10 Datasets for Machine Learning Classification Experiments

1. IRIS Dataset

IRIS Dataset for Classification
The Iris flower data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher. The data set consists of 50 samples from each of three species of Iris (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.
This dataset is free and is publicly available at the UCI Machine Learning Repository.

2. Telecom Customer Churn Prediction

Customer Churn Prediction Dataset

Telcom Customer Churn Dataset is a raw dataset contains more than 7000 entries. All entries have several features and of course a column stating if the customer has churned or not

Credits :  Dataset is available in Kaggle

3. Wine Quality Dataset

Wine Quality Dataset

The dataset can be used for predicting the quality of wine based on different chemical information about the wine. In the dataset, we have information about two types of wines red and white and the quality corresponding to both. .

Credits : UCI Machine Learning Repository


4. Parkinson Disease Dataset

Parkinson Disease Dataset

Parkinson’s disease is a progressive nervous system disorder that affects movement. Symptoms start gradually, sometimes starting with a barely noticeable tremor in just one hand.The dataset contain details about biomedical measurements.  The main objective of this dataset is to identify people with Parkinson.

Credits: Kaggle

5. Breast Cancer Dataset

Breast Cancer Prediction Classification
Breast cancer is cancer that forms in the cells of the breasts.There are two types of breast cancer tumors: those that are non-cancerous, or ‘benign’, and those that are cancerous, which are ‘malignant’. This dataset has a number of features that can help us to classify the tumor into Benign (B) or Malignant (M). The Diagnose column is the target variable here.

Credits :      UCI Machine Learning Repository

6. Titanic Dataset

Titanic Survival Dataset
The sinking of the Titanic is one of the most infamous shipwrecks in history.  This dataset contains passenger information like name, age, gender, socio-economic class, etc. Te objective is to build a predictive model saying the passenger will survive or not.

Credits : Kaggle

7. PIMA Indians Diabetes Dataset

PIMA Indians Diabetes Diabetes
The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
Credits : UCI Machine Learning

8. Default of Credit Card Clients Dataset

Default of Credit Card Clients Dataset
This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.Default payment Is indicated by (1=yes, 0=no). 

Credits : UCI Machine Learning Repository

9. Heart Disease Dataset

Heart Disease Dataset

The main objective of the data is to identify the patients with heart disease. It has around 14 attributes that helps to identify the presence of heart disease. Target feature 1 indicates presence of disease and 0 indicates absence of disease.

Credits : UCI Machine Learning Repository

10. Bank Note Authentication Dataset

Bank Note authentication
This advanced level data set has 1372 rows and 5 columns. Data were extracted from images that were taken for the evaluation of an authentication procedure for banknotes. Data were extracted from images that were taken from genuine and forged banknote-like specimens.Wavelet Transform tool were used to extract features from images. 1 indicates note is genuine and 0 indicates fake.

Credit : UCL Machine Learning Repository

Leave a Reply

Your email address will not be published. Required fields are marked *