In data science projects, while working with classification problems it is quite common to encounter imbalanced data. Also, when you evaluate the model using metrics like accuracy_score(), you may get good accuracy since the the model may predict the dominant class (Class - No) really well even though it fails to predict the class with less numbers. This can be dangerous in case of medical use cases. This article is intended to help you understand how data imbalance can be handled when you work in real-world use cases in data science.
Exploratory Data Analysis (EDA) in data analytics helps to visualize hidden and meaningful information inside the data.Once the data visualized, understood, the further steps in the data analysis like data cleaning and model building can be planned effectively to meet the business outcomes. Further more, helps to make quick conclusions as better decisions are made when the data is represented visually than just a collection of numbers. In this article, we will see how to perform exploratory data analysis using matplotlib and seaborn to derive some insights of the data.
Breast cancer is a cancer that develops in breast cells and when we see the statistics of 2019 in U.S, About 1 in 8 U.S. women (about 12%) will develop invasive breast cancer over the course of her lifetime. In this article, we will see how to identify breast cancer using K-Nearest Neighbors algorithm.
Decision Trees and Random Forests are two powerful tree based machine learning algorithms which are predominantly used by Data scientists. One of the remarkable advantage of using tree based algorithm is that they can be easily interpreted. Also this makes it straight forward to derive the importance of each variable on the decision making process of tree based approach. In simple words, In tree based methods it is easy to compute how much each variable contributes to that decision. In this article, we will see two approaches for feature selection using tree based models.
Regression and Classification are the main two types of supervised machine learning algorithms. The main difference between classification and regression is that, regression predicts the continuous output and classification predicts the discrete output values. You can think of predicting if a person has diabetics or not as a classification problem and predicting the oil price as a regression problem. Regression is useful when you want to estimate a continuous output value using a set of predictors (inputs) . Here we will see how we can implement linear regression using python and scikit-learn with the help of Advertisement dataset.