In data science projects, while working with classification problems it is quite common to encounter imbalanced data. Also, when you evaluate the model using metrics like accuracy_score(), you may get good accuracy since the the model may predict the dominant class (Class - No) really well even though it fails to predict the class with less numbers. This can be dangerous in case of medical use cases. This article is intended to help you understand how data imbalance can be handled when you work in real-world use cases in data science.
Generating large volume of data for various purpose is often a hectic job and consumes a lot of time. Especially in scenarios like testing your application with dummy data,filling database tables, running machine learning algorithms, performance testing of applications, etc. Here we are going to see an effective technique to generate a huge amount of data in seconds with a python library called ‘Faker’.