Machine Learning with Imbalanced Data
What you'll learn
- Apply random under-sampling to remove observations from majority classes
- Perform under-sampling by removing observations that are hard to classify
- Carry out under-sampling by retaining observations at the boundary of class separation
- Apply random over-sampling to augment the minority class
- Create syntethic data to increase the examples of the minority class
- Implement SMOTE and its variants to synthetically generate data
- Use ensemble methods with sampling techniques to improve model performance
- Change the miss-classification cost optimized by the models to accomodate minority classes
- Determine model performance with the most suitable metrics for imbalanced datasets
- Knowledge of machine learning basic algorithms, i.e., regression, decision trees and nearest neighbours
- Python programming, including familiarity with NumPy, Pandas and Scikit-learn
- A Python and Jupyter notebook installation
Welcome to Machine Learning with Imbalanced Datasets. In this course, you will learn multiple techniques which you can use with imbalanced datasets to improve the performance of your machine learning models.
If you are working with imbalanced datasets right now and want to improve the performance of your models, or you simply want to learn more about how to tackle data imbalance, this course will show you how.
We'll take you step-by-step through engaging video tutorials and teach you everything you need to know about working with imbalanced datasets. Throughout this comprehensive course, we cover almost every available methodology to work with imbalanced datasets, discussing their logic, their implementation in Python, their advantages and shortcomings, and the considerations to have when using the technique. Specifically, you will learn:
Under-sampling methods at random or focused on highlighting certain sample populations
Over-sampling methods at random and those which create new examples based of existing observations
Ensemble methods that leverage the power of multiple weak learners in conjunction with sampling techniques to boost model performance
Cost sensitive methods which penalize wrong decisions more severely for minority classes
The appropriate metrics to evaluate model performance on imbalanced datasets
By the end of the course, you will be able to decide which technique is suitable for your dataset, and / or apply and compare the improvement in performance returned by the different methods on multiple datasets.
This comprehensive machine learning course includes over 50 lectures spanning more than 10 hours of video, and ALL topics include hands-on Python code examples which you can use for reference and for practice, and re-use in your own projects.
In addition, the code is updated regularly to keep up with new trends and new Python library releases.
So what are you waiting for? Enroll today, learn how to work with imbalanced datasets and build better machine learning models.
Who this course is for:
- Data scientists and machine learning engineers working with imbalanced datasets
- Data scientists who want to improve the performance of models trained on imbalanced datasets
- Students who want to learn intermediate content on machine learning
- Students working with imbalanced multi-class targets
Hey, I am Sole. I am a data scientist and open-source Python developer with a passion for teaching and programming.
I teach intermediate and advanced courses on machine learning, covering topics like how to improve machine learning pipelines, better engineer and select features, optimize models, and deal with imbalanced datasets.
I am the developer and maintainer of Feature-engine, an open-source Python library for feature engineering and selection, and the author of Packt's "Python Feature Engineering Cookbook" and the "Feature Selection in Machine Learning with Python" book.
I received a Data Science Leaders Award in 2018 and was selected as one of "LinkedIn’s voices" in data science and analytics in 2019.
I worked as a data scientist for financial and insurance firms, developing and putting in production machine learning models to assess credit risk, process insurance claims, and prevent fraud.
I love sharing knowledge about data science and machine learning. This is why I teach online, create and contribute to open-source software, and also speak at meetups, write blogs, and participate in podcasts.
I've got an MSc in Biology, a PhD in Biochemistry, and 8+ years of experience as a research scientist at well-known institutions like University College London and the Max Planck Institute. I've also taught biochemistry for 4+ years at the University of Buenos Aires and mentored MSc and PhD students.
Feel free to contact me on LinkedIn, follow me on Twitter, or visit our website for blogs about machine learning.