Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA Amazon AWS CompTIA Security+ AWS Certified Developer - Associate
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Personal Development Mindfulness Personal Transformation Meditation Life Purpose Emotional Intelligence Neuroscience
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Retargeting
SQL Microsoft Power BI Tableau Business Analysis Business Intelligence MySQL Data Analysis Data Modeling Data Science
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Online Business Business Plan Startup Freelancing Blogging Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
30-Day Money-Back Guarantee
Development Data Science Machine Learning

Machine Learning with Imbalanced Data

Learn multiple techniques to tackle data imbalance and improve the performance of your machine learning models.
Rating: 4.7 out of 54.7 (109 ratings)
1,683 students
Created by Soledad Galli
Last updated 2/2021
English
English [Auto]
30-Day Money-Back Guarantee

What you'll learn

  • Under-sampling methods at random
  • Under-sampling methods which focus on observations that are harder to classify
  • Under-sampling methods that ignore potentially noisy observations
  • Over-sampling methods to increase the number of minority observations
  • Ways of creating syntethic data to increase the examples of the minority class
  • SMOTE and its variants
  • Use ensemble methods with sampling techniques to improve model performance
  • The most suitable evaluation metrics to use with imbalanced datasets
Curated for the Udemy for Business collection

Course content

10 sections • 104 lectures • 8h 27m total length

  • Preview03:16
  • Preview03:13
  • Preview02:01
  • Code | Jupyter notebooks
    00:15
  • Presentations covered in the course
    00:04
  • Python package Imbalanced-learn
    00:07
  • Download Datasets
    00:08
  • Additional resources for Machine Learning and Python programming
    00:47

  • Imbalanced classes - Introduction
    05:24
  • Nature of the imbalanced class
    04:56
  • Approaches to work with imbalanced datasets - Overview
    03:59
  • Additional Reading Resources (Optional)
    00:10

  • Introduction to Performance Metrics
    02:43
  • Accuracy
    04:33
  • Accuracy - Demo
    06:05
  • Precision, Recall and F-measure
    13:32
  • Install Yellowbrick
    00:07
  • Precision, Recall and F-measure - Demo
    10:04
  • Confusion tables, FPR and FNR
    06:03
  • Confusion tables, FPR and FNR - Demo
    07:32
  • Geometric Mean, Dominance, Index of Imbalanced Accuracy
    04:29
  • Geometric Mean, Dominance, Index of Imbalanced Accuracy - Demo
    10:25
  • ROC-AUC
    07:26
  • ROC-AUC - Demo
    04:46
  • Precision-Recall Curve
    07:54
  • Precision-Recall Curve - Demo
    02:40
  • Additional reading resources (Optional)
    00:15
  • Probability
    04:32

  • Preview05:21
  • Random Under-Sampling - Intro
    05:37
  • Random Under-Sampling - Demo
    10:11
  • Preview07:25
  • Condensed Nearest Neighbours - Demo
    07:25
  • Tomek Links - Intro
    04:48
  • Tomek Links - Demo
    03:18
  • One Sided Selection - Intro
    02:39
  • One Sided Selection - Demo
    03:32
  • Edited Nearest Neighbours - Intro
    04:44
  • Edited Nearest Neighbours - Demo
    04:02
  • Repeated Edited Nearest Neighbours - Intro
    04:39
  • Repeated Edited Nearest Neighbours - Demo
    03:00
  • All KNN - Intro
    03:38
  • All KNN - Demo
    02:54
  • Neighbourhood Cleaning Rule - Intro
    04:14
  • Neighbourhood Cleaning Rule - Demo
    02:03
  • NearMiss - Intro
    03:47
  • NearMiss - Demo
    03:53
  • Instance Hardness Threshold - Intro
    04:09
  • Instance Hardness Threshold - Demo
    03:41
  • Undersampling Method Comparison
    07:44
  • Summary Table
    00:00

  • Over-Sampling Methods - Introduction
    03:41
  • Random Over-Sampling
    03:10
  • Random Over-Sampling - Demo
    04:55
  • Preview09:26
  • SMOTE - Demo
    02:35
  • SMOTE-NC
    09:02
  • SMOTE-NC - Demo
    02:56
  • ADASYN
    07:11
  • ADASYN - Demo
    03:17
  • Borderline SMOTE
    08:41
  • Borderline SMOTE - Demo
    03:13
  • SVM SMOTE
    05:39
  • SVM SMOTE - Demo
    04:32
  • K-Means SMOTE
    05:42
  • K-Means SMOTE - Demo
    03:29
  • Over-Sampling Method Comparison
    06:18

  • Combining Over and Under-sampling - Intro
    06:32
  • Combining Over and Under-sampling - Demo
    05:26
  • Comparison of Over and Under-sampling Methods
    05:54

  • Ensemble methods with Imbalanced Data
    04:49
  • Foundations of Ensemble Learning
    03:12
  • Bagging
    03:04
  • Bagging plus Over- or Under-Sampling
    05:38
  • Boosting
    10:03
  • Boosting plus Re-Sampling
    07:05
  • Hybdrid Methods
    04:48
  • Ensemble Methods - Demo
    09:59
  • Additional Reading Resources
    00:17

  • Cost-sensitive Learning - Intro
    07:27
  • Types of Cost
    10:55
  • Obtaining the Cost
    04:28
  • Cost Sensitive Approaches
    01:52
  • Misclassification Cost in Logistic Regression
    03:35
  • Misclassification Cost in Decision Trees
    04:02
  • Cost Sensitive Learning with Scikit-learn- Demo
    07:13
  • Find Optimal Cost with hyperparameter tuning
    03:33
  • Bayes Conditional Risk
    13:44
  • MetaCost
    08:03
  • MetaCost - Demo
    03:40
  • Optional: MetaCost Base Code
    06:39
  • Additional Reading Resources
    00:13

  • Probability Calibration
    06:41
  • Probability Calibration Curves
    05:56
  • Probability Calibration Curves - Demo
    09:37
  • Brier Score
    03:06
  • Brier Score - Demo
    07:07
  • Under- and Over-sampling and Cost-sensitive learning on Probability Calibration
    05:10
  • Calibrating a Classifier
    05:25
  • Calibrating a Classifier - Demo
    06:20
  • Calibrating a Classfiier after SMOTE or Under-sampling
    08:05
  • Calibrating a Classifier with Cost-sensitive Learning
    03:31
  • Probability: Additional reading resources
    00:09

  • Next steps
    00:12

Requirements

  • Knowledge of machine learning basic algorithms, i.e., regression, decision trees and nearest neighbours
  • Python programming, including familiarity with NumPy, Pandas and Scikit-learn

Description

Welcome to Machine Learning with Imbalanced Datasets. In this course, you will learn multiple techniques which you can use with imbalanced datasets to improve the performance of your machine learning models.


If you are working with imbalanced datasets right now and want to improve the performance of your models, or you simply want to learn more about how to tackle data imbalance, this course will show you how.


We'll take you step-by-step through engaging video tutorials and teach you everything you need to know about working with imbalanced datasets. Throughout this comprehensive course, we cover almost every available methodology to work with imbalanced datasets, discussing their logic, their implementation in Python, their advantages and shortcomings, and the considerations to have when using the technique. Specifically, you will learn:


  • Under-sampling methods at random or focused on highlighting certain sample populations

  • Over-sampling methods at random and those which create new examples based of existing observations

  • Ensemble methods that leverage the power of multiple weak learners in conjunction with sampling techniques to boost model performance

  • Cost sensitive methods which penalize wrong decisions more severely for minority classes

  • The appropriate metrics to evaluate model performance on imbalanced datasets


By the end of the course, you will be able to decide which technique is suitable for your dataset, and / or apply and compare the improvement in performance returned by the different methods on multiple datasets.


This comprehensive machine learning course includes over 50 lectures spanning about 8 hours of video, and ALL topics include hands-on Python code examples which you can use for reference and for practice, and re-use in your own projects.


In addition, the code is updated regularly to keep up with new trends and new Python library releases.

So what are you waiting for? Enroll today, learn how to work with imbalanced datasets and build better machine learning models.

Who this course is for:

  • Data Scientists and Machine Learning engineers working with imbalanced datasets

Instructor

Soledad Galli
Lead Data Scientist
Soledad Galli
  • 4.6 Instructor Rating
  • 5,805 Reviews
  • 25,125 Students
  • 6 Courses

Soledad Galli is a lead data scientist and founder of Train in Data. She has experience in finance and insurance, received a Data Science Leaders Award in 2018 and was selected “LinkedIn’s voice” in data science and analytics in 2019. Sole is passionate about sharing knowledge and helping others succeed in data science.

As a data scientist in Finance and Insurance companies, Sole researched, developed and put in production machine learning models to assess Credit Risk, Insurance Claims and to prevent Fraud, leading in the adoption of machine learning in the organizations.

Sole is passionate about empowering people to step into and excel in data science. She mentors data scientists, writes articles online, speaks at data science meetings, and teaches online courses on machine learning.

Sole has recently created Train In Data, with the mission to facilitate and empower people and organizations worldwide to step into and excel in data science and analytics.

Sole has an MSc in Biology, a PhD in Biochemistry and 8+ years of experience as a research scientist in well-known institutions like University College London and the Max Planck Institute. She has scientific publications in various fields such as Cancer Research and Neuroscience, and her research was covered by the media on multiple occasions.

Soledad has 4+ years of experience as an instructor in Biochemistry at the University of Buenos Aires, taught seminars and tutorials at University College London, and mentored MSc and PhD students at Universities.

Feel free to contact her on LinkedIn.


========================


Soledad Galli es científica de datos y fundadora de Train in Data. Tiene experiencia en finanzas y seguros, recibió el premio Data Science Leaders Award en 2018 y fue seleccionada como "la voz de LinkedIn" en ciencia y análisis de datos en 2019. A Soledad le apasiona compartir conocimientos y ayudar a otros a tener éxito en la ciencia de datos.


Como científica de datos en compañías de finanzas y seguros, Sole desarrolló y puso en producción modelos de aprendizaje automático para evaluar el riesgo crediticio, automatizar reclamos de seguros y para prevenir el fraude, facilitando la adopción del aprendizaje de máquina en estas organizaciones.


A Sole le apasiona ayudar a que las personas aprendan y se destaquen en ciencia de datos, es por eso habla regularmente en reuniones de ciencia de datos, escribe varios artículos disponibles en la web y crea cursos sobre aprendizaje de máquina.


Sole ha creado recientemente Train In Data, con la misión de ayudar a las personas y organizaciones de todo el mundo a que aprendan y se destaquen en la ciencia y análisis de datos.


Sole tiene una maestría en biología, un doctorado en bioquímica y más de 8 años de experiencia como investigadora científica en instituciones prestigiosas como University College London y el Instituto Max Planck. Tiene publicaciones científicas en diversos campos, como la investigación contra el Cáncer y la Neurociencia, y sus resultados fueron cubiertos por los medios en múltiples ocasiones.


Soledad tiene más de 4 años de experiencia como instructora de bioquímica en la Universidad de Buenos Aires, dio seminarios y tutoriales en University College London, en Londres, y fue mentora de estudiantes de maestría y doctorado en diferentes universidades.


No dudes en contactarla en LinkedIn.

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.