Unbalanced Data - Quick Start
What you'll learn
- Understand the underline causes of the Class Imbalance problem
- Why it is a major challenge in machine learning and data mining fields
- Learn the different characteristics of imbalanced datasets
- Learn the state-of-the-art techniques and algorithms
- Understand couple data-based undersampling techniques and apply them.
- Understand couple data-based oversampling techniques and apply them
- Learn an algorithmic-based algorithm
- Knowledge of Data Science - Basic Level
There is an unprecedented amount of data available. This has caused knowledge discovery to garner attention in recent years. However, many real-world datasets are imbalanced. Learning from imbalanced data poses major challenges and is recognized as needing significant attention.
The problem with imbalanced data is the performance of learning algorithms in the presence of underrepresented data and severely skewed class distributions. Models trained on imbalanced datasets strongly favor the majority class and largely ignore the minority class. Several approaches introduced to date present both data-based and algorithmic solutions.
The specific goals of this course are:
Help the students understand the underline causes of this problem
Discuss the different characteristics of an unbalanced dataset
Highlight the severity and importance of this branch of data science
Give a general idea of the two main major state-of-the-art approaches that you developed to handle this problem.
Go over two methods in details to give an idea about some of the techniques used and hopefully motivate the students to learn more.
Who this course is for:
- People who are curious about dataset of unbalanced nature and how to handle them
Hello and thank you for checking out my course. I have a B.Sc, M.Sc and PhD in computer science from University of California, San Diego and University of Houston respectively.
I'm an experienced machine learning specialist. I enjoy working on various aspects of machine learning problems, high-dimensional statistics and predictive analytics with a main focus on developing and analyzing learning algorithms for imbalanced data. I am especially interested in understanding and exploiting the intrinsic structure in data (e.g. manifold or sparse structure) to design more effective learning algorithms.
I am an entrepreneur who wants to use technology to improve people's lives and an educator who wants to turn technology consumers into technology builders.
My Method: The first step is always simply noticing a problem that already exists. What could be changed or improved about the way we currently do things to make them easier, cheaper, more efficient or helpful? Next begins the ongoing process of gathering insight. What do people closest to the issue see as the hurdles? How can we collaborate to understand the problem in its most basic form? Third, I map out a clear path from what we have now to a better solution. Finally, I work relentlessly, tirelessly, to come up with an answer while being flexible enough to take criticism and firm enough to stay driven.