In this tutorial, you will discover how to develop and evaluate a model for the imbalanced adult income classification dataset.
After completing this tutorial, you will know:
How to load and explore the dataset and generate ideas for data preparation and model selection.
How to systematically evaluate a suite of machine learning models with a robust test harness.
How to fit a final model and use it to predict class labels for specific cases.
Many binary classification tasks do not have an equal number of examples from each class, e.g. the class distribution is skewed or imbalanced.
A popular example is the adult income dataset that involves predicting personal income levels as above or below $50,000 per year based on personal details such as relationship and education level. There are many more cases of incomes less than $50K than above $50K, although the skew is not severe.