Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Python Data Science: Classification Modeling

Name: Python Data Science: Classification Modeling
Rating: 4.7 (215 reviews)

Learn Python for data science & supervised machine learning, and build classification models w/ a top Python instructor!

Created byMaven Analytics • 1,500,000 Learners, Chris Bruehl

Last updated 6/2026

English

What you'll learn

Master the foundations of supervised Machine Learning & classification modeling in Python
Perform exploratory data analysis on model features and targets
Apply feature engineering techniques and split the data into training, test and validation sets
Build and interpret k-nearest neighbors and logistic regression models using scikit-learn
Evaluate model performance using tools like confusion matrices and metrics like accuracy, precision, recall, and F1
Learn techniques for modeling imbalanced data, including threshold tuning, sampling methods, and adjusting class weights
Build, tune, and evaluate decision tree models for classification, including advanced ensemble models like random forests and gradient boosted machines

Course content

14 sections • 170 lectures • 9h 51m total length

Course Introduction2:00
Master classification modeling in Python with data prep, feature engineering, imbalanced data techniques, and evaluating K-nearest neighbors, logistic regression, and trees for credit risk at Maven National Bank.
About This Series0:42
Explore classification modeling in Python as part three of a five-part data science series, covering dataprep and EDA, regression classification, unsupervised learning, and natural language processing.
Course Structure & Outline2:16
Explore a project-based, hands-on course in Python data science that covers classification modeling, data prep, EDA, and model deployment with interactive demos, quizzes, and assignments.
READ ME: Important Notes for New Students2:18
DOWNLOAD: Course Resources0:13
Introducing the Course Project0:50
Explore and visualize bank customer data, prepare it for modeling, and apply classification algorithms to predict high, medium, and low credit risk, then evaluate and interpret the best model.
Setting Expectations1:27
Explore classification modeling from k nearest neighbors to logistic regression and tree-based models, use metrics for imbalanced data, and practice in Jupyter Notebooks or Google Colab.
Jupyter Installation & Launch4:03
Install and launch Anaconda to access Jupyter notebooks and create notebooks. Explore Google Colab as an online alternative and verify setup with a simple Python test.

What is Data Science?2:44
Explore the data science landscape, compare it with other analytics fields, and walk through each phase of the data science workflow, highlighting supervised and unsupervised learning and common algorithms.
The Data Science Skillset1:46
What is Machine Learning?2:43
Learn how machine learning uses data-driven algorithms to enable computers to learn and predict, covering supervised learning (regression and classification) and unsupervised learning for pattern discovery, customer segmentation, and recommendations.
Common Machine Learning Algorithms1:59
Discover the most common machine learning algorithms, covering supervised and unsupervised learning, regression, classification, and key models like k-NN, logistic regression, tree-based models, and neural networks.
Data Science Workflow1:08
The data science workflow guides you from scoping the project to gathering, cleaning, exploring data, selecting features, and modeling with machine learning. Share insights with end users and iterate.
Data Prep & EDA Steps3:42
Define the project scope and business problem to guide data gathering and data cleaning, then perform exploratory data analysis with profiling and visualization to derive actionable insights.
Modeling Steps2:54
Move from data steps to modeling by splitting data, selecting and engineering features, and training and validating models. Emphasize simple, interpretable approaches for stakeholder buy-in, deployment, and business impact.
Classification Modeling0:37
Explore classification modeling within the data science workflow, emphasizing predicting categorical targets and prioritizing dataprep and EDA to maximize the modeling phase.
Key Takeaways1:17
Data science uses data to make smart decisions with supervised and unsupervised learning. Classification predicts categorical outcomes like purchases, fraud, or positive/negative/neutral reviews.
Intro to Data Science

Classification 1015:45
Classification modeling is a supervised technique that predicts a categorical target from features, using age and income to forecast purchase. It covers terminology, goals, workflow, and a k-nearest neighbors example.
Goals of Classification1:50
Explore how classification models balance prediction and inference to maximize predictive accuracy. Learn how target variables relate to features and how this informs model choice and feature engineering.
Types of Classification2:21
Classification Modeling Workflow2:52
Learn the classification modeling workflow from data preparation and feature engineering to model selection, evaluation, and deployment, with algorithms like logistic regression, trees, and ensembles.
Key Takeaways1:21
Explore classification modeling concepts, including predicting categorical values, target variables (Y), and feature inputs (X), while emphasizing predictive accuracy through data splitting, feature engineering, and model validation.
Classification 101

EDA For Classification3:30
Explore data prep and EDA for classification, visualize the target and features, assess their relationships and multicollinearity, and prepare data with feature engineering and data splitting.
Defining a Target4:27
DEMO: Defining a Target5:44
Explore how to define a binary target for classification using loan default status, credit score thresholds, and loan type mappings, with data exploration and transformations in Python and SQL.
Exploring the Target4:29
Explore the target variable to assess class frequencies and imbalances with value counts and bar charts; express distributions as percentages and note the need for numeric encoding.
Exploring the Features2:07
Explore features for classification using histograms and box plots on numeric features, and value counts with bar charts using pandas plot API for categorical features; note the 35–45 age spike.
DEMO: Exploring the Features5:08
Explore a loan default dataset with numeric and object features using histograms, box plots, and loops to visualize distributions and identify rare classes for cleaning.
ASSIGNMENT: Exploring the Target & Features2:18
Read the income data with pandas, convert sal stat to a binary target. Plot target frequencies and explore numeric features like age, capital gain, capital loss, and hours per week.
SOLUTION: Exploring the Target & Features8:28
Read an income dataset with pandas, create a binary target via numpy.where, assess class imbalance, and explore numeric and categorical features with box and bar plots for classification modeling.
Correlation5:14
Explore how correlation reveals the strength and direction of relationships between numeric features and the target, using pandas to compute correlations and screen for multicollinearity in classification modeling.
PRO TIP: Correlation Matrix2:28
Explore how to create and interpret a correlation matrix using df.corr and sns.heatmap to identify strong predictors like age and estimated salary, while recognizing nonpredictive features such as user ID.
DEMO: Correlation Matrix4:59
Explore correlation basics using a loan data frame, visualize relationships with scatter plots and heatmaps, and quantify links via a correlation matrix, including loan amount, property value, and default.
Feature-Target Relationships7:18
Explore feature-target relationships using box plots for numeric features and bar charts for categorical features to identify strong predictors and reveal how means and distributions relate to the target variable.
Feature-Feature Relationships2:29
Explore feature feature relationships to detect multicollinearity and avoid redundant predictors in classification modeling, using correlation matrices, data visualizations, and pair plots such as age versus salary.
PRO TIP: Pair Plots4:28
ASSIGNMENT: Exploring Relationships1:33
Explore feature relationships in the data by building a correlation heat map, a pair plot of numeric variables, and a function to visualize the average target rate by categorical levels.
SOLUTION: Exploring Relationships7:53
Build a correlation heatmap from a correlation matrix, then use seaborn pair plots and bar plots to reveal how age, hours worked, and education influence earning over 50,000.
Feature Engineering Overview4:43
Engineer features by creating or modifying inputs to improve model performance, using domain knowledge to transform and combine columns, such as binary gender and dummy variables, for logistic regression.
Numeric Feature Engineering4:10
Dummy Variables4:48
Transform categorical data into numeric with dummy variables via one-hot encoding using pandas get_dummies, and learn how drop_first creates a reference level for logistic regression.
Binning Categories3:34
Group rare and related categories through binning to reduce dummy variables and model width, boosting interpretability in classification models by mapping eight diamond quality levels into three bins.
DEMO: Feature Engineering7:01
Explore quick feature engineering techniques, including scaling with mean and standard deviation, binning categorical variables, creating ratio features like loan amount to income, and preparing data for modeling with splits.
Data Splitting5:28
Preparing Data for Modeling2:05
Prepare data for modeling by organizing it into a single data frame with features and a binary target. Tree-based models handle missing values; kNN and logistic regression require imputation.
ASSIGNMENT: Preparing the Data for Modeling1:59
Prepare data for modeling by creating dummy variables, grouping rare categories, and splitting 20% of data for testing, producing x_train, x_test, y_train, and y_test.
SOLUTION: Prepare the Data for Modeling7:29
Group rare or ambiguous categorical levels into missing or other buckets, replace values, create dummy variables, and prepare training and test sets for modeling.
Key Takeaways1:37
Explore features with histograms and box plots, bar charts for targets; use scatter plots and correlations to identify predictors and multicollinearity, perform feature engineering, and split data for classification modeling.
Data Prep & EDA

K-Nearest Neighbors5:44
The KNN Workflow4:57
Learn the KNN workflow: split data, standardization with scikit-learn's StandardScaler, and fit and tune a KNN model to evaluate generalization on a test set.
KNN in Python2:16
Apply k nearest neighbors classification in Python using scikit-learn, emphasizing data split and scaling, and evaluate using accuracy scores to tune the k parameter via cross-validation to optimize model performance.
Model Accuracy3:55
The lecture explains measuring classification accuracy by counting correct predictions, using scikit-learn's score, and tools like the confusion matrix to compare training and test performance.
Confusion Matrix3:58
Analyze the confusion matrix to compare predicted versus actual classes, identify true and false positives and negatives, derive accuracy, precision, and recall with Python and scikit-learn; visualize with seaborn heatmap.
DEMO: Confusion Matrix4:10
Explore how accuracy can be misleading and diagnose model performance with a confusion matrix, illustrating true negatives, true positives, false positives, and false negatives using scikit-learn. and seaborn.
ASSIGNMENT: Fitting a Simple KNN Model1:50
Fit a k-nearest neighbors model with k=5 using age and hours per week, after scaling, and report train/test accuracy and confusion matrices; create a scatter plot colored by predictions.
SOLUTION: Fitting a Simple KNN Model3:42
Standardize age and hours per week with a standard scaler, train a five-nearest-neighbors model, and evaluate with accuracy and a confusion matrix; explore tuning k and features.
Hyperparameter Tuning3:39
Tune hyperparameters to optimize k nearest neighbors classification, testing values of k from 1 to n and considering distance metrics like euclidean or manhattan and weights to improve validation performance.
Overfitting & Validation7:07
Balance overfitting and underfitting by using validation data and cross-validation to gauge generalization, tune hyperparameters, features, and regularization, and evaluate accuracy, precision, and recall.
DEMO: Hyperparameter Tuning6:13
Tune the k in k-nearest neighbors with grid search cross-validation to optimize hyperparameters, compare distance metrics like Minkowski, Euclidean, and Manhattan, and report a test accuracy of 88.75%.
Hard vs. Soft Classification4:54
DEMO: Probability vs. Event Rate10:05
Compare soft probability scores to hard thresholds through a probability versus event rate plot. Bin scores into deciles, then compare mean probability to actual event rate.
ASSIGNMENT: Tuning a KNN Model1:16
Tune the k in a kNN classifier using all features, apply scaling, and employ cross-validated grid search to report test accuracy and generate a confusion matrix.
SOLUTION: Tuning a KNN Model3:33
Tune a kNN model using all features to boost accuracy, but note the training-test gap. Cross-validation informs n_neighbors=25, raising test accuracy to 82.8%.
Pros & Cons of KNN4:17
Explore the pros and cons of k-nearest neighbors (KNN) for classification, noting simplicity and interpretability for small data, and heavy computation and the curse of dimensionality for large data.
Key Takeaways1:12
K-nearest neighbors is a distance-based classifier using Euclidean distance (with Manhattan as an alternative), k tuned by cross-validation, and evaluated via accuracy and confusion matrix before moving to logistic regression.
K-Nearest Neighbors

Logistic Regression3:00
Explore logistic regression for binary and multiclass classification in Python, covering the sigmoid probability curve, likelihood, fitting, scoring, regularization, tuning, and model interpretation.
Logistic vs. Linear Regression2:41
Explore how logistic regression, a classification algorithm, uses a linear model to produce probabilities by transforming outputs through log odds (logit) to p in [0,1].
The Logistic Function3:24
Shows how logistic regression converts a linear combination of features into a probability via a sigmoid function, with intercept and slope shaping odds and likelihood optimization.
Likelihood4:53
Learn how logistic regression uses likelihood to fit model weights by maximizing predicted probabilities and comparing to the true labels; extend to multiple features and regularization.
Multiple Logistic Regression3:17
Learn to apply multiple logistic regression to predict binary outcomes with several features, such as X1 and X2, an intercept, and beta weights for spam detection.
The Logistic Regression Workflow0:52
Split training/validation and test sets to assess performance, scale for regularization if needed, then fit and tune logistic regression with features and hyperparameters, and score on test.
Logistic Regression in Python4:43
Explore how to fit a logistic regression model in Python using scikit-learn, inspect coefficients and intercept, evaluate accuracy and confusion matrices on loan default data, feature engineering ideas.
Interpreting Coefficients3:41
Interpret logistic regression coefficients as changes in log odds; positive coefficients increase probability and negative coefficients decrease it, with odds ratios given by e^beta.
ASSIGNMENT: Logistic Regression1:35
Fit a logistic regression model on income data using age and hours per week, prepare the data, and evaluate with a confusion matrix and accuracy on test data.
SOLUTION: Logistic Regression3:24
Fit a logistic regression with age and hours per week to predict income over 50k, evaluate with training and test accuracy and a confusion matrix, and interpret coefficients as odds.
Feature Engineering & Selection3:53
Improve ad purchase prediction by applying feature engineering and selection, using binary cutoffs for age and salary and interaction terms to boost out-of-sample accuracy of logistic regression.
Regularization5:57
Explore regularization to curb overfitting through hyperparameter tuning, cross-validation, and penalty terms (L1, L2, elastic net) in linear models and logistic regression to improve out-of-sample performance.
Tuning a Regularized Model3:51
Tune regularized logistic regression by scaling features and using grid search with cross-validation to optimize C and penalty types (L1 and L2) on the Saga solver.
DEMO: Regularized Logistic Regression3:45
Scale data with standardization, run grid search over logistic regression C and penalty, note elastic net with L1 ratio, and use saga with max_iter 1000.
ASSIGNMENT: Regularized Logistic Regression1:07
Fit a full logistic regression model with all features, scale the inputs, and tune regularization hyperparameters, using feature selection and engineering to build an MVP production model.
SOLUTION: Regularized Logistic Regression4:28
Fit a logistic regression model using all features and scale data. Tune regularization (C, L2) with grid search to improve test accuracy to 85.6% and preview multiclass logistic regression.
Multi-class Logistic Regression6:43
Explore multiclass logistic regression with the iris data set to predict flower species from petal and sepal measurements. See how three models handle classes, interpret coefficients, and assess accuracy.
ASSIGNMENT: Multi-class Logistic Regression1:22
Fit a multi-class logistic regression on the provided dataset, and report test accuracy and a confusion matrix to evaluate model performance for the upcoming projects.
SOLUTION: Multi-class Logistic Regression3:52
Fit a multiclass logistic regression on credit data with three target classes, and note baseline class frequencies. Compare its accuracy to a random forest baseline and discuss misclassifications.
Pros & Cons of Logistic Regression2:33
Key Takeaways1:40
Master logistic regression as a linear, interpretable classifier with a logit link and 0–1 probabilities, forming an S-shaped curve. Explore regularization, multiclass extensions, and metrics beyond accuracy and confusion matrix.
Logistic Regression

Classification Metrics2:37
Explore classification metrics for evaluating models, including accuracy, confusion matrix, precision, recall, and F1 score, plus ROC curves and AUC for soft classification thresholds and multiclass metrics.
Accuracy, Precision & Recall6:39
DEMO: Accuracy, Precision & Recall5:24
Explore how to compute accuracy, precision, and recall using scikit-learn metrics, interpret confusion matrices, and compare untuned and tuned logistic regression to balance precision and recall.
PRO TIP: F1 Score3:39
Explore the F1 score, the harmonic mean of precision and recall, and learn how to compute it with scikit-learn, interpret imbalances, and compare untuned and tuned models.
ASSIGNMENT: Model Metrics0:57
Apply Python to calculate accuracy, precision, recall, and F1 scores for a logistic regression classifier, generate a confusion matrix, and practice metrics with the metrics assignments notebook on income data.
SOLUTION: Model Metrics4:06
Soft Classification7:02
Shift the default 0.5 threshold to balance false positives and false negatives, optimizing precision, recall, and F1 for tasks like spam filtering, fraud detection, and targeted advertising.
DEMO: Leveraging Soft Classification3:28
Shift decision thresholds with predict_proba to balance precision and recall in a loan default model, analyze false negatives versus false positives, and preview the precision recall curve.
PRO TIP: Precision-Recall & F1 Curves3:44
Visualize precision-recall curve to explore how thresholds trade off precision and recall with predicted probabilities. Use F1 curve to pick the threshold that balances metrics on test data.
DEMO: Plotting Precision-Recall & F1 Curves4:09
Plot precision-recall and F1 curves to compare thresholds, observe how precision rises as recall falls, and identify near-optimal points using training and test data, ROC and AUC metrics.
The ROC Curve & AUC3:15
Explore the ROC curve and AUC to evaluate probabilistic classifiers across thresholds, using true positive rate and false positive rate to illustrate threshold-agnostic ranking.
DEMO: The ROC Curve & AUC3:46
Demonstrates evaluating a classifier with the ROC curve and AUC, using scikit-learn metrics to compute false positive and true positive rates from predicted probabilities, and plotting the curve.
Classification Metrics Recap2:22
Recap of key classification metrics for model tuning, including accuracy, precision, recall, f1, and roc auc, with guidance on when each metric matters, especially with imbalanced data.
ASSIGNMENT: Threshold Shifting1:25
Shift the model threshold to maximize the F1 score, plot precision, recall, and F1 versus threshold, and report metrics at the optimum. Plot the ROC curve and report the AUC.
SOLUTION: Threshold Shifting5:33
Plot precision-recall and F1 vs threshold to identify the optimal threshold around 0.32 using training data, then compare precision, recall, F1, and ROC-AUC to understand trade-offs.
Multi-class Metrics5:43
Explore multiclass confusion matrices to assess model predictions, compute per-class and weighted metrics (accuracy, precision, recall), and implement calculations with Python.
Multi-class Metrics in Python1:36
Explore multiclass metrics in python with scikit-learn, using actual y and predicted y in precision score and selecting average none, macro, or weighted for imbalanced classes.
ASSIGNMENT: Multi-class Metrics1:00
Learn to compute precision and recall by class for a fitted multiclass model, and derive accuracy and weighted-average precision and recall on test data.
SOLUTION: Multi-class Metrics2:54
Explore multi-class metrics in Python data science by calculating precision, recall, and F1 scores using scikit-learn, comparing class-level and weighted averages, and relating recall to accuracy.
Key Takeaways1:32
Use accuracy for balanced data, then leverage precision, recall, and F1 for imbalanced problems; adjust thresholds and compare ROC and AUC, with AUC near one indicating strong ranking.
Classification Metrics

Imbalanced Data4:03
Explore techniques for modeling imbalanced data, including oversampling, SMOTE, and undersampling, and tune thresholds and class weights to improve rare-event detection and overall metrics.
Managing Imbalanced Data4:03
Choose the correct metric during scoping, then balance imbalanced data with tuning the decision threshold, sampling, and class weights to improve precision and recall.
Threshold Shifting2:24
Sampling Strategies1:49
Oversampling1:30
Explore oversampling, duplicating minority class observations to balance data and improve a model's ability to distinguish the positive class, while noting risks of overfitting and larger data size.
Oversampling in Python2:44
DEMO: Oversampling4:32
Install imbalance learn and demo oversampling to balance the data by increasing the positive class using a random oversampler, then evaluate F1 improvements across ratios and threshold tuning, including smote.
SMOTE1:10
Explore Smote, a distance-based oversampling technique that creates synthetic minority observations near existing ones, reducing overfitting compared with duplication, while noting the higher computational cost and need for cross validation.
SMOTE in Python2:31
learn to apply smote in python with the EMB learn library, configure the oversampling ratio (e.g., 4, 8, 16), and compare F1 scores to improve model performance.
Undersampling2:11
Explore undersampling, or down sampling, to balance imbalanced data by randomly removing majority class rows, improving model performance while reducing data size and training time in Python.
Undersampling in Python5:14
Explore undersampling in Python using the IMDb learn library and random under sampler to balance imbalanced data, adjust minority–majority ratios, and evaluate logistic regression with F1-score and threshold tuning.
ASSIGNMENT: Sampling Methods2:19
Explore imbalance techniques by applying oversampling, undersampling, and SMOTE to an income data set, then tune logistic regression thresholds to improve F1 and related metrics.
SOLUTION: Sampling Methods5:21
Explore how to apply oversampling and undersampling methods, including Smote and random oversampling, to logistic regression, compare accuracy, precision, recall, and F1, and optimize thresholds.
Changing Class Weights2:51
Explore changing class weights in imbalanced data to improve logistic regression performance, comparing default, balanced, and forex weighting, and evaluating with F1 and AUC.
DEMO: Changing Class Weights2:55
Tune class weights and thresholds to improve F1 and accuracy in a binary classifier. Use AUC, precision, and recall to compare balanced, 4-to-1, and threshold-based strategies.
ASSIGNMENT: Changing Class Weights0:59
Experiment with logistic regression by testing standard, balanced, and 4-to-1 class weights to maximize AUC, then tune the threshold to maximize F1 using the imbalanced data assignments notebook.
SOLUTION: Changing Class Weights3:25
Compare three logistic regression models with no weighting, balanced weighting, and four-to-one positive weighting, and evaluate with accuracy, auc, and f1 to determine if standard weighting suffices.
Imbalanced Data Recap1:50
Recap strategies for imbalanced data: establish a baseline, tune the threshold, apply sampling methods (oversampling with SMOTE, undersampling), and adjust class weights, guided by cross-validation.
Key Takeaways1:08
Imbalanced Data

Decision Trees3:44
Explore how decision trees classify data by splitting on features to maximize information gain and entropy reduction, then fit, visualize, and tune tree depth and leaf size.
Entropy5:45
Explore how entropy measures class impurity and guides decision tree splits, using churn prediction with last login time, age, lifetime value, and sign up date to maximize information gain.
Decision Tree Predictions4:07
Predict with decision trees from root to leaf nodes, using soft and hard classifications, information gain splits, and hyperparameter tuning to prevent overfitting; compare random forest and gradient boosting ensembles.
Decision Trees in Python2:59
Learn to build decision trees in Python with scikit-learn, tune hyperparameters, fit models, evaluate training versus test accuracy, and visualize splits with plot_tree to highlight age and salary.
DEMO: Decision Trees3:55
demonstrates preprocessing a loan default dataset, imputing numeric values with the mean and categorical values with the mode, training a sklearn decision tree, and examining data quality issues.
Feature Importance4:58
Learn how feature importance sums to one and reveals each feature's contribution to a tree model's accuracy, and use low-importance features to remove them from your model, potentially improving generalization.
ASSIGNMENT: Decision Trees1:14
Develop a decision tree classifier with max depth 3 using age, hours per week, and gender; evaluate accuracy, confusion matrix, and feature importance, and visualize the tree.
SOLUTION: Decision Trees5:52
Train a simple decision tree on income data, evaluate train and test accuracy, precision, and recall, and examine feature importance (marital status, capital gain, hours, age, education) ahead of tuning.
Hyperparameter Tuning for Decision Trees4:17
Learn how decision tree hyperparameters like max depth, minsamplesleaf, criteria, and class weights control overfitting, with cross-validation and grid search to boost out-of-sample performance.
DEMO: Hyperparameter Tuning2:33
Apply grid search to tune a decision tree classifier, selecting max depth and min samples leaf, yielding a simpler, more interpretable model with improved test accuracy and highlighted feature importance.
ASSIGNMENT: Tuned Decision Tree0:48
SOLUTION: Tuned Decision Tree4:11
Pros & Cons of Decision Trees2:34
Key Takeaways1:00
Explore how decision trees use entropy to split data, visualize early splits, and tune hyperparameters and depth to prevent overfitting while evaluating feature importance.
Decision Trees

Requirements

We strongly recommend taking our Data Prep & EDA and Regression courses before this one
Jupyter Notebooks (free download, we'll walk through the install)
Familiarity with base Python and Pandas is recommended, but not required

Description

This is a hands-on, project-based course designed to help you master the foundations for classification modeling and supervised machine learning in Python.

We’ll start by reviewing the Python data science workflow, discussing the primary goals & types of classification algorithms, and do a deep dive into the classification modeling steps we’ll be using throughout the course.

You’ll learn to perform exploratory data analysis (EDA), leverage feature engineering techniques like scaling, dummy variables, and binning, and prepare data for modeling by splitting it into train, test, and validation datasets.

From there, we’ll fit K-Nearest Neighbors & Logistic Regression models, and build an intuition for interpreting their coefficients and evaluating their performance using tools like confusion matrices and metrics like accuracy, precision, and recall. We’ll also cover techniques for modeling imbalanced data, including threshold tuning, sampling methods like oversampling & SMOTE, and adjusting class weights in the model cost function.

Throughout the course, you'll play the role of Data Scientist for the risk management department at Maven National Bank. Using the skills you learn throughout the course, you'll use Python to explore their data and build classification models to accurately determine which customers have high, medium, and low credit risk based on their profiles.

Last but not least, you'll learn to build and evaluate decision tree models for classification. You’ll fit, visualize, and fine-tune these models using Python, then apply your knowledge to more advanced ensemble models like random forests and gradient boosted machines.

COURSE OUTLINE:

Intro to Data Science in Python
- Introduce the fields of data science and machine learning, review essential skills, and introduce each phase of the data science workflow
Classification 101
- Review the basics of classification, including key terms, the types and goals of classification modeling, and the modeling workflow
Pre-Modeling Data Prep & EDA
- Recap the data prep & EDA steps required to perform modeling, including key techniques to explore the target, features, and their relationships
K-Nearest Neighbors
- Learn how the k-nearest neighbors (KNN) algorithm classifies data points and practice building KNN models in Python
Logistic Regression
- Introduce logistic regression, learn the math behind the model, and practice fitting them and tuning regularization strength
Classification Metrics
- Learn how and when to use several important metrics for evaluating classification models, such as precision, recall, F1 score, and ROC-AUC
Imbalanced Data
- Understand the challenges of modeling imbalanced data and learn strategies for improving model performance in these scenarios
Decision Trees
- Build and evaluate decision tree models, algorithms that look for the splits in your data that best separate your classes
Ensemble Models
- Get familiar with the basics of ensemble models, then dive into specific models like random forests and gradient boosted machines

__________

Ready to dive in? Join today and get immediate, LIFETIME access to the following:

9.5 hours of high-quality video
18 homework assignments
9 quizzes
2 projects
Python Data Science: Classification ebook (250+ pages)
Downloadable project files & solutions
Expert support and Q&A forum
30-day Udemy satisfaction guarantee

If you're a business intelligence professional or aspiring data scientist looking for an introduction to the world of classification modeling with Python, this is the course for you.

Happy learning!

-Chris Bruehl (Data Science Expert & Lead Python Instructor, Maven Analytics)

__________

Looking for our full business intelligence stack? Search for "Maven Analytics" to browse our full course library, including Excel, Power BI, MySQL, Tableau and Machine Learning courses!

See why our courses are among the TOP-RATED on Udemy:

"Some of the BEST courses I've ever taken. I've studied several programming languages, Excel, VBA and web dev, and Maven is among the very best I've seen!" Russ C.

"This is my fourth course from Maven Analytics and my fourth 5-star review, so I'm running out of things to say. I wish Maven was in my life earlier!" Tatsiana M.

"Maven Analytics should become the new standard for all courses taught on Udemy!" Jonah M.

Who this course is for:

Data scientists who want to learn how to build and apply supervised learning models in Python
Analysts or BI experts looking to learn about classification modeling or transition into a data science role
Anyone interested in learning one of the most popular open source programming languages in the world

Python Data Science: Classification Modeling

What you'll learn

Explore related topics

Course content

Introduction8 lectures • 14min

Intro to Data Science9 lectures • 19min

Classification 1015 lectures • 14min

Data Prep & EDA26 lectures • 1hr 55min

K-Nearest Neighbors17 lectures • 1hr 13min

Logistic Regression21 lectures • 1hr 11min

Classification Metrics20 lectures • 1hr 11min

Imbalanced Data19 lectures • 53min

Mid-Course Project2 lectures • 16min

Decision Trees14 lectures • 48min

Requirements

Description

Who this course is for: