Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Python Data Science: Unsupervised Machine Learning

Name: Python Data Science: Unsupervised Machine Learning
Rating: 4.9 (416 reviews)

Learn Python for data science & machine learning, and build unsupervised learning models w/ a top Python instructor!

Created byMaven Analytics • 1,500,000 Learners, Alice Zhao

Last updated 11/2025

English

What you'll learn

Master the foundations of unsupervised Machine Learning in Python, including clustering, anomaly detection, dimensionality reduction, and recommenders
Prepare data for modeling by applying feature engineering, selection, and scaling
Fit, tune, and interpret three types of clustering algorithms: K-Means Clustering, Hierarchical Clustering, and DBSCAN
Use unsupervised learning techniques like Isolation Forests and DBSCAN for anomaly detection
Apply and interpret two types of dimensionality reduction models: Principal Component Analysis (PCA) and t-SNE
Build recommendation engines using content-based and collaborative filtering techniques, including Cosine Similarity and Singular Value Decomposition (SVD)

Course content

13 sections • 202 lectures • 16h 47m total length

Course Introduction2:57
Learn unsupervised learning in Python through hands-on clustering, anomaly detection, and dimensionality reduction, using k-means, hierarchical clustering, dbscan, isolation forest, PCA, t-SNE, and cosine similarity-based and SVD-based recommenders.
About This Series0:47
Discover unsupervised learning in Python in part four of five-part series on applying data science, covering clustering and dimensionality reduction as you progress from data prep to natural language processing.
Course Structure & Outline4:18
This project-based Python data science course teaches unsupervised learning through clustering, anomaly detection, dimensionality reduction, and recommender systems, with interactive demos, quizzes, and a downloadable slide pdf.
READ ME: Important Notes for New Students2:18
DOWNLOAD: Course Resources0:14
Introducing the Course Project1:02
Prepare and apply unsupervised learning on an HR analytics dataset to cluster employees, visualize clusters with dimensionality reduction, perform exploratory data analysis, and propose retention improvements.
Setting Expectations1:40
Explore unsupervised learning in Python, covering clustering and dimensionality reduction with K-means, hierarchical clustering, DBSCAN, PCA, and t-SNE, plus anomaly detection and recommender applications using isolation forest and SVD.
Jupyter Installation & Launch8:15
Install Anaconda, a package manager that includes Python and Jupyter Notebook, then launch a Jupyter Notebook to write Python code, organize notebooks, and learn basic workflow.

Section Introduction1:09
Explore the field of data science and distinguish it from other data disciplines. Then walk through the data science workflow, supervised and unsupervised learning, and common machine learning algorithms.
What is Data Science?1:03
Data science uses data to make smart decisions, differentiating descriptive analytics for the past from predictive analytics for the future, and clarifying its relation to data analysis and business intelligence.
Data Science Skill Set2:13
Data scientists combine coding, math, and domain expertise with soft skills like communication and problem solving. They work with larger data sets and advanced algorithms, differentiating themselves from data analysts.
What is Machine Learning?1:55
Data scientists use machine learning algorithms to enable computers to learn from data and make decisions. Unsupervised learning finds patterns and groups data, enabling customer segmentation and TV show recommendations.
Common Machine Learning Algorithms3:03
Examine common machine learning algorithms within supervised and unsupervised learning, including regression and classification, reinforcement learning, and natural language processing concepts like Naive Bayes and topic modeling.
Data Science Workflow1:00
Define the data science workflow by scoping a project, gathering and cleaning data, exploring data, applying models, and sharing insights, acknowledging the non-linear, iterative nature.
Step 1: Scoping a Project1:25
Scope a project by identifying stakeholders and business problems, choose supervised or unsupervised learning, and define the data needed to proceed.
Step 2: Gathering Data1:07
Gather data strategically by defining problem first and choosing datasets from csv and txt files, spreadsheets, relational SQL and NoSQL databases, websites with scraping, and APIs that Python can access.
Step 3: Cleaning Data1:20
Clean your data to prevent garbage in, garbage out, and learn techniques for resolving formatting issues, correcting data types, imputing missing values, and restructuring data.
Step 4: Exploring Data1:12
Explore data with exploratory data analysis (EDA) to understand data structure, visualize patterns, and guide modeling. Use slicing, filtering, profiling, and visualization to assess cleanliness and generate insights.
Step 5: Modeling Data1:29
Model data by restructuring and preparing features, then fit models to reveal patterns in unsupervised learning or make predictions in supervised tasks, focusing on simple, classic techniques and interpreting results.
Step 6: Sharing Insights1:22
Reiterate the original problem, summarize analysis results and interpretations, and share business-focused recommendations and next steps. Consider deploying the model to let users discover insights directly.
Unsupervised Learning0:50
Explore how unsupervised learning fits the data science workflow, from data prep and EDA to cleaning, exploring, and modeling, revealing patterns and generating insights across stages.
Key Takeaways1:47
Discover how data science uses data to drive smart decisions with a focus on unsupervised learning to find patterns, and learn the data science workflow from scope to sharing insights.
Intro to Data Science

Section Introduction0:45
Explore the basics of unsupervised learning, key concepts, techniques, and applications, and see how these techniques fit into the data science workflow.
Unsupervised Learning 1014:39
Explore unsupervised learning by clustering customers based on listening behavior, using features to reveal patterns without labels and naming clusters like music lovers, podcast enthusiasts, and casual listeners.
Unsupervised Learning Techniques3:39
Explore unsupervised learning techniques, focusing on clustering and dimensionality reduction. Learn how K-means, hierarchical clustering, DBSCAN, PCA, and t-SNE enable anomaly detection, segmentation, and data visualization.
Unsupervised Learning Applications2:17
Explore anomaly detection and recommender systems using unsupervised techniques like clustering and dimensionality reduction, plus methods such as isolation forests, time series analysis, and cosine similarity.
Structure of This Course1:39
Explore unsupervised learning techniques and applications, including clustering (K-means, hierarchical, DBSCAN), anomaly detection with isolation forests, and dimensionality reduction (PCA, t-SNE, SVD) for visualization and recommenders.
Unsupervised Learning Workflow4:56
Learn the unsupervised learning workflow from data prep to algorithm tuning, featuring clustering and dimensionality reduction, feature engineering and scaling, with inertia and intuition guided by domain expertise.
Key Takeaways1:39
Explore patterns in data with unsupervised learning, focusing on data structure and algorithms, where there are no predictions or labels, using clustering and dimensionality reduction.
Unsupervised Learning 101

Section Introduction1:24
Learn essential data prep steps for unsupervised learning, including aligning rows and columns, ensuring non-null numeric features, and applying feature engineering, selection, and scaling.
Data Prep for Unsupervised Learning2:34
Master five data prep steps to transform data into unsupervised learning inputs: set row granularity, ensure numeric non-null columns, engineer and select features excluding identifiers, and scale for distance-based algorithms.
Setting the Correct Row Granularity6:28
Set the correct row granularity by making each customer a single row, then reshape data with groupby and pivot for unsupervised clustering; learn reset index and melt basics.
DEMO: Group By6:37
Create a new Jupyter notebook to practice reshaping data with group by in Python, then aggregate by customer and reset the index to produce a clean dataframe.
DEMO: Pivot4:25
Demonstrates pivoting a data frame in Pandas by turning customers into rows and genres into columns, filling missing values with zeros and flattening the index.
ASSIGNMENT: Setting the Correct Row Granularity2:21
Format data for analysis by setting the correct row granularity with pandas, using groupby or pivot on an entertainment preferences dataset, and verify a 150-row output.
SOLUTION: Setting the Correct Row Granularity5:29
Walk through solving assignment one by loading entertainment data with pandas, reading an Excel file, pivoting to one row per student, and saving the transformed data frame with 150 rows.
Preparing Columns for Modeling2:04
Prepare columns for unsupervised modeling by ensuring non-null, numeric features; impute or remove missing values, convert text to numbers, apply conditional logic with np.where, and create dummy variables for categoricals.
Identifying Missing Data6:35
Identify missing data in a data frame using info and is.na, and flag nan (not a number) values. Filter rows with true values via any axis=1 to inspect missing data.
Handling Missing Data8:09
Handle missing data in pandas data frames by dropping rows or columns with missing values, resetting the index, and imputing values with the median age and with zero followers.
Converting to Numeric8:04
Convert text fields to numeric in pandas by removing dollar signs and commas with str.replace, then apply pd.to_numeric to the income column for modeling.
Converting to DateTime6:55
Convert sign up date from text to datetime with pd.to_datetime using a specified format, then extract numeric components for modeling; handle spaces in column names and address parsing warnings.
Extracting DateTime5:23
Learn to extract date time components in pandas with dt methods to create sign up month and sign up day of week columns, then drop the date column before modeling.
Calculating Based on a Condition3:49
Learn how to convert categorical text to numeric indicators in pandas using numpy where, turning yes/no discounts into a 0/1 column for model input.
Dummy Variables5:41
Transform categorical fields into numeric features using dummy variables or one-hot encoding, applying pandas get_dummies, converting booleans to 0/1, and combining for modeling.
ASSIGNMENT: Preparing Columns for Modeling0:58
Identify missing values and fill them with zeros to prepare columns for modeling, then create column video game lover set to one if hours exceed seven a week, else zero.
SOLUTION: Preparing Columns for Modeling2:25
Identify missing values, fill books with zeros, and create a video game lover feature with numpy where hours played > 7 for modeling.
Feature Engineering3:17
Create new features by adding columns to strengthen model inputs, such as aggregating genre songs and deriving an age feature from external data, then apply calculations and identify proxy variables.
Feature Engineering During Data Prep2:29
Review feature engineering in data prep stage, using feature aggregation to set row granularity, and impute missing data plus encode categoricals with indicator and dummy columns for non-null numeric features.
Applying Calculations4:49
Apply feature engineering techniques by creating new features through calculations, such as percent pop, using numerator and denominator, and combining columns with pandas like pd.concat and axis=1.
Binning Values3:42
Bin numeric features into discrete categories with np.where, turning sign up day of week into a weekend versus weekday indicator. Replace the original column with weekend and apply age-range bins.
Identifying Proxy Variables5:01
Learn how proxy variables use external data to approximate hard-to-measure features, such as average temperature for signup month, and merge dataframes to create numeric, model-ready features.
Feature Engineering Tips1:48
Leverage domain expertise to engineer meaningful features, favor long data with many rows and few columns, start simple, and continually revisit data prep during modeling.
ASSIGNMENT: Feature Engineering0:48
Apply feature engineering to create total entertainment and percent screen columns for each student, summing weekly entertainment hours and calculating screen usage excluding books, in the data prep Jupyter notebook.
SOLUTION: Feature Engineering1:42
The lecture demonstrates feature engineering by creating two new columns: total entertainment and percent screen, computed from books, movies, tv shows, and video games to summarize a student’s media exposure.
Excluding Identifiers From Modeling2:34
Exclude identifier columns during feature selection, but keep them for interpretation; use Jupyter notebook to save the name column as a series named names, and drop it from the data.
Feature Selection4:47
Learn to select a subset of features for modeling using intuition and MVP, then start simple with a few features and iteratively refine to differentiate customers in Python.
ASSIGNMENT: Feature Selection0:45
Master feature selection by saving the student name as a series and compiling a modeling data frame with three engineered features: video game lover, total entertainment, and percent screen.
SOLUTION: Feature Selection2:14
Perform feature selection in pandas by extracting the student name as a series and selecting the last three columns for modeling: video game lover, total entertainment, percent screen.
Feature Scaling2:18
Explain feature scaling as an optional data prep step for unsupervised learning, covering normalization and standardization, and why scaling matters for distance-based algorithms.
Normalization7:46
Normalize data by scaling all features to a 0 to 1 range with scikit-learn's MinMaxScaler, using fit, transform, or fit_transform to place columns on a common scale.
Standardization5:12
Standardization scales data by transforming each column to a mean of zero and a standard deviation of one, ideal for normally distributed features, using standard scaler in Python.
ASSIGNMENT: Feature Scaling0:41
Scale the three features in the data frame to zero mean and unit variance for a distance-based clustering approach. Save the transformed data as a final modeling-ready data frame.
SOLUTION: Feature Scaling3:09
Learn how to scale features with a standard scaler to achieve a mean of zero and standard deviation of one, and save the scaled data frame for modeling.
Key Takeaways1:38
Review data prep for unsupervised learning, covering row and column preparation, feature engineering, scaling with normalization or standardization, and techniques like groupby, pivot, fillna, np.where, and pd.get_dummies.
Pre-Modeling Data Prep

Section Introduction1:16
Explore the fundamentals of clustering and compare k-means clustering, hierarchical clustering, and DBSCAN, covering theory, Python implementations, and emphasis on interpretation to answer business questions.
Clustering Basics4:21
Visualize data in unsupervised learning to identify clusters, prepare features, scale data, and apply k-means, hierarchical clustering, and dbscan. Tune with inertia and intuition toward business insights.
K-Means Clustering6:25
Learn k-means clustering, the unsupervised algorithm that assigns data points to k clusters using centroids and iterates until stable. Apply it in Python to customer segmentation and clustering store locations.
K-Means Clustering in Python7:53
Apply k-means clustering in Python using scikit-learn, choosing n_clusters and random_state to produce stable results. Learn about initialization, inertia, and how to interpret cluster assignments.
DEMO: K-Means Clustering in Python10:06
Create a k-means clustering model in Python by preparing data in a notebook, reading a csv with pandas, cleaning and selecting numeric features, and fitting two clusters with scikit learn.
Visualizing K-Means Clustering7:14
Visualize clusters from a k means clustering model by using dot labels and python libraries to create a 3d scatter plot with books, tv shows, and video games as axes.
Interpreting K-Means Clustering7:36
Interpret cluster centers from a two-cluster k-means model to reveal entertainment-hour patterns and guide naming with domain intuition.
Visualizing Cluster Centers9:16
Learn to visualize cluster centers with a heat map by converting k-means centers into a data frame and using seaborn to interpret three clusters.
ASSIGNMENT: K-Means Clustering1:26
Apply k-means clustering to the serial data set, using two clusters, by reading serial.csv, preparing numeric data (dropping name and manufacturer), and interpreting the cluster centers for the assignment.
SOLUTION: K-Means Clustering7:23
Apply a two-cluster k-means model to cereal data after dropping name and manufacturer and selecting numeric features. Visualize cluster centers with a heat map and interpret for kids vs adults.
Inertia5:45
Calculate inertia (within cluster sum of squares) to compare k means models with different cluster counts, plot inertia vs k, and identify the elbow to choose a suitable k.
Plotting Inertia in Python2:46
Explore how to plot inertia for k-means in Python by fitting models from 2 to 15 clusters, recording inertia values, and visualizing the results to identify the elbow.
DEMO: Plotting Inertia in Python11:25
Explore inertia values for k-means models from two to fifteen clusters, plot the inertia to identify elbows, and compare three- and five-cluster solutions.
ASSIGNMENT: Inertia Plot1:07
Fit 14 k-means models with 2–15 clusters and plot inertia to locate the elbow. Identify the optimal cluster count and interpret the resulting cluster centers with a heat map.
SOLUTION: Inertia Plot6:04
Fit k-means models for 2–15 clusters, plot inertia to locate the elbow at three, and interpret the three cluster centers with a heat map of calories and vitamins and minerals.
Tuning a K-Means Model4:53
Tune a k-means model by refining data prep, outlier removal, scaling, and feature engineering to improve cluster quality and stability.
DEMO: Tuning a K-Means Model8:23
Tune a k-means model by engineering features and scaling data version two. Fit 2–15 clusters and use the inertia elbow to select four.
ASSIGNMENT: Tuning a K-Means Model1:04
Tune a k-means model by removing the fat column, standardizing the remaining features, iterating 2 to 15 clusters via inertia plots to find elbow, and interpret centers with heat map.
SOLUTION: Tuning a K-Means Model9:17
Remove the fat column and standardize the remaining features, then fit a six-cluster k-means model and interpret clusters by calories, sugar, protein, and vitamins and minerals.
Selecting the Best Model6:03
Explore how to select the best clustering model by comparing cluster assignments and metrics, testing on new data, and prioritizing results that meaningfully solve the business problem.
DEMO: Selecting the Best Model13:55
Compare two k-means models to select the best clustering; map cluster labels to names like non-readers and entertainment enthusiasts, and examine centers to guide targeted ads.
ASSIGNMENT: Selecting the Best K-Means Model1:23
Compare two k-means models by labeling rows with unstandardized and standardized clusters, tally serials per cluster, and recommend the best model plus the number of serial displays for Maven Supermarket.
SOLUTION: Selecting the Best K-Means Model13:44
Label data with k-means cluster names, compare distribution across models, and map clusters to cereal types to decide the best model. Recommend display placements using pandas filtering and sorting.
Hierarchical Clustering13:25
Explore hierarchical clustering as an agglomerative method that builds a dendrogram by iteratively merging closest points and clusters, comparing linkage methods like single, complete, average, and Ward's.
Dendrograms in Python10:53
Explore hierarchical clustering in Python using SciPy's dendrogram and linkage functions, with Euclidean distance, to visualize data points and clusters. Learn how to adjust color threshold to reveal three clusters.
Agglomerative Clustering in Python3:59
Learn to apply agglomerative (hierarchical) clustering in scikit-learn to data, using n_clusters, Euclidean distance, and Ward linkage, and compare to K-means while interpreting dendrograms.
DEMO: Agglomerative Clustering in Python6:11
Learn to perform agglomerative hierarchical clustering in Python using scikit-learn, compare with SciPy dendrograms to determine cluster counts, fit models, and analyze labels and cluster sizes.
Cluster Maps in Python3:20
Visualize agglomerative clustering results with Seaborn's cluster map, combining a heatmap and dendrogram to interpret clusters of students across books, TV shows, and video games.
DEMO: Cluster Maps in Python12:03
Demonstrates cluster maps in Python using agglomerative clustering with three and four clusters. Interpret SciPy dendrograms and cluster assignments, and map data points to clusters with F cluster and ivl.
ASSIGNMENT: Hierarchical Clustering1:26
Create dendrograms for both the original and standardized five-numeric-field and four-standardized-field serial data, identify number of clusters, then fit an agglomerative model and generate a cluster map to interpret results.
SOLUTION: Hierarchical Clustering7:22
Identify four clusters of cereals using hierarchical clustering on five numeric fields, visualized with a dendrogram and cluster map, and interpret clusters by vitamins, minerals, protein, calories, and sugar.
DBSCAN8:49
Explore density-based clustering with dbscan, using epsilon and min samples to label core, border, and noise points, forming irregular clusters and identifying outliers.
DBSCAN in Python4:27
Learn to apply dbscan in python with scikit-learn, using epsilon as the radius and min samples to define core points, clusters, and noting there is no random state.
Silhouette Score6:16
Use the silhouette score to compare clustering models such as k-means, hierarchical, and dbscan, since it ranges from -1 to 1 and indicates how well points fit their own clusters.
Silhouette Score in Python1:52
Learn to compute silhouette scores in Python using scikit-learn's silhouette_score with your data and cluster labels, including the default Euclidean metric and optional sample size in a Jupyter notebook demo.
DEMO: DBSCAN and Silhouette Score in Python19:07
Demonstrates tuning DBSCAN in Python with epsilon and min samples, computing silhouette scores to select the best model, using loops, data frames, and a tune db scan function.
ASSIGNMENT: DBSCAN1:08
Explore DBSCAN by looping epsilon from 0.1 to 2 in 0.1 steps and 2 to 10 in 1 step on original and standardized data, identify silhouette score, and fit model.
SOLUTION: DBSCAN4:45
Learn to tune dbscan by looping over eps and min_samples, evaluate with silhouette scores on original and standardized data, and select the best model with eps 1.9 and min_samples 4.
Comparing Clustering Algorithms7:56
Compare k-means, hierarchical clustering, and DBSCAN, outlining pros and cons to guide practical model selection. Learn when to prioritize interpretability, scalability, or density-based clustering for complex data.
Clustering Next Steps3:53
Compare clustering models using silhouette score and inertia, balance metrics with intuition to select the best model, and learn how to label unseen data with consistent data prep.
DEMO: Compare Clustering Models5:18
Compare three clustering models—k-means, agglomerative, and DBSCAN—by converting labels to series, counting cluster sizes, and evaluating silhouette scores to balance accuracy and interpretability.
DEMO: Label Unseen Data14:19
Label unseen data with the k means model, employing feature engineering and scaling, and predict cluster assignments for new students using the prepared data pipeline.
Key Takeaways2:13
Learn to cluster data by preparing features, then compare k means, hierarchical, and DBSCAN models using inertia, dendrograms, silhouette scores, and intuition to answer business questions.
Clustering

Project Overview2:05
Cluster client data by scaling features and applying k-means, hierarchical, and dbscan techniques. Evaluate segments with silhouette score and predict a new client's cluster using the best model.
SOLUTION: Data Prep4:26
Read the wholesale client csv, drop channel and region, and standardize the six spending features to mean zero and std near one for a 440-row dataset.
SOLUTION: K-Means Clustering16:54
Apply k-means clustering on scaled data and use inertia plots to identify the elbow at five clusters, then visualize the five clusters with a heat map of product categories.
SOLUTION: Hierarchical Clustering15:56
Master hierarchical clustering with dendrograms, agglomerative models, and cluster maps; explore row-wise z-score scaling, threshold tuning, and silhouette-based cluster selection.
SOLUTION: DBSCAN4:52
Tune dbscan on scaled data to optimize clustering using silhouette scores. Expand epsilon up to five, compare models, and select the best configuration.
SOLUTION: Compare, Recommend and Predict8:40
Compare clustering techniques, including k-means, hierarchical, and DBSCAN, on scaled data using silhouette scores. Adopt three-cluster k-means, review cluster centers and segments, and apply scaling for new data predictions.

Section Introduction0:47
Explore anomaly detection within the data science workflow and compare unsupervised techniques—isolation forests and DBSCAN—through Python applications and interpretation of results.
Anomaly Detection Basics2:29
Learn anomaly detection basics and the interchangeable use of anomalies and outliers. Visualize with two-dimensional and three-dimensional scatter plots, especially across many features, and apply advanced modeling when needed.
Anomaly Detection Approaches6:24
See how anomaly detection sits in cleaning and modeling steps, using unsupervised techniques like isolation force and DBscan to identify data issues and uncover insights.
Anomaly Detection Workflow2:16
Apply the anomaly detection workflow by preparing numeric, non-null data in a single table, then model with isolation forest or DBSCAN, and iterate with plots to meet business objectives.
Isolation Forests9:33
Explore how isolation forests detect anomalies by building many random-split trees, measuring path lengths, and scoring observations to identify outliers in fraud, sensor, and patient data.
Isolation Forests in Python7:57
Learn to build an isolation forest in Python with scikit-learn, configure contamination and random state, and interpret anomaly scores and flags using decision_function and predict.
Visualizing Anomalies6:59
Visualize anomalies with Seaborn pair plots by coloring points using an anomaly flag from isolation forest across books, TV shows, and video games.
Tuning and Interpreting Isolation Forests7:56
Tune the isolation forest contamination from 2% to 5% and predict anomalies. Interpret results by sorting data and examining anomaly flags and scores with pair plots and visualizations.
ASSIGNMENT: Isolation Forests1:15
Preprocess the Tripadvisor review csv by removing the user id, view rating ranges, visualize with a seaborn pair plot, and detect anomalies with isolation forests at 0.01 and 0.005 contamination.
SOLUTION: Isolation Forests10:37
Explore unsupervised learning with isolation forests to detect anomalies in numeric TripAdvisor review data, using pandas for prep and seaborn pair plots for visualization.
DBSCAN for Anomaly Detection1:23
Harness DBSCAN to detect anomalies by identifying core points, border points, and noise points in dense regions, revealing outliers as anomalies. Translate clustering techniques to anomaly detection in Python.
DBSCAN for Anomaly Detection in Python8:13
Explore dbscan for anomaly detection in Python using scikit-learn, with epsilon and min samples, data scaling, and labeling anomalies as noise.
Visualizing DBSCAN Anomalies5:32
Visualize dbscan anomalies with cluster labels in a pair plot to reveal anomalies alongside data trends. Compare these dbscan findings with isolation forest results to inform stakeholders through visualization-driven insights.
ASSIGNMENT: DBSCAN for Anomaly Detection0:46
Apply dbscan for anomaly detection on the tourist rating data set, visualize, identify the silhouette score to pick the best eps and min_samples, and note anomalies on a pair plot.
SOLUTION: DBSCAN for Anomaly Detection6:57
Apply and tune DBSCAN for anomaly detection on user ratings, compare with isolation forest, and identify and visualize anomalous patterns to refine future analyses.
Comparing Anomaly Detection Algorithms3:45
Compare anomaly detection algorithms, highlighting isolation forests as efficient for high-dimensional data and global anomalies, and DBSCAN for local anomalies and complex clusters, with practical start-with-isolation-forest guidance.
RECAP: Clustering and Anomaly Detection1:56
Explore how unsupervised learning reveals data relationships by comparing points for similarity and difference, using clustering (including dbscan) to form groups and anomaly detection to highlight outliers.
Key Takeaways2:00
Use unsupervised anomaly detection with isolation forest and DBSCAN to identify unusual observations, then visualize, tune, and decide whether to explore or exclude anomalies.
Anomaly Detection

Section Introduction1:21
Explore dimensionality reduction within unsupervised learning, focusing on PCA and t-SNE, with practical Python implementations and interpretation of results in the data science workflow.
Dimensionality Reduction Basics3:03
Explore dimensionality reduction, transforming data from three dimensions into two pc1 and pc2 components while preserving information and revealing clearer clusters.
Why Reduce Dimensions?8:46
Discover why reducing dimensions helps visualize high-dimensional data and improve modeling, using PCA, t-SNE, and SVD for feature extraction, supervised learning, and unsupervised discovery.
Dimensionality Reduction Workflow3:17
Learn the dimensionality reduction workflow—from data prep to modeling and tuning—using scaling and PCA, t-SNE, or SVD, with explained variance guiding iteration.
Principal Component Analysis15:18
Learn how principal component analysis reduces dimensions by projecting data onto the most variant linear combination, PC1, and using eigen decomposition to transform and visualize clusters.
Principal Component Analysis in Python4:39
Apply principal component analysis in Python using scikit-learn's PCA, including centering data, choosing n_components, and deciding when to standardize for visualization or feature extraction.
Explained Variance Ratio3:38
Explore explained variance ratio in PCA, showing how each principal component captures data variance, with the first component capturing the most and the sum equaling one, guiding component selection.
DEMO: PCA and Explained Variance Ratio in Python7:24
Learn to perform PCA on numeric data in Python by centering inputs, fitting with n_components, and interpreting explained_variance_ratio to reduce dimensions for visualization or feature extraction, aiming for 80–90% variance.
ASSIGNMENT: Principal Component Analysis0:54
Apply principal component analysis to the student grades dataset, drop the student ID column, center the data, fit a two-component PCA, and interpret the explained variance ratios.
SOLUTION: Principal Component Analysis3:11
Demonstrates applying principal component analysis to student grades data: load data, drop the first column with student ID, center the features, fit PCA with two components, and interpret explained variance.
Interpreting PCA6:33
Interpret PCA outputs by examining component loadings, mapping PC1 and PC2 to original features, and visualizing in a scatter plot to reveal how books, TV shows, and video games relate.
DEMO: Interpreting PCA12:55
Interpret PCA by inspecting components and transforming data from three features to two. Visualize PC1 and PC2 to reveal patterns, such as books, TV shows, and video games driving clusters.
ASSIGNMENT: Interpreting PCA0:55
Interpret the PCA model components, then plot students on a two-dimensional scatter plot with PC1 and PC2, and interpret clusters to guide college recommendations.
SOLUTION: Interpreting PCA7:33
Interpret PCA components to show that higher PC1 means better grades, while PC2 differentiates stem versus humanities strengths; visualize with a two-component scatter plot to guide student recommendations.
Feature Selection vs Feature Extraction3:59
Explore feature selection and feature extraction as dimensionality reduction techniques; compare dropping columns with PCA, and apply to clustering and data visualization for cereals and other datasets.
PCA Next Steps4:47
Explore next steps after fitting PCA, including visualization with 2d or 3d components, data subset inputs, and applying identical PCA transformations to test data.
T-SNE18:09
Explore t-sne, a nonlinear dimensionality reduction that emphasizes point relationships over absolute distances. See how affinity and density shape neighbor probabilities to reveal clusters in a 2d visualization.
T-SNE in Python10:08
Explore applying t-SNE in Python with scikit-learn, configuring n_components for 2d visualization, using random_state for reproducibility, note that t-SNE doesn't require centering but scaling is recommended, and visualize clusters.
ASSIGNMENT: T-SNE0:30
Fit a t-SNE model with two components, plot a scatter of students with x as component one and y as component two, and interpret the plot data.
SOLUTION: T-SNE2:51
Apply t-SNE with two components to visualize grade data, fitting and transforming, then plot component one versus component two to reveal distinct clusters, contrasting with PCA results.
PCA vs t-SNE3:31
Compare PCA and t-SNE as dimensionality reduction methods, showing PCA's accurate, interpretable linear components and t-SNE's visually separated clusters despite not providing a true data representation.
DEMO: Dimensionality Reduction and Clustering9:26
Explore how dimensionality reduction and clustering reveal data structure. Visualize with PCA and t-SNE, then apply k-means and interpret inertia and silhouette scores to decide cluster counts.
ASSIGNMENT: T-SNE & K-Means Clustering0:31
Combine t-SNE and k-means clustering by fitting a three-cluster k-means model, overlaying the clusters on the t-SNE plot with colors, and interpreting the cluster centers.
SOLUTION: T-SNE & K-Means Clustering4:36
Fit a three-cluster k-means model on the grades data, visualize clusters with a t-SNE plot, and interpret centroid heat maps to reveal humanities vs stem patterns.
Key Takeaways2:42
Master dimensionality reduction concepts for unsupervised learning by using PCA for feature extraction and visualization, explained variance ratio as the metric, and t-SNE for clustering visualization.
Dimensionality Reduction

Section Introduction1:22
Explore the basics of recommenders, compare content-based and collaborative filtering, and learn to use cosine similarity and SVD to identify similar items and users.
Recommenders Basics4:51
Explore the basics of recommender systems, including content based filtering and collaborative filtering, with cosine similarity and singular value decomposition, and learn to build unsupervised recommenders.
Content-Based Filtering2:03
Learn how content-based filtering recommends items based on item characteristics, using features like energy, vitamin C, and sugar to compare fruits and suggest similar options such as mangoes.
Cosine Similarity7:11
Apply cosine similarity, a popular metric, to measure the angle between data points for recommenders. In the first quadrant, cosine similarity ranges from 0 to 1 for positive data.
Cosine Similarity in Python14:33
Compute cosine similarity in Python with scikit learn's cosine similarity function to compare fruit data frame rows. Analyze how sugar and vitamin c influence their similarity.
Making a Content Based Filtering Recommendation8:36
Apply content-based filtering with cosine similarity to generate fruit recommendations, filter by mangoes, compare two-column and six-column nutrition data, and build a reusable function with error handling.
ASSIGNMENT: Content-Based Filtering1:03
Apply content-based filtering by computing cosine similarity on genre labels from 1600 labeled movies to identify Toy Story's closest peers and return the top five most similar movies.
SOLUTION: Content-Based Filtering8:40
Explore content-based filtering with cosine similarity on a movie genres dataset, including data prep, indexing by title, and ranking top five recommendations similar to Toy Story.
Collaborative Filtering4:00
Explore collaborative filtering for personal recommendations by analyzing user behaviors, contrasting user-based and item-based approaches, and using a user-item matrix of ratings to predict new fruits a user may like.
User-Item Matrix7:46
Learn to pivot data into a user-item matrix for collaborative filtering in Python, fill missing ratings with the mean, and prepare data for machine learning models.
ASSIGNMENT: User-Item Matrix0:45
Kick off collaborative filtering by building a user item matrix from movie ratings, reading movies, users, and ratings into three data frames, then using Adapt Pivot to reshape for modeling.
SOLUTION: User-Item Matrix4:21
Read movie, user, and ratings data from multiple sheets, then pivot to a user-item matrix and fill missing ratings for later analysis.
Singular Value Decomposition8:42
Apply truncated singular value decomposition to the user item matrix, decomposing into u, sigma, and v transpose to reveal two latent features and reduce dimensionality for recommender systems.
Singular Value Decomposition in Python13:05
Apply truncated SVD in Python with scikit-learn to fit and transform a user-item matrix into two latent features, revealing U, Sigma, and V transpose and their singular values.
ASSIGNMENT: Singular Value Decomposition0:47
Apply truncated SVD to the user item matrix with two components, then view the matrix, Umatrix, and their shapes to verify dimensions.
SOLUTION: Singular Value Decomposition4:08
Apply truncated SVD to the user-item matrix to reduce 943 users and 1682 movies to two latent components, revealing the U and V matrices that capture user and movie features.
Choosing the Number of Components4:26
Learn to select the number of components for a truncated SVD model using explained variance ratio and the cumulative explained variance plot, aiming for about 80% with a practical cutoff.
DEMO: Choosing the Number of Components12:47
Demonstrate explained variance ratio for two-component and six-component SVD on a 30-by-6 user-item matrix. Use cumulative variance to decide the number of components, noting SVD ratios may not decrease.
ASSIGNMENT: Choosing the Number of Components0:54
Apply truncated SVD with 500 components, plot cumulative explained variance, and select the optimal component count to fit a final model that captures the data variance.
SOLUTION: Choosing the Number of Components7:25
Fit a truncated svd with 500 components, analyze the cumulative explained variance, and select 250 components to capture about 84% of the variance for the user-item matrix.
Making a Collaborative Filtering Recommendation8:17
Learn collaborative filtering with truncated SVD, fit and transform to create a user matrix, then transform a new user, fill missing values, and generate recommendations via a dot product.
DEMO: Making a Collaborative Filtering Recommendation11:57
Learn collaborative filtering with a two-component truncated SVD, transform a new user into latent space, and generate top fruit recommendations via the V transpose product while excluding rated items.
ASSIGNMENT: Collaborative Filtering1:13
Use a 250-component SVD to map a new user into latent space, generate ten movie recommendations, reconstruct the user-item matrix, and assess whether they make sense, with optional component adjustments.
SOLUTION: Collaborative Filtering11:31
Explore collaborative filtering with singular value decomposition to transform a new user into a latent space, reconstruct a user-item matrix, and generate top movie recommendations.
Recommender Next Steps6:23
Explore content-based and collaborative filtering, then build a hybrid recommender with popularity baseline, and evaluate with explained variance ratio; deploy, gather feedback, and refine for data sparsity and cold start.
DEMO: Hybrid Approach4:51
Combine four recommenders—collaborative filtering with truncated SVD, content-based filtering (sugar and vitamin C; six nutritional values), and the mean of the user item matrix—to form a hybrid recommender.
Key Takeaways2:38
Explore the two main types of recommenders—content-based filtering and collaborative filtering—and learn how cosine similarity and SVD drive item- and user-based recommendations, including hybrid approaches.
Recommenders

Project Overview1:51
Harness data prep and unsupervised recommenders, truncated SVD and cosine similarity, to deliver Maven Eats restaurant recommendations, with five home page picks and five similar restaurants on each detail page.
SOLUTION: Data Prep1:56
Import the restaurant ratings data, confirm a 0–2 scale with describe, pivot to a user-item matrix, fill NaNs with the mean, and note 138 users and 127 restaurants.
SOLUTION: TruncatedSVD11:22
Demonstrates collaborative filtering with truncated svd on a centered user-item matrix, selects components to reach about 80% explained variance, and delivers top restaurant recommendations with details.
SOLUTION: Cosine Similarity5:45
Convert restaurant data into numeric features using get dummies for cuisine, a single numeric price encoding, and a yes/no franchise indicator, then apply cosine similarity for recommendations.
SOLUTION: Recommendations4:28
Explore how collaborative filtering yields top ten restaurant recommendations via truncated SVD, and how tuning parameters, data prep, and hybrid content-based approaches with distance can refine them.

Requirements

We strongly recommend taking our Data Prep & EDA course before this one
Jupyter Notebooks (free download, we'll walk through the install)
Familiarity with base Python and Pandas is recommended, but not required

Description

This is a hands-on, project-based course designed to help you master the foundations for unsupervised machine learning in Python.

We’ll start by reviewing the Python data science workflow, discussing the techniques & applications of unsupervised learning, and walking through the data prep steps required for modeling. You’ll learn how to set the correct row granularity for modeling, apply feature engineering techniques, select relevant features, and scale your data using normalization and standardization.

From there we'll fit, tune, and interpret 3 popular clustering models using scikit-learn. We’ll start with K-Means Clustering, learn to interpret the output’s cluster centers, and use inertia plots to select the right number of clusters. Next, we’ll cover Hierarchical Clustering, where we’ll use dendrograms to identify clusters and cluster maps to interpret them. Finally, we’ll use DBSCAN to detect clusters and noise points and evaluate the models using their silhouette score.

We’ll also use DBSCAN and Isolation Forests for anomaly detection, a common application of unsupervised learning models for identifying outliers and anomalous patterns. You’ll learn to tune and interpret the results of each model and visualize the anomalies using pair plots.

Next, we’ll introduce the concept of dimensionality reduction, discuss its benefits for data science, and explore the stages in the data science workflow in which it can be applied. We’ll then cover two popular techniques: Principal Component Analysis, which is great for both feature extraction and data visualization, and t-SNE, which is ideal for data visualization.

Last but not least, we’ll introduce recommendation engines, and you'll practice creating both content-based and collaborative filtering recommenders using techniques such as Cosine Similarity and Singular Value Decomposition.

Throughout the course you'll play the role of an Associate Data Scientist for the HR Analytics team at a software company trying to increase employee retention. Using the skills you learn throughout the course, you'll use Python to segment the employees, visualize the clusters, and recommend next steps to increase retention.

COURSE OUTLINE:

Intro to Data Science in Python
- Introduce the fields of data science and machine learning, review essential skills, and introduce each phase of the data science workflow
Unsupervised Learning 101
- Review the basics of unsupervised learning, including key concepts, types of techniques and applications, and its place in the data science workflow
Pre-Modeling Data Prep
- Recap the data prep steps required to apply unsupervised learning models, including restructuring data, engineering & scaling features, and more
Clustering
- Apply three different clustering techniques in Python and learn to interpret their results using metrics, visualizations, and domain expertise
Anomaly Detection
- Understand where anomaly detection fits in the data science workflow, and apply techniques like Isolation Forests and DBSCAN in Python
Dimensionality Reduction
- Use techniques like Principal Component Analysis (PCA) and t-SNE in Python to reduce the number of features in a data set without losing information
Recommenders
- Recognize the variety of approaches for creating recommenders, then apply unsupervised learning techniques in Python, including Cosine Similarity and Singular Vector Decomposition (SVD)

__________

Ready to dive in? Join today and get immediate, LIFETIME access to the following:

16.5 hours of high-quality video
22 homework assignments
7 quizzes
3 projects
Python Data Science: Unsupervised Learning ebook (350+ pages)
Downloadable project files & solutions
Expert support and Q&A forum
30-day Udemy satisfaction guarantee

If you're a business intelligence professional or data scientist looking for a practical overview of unsupervised learning techniques in Python with a focus on interpretation, this is the course for you.

Happy learning!

-Alice Zhao (Python Expert & Data Science Instructor, Maven Analytics)

__________

Looking for our full business intelligence stack? Search for "Maven Analytics" to browse our full course library, including Excel, Power BI, MySQL, Tableau and Machine Learning courses!

See why our courses are among the TOP-RATED on Udemy:

"Some of the BEST courses I've ever taken. I've studied several programming languages, Excel, VBA and web dev, and Maven is among the very best I've seen!" Russ C.

"This is my fourth course from Maven Analytics and my fourth 5-star review, so I'm running out of things to say. I wish Maven was in my life earlier!" Tatsiana M.

"Maven Analytics should become the new standard for all courses taught on Udemy!" Jonah M.

Who this course is for:

Data scientists who want to learn how to build and interpret unsupervised learning models in Python
Analysts or BI experts looking to learn about unsupervised learning or transition into a data science role
Anyone interested in learning one of the most popular open source programming languages in the world

Python Data Science: Unsupervised Machine Learning

What you'll learn

Explore related topics

Course content

Getting Started8 lectures • 22min

Intro to Data Science14 lectures • 21min

Unsupervised Learning 1017 lectures • 20min

Pre-Modeling Data Prep35 lectures • 2hr 14min

Clustering43 lectures • 4hr 47min

PROJECT: Clustering Clients6 lectures • 53min

Anomaly Detection18 lectures • 1hr 27min

Dimensionality Reduction25 lectures • 2hr 21min

Recommenders27 lectures • 2hr 44min

PROJECT: Recommending Restaurants5 lectures • 25min

Requirements

Description

Who this course is for: