
Learn unsupervised learning in Python through hands-on clustering, anomaly detection, and dimensionality reduction, using k-means, hierarchical clustering, dbscan, isolation forest, PCA, t-SNE, and cosine similarity-based and SVD-based recommenders.
Discover unsupervised learning in Python in part four of five-part series on applying data science, covering clustering and dimensionality reduction as you progress from data prep to natural language processing.
This project-based Python data science course teaches unsupervised learning through clustering, anomaly detection, dimensionality reduction, and recommender systems, with interactive demos, quizzes, and a downloadable slide pdf.
Prepare and apply unsupervised learning on an HR analytics dataset to cluster employees, visualize clusters with dimensionality reduction, perform exploratory data analysis, and propose retention improvements.
Explore unsupervised learning in Python, covering clustering and dimensionality reduction with K-means, hierarchical clustering, DBSCAN, PCA, and t-SNE, plus anomaly detection and recommender applications using isolation forest and SVD.
Install Anaconda, a package manager that includes Python and Jupyter Notebook, then launch a Jupyter Notebook to write Python code, organize notebooks, and learn basic workflow.
Explore the field of data science and distinguish it from other data disciplines. Then walk through the data science workflow, supervised and unsupervised learning, and common machine learning algorithms.
Data science uses data to make smart decisions, differentiating descriptive analytics for the past from predictive analytics for the future, and clarifying its relation to data analysis and business intelligence.
Data scientists combine coding, math, and domain expertise with soft skills like communication and problem solving. They work with larger data sets and advanced algorithms, differentiating themselves from data analysts.
Data scientists use machine learning algorithms to enable computers to learn from data and make decisions. Unsupervised learning finds patterns and groups data, enabling customer segmentation and TV show recommendations.
Examine common machine learning algorithms within supervised and unsupervised learning, including regression and classification, reinforcement learning, and natural language processing concepts like Naive Bayes and topic modeling.
Define the data science workflow by scoping a project, gathering and cleaning data, exploring data, applying models, and sharing insights, acknowledging the non-linear, iterative nature.
Scope a project by identifying stakeholders and business problems, choose supervised or unsupervised learning, and define the data needed to proceed.
Gather data strategically by defining problem first and choosing datasets from csv and txt files, spreadsheets, relational SQL and NoSQL databases, websites with scraping, and APIs that Python can access.
Clean your data to prevent garbage in, garbage out, and learn techniques for resolving formatting issues, correcting data types, imputing missing values, and restructuring data.
Explore data with exploratory data analysis (EDA) to understand data structure, visualize patterns, and guide modeling. Use slicing, filtering, profiling, and visualization to assess cleanliness and generate insights.
Model data by restructuring and preparing features, then fit models to reveal patterns in unsupervised learning or make predictions in supervised tasks, focusing on simple, classic techniques and interpreting results.
Reiterate the original problem, summarize analysis results and interpretations, and share business-focused recommendations and next steps. Consider deploying the model to let users discover insights directly.
Explore how unsupervised learning fits the data science workflow, from data prep and EDA to cleaning, exploring, and modeling, revealing patterns and generating insights across stages.
Discover how data science uses data to drive smart decisions with a focus on unsupervised learning to find patterns, and learn the data science workflow from scope to sharing insights.
Explore the basics of unsupervised learning, key concepts, techniques, and applications, and see how these techniques fit into the data science workflow.
Explore unsupervised learning by clustering customers based on listening behavior, using features to reveal patterns without labels and naming clusters like music lovers, podcast enthusiasts, and casual listeners.
Explore unsupervised learning techniques, focusing on clustering and dimensionality reduction. Learn how K-means, hierarchical clustering, DBSCAN, PCA, and t-SNE enable anomaly detection, segmentation, and data visualization.
Explore anomaly detection and recommender systems using unsupervised techniques like clustering and dimensionality reduction, plus methods such as isolation forests, time series analysis, and cosine similarity.
Explore unsupervised learning techniques and applications, including clustering (K-means, hierarchical, DBSCAN), anomaly detection with isolation forests, and dimensionality reduction (PCA, t-SNE, SVD) for visualization and recommenders.
Learn the unsupervised learning workflow from data prep to algorithm tuning, featuring clustering and dimensionality reduction, feature engineering and scaling, with inertia and intuition guided by domain expertise.
Explore patterns in data with unsupervised learning, focusing on data structure and algorithms, where there are no predictions or labels, using clustering and dimensionality reduction.
Learn essential data prep steps for unsupervised learning, including aligning rows and columns, ensuring non-null numeric features, and applying feature engineering, selection, and scaling.
Master five data prep steps to transform data into unsupervised learning inputs: set row granularity, ensure numeric non-null columns, engineer and select features excluding identifiers, and scale for distance-based algorithms.
Set the correct row granularity by making each customer a single row, then reshape data with groupby and pivot for unsupervised clustering; learn reset index and melt basics.
Create a new Jupyter notebook to practice reshaping data with group by in Python, then aggregate by customer and reset the index to produce a clean dataframe.
Demonstrates pivoting a data frame in Pandas by turning customers into rows and genres into columns, filling missing values with zeros and flattening the index.
Format data for analysis by setting the correct row granularity with pandas, using groupby or pivot on an entertainment preferences dataset, and verify a 150-row output.
Walk through solving assignment one by loading entertainment data with pandas, reading an Excel file, pivoting to one row per student, and saving the transformed data frame with 150 rows.
Prepare columns for unsupervised modeling by ensuring non-null, numeric features; impute or remove missing values, convert text to numbers, apply conditional logic with np.where, and create dummy variables for categoricals.
Identify missing data in a data frame using info and is.na, and flag nan (not a number) values. Filter rows with true values via any axis=1 to inspect missing data.
Handle missing data in pandas data frames by dropping rows or columns with missing values, resetting the index, and imputing values with the median age and with zero followers.
Convert text fields to numeric in pandas by removing dollar signs and commas with str.replace, then apply pd.to_numeric to the income column for modeling.
Convert sign up date from text to datetime with pd.to_datetime using a specified format, then extract numeric components for modeling; handle spaces in column names and address parsing warnings.
Learn to extract date time components in pandas with dt methods to create sign up month and sign up day of week columns, then drop the date column before modeling.
Learn how to convert categorical text to numeric indicators in pandas using numpy where, turning yes/no discounts into a 0/1 column for model input.
Transform categorical fields into numeric features using dummy variables or one-hot encoding, applying pandas get_dummies, converting booleans to 0/1, and combining for modeling.
Identify missing values and fill them with zeros to prepare columns for modeling, then create column video game lover set to one if hours exceed seven a week, else zero.
Identify missing values, fill books with zeros, and create a video game lover feature with numpy where hours played > 7 for modeling.
Create new features by adding columns to strengthen model inputs, such as aggregating genre songs and deriving an age feature from external data, then apply calculations and identify proxy variables.
Review feature engineering in data prep stage, using feature aggregation to set row granularity, and impute missing data plus encode categoricals with indicator and dummy columns for non-null numeric features.
Apply feature engineering techniques by creating new features through calculations, such as percent pop, using numerator and denominator, and combining columns with pandas like pd.concat and axis=1.
Bin numeric features into discrete categories with np.where, turning sign up day of week into a weekend versus weekday indicator. Replace the original column with weekend and apply age-range bins.
Learn how proxy variables use external data to approximate hard-to-measure features, such as average temperature for signup month, and merge dataframes to create numeric, model-ready features.
Leverage domain expertise to engineer meaningful features, favor long data with many rows and few columns, start simple, and continually revisit data prep during modeling.
Apply feature engineering to create total entertainment and percent screen columns for each student, summing weekly entertainment hours and calculating screen usage excluding books, in the data prep Jupyter notebook.
The lecture demonstrates feature engineering by creating two new columns: total entertainment and percent screen, computed from books, movies, tv shows, and video games to summarize a student’s media exposure.
Exclude identifier columns during feature selection, but keep them for interpretation; use Jupyter notebook to save the name column as a series named names, and drop it from the data.
Learn to select a subset of features for modeling using intuition and MVP, then start simple with a few features and iteratively refine to differentiate customers in Python.
Master feature selection by saving the student name as a series and compiling a modeling data frame with three engineered features: video game lover, total entertainment, and percent screen.
Perform feature selection in pandas by extracting the student name as a series and selecting the last three columns for modeling: video game lover, total entertainment, percent screen.
Explain feature scaling as an optional data prep step for unsupervised learning, covering normalization and standardization, and why scaling matters for distance-based algorithms.
Normalize data by scaling all features to a 0 to 1 range with scikit-learn's MinMaxScaler, using fit, transform, or fit_transform to place columns on a common scale.
Standardization scales data by transforming each column to a mean of zero and a standard deviation of one, ideal for normally distributed features, using standard scaler in Python.
Scale the three features in the data frame to zero mean and unit variance for a distance-based clustering approach. Save the transformed data as a final modeling-ready data frame.
Learn how to scale features with a standard scaler to achieve a mean of zero and standard deviation of one, and save the scaled data frame for modeling.
Review data prep for unsupervised learning, covering row and column preparation, feature engineering, scaling with normalization or standardization, and techniques like groupby, pivot, fillna, np.where, and pd.get_dummies.
Explore the fundamentals of clustering and compare k-means clustering, hierarchical clustering, and DBSCAN, covering theory, Python implementations, and emphasis on interpretation to answer business questions.
Visualize data in unsupervised learning to identify clusters, prepare features, scale data, and apply k-means, hierarchical clustering, and dbscan. Tune with inertia and intuition toward business insights.
Learn k-means clustering, the unsupervised algorithm that assigns data points to k clusters using centroids and iterates until stable. Apply it in Python to customer segmentation and clustering store locations.
Apply k-means clustering in Python using scikit-learn, choosing n_clusters and random_state to produce stable results. Learn about initialization, inertia, and how to interpret cluster assignments.
Create a k-means clustering model in Python by preparing data in a notebook, reading a csv with pandas, cleaning and selecting numeric features, and fitting two clusters with scikit learn.
Visualize clusters from a k means clustering model by using dot labels and python libraries to create a 3d scatter plot with books, tv shows, and video games as axes.
Interpret cluster centers from a two-cluster k-means model to reveal entertainment-hour patterns and guide naming with domain intuition.
Learn to visualize cluster centers with a heat map by converting k-means centers into a data frame and using seaborn to interpret three clusters.
Apply k-means clustering to the serial data set, using two clusters, by reading serial.csv, preparing numeric data (dropping name and manufacturer), and interpreting the cluster centers for the assignment.
Apply a two-cluster k-means model to cereal data after dropping name and manufacturer and selecting numeric features. Visualize cluster centers with a heat map and interpret for kids vs adults.
Calculate inertia (within cluster sum of squares) to compare k means models with different cluster counts, plot inertia vs k, and identify the elbow to choose a suitable k.
Explore how to plot inertia for k-means in Python by fitting models from 2 to 15 clusters, recording inertia values, and visualizing the results to identify the elbow.
Explore inertia values for k-means models from two to fifteen clusters, plot the inertia to identify elbows, and compare three- and five-cluster solutions.
Fit 14 k-means models with 2–15 clusters and plot inertia to locate the elbow. Identify the optimal cluster count and interpret the resulting cluster centers with a heat map.
Fit k-means models for 2–15 clusters, plot inertia to locate the elbow at three, and interpret the three cluster centers with a heat map of calories and vitamins and minerals.
Tune a k-means model by refining data prep, outlier removal, scaling, and feature engineering to improve cluster quality and stability.
Tune a k-means model by engineering features and scaling data version two. Fit 2–15 clusters and use the inertia elbow to select four.
Tune a k-means model by removing the fat column, standardizing the remaining features, iterating 2 to 15 clusters via inertia plots to find elbow, and interpret centers with heat map.
Remove the fat column and standardize the remaining features, then fit a six-cluster k-means model and interpret clusters by calories, sugar, protein, and vitamins and minerals.
Explore how to select the best clustering model by comparing cluster assignments and metrics, testing on new data, and prioritizing results that meaningfully solve the business problem.
Compare two k-means models to select the best clustering; map cluster labels to names like non-readers and entertainment enthusiasts, and examine centers to guide targeted ads.
Compare two k-means models by labeling rows with unstandardized and standardized clusters, tally serials per cluster, and recommend the best model plus the number of serial displays for Maven Supermarket.
Label data with k-means cluster names, compare distribution across models, and map clusters to cereal types to decide the best model. Recommend display placements using pandas filtering and sorting.
Explore hierarchical clustering as an agglomerative method that builds a dendrogram by iteratively merging closest points and clusters, comparing linkage methods like single, complete, average, and Ward's.
Explore hierarchical clustering in Python using SciPy's dendrogram and linkage functions, with Euclidean distance, to visualize data points and clusters. Learn how to adjust color threshold to reveal three clusters.
Learn to apply agglomerative (hierarchical) clustering in scikit-learn to data, using n_clusters, Euclidean distance, and Ward linkage, and compare to K-means while interpreting dendrograms.
Learn to perform agglomerative hierarchical clustering in Python using scikit-learn, compare with SciPy dendrograms to determine cluster counts, fit models, and analyze labels and cluster sizes.
Visualize agglomerative clustering results with Seaborn's cluster map, combining a heatmap and dendrogram to interpret clusters of students across books, TV shows, and video games.
Demonstrates cluster maps in Python using agglomerative clustering with three and four clusters. Interpret SciPy dendrograms and cluster assignments, and map data points to clusters with F cluster and ivl.
Create dendrograms for both the original and standardized five-numeric-field and four-standardized-field serial data, identify number of clusters, then fit an agglomerative model and generate a cluster map to interpret results.
Identify four clusters of cereals using hierarchical clustering on five numeric fields, visualized with a dendrogram and cluster map, and interpret clusters by vitamins, minerals, protein, calories, and sugar.
Explore density-based clustering with dbscan, using epsilon and min samples to label core, border, and noise points, forming irregular clusters and identifying outliers.
Learn to apply dbscan in python with scikit-learn, using epsilon as the radius and min samples to define core points, clusters, and noting there is no random state.
Use the silhouette score to compare clustering models such as k-means, hierarchical, and dbscan, since it ranges from -1 to 1 and indicates how well points fit their own clusters.
Learn to compute silhouette scores in Python using scikit-learn's silhouette_score with your data and cluster labels, including the default Euclidean metric and optional sample size in a Jupyter notebook demo.
Demonstrates tuning DBSCAN in Python with epsilon and min samples, computing silhouette scores to select the best model, using loops, data frames, and a tune db scan function.
Explore DBSCAN by looping epsilon from 0.1 to 2 in 0.1 steps and 2 to 10 in 1 step on original and standardized data, identify silhouette score, and fit model.
Learn to tune dbscan by looping over eps and min_samples, evaluate with silhouette scores on original and standardized data, and select the best model with eps 1.9 and min_samples 4.
Compare k-means, hierarchical clustering, and DBSCAN, outlining pros and cons to guide practical model selection. Learn when to prioritize interpretability, scalability, or density-based clustering for complex data.
Compare clustering models using silhouette score and inertia, balance metrics with intuition to select the best model, and learn how to label unseen data with consistent data prep.
Compare three clustering models—k-means, agglomerative, and DBSCAN—by converting labels to series, counting cluster sizes, and evaluating silhouette scores to balance accuracy and interpretability.
Label unseen data with the k means model, employing feature engineering and scaling, and predict cluster assignments for new students using the prepared data pipeline.
Learn to cluster data by preparing features, then compare k means, hierarchical, and DBSCAN models using inertia, dendrograms, silhouette scores, and intuition to answer business questions.
Cluster client data by scaling features and applying k-means, hierarchical, and dbscan techniques. Evaluate segments with silhouette score and predict a new client's cluster using the best model.
Read the wholesale client csv, drop channel and region, and standardize the six spending features to mean zero and std near one for a 440-row dataset.
Apply k-means clustering on scaled data and use inertia plots to identify the elbow at five clusters, then visualize the five clusters with a heat map of product categories.
Master hierarchical clustering with dendrograms, agglomerative models, and cluster maps; explore row-wise z-score scaling, threshold tuning, and silhouette-based cluster selection.
Tune dbscan on scaled data to optimize clustering using silhouette scores. Expand epsilon up to five, compare models, and select the best configuration.
Compare clustering techniques, including k-means, hierarchical, and DBSCAN, on scaled data using silhouette scores. Adopt three-cluster k-means, review cluster centers and segments, and apply scaling for new data predictions.
Explore anomaly detection within the data science workflow and compare unsupervised techniques—isolation forests and DBSCAN—through Python applications and interpretation of results.
Learn anomaly detection basics and the interchangeable use of anomalies and outliers. Visualize with two-dimensional and three-dimensional scatter plots, especially across many features, and apply advanced modeling when needed.
See how anomaly detection sits in cleaning and modeling steps, using unsupervised techniques like isolation force and DBscan to identify data issues and uncover insights.
Apply the anomaly detection workflow by preparing numeric, non-null data in a single table, then model with isolation forest or DBSCAN, and iterate with plots to meet business objectives.
Explore how isolation forests detect anomalies by building many random-split trees, measuring path lengths, and scoring observations to identify outliers in fraud, sensor, and patient data.
Learn to build an isolation forest in Python with scikit-learn, configure contamination and random state, and interpret anomaly scores and flags using decision_function and predict.
Visualize anomalies with Seaborn pair plots by coloring points using an anomaly flag from isolation forest across books, TV shows, and video games.
Tune the isolation forest contamination from 2% to 5% and predict anomalies. Interpret results by sorting data and examining anomaly flags and scores with pair plots and visualizations.
Preprocess the Tripadvisor review csv by removing the user id, view rating ranges, visualize with a seaborn pair plot, and detect anomalies with isolation forests at 0.01 and 0.005 contamination.
Explore unsupervised learning with isolation forests to detect anomalies in numeric TripAdvisor review data, using pandas for prep and seaborn pair plots for visualization.
Harness DBSCAN to detect anomalies by identifying core points, border points, and noise points in dense regions, revealing outliers as anomalies. Translate clustering techniques to anomaly detection in Python.
Explore dbscan for anomaly detection in Python using scikit-learn, with epsilon and min samples, data scaling, and labeling anomalies as noise.
Visualize dbscan anomalies with cluster labels in a pair plot to reveal anomalies alongside data trends. Compare these dbscan findings with isolation forest results to inform stakeholders through visualization-driven insights.
Apply dbscan for anomaly detection on the tourist rating data set, visualize, identify the silhouette score to pick the best eps and min_samples, and note anomalies on a pair plot.
Apply and tune DBSCAN for anomaly detection on user ratings, compare with isolation forest, and identify and visualize anomalous patterns to refine future analyses.
Compare anomaly detection algorithms, highlighting isolation forests as efficient for high-dimensional data and global anomalies, and DBSCAN for local anomalies and complex clusters, with practical start-with-isolation-forest guidance.
Explore how unsupervised learning reveals data relationships by comparing points for similarity and difference, using clustering (including dbscan) to form groups and anomaly detection to highlight outliers.
Use unsupervised anomaly detection with isolation forest and DBSCAN to identify unusual observations, then visualize, tune, and decide whether to explore or exclude anomalies.
Explore dimensionality reduction within unsupervised learning, focusing on PCA and t-SNE, with practical Python implementations and interpretation of results in the data science workflow.
Explore dimensionality reduction, transforming data from three dimensions into two pc1 and pc2 components while preserving information and revealing clearer clusters.
Discover why reducing dimensions helps visualize high-dimensional data and improve modeling, using PCA, t-SNE, and SVD for feature extraction, supervised learning, and unsupervised discovery.
Learn the dimensionality reduction workflow—from data prep to modeling and tuning—using scaling and PCA, t-SNE, or SVD, with explained variance guiding iteration.
Learn how principal component analysis reduces dimensions by projecting data onto the most variant linear combination, PC1, and using eigen decomposition to transform and visualize clusters.
Apply principal component analysis in Python using scikit-learn's PCA, including centering data, choosing n_components, and deciding when to standardize for visualization or feature extraction.
Explore explained variance ratio in PCA, showing how each principal component captures data variance, with the first component capturing the most and the sum equaling one, guiding component selection.
Learn to perform PCA on numeric data in Python by centering inputs, fitting with n_components, and interpreting explained_variance_ratio to reduce dimensions for visualization or feature extraction, aiming for 80–90% variance.
Apply principal component analysis to the student grades dataset, drop the student ID column, center the data, fit a two-component PCA, and interpret the explained variance ratios.
Demonstrates applying principal component analysis to student grades data: load data, drop the first column with student ID, center the features, fit PCA with two components, and interpret explained variance.
Interpret PCA outputs by examining component loadings, mapping PC1 and PC2 to original features, and visualizing in a scatter plot to reveal how books, TV shows, and video games relate.
Interpret PCA by inspecting components and transforming data from three features to two. Visualize PC1 and PC2 to reveal patterns, such as books, TV shows, and video games driving clusters.
Interpret the PCA model components, then plot students on a two-dimensional scatter plot with PC1 and PC2, and interpret clusters to guide college recommendations.
Interpret PCA components to show that higher PC1 means better grades, while PC2 differentiates stem versus humanities strengths; visualize with a two-component scatter plot to guide student recommendations.
Explore feature selection and feature extraction as dimensionality reduction techniques; compare dropping columns with PCA, and apply to clustering and data visualization for cereals and other datasets.
Explore next steps after fitting PCA, including visualization with 2d or 3d components, data subset inputs, and applying identical PCA transformations to test data.
Explore t-sne, a nonlinear dimensionality reduction that emphasizes point relationships over absolute distances. See how affinity and density shape neighbor probabilities to reveal clusters in a 2d visualization.
Explore applying t-SNE in Python with scikit-learn, configuring n_components for 2d visualization, using random_state for reproducibility, note that t-SNE doesn't require centering but scaling is recommended, and visualize clusters.
Fit a t-SNE model with two components, plot a scatter of students with x as component one and y as component two, and interpret the plot data.
Apply t-SNE with two components to visualize grade data, fitting and transforming, then plot component one versus component two to reveal distinct clusters, contrasting with PCA results.
Compare PCA and t-SNE as dimensionality reduction methods, showing PCA's accurate, interpretable linear components and t-SNE's visually separated clusters despite not providing a true data representation.
Explore how dimensionality reduction and clustering reveal data structure. Visualize with PCA and t-SNE, then apply k-means and interpret inertia and silhouette scores to decide cluster counts.
Combine t-SNE and k-means clustering by fitting a three-cluster k-means model, overlaying the clusters on the t-SNE plot with colors, and interpreting the cluster centers.
Fit a three-cluster k-means model on the grades data, visualize clusters with a t-SNE plot, and interpret centroid heat maps to reveal humanities vs stem patterns.
Master dimensionality reduction concepts for unsupervised learning by using PCA for feature extraction and visualization, explained variance ratio as the metric, and t-SNE for clustering visualization.
Explore the basics of recommenders, compare content-based and collaborative filtering, and learn to use cosine similarity and SVD to identify similar items and users.
Explore the basics of recommender systems, including content based filtering and collaborative filtering, with cosine similarity and singular value decomposition, and learn to build unsupervised recommenders.
Learn how content-based filtering recommends items based on item characteristics, using features like energy, vitamin C, and sugar to compare fruits and suggest similar options such as mangoes.
Apply cosine similarity, a popular metric, to measure the angle between data points for recommenders. In the first quadrant, cosine similarity ranges from 0 to 1 for positive data.
Compute cosine similarity in Python with scikit learn's cosine similarity function to compare fruit data frame rows. Analyze how sugar and vitamin c influence their similarity.
Apply content-based filtering with cosine similarity to generate fruit recommendations, filter by mangoes, compare two-column and six-column nutrition data, and build a reusable function with error handling.
Apply content-based filtering by computing cosine similarity on genre labels from 1600 labeled movies to identify Toy Story's closest peers and return the top five most similar movies.
Explore content-based filtering with cosine similarity on a movie genres dataset, including data prep, indexing by title, and ranking top five recommendations similar to Toy Story.
Explore collaborative filtering for personal recommendations by analyzing user behaviors, contrasting user-based and item-based approaches, and using a user-item matrix of ratings to predict new fruits a user may like.
Learn to pivot data into a user-item matrix for collaborative filtering in Python, fill missing ratings with the mean, and prepare data for machine learning models.
Kick off collaborative filtering by building a user item matrix from movie ratings, reading movies, users, and ratings into three data frames, then using Adapt Pivot to reshape for modeling.
Read movie, user, and ratings data from multiple sheets, then pivot to a user-item matrix and fill missing ratings for later analysis.
Apply truncated singular value decomposition to the user item matrix, decomposing into u, sigma, and v transpose to reveal two latent features and reduce dimensionality for recommender systems.
Apply truncated SVD in Python with scikit-learn to fit and transform a user-item matrix into two latent features, revealing U, Sigma, and V transpose and their singular values.
Apply truncated SVD to the user item matrix with two components, then view the matrix, Umatrix, and their shapes to verify dimensions.
Apply truncated SVD to the user-item matrix to reduce 943 users and 1682 movies to two latent components, revealing the U and V matrices that capture user and movie features.
Learn to select the number of components for a truncated SVD model using explained variance ratio and the cumulative explained variance plot, aiming for about 80% with a practical cutoff.
Demonstrate explained variance ratio for two-component and six-component SVD on a 30-by-6 user-item matrix. Use cumulative variance to decide the number of components, noting SVD ratios may not decrease.
Apply truncated SVD with 500 components, plot cumulative explained variance, and select the optimal component count to fit a final model that captures the data variance.
Fit a truncated svd with 500 components, analyze the cumulative explained variance, and select 250 components to capture about 84% of the variance for the user-item matrix.
Learn collaborative filtering with truncated SVD, fit and transform to create a user matrix, then transform a new user, fill missing values, and generate recommendations via a dot product.
Learn collaborative filtering with a two-component truncated SVD, transform a new user into latent space, and generate top fruit recommendations via the V transpose product while excluding rated items.
Use a 250-component SVD to map a new user into latent space, generate ten movie recommendations, reconstruct the user-item matrix, and assess whether they make sense, with optional component adjustments.
Explore collaborative filtering with singular value decomposition to transform a new user into a latent space, reconstruct a user-item matrix, and generate top movie recommendations.
Explore content-based and collaborative filtering, then build a hybrid recommender with popularity baseline, and evaluate with explained variance ratio; deploy, gather feedback, and refine for data sparsity and cold start.
Combine four recommenders—collaborative filtering with truncated SVD, content-based filtering (sugar and vitamin C; six nutritional values), and the mean of the user item matrix—to form a hybrid recommender.
Explore the two main types of recommenders—content-based filtering and collaborative filtering—and learn how cosine similarity and SVD drive item- and user-based recommendations, including hybrid approaches.
Harness data prep and unsupervised recommenders, truncated SVD and cosine similarity, to deliver Maven Eats restaurant recommendations, with five home page picks and five similar restaurants on each detail page.
Import the restaurant ratings data, confirm a 0–2 scale with describe, pivot to a user-item matrix, fill NaNs with the mean, and note 138 users and 127 restaurants.
Demonstrates collaborative filtering with truncated svd on a centered user-item matrix, selects components to reach about 80% explained variance, and delivers top restaurant recommendations with details.
Convert restaurant data into numeric features using get dummies for cuisine, a single numeric price encoding, and a yes/no franchise indicator, then apply cosine similarity for recommendations.
Explore how collaborative filtering yields top ten restaurant recommendations via truncated SVD, and how tuning parameters, data prep, and hybrid content-based approaches with distance can refine them.
This is a hands-on, project-based course designed to help you master the foundations for unsupervised machine learning in Python.
We’ll start by reviewing the Python data science workflow, discussing the techniques & applications of unsupervised learning, and walking through the data prep steps required for modeling. You’ll learn how to set the correct row granularity for modeling, apply feature engineering techniques, select relevant features, and scale your data using normalization and standardization.
From there we'll fit, tune, and interpret 3 popular clustering models using scikit-learn. We’ll start with K-Means Clustering, learn to interpret the output’s cluster centers, and use inertia plots to select the right number of clusters. Next, we’ll cover Hierarchical Clustering, where we’ll use dendrograms to identify clusters and cluster maps to interpret them. Finally, we’ll use DBSCAN to detect clusters and noise points and evaluate the models using their silhouette score.
We’ll also use DBSCAN and Isolation Forests for anomaly detection, a common application of unsupervised learning models for identifying outliers and anomalous patterns. You’ll learn to tune and interpret the results of each model and visualize the anomalies using pair plots.
Next, we’ll introduce the concept of dimensionality reduction, discuss its benefits for data science, and explore the stages in the data science workflow in which it can be applied. We’ll then cover two popular techniques: Principal Component Analysis, which is great for both feature extraction and data visualization, and t-SNE, which is ideal for data visualization.
Last but not least, we’ll introduce recommendation engines, and you'll practice creating both content-based and collaborative filtering recommenders using techniques such as Cosine Similarity and Singular Value Decomposition.
Throughout the course you'll play the role of an Associate Data Scientist for the HR Analytics team at a software company trying to increase employee retention. Using the skills you learn throughout the course, you'll use Python to segment the employees, visualize the clusters, and recommend next steps to increase retention.
COURSE OUTLINE:
Intro to Data Science in Python
Introduce the fields of data science and machine learning, review essential skills, and introduce each phase of the data science workflow
Unsupervised Learning 101
Review the basics of unsupervised learning, including key concepts, types of techniques and applications, and its place in the data science workflow
Pre-Modeling Data Prep
Recap the data prep steps required to apply unsupervised learning models, including restructuring data, engineering & scaling features, and more
Clustering
Apply three different clustering techniques in Python and learn to interpret their results using metrics, visualizations, and domain expertise
Anomaly Detection
Understand where anomaly detection fits in the data science workflow, and apply techniques like Isolation Forests and DBSCAN in Python
Dimensionality Reduction
Use techniques like Principal Component Analysis (PCA) and t-SNE in Python to reduce the number of features in a data set without losing information
Recommenders
Recognize the variety of approaches for creating recommenders, then apply unsupervised learning techniques in Python, including Cosine Similarity and Singular Vector Decomposition (SVD)
__________
Ready to dive in? Join today and get immediate, LIFETIME access to the following:
16.5 hours of high-quality video
22 homework assignments
7 quizzes
3 projects
Python Data Science: Unsupervised Learning ebook (350+ pages)
Downloadable project files & solutions
Expert support and Q&A forum
30-day Udemy satisfaction guarantee
If you're a business intelligence professional or data scientist looking for a practical overview of unsupervised learning techniques in Python with a focus on interpretation, this is the course for you.
Happy learning!
-Alice Zhao (Python Expert & Data Science Instructor, Maven Analytics)
__________
Looking for our full business intelligence stack? Search for "Maven Analytics" to browse our full course library, including Excel, Power BI, MySQL, Tableau and Machine Learning courses!
See why our courses are among the TOP-RATED on Udemy:
"Some of the BEST courses I've ever taken. I've studied several programming languages, Excel, VBA and web dev, and Maven is among the very best I've seen!" Russ C.
"This is my fourth course from Maven Analytics and my fourth 5-star review, so I'm running out of things to say. I wish Maven was in my life earlier!" Tatsiana M.
"Maven Analytics should become the new standard for all courses taught on Udemy!" Jonah M.