
Explore a beginner-friendly visual guide to machine learning and data science with interactive Excel models, covering data profiling, classification, regression, unsupervised learning, and models like logistic regression and decision trees.
Demystify essential machine learning topics using Excel as a teaching tool with no coding required. Cover data profiling, linear logistic regression, forecasting, and unsupervised learning for analysts and BI professionals.
Define machine learning as using statistical methods to find patterns and make predictions, and apply contextual inference beyond programmed rules across churn, sales, cross-selling, and sentiment.
Explore the machine learning landscape, distinguishing supervised and unsupervised methods, and preview foundational techniques like classification, regression, clustering, k-NN, logistic regression, and sentiment analysis for business intelligence.
Master quality assurance and data profiling to ensure clean data before modeling. Gain tools for preliminary data quality analysis, including variable types and empty values, plus univariate and multivariate profiling.
Learn how to perform preliminary data quality assurance in machine learning, including handling missing values, variable types, and outliers, to ensure error-free data and reliable analyses.
Practice rigorous data QA to ensure error-free data, proper encoding, and unbiased capture so ML models are reliable and you avoid wasted time, money, and reputational risk.
Analyze variable types to ensure data quality and readiness, distinguishing numeric, discrete, and categorical variables, recognizing when to treat zip codes as strings, recode into buckets, and handle empty values.
Examine empty values in data, distinguish zeros and blanks, and decide to keep, remove, or impute using methods like mean or linear interpolation to avoid bias.
Explore range calculations to verify min and max values, reveal outliers, and validate data realism across variables like age, income, and height; prepare for count calculation checks.
Identify left and right censored data by recognizing when min or max values do not reflect the range, with examples from surveys of people over 18 and censored repurchase rates.
Explore table structure by comparing long and wide formats, and learn how pivoting and unpivoting transform data rows into columns for exploratory data analysis and modeling.
Master preliminary qa by reviewing all fields, configuring variable types, handling missing values and zeros, and applying basic diagnostics and censored data to support modeling.
Advance from quality assurance to univariate profiling, performing descriptive analysis of each variable before multivariate modeling, covering categorical and numerical distributions, histograms, kernel densities, and mean, median, and mode.
Understand categorical variables and how categories serve as values and dimensions to filter numerical data, with examples like product type, country, gender, and binary 1/0 flags.
Discretize a numerical variable to create a categorical price level using rules that label values as low, medium, or high, enabling better modeling and analysis.
Learn to distinguish nominal and ordinal categorical variables, understand when order matters, and explore their distributions to build foundations for predictive modeling.
Explore numerical variables and their distributions, distinguishing quantitative data from categorical, and apply histograms and kernel densities to metrics like page views and revenue for machine learning and business intelligence.
Visualize numerical distributions with histograms and kernel densities by binning data and smoothing the shape. Use both to spot outliers, compare bin sensitivity, and apply Sturges' rule for bin counts.
Present the normal distribution as a symmetric bell curve centered at the mean, also called Gaussian, guiding ML and statistics and enabling testing differences and comparisons across distributions.
Analyze female athlete heights from the 2016 Rio Olympics against the general population using histograms and kernel density to reveal near-normal distributions and a notable height difference.
Identify the mode as the most frequent value, shown by Houston and 24 sessions. It is not very useful on its own, but guides multivariate profiling for deeper insight.
Learn how the mean defines the central value for numerical data and serves as a basic predictive estimate, with a quick example and notes on outliers and skewness.
Apply the median to numerical data to identify the center of a distribution and resist outliers. Calculate the median as the average of the two middle values in ordered data.
Explore variance as the measure of how far observations lie from the mean, describing distribution width and enabling comparison of numerical groups, with examples and a path to standard deviation.
Relate standard deviation to variance by square-rooting to the variable's scale. Apply the empirical rule: 68%, 95%, 99.7% within one, two, or three standard deviations for normal distributions.
Explore how skewness measures deviations from a normal distribution in univariate profiling. Visualize left and right skew, compare mean, mode, and median, and learn how skewness identifies non-normal distributions.
Apply univariate profiling tools to distinguish categorical and numerical variables, use distributions for exploration, and quality assurance to ensure metrics like mean, median, mode, and variance support predictive insights.
Learn multivariate profiling of two categorical variables using frequency and proportion tables and heatmaps to visualize joint distributions, with examples on design and size and notes on Naive Bayes classification.
Analyze heat maps of NYC traffic accidents by time of day and day of week, using frequency tables, counts, proportions, and conditional formatting with a red–yellow–green color scale in Excel.
Explore categorical numerical distributions to compare numerical data across categories using histograms, kernel densities, violin plots, and box plots for multivariate profiling of key business metrics.
Visualize categorical numerical distributions by applying per-class kernel densities on the same plot, compare means and variances, and prepare to contrast with violin and box plots.
Explore violin plots as mirrored kernel densities that you visualize for each category, turning the density on its side to show distribution without overlap and aid machine learning insights.
Discover box plots, like violin plots, that reveal median, min and max (excluding outliers), 25th and 75th percentiles, and outliers for multivariate distributions of a categorical and a numerical variable.
Explore how correlation reveals linear relationships between two numeric variables through multivariate profiling, covariance, and standard deviations, using scatter plots to visualize variance.
Explain why correlation does not imply causation, using an ice cream sales and drowning example, and highlight the role of a common unobserved variable like warm weather.
Analyze correlations among weekly digital media spend, site traffic, offline spend, and sales using scatter plots to reveal positive, negative, and diminishing returns patterns.
Build on data profiling and quality assurance to explore supervised learning and classification techniques, such as k-nearest neighbors, Naive Bayes, decision trees, logistic regression, and sentiment analysis for business intelligence.
Learn supervised classification, from feature engineering and data splitting to overfitting, then explore K nearest neighbors, naive Bayes, decision trees, random forest, logistic regression, sentiment analysis, model selection and tuning.
Explore supervised and unsupervised learning, including classification and regression techniques like k nearest neighbors, Naive Bayes, and decision trees, plus clustering and outlier detection; supervised predicts labels, unsupervised reveals structure.
Explore supervised learning by comparing classification and regression, focusing on predicting categorical targets versus numerical values, with examples like churn, sentiment, and revenue forecasts.
Review rows and columns, categorical and numerical variables, including binary 1/0, to understand data; explore conditioning for better predictions, quality assurance, data profiling, and classification with machine learning.
Explore how classification predicts a dependent variable from independent variables using a crm example, training a model on observed churn data to predict future churn for new customers.
Map the classification workflow from scoping the business challenge and stakeholders to feature engineering, data splitting, iterative training, and tuned model selection.
Enrich data with new independent variables through feature engineering, including one hot encoding, scaling, log transformation, discretization, date component extraction, and boolean flags to boost predictive power for model validation.
Explore common foundational classification methods, including k-nearest neighbors, naive Bayes, decision trees, and random forests, plus logistic regression and sentiment analysis for classifying categorical outcomes.
Explore k-nearest neighbors, a classification technique that predicts an observation's class from the closest points in a scatter plot, with k guiding the prediction for applications like customer segmentation.
Explore how k-nearest neighbors classifies purchases by comparing a new customer's age and income to nearby examples, selecting the best k and resolving ties with distances.
Explore a KNN case study predicting a Spotify track outcome (listen, skip, or favorite) from scaled features. See how Excel visualizes with a scatter plot and computes prediction confidence.
Train a naive Bayes classifier by building frequency tables for each independent variable and the purchase outcome, then calculate conditional probabilities to predict purchases.
Build intuition for naïve Bayes by deriving conditional probabilities from frequency tables, predicting purchase likelihood for new observations, and embracing computer-assisted, rapid probability calculations.
Explore a Naïve Bayes case study using a small binary dataset to predict purchase probability from three interactions: newsletter, Facebook, and website visits, with frequency tables and conditional probabilities.
Calculate entropy using P1 and P2 from class counts with log base two, producing a curve between 0 and 1, and show entropy guiding splits on churn and login days.
Explore decision trees for churn prediction, covering root, decision, and leaf nodes, information gain, hyperparameters, overfitting, and the intro to random forests.
Random forests use random subsets of observations and variables at each split to explore many options. Each tree votes, and the forest prediction is the mode of all tree predictions.
Explore a practical case study on building a simple decision tree to predict paid subscriptions using binary customer features, entropy, and information gain.
Maximize the likelihood that the logistic regression s-curve predicts probabilities closest to the actual training data, then adjust beta zero and beta one to minimize distance.
Explore logistic regression to predict unsubscribe probability from weekly email frequency, maximizing likelihood with a univariate model, visualizing the curve, and identifying the 50% decision threshold for optimal email cadence.
Explore sentiment analysis as a classification approach in supervised machine learning, using bag-of-words features, labeled training data, and data preparation to predict emotions, noting word clouds' limitations for market research.
Clean text data for sentiment analysis by removing noise, punctuation and special characters, and stopwords, while applying stemming or lemmatizing and proper encoding to preserve key sentiment words.
Select the best model for a problem and tune hyperparameters to maximize predictive power, address imbalanced classes, and interpret confusion matrices while monitoring drift over time.
Learn how imbalanced classes bias predictive models toward the majority class and balance data with up sampling, down sampling, and weighting for rare event detection, using confusion matrices for evaluation.
Explore how a confusion matrix compares predicted to actual classes and identifies true positives, true negatives, false positives, and false negatives, linking these counts to accuracy, precision, and recall.
Explore the confusion matrix and define accuracy, precision, and recall, showing how true positives, true negatives, false positives, and false negatives influence model performance.
Explore multi-class confusion matrices for predictions across products A, B, C, D, focusing on diagonal for accuracy and diagnosing misclassifications B and C to guide precision, recall, and feature engineering.
Explore multi-class confusion matrices and compute per-class and weighted-average accuracy, precision, and recall to evaluate and compare predictive models.
Train multiple classification models quickly and select the best using context-specific metrics. Evaluate recall, precision, and accuracy via the confusion matrix to choose the most important metric for your challenge.
Model drift degrades predictions over time as relationships change. Retrain with newer data, benchmark day one, and use feature engineering to counter drift.
Finish part two and advance from supervised machine learning foundations to regression and forecasting, predicting numeric variables with linear regression, intervention analysis, and Markov chains.
This course is for everyday people looking for an intuitive, beginner-friendly introduction to the world of machine learning and data science.
Build confidence with guided, step-by-step demos, and learn foundational skills from the ground up. Instead of memorizing complex math or learning a new coding language, we'll break down and explore machine learning techniques to help you understand exactly how and why they work.
Follow along with simple, visual examples and interact with user-friendly, Excel-based models to learn topics like linear and logistic regression, decision trees, KNN, naïve bayes, hierarchical clustering, sentiment analysis, and more – without writing a SINGLE LINE of code.
This course combines 4 best-selling courses from Maven Analytics into a single masterclass:
PART 1: Univariate & Multivariate Profiling
PART 2: Classification Modeling
PART 3: Regression & Forecasting
PART 4: Unsupervised Learning
PART 1: Univariate & Multivariate Profiling
In Part 1 we’ll introduce the machine learning workflow and common techniques for cleaning and preparing raw data for analysis. We’ll explore univariate analysis with frequency tables, histograms, kernel densities, and profiling metrics, then dive into multivariate profiling tools like heat maps, violin & box plots, scatter plots, and correlation:
Section 1: Machine Learning Intro & Landscape
Machine learning process, definition, and landscape
Section 2: Preliminary Data QA
Variable types, empty values, range & count calculations, left/right censoring, etc.
Section 3: Univariate Profiling
Histograms, frequency tables, mean, median, mode, variance, skewness, etc.
Section 4: Multivariate Profiling
Violin & box plots, kernel densities, heat maps, correlation, etc.
Throughout the course, we’ll introduce real-world scenarios to solidify key concepts and simulate actual data science and business intelligence cases. You’ll use profiling metrics to clean up product inventory data for a local grocery, explore Olympic athlete demographics with histograms and kernel densities, visualize traffic accident frequency with heat maps, and more.
PART 2: Classification Modeling
In Part 2 we’ll introduce the supervised learning landscape, review the classification workflow, and address key topics like dependent vs. independent variables, feature engineering, data splitting and overfitting. From there we'll review common classification models like K-Nearest Neighbors (KNN), Naïve Bayes, Decision Trees, Random Forests, Logistic Regression and Sentiment Analysis, and share tips for model scoring, selection, and optimization:
Section 1: Intro to Classification
Supervised learning & classification workflow, feature engineering, splitting, overfitting & underfitting
Section 2: Classification Models
K-nearest neighbors, naïve bayes, decision trees, random forests, logistic regression, sentiment analysis
Section 3: Model Selection & Tuning
Hyperparameter tuning, imbalanced classes, confusion matrices, accuracy, precision & recall, model drift
You’ll help build a simple recommendation engine for Spotify, analyze customer purchase behavior for a retail shop, predict subscriptions for an online travel company, extract sentiment from a sample of book reviews, and more.
PART 3: Regression & Forecasting
In Part 3 we’ll introduce core building blocks like linear relationships and least squared error, and practice applying them to univariate, multivariate, and non-linear regression models. We'll review diagnostic metrics like R-squared, mean error, F-significance, and P-Values, then use time-series forecasting techniques to identify seasonality, predict nonlinear trends, and measure the impact of key business decisions using intervention analysis:
Section 1: Intro to Regression
Supervised learning landscape, regression vs. classification, prediction vs. root-cause analysis
Section 2: Regression Modeling 101
Linear relationships, least squared error, univariate & multivariate regression, nonlinear transformation
Section 3: Model Diagnostics
R-squared, mean error, null hypothesis, F-significance, T & P-values, homoskedasticity, multicollinearity
Section 4: Time-Series Forecasting
Seasonality, auto correlation, linear trending, non-linear models, intervention analysis
You’ll see how regression analysis can be used to estimate property prices, forecast seasonal trends, predict sales for a new product launch, and even measure the business impact of a new website design.
PART 4: Unsupervised Learning
In Part 4 we’ll explore the differences between supervised and unsupervised machine learning and introduce several common unsupervised techniques, including cluster analysis, association mining, outlier detection and dimensionality reduction. We'll break down each model in simple terms and help you build an intuition for how they work, from K-means and apriori to outlier detection, principal component analysis, and more:
Section 1: Intro to Unsupervised Machine Learning
Unsupervised learning landscape & workflow, common unsupervised techniques, feature engineering
Section 2: Clustering & Segmentation
Clustering basics, K-means, elbow plots, hierarchical clustering, dendograms
Section 3: Association Mining
Association mining basics, apriori, basket analysis, minimum support thresholds, markov chains
Section 4: Outlier Detection
Outlier detection basics, cross-sectional outliers, nearest neighbors, time-series outliers, residual distribution
Section 5: Dimensionality Reduction
Dimensionality reduction basics, principle component analysis (PCA), scree plots, advanced techniques
You'll see how K-means can help identify customer segments, how apriori can be used for basket analysis and recommendation engines, and how outlier detection can spot anomalies in cross-sectional or time-series datasets.
__________
Ready to dive in? Join today and get immediate, LIFETIME access to the following:
9+ hours of on-demand video
ML Foundations ebook (350+ pages)
Downloadable Excel project files
Expert Q&A forum
30-day money-back guarantee
If you're an analyst or aspiring data professional looking to build the foundation for a successful career in machine learning or data science, you've come to the right place.
Happy learning!
-Josh & Chris
__________
Looking for our full business intelligence stack? Search for "Maven Analytics" to browse our full course library, including Excel, Power BI, MySQL, Tableau and Machine Learning courses!
See why our courses are among the TOP-RATED on Udemy:
"Some of the BEST courses I've ever taken. I've studied several programming languages, Excel, VBA and web dev, and Maven is among the very best I've seen!" Russ C.
"This is my fourth course from Maven Analytics and my fourth 5-star review, so I'm running out of things to say. I wish Maven was in my life earlier!" Tatsiana M.
"Maven Analytics should become the new standard for all courses taught on Udemy!" Jonah M.