Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Statistical Thinking & Data science with R.

Name: Statistical Thinking & Data science with R.
Rating: 4.3 (254 reviews)

Master probability, distributions, hypothesis testing, regression & Tidymodels in R. Business-focused. Beginners welcome

Created byHaytham Omar-Ph.D

Last updated 6/2026

English

What you'll learn

Learn R from scratch: installation, syntax, data structures, data manipulation, and visualisation with ggplot2
Calculate and interpret measures of spread and centrality — and detect and remove outliers from real datasets
Understand probability fundamentals and apply them to real business decision problems
✓ Work with continuous and discrete probability distributions and fit the right distribution to your dat
Build business simulations in R to model revenue, risk, and uncertainty for planning decisions
Conduct hypothesis testing and confidence interval analysis to validate business decisions with data
Apply ANOVA to compare means across groups and regression to model relationships between variables
Build and interpret logistic regression models — including log odds, odds ratios, and conversion probabilities
Apply regularised regression (Ridge, Lasso) for feature selection and model performance improvement
Use Bayesian analysis to estimate distributions and quantify uncertainty in business parameters
Measure relative risk and odds ratios for business and clinical decision problems
Build and evaluate machine learning models with Tidymodels — the standard ML framework in R

Course content

16 sections • 243 lectures • 26h 58m total length

Introduction3:06
Explore statistical thinking for data science with R, from data cleaning and visualization to probability, hypothesis testing, simulations, and regression models, including regularized approaches for business analytics.
Get the Best out of this course2:11
Curriculum5:15
Types of analytics6:20
Explore descriptive, predictive, and prescriptive analytics as three interconnected approaches, using descriptive visuals, predictive models like linear and logistic regression, and prescriptive optimization to improve outcomes.
Objectives of data science4:01
Learn how data science transforms raw data into information, replacing subjectivity with objectivity, speeding analytics, and handling structured and unstructured data for machine learning and analytics.
Applications of data science4:55
Explore data science applications across marketing, supply chain, and finance, including rfm analysis, churn prediction, forecasting, stock replenishment, and optimization powered by ai, computer vision, and natural language processing.
The data science Process2:18
Identify problems and goals, acquire and clean structured and unstructured data, integrate sources, then analyze, model, and interpret results, deploying automated, real-time data science workflows.
Why R5:38
Explore why the lecture advocates using R for data science, highlighting its rich statistical packages, beautiful visualizations, large community, and ease for beginners, while comparing Python's speed and versatility.

Welcome to the World of R!1:45
What is R statistical Language.3:12
Discover what the R statistical language is—a free software environment for statistical computing and graphics—and its role in data science, with comparisons to Python and its growing community.
How to install R?4:07
How to install Rstudio?5:23
A walk through tutorial3:13
Explore the R Studio workspace, including the console, environment, plot area, and packages, and learn to start a script, run code, and view output from downloaded packages.
Setup your project3:39
Create a new project folder for your supply chain data science work, set up a dedicated workspace, and save outputs there to keep data, projects, and analyses organized.
Install packages11:20
Summary1:32
Install packages with install.packages, load them with library, and set up a dedicated project folder with a resource folder for easy data import and export.

Introduction12:08
Explore core data structures in R, including vectors, data frames, and lists, and types like strings, numeric, integers, dates, and factors. Learn type conversions, shaping, for loops, and simple functions.
Different data structures and types in R6:40
Discover core data structures in R, including vectors, lists, matrices, and data frames, with observations and attributes guiding modeling. Learn essential data types: character, numeric, integer, factor, and dates.
Do arithmetic calculations and write functions in R12:49
Practice arithmetic in R, save results as variables, and explore vectors, names, and selective indexing, using a UK online retailer data set to simulate stocks and forecasting.
Creating a list.7:52
Create a list in R by combining numeric and character vectors, then access elements with double brackets and named indices, exploring subset and extraction techniques.
Importing Data in R and Basic exploration18:30
Import diverse data formats into R with the Reader package, then explore the retail data by inspecting structure and names, view samples, and summarize numeric statistics.
Selecting data in a data frame20:30
Learn to subset and filter data frames in R by selecting rows and columns, inspecting names and types, and filtering for country France while addressing negative quantities.
If else function8:21
Conditions6:09
Functions with Conditions8:29
Build a function of x using if-else conditions to classify a person as child, teenager, or adult, and apply it to many people.
Forloops7:27
Applying a function inside the loop3:36
Apply a function inside a loop to categorize each person by age, iterate through a list of names, and print the age-based categories.
For-loop on a data-frame10:16
Applying the function on a data frame7:59
Apply a country function to every row of a data frame, create a new column, and compare for loops with if-else logic to label United Kingdom or France.
Assignment1:37
Assignment Section 4 answer Part 18:53
Explore the cars dataset by counting observations and features, computing cylinder counts with unique and table, and summarizing speed, horsepower, and price, while renaming the name column to car name.
Assignment Section 4 answer part 26:39
The lecture shows adding a price category to the cars dataset in R: budget under 20000, suitable under 35000, else expensive, by subsetting name and price and counting results.
Summary4:08
Review of section four covers R data structures (vectors, lists, matrices, data frames) and data types, with importing, summarizing continuous and categorical variables, subsetting, filtering, and creating new columns.

Intro8:53
Examine data types, including categorical and numeric, and distinguish discrete from continuous data while applying central tendency measures (mean, median, mode) and spread (standard deviation, variance) with outliers.
Central tendency6:21
Explore descriptive statistics and central tendency in statistical thinking and data science with R, including the mean, median, and mode, and how outliers affect the center of a data distribution.
Measures of spread7:26
Calculating measures of spread and centrality Part 111:25
Compare mean and median to see how outliers shift the average, and compute spread using standard deviation, variance, range, and the 25th percentile in R.
Calculating measures of spread and centrality PART 27:17
Explore key measures of centrality and spread in data, including mean, median, range, standard deviation, variance, and the coefficient of variation, with applications to demand forecasting in supply chain.
Central tendency assignment
Detecting outliers4:18
Identify and interpret outliers in supply chain data using the 1.5 iqr rule, explain their impact on forecasting and inventory decisions, and explore causes behind anomalies.
Detecting outliers in R6:48

Intro5:53
Explore data manipulation with the dplyr package by using verbs like filter, group_by, summarize, arrange, mutate, and slice, and connect them with the pipe to perform chained operations.
Intro to dplyr5:31
Investigate with Dplyr20:59
Explore how to use dplyr to investigate sales by country, group by country, summarize units sold, and sort results, while filtering out outliers and cleaning data.
Unique invoices3:23
Average Bucket value per country10:30
Average items in an invoice10:00
Joining17:12
Learn how to join data sets using left and full joins to combine sales, stocks, and production data, using common keys and by-equal conditions to align records.
Changing date time to date3:33
Pivot wider10:01
Pivot longer6:52
Separate and Paste5:14
Master separating a column by space into date and time with separate, then paste the parts back, enabling data cleaning and wrangling for supply chain analysis.
Putting it all together8:42
Assignment : New York airlines1:44
Analyze the 2013 New York flights data package by joining flights with planes, airlines, and weather to identify most used planes and most punctual airlines.
Assignment : Question 1 answer8:15
Assignment question 2&3 answer8:21
Use R to identify the busiest month and compute airline punctuality by aggregate delay and mean of departure/arrival delays.
Assignment question 4,5,616:09
Identify the destination with the longest average air time across origins, and find the airline with the most delays and the carrier with highest seat capacity using joins and aggregations.
Assignment question 76:31
Analyze which airplane model and manufacturer are most used by applying joins, group by, and distinct counts, handle missing data, and practice data manipulation in supply chain analytics.
Summary4:00
Master data manipulation with core verbs filter, mutate, arrange, slice, and summarize, using pipes to filter multiple variables and create summaries; apply group by, pivot longer, and left join.

Introduction5:14
Learn the basics of plotting and the plot grammar, including aesthetics and geometry, to create scatterplots, lines, histograms, and customized visuals with color, size, and shape.
Line plots22:50
Create line plots of total sales over time by country in R, using long data, color by country, and facet wrap for separate country trends.
Scatter plots12:34
Bar plots11:55
Distribution plots8:48
Explore distribution plots to visualize continuous data and compare iris species—virginica, versicolor, and setosa—by length, revealing how distributions differ and hinting at simple classification.
Box plots7:46
Explore box plots that compare cylinder categories (six and eight) on a continuous y axis, revealing quartiles, medians, and outliers while learning to make the visualization interactive.
Histograms5:43
Explore histograms in R to characterize juice distributions, adjust bin width and bandwidth, identify outliers, and compare apple, grape, and other juice distributions.
Histograms 25:02
Learn to arrange multiple histograms together using the Great Range function, compare distributions (normal, not normal, discrete), and customize colors to visualize data in R.
Assignment2:33
Assignment Solution Question 1 and 27:54
Assignment Solution Part 28:14
Use scatterplots to relate price, horsepower, and mileage, and visualize iris species distributions; filter cars by cylinder count, highlighting the role of visualization in data science.
Summary3:53

Intro7:16
Introduce probability and randomness through random experiments, random variables, and sample spaces, with coin flips and the two-child problem. Explain discrete versus continuous distributions and their role in data science.
Probability introduction6:22
Explore probability by distinguishing discrete and continuous random variables and defining events by likelihood. Use the probability mass function and expected value to compute outcomes, with fair and unfair dice.
Variance and standard deviation3:48
Learn how to measure data dispersion by computing variance and standard deviation from the mean, using squared distances multiplied by the probability mass function, and interpret the spread.
Overlapping of probability5:32
Desecrate and continuous probability
Conditional Probability3:59
Explore conditional probability and independent events through practical examples, from a die showing even and divisible by three to conditioning on at least two heads in three coin flips.
Question 1 Probability5:33
Explore the probability distribution of X, the sum of two fair dice, by enumerating all 36 outcomes to compute X>7 and X odd.
Question 2 Probability5:29
Rolling the dice
Binomial distribution6:52
Question 1 Binomial5:28
Question 2 Binomial2:27
Compute the probability of exactly nine heads in twenty fair coin flips using the binomial formula, multiplying the number of combinations by (0.5)^20.
For looping on a binomial distribution6:32
Explore looping on a binomial distribution by summing probabilities with a for loop, storing results in a probability vector, and analyzing outcomes for 20 flips with a fair coin.
Binomial assignment
Poisson Distribution4:02
Poisson distribution in R8:11
Learn to model world cup goal counts with the Poisson distribution in R using lambda = 2.5, compute P(X=5) with the Poisson formula, and visualize probabilities from 0 to 10.
Continuos Distributions6:33
Normal distributions example8:31
Compute probabilities for a normal distribution with mean 20 and sd 2, including p(X>25) and p(18≤X≤23) using the normal cdf; explore simulation and histogram convergence.
Uniform distribution example6:05
Illustrate the uniform distribution in discrete and continuous forms with coins, random generators, lotteries, and everyday encounters, and explain probability and the effect of sample size on the histogram.
Central Limit theorem3:21
Explore the central limit theorem: summing independent variables yields a normal distribution, enabling easy simulations; see how binomial outcomes resemble normal when combined, as in coin flips.
Associations5:39
Calculating Relative risk in R4:54
Association among numerical variables4:16
Correlation Matrix19:47
Cause and effect5:16
Demonstrates that association is not causation by illustrating experiments with randomization, control and treatment groups, and double-blind designs to establish cause and effect while confounding variables obscure links.
Bayes theory8:50
Bayes theory uses prior knowledge and conditioning to update the probability of events, illustrated by examples on gender and smoking, two-child scenarios, and Bayes rule applications.

Distributions Intro7:35
Explore how data form distributions around a center, with the normal and standard normal shapes, the central limit theorem, and discrete versus continuous examples like binomial and revenue simulations.
Distribution shapes8:24
Explore how distribution shapes affect inventory decisions and service levels, comparing normal and skewed distributions, discrete and continuous types, and models like binomial, gamma, and exponential.
Chi-square Tests2:54
Apply the chi-square test to assess normality by comparing a sample with a simulated normal reference in Excel, using null and alternative hypotheses and p-values around 0.05.
Chi-square test in excel8:21
Explore apple juice data in Excel by building a histogram and calculating the average and standard deviation. Create a frequency or contingency table with bucketed ranges and determine cumulative observations.
Part 26:55
Explore using a normal distribution with mean 100 and standard deviation 20 to compute cdf, pdf, and probabilities, and apply a chi-square test to validate observed versus expected counts.
Cover for 90% of distribution4:49
Assignment Distribution in Excel1:02
Assignment answer : Bike demand7:38
Distributions in R21:04
Explore fitting distributions in R with the Gammell package, distinguish discrete from continuous data, and compare apple, grape, and cantaloupe samples to estimate 90% coverage.
Assignment1:04
Assignment answer5:27
Practice using R to fit continuous demand distributions with chi-square tests and estimate a price-demand relationship via linear regression, concluding normally distributed demand and a negative price effect.

Simulation Intro4:49
Simulations6:00
Simulate interactions of discrete, normal, and exponential distributions to estimate daily revenue, exploring arrival patterns, table capacity, and profit through 10,000 coding simulations over 16 hours.
Restaurant Example 16:25
Simulate restaurant arrivals with exponential interarrival times at a given rate, filling an arrivals vector until 960 minutes, and illustrate the commutative sum and while loop logic in R.
Customer's number11:33
Expected revenue6:15
Explore modeling customer arrivals to revenue in R by building an expected revenue vector and simulating revenue with a normal distribution, then computing total expected revenue.
Simulation assignment
Conclusion6:00
Learn to build a revenue simulation in R using a function and 10,000 runs to estimate daily revenue and the normal distribution, illustrating the central limit theorem.
Waiting lines6:35
Explore how waiting lines are modeled with queuing theory and simulation to assess bank capacity, arrival and service distributions, and performance metrics like utilization, queue length, and waiting time.
Example1:24
Explore the M/G/1 queueing model with a single server, exponential interarrival times, and general service times, to assess if one cashier keeps waiting times under two minutes.
Waiting lines in Excel9:09
Waiting lines in R10:37
Explore simulating a single-server waiting line in R using arrival (exponential) and service (normal) times, building a waiting time function and looping 400 events to study queue dynamics.
Simulating waiting lines 400 times3:22
Explore simulating a grocery store waiting model with 400 runs, adding random arrival and service times inside a for loop, and examine mean waiting times between one and two servers.

Waiting line at a call centre6:09
Model a call center queue in R to determine required servers for keeping waiting time below five minutes, using arrival rate 40/hour, seven-minute service, and exponential times in a simulation.
Defining the right K8:17
Define the right K by simulating various call center rep counts from 1 to 20, computing waiting times and utilization, then identify the rep level that meets the threshold.
Capacity Constraints7:25
Explore capacity constraints in a bank operations scenario using exponential arrival and service times, with a 55-customer capacity and a 10-minute waiting-time threshold, analyzed via simulation.
Assignment1:18
Apply a for-loop to find the optimal number of servers in a bank model with exponential arrivals and 30-capacity, keeping mean waiting under 10 minutes.
Assignment solution5:29
Sequential service on one system4:32
Link multiple services in a bank to model sequential service on one system, routing arrivals 70% to teller and 30% to customer service, with varied service times.
Many Services5:57
Model a multi-service bank with registering, tellers, and customer service; exponential arrivals at 150 per hour, 65% to tellers, 35% to customer service, to estimate the median waiting time.
Multiple service simulations in R11:28
Conclusion7:00
Apply a multi-stage queuing model to analyze mean waiting times and service times across registration and other stages, and demonstrate how adding a machine reduces waits and guides service decisions.
Assignment1:38
Assignment Solution9:10
Model a bank with registration (30 seconds per registration), tellers, and customer service using exponential arrivals at 150 per hour. Split arrivals 55% to tellers and 45% to customer service.
Summary2:24

Requirements

No R experience required — Module 1 teaches R from absolute scratch including installation, syntax, and data manipulation.
No prior statistics or probability knowledge required — every concept is introduced from first principles before any R code is written.
Basic numeracy and comfort working with data (spreadsheets, simple formulas) is helpful but not essential.
A computer capable of running R and RStudio (both free) — installation walkthroughs are provided at the start of the course.
Tidyverse and Tidymodels are free, open-source R packages — installation is guided within the relevant modules.

Description

★ Taught by a Ph.D. in Supply Chain and Forecasting from the University of Bordeaux

Haytham’s doctoral research was in forecasting and statistical modelling. He also applies these methods professionally as a consultant to retail and supply chain organisations across the UAE and France, and that applied experience informs how the material in this course is sequenced and explained.

★ Each method is introduced through an applied business question before the underlying theory

Whether a campaign had a genuine effect, whether a change in sales is meaningful or within normal variation, what the probability of a given outcome is — these are the framing questions for hypothesis testing, ANOVA, regression, and Bayesian inference. The statistical reasoning is taught in service of these questions rather than as an abstract sequence of topics.

★ 27 hours across 16 sections — broader in scope than most introductory R statistics courses

The curriculum includes topics that are frequently omitted from shorter courses: queuing theory, multinomial logistic regression, K-means clustering with silhouette analysis, model stacking with Tidymodels, step AIC for model selection, and Fisher’s exact test. Each is treated with the same care given to the more commonly taught material.

★ Simulation coverage extends beyond Monte Carlo to queuing theory

Most business statistics courses that cover simulation stop at Monte Carlo revenue estimation. This course also covers queuing theory: waiting line analysis, determining the appropriate number of service channels, capacity constraints, and multi-server systems such as call centres. These methods are common in operations research but are not often taught alongside introductory statistics.

★ Concludes with a complete machine learning pipeline in Tidymodels

The course progresses from K-means clustering and decision trees through logistic classification with confusion matrices and ROC curves, to a full Tidymodels workflow including cross-validation, hyperparameter tuning, and model stacking. The aim is a working, reproducible ML pipeline rather than an isolated example.

COURSE DESCRIPTION

This course covers the statistical methods that support evidence-based business decisions. Questions such as whether an observed difference is statistically meaningful, or what the probability of a given outcome is, recur across marketing, operations, and finance. The course addresses these questions directly, using R as the analytical tool throughout.

No prior experience with R or statistics is assumed. The course progresses from R fundamentals and probability theory through distributions, Monte Carlo and queuing simulation, hypothesis testing, ANOVA, linear and logistic regression, Ridge and Lasso regularisation, Bayesian inference, K-means clustering, and a complete Tidymodels machine learning pipeline. Across 16 sections and 243 lectures, each method is introduced alongside the type of business question it is typically used to answer.

The instructor holds a Ph.D. in forecasting and statistical modelling from the University of Bordeaux and applies these methods professionally as a consultant to retail and supply chain organisations across the UAE and France. Over 18,300 students have enrolled in the course to date.

WHAT MAKES THIS COURSE DIFFERENT

[ BIZ ]

Applied framing throughout

Each statistical method is paired with the type of business question it is commonly used to answer.

[ A→Z ]

No prerequisites

R is installed and introduced in the first section. No prior programming or statistics background is required.

[ ML ]

Extends to applied machine learning

The course concludes with cross-validated, tuned, and stacked models built with Tidymodels.

TOOLS AND TECHNOLOGIES COVERED

WHAT YOU WILL LEARN

✓ Learn R from scratch: installation, syntax, data structures, manipulation with dplyr and tidyr, and visualisation with ggplot2

✓ Calculate measures of spread and centrality, detect outliers statistically, and investigate data with dplyr on real invoice and airline datasets

✓ Master probability fundamentals: conditional probability, Bayes’ theorem, relative risk, odds and odds ratio, and correlation matrices

✓ Work with discrete and continuous distributions: Binomial, Poisson, Normal, Uniform — and apply the Central Limit Theorem

✓ Build Monte Carlo revenue simulations and queuing theory models for waiting lines, call centres, and multi-server service systems in R

✓ Conduct the full hypothesis testing toolkit: t-tests (one and two sample), chi-square, Fisher’s exact test, tests for association, and Bayesian inference

✓ Apply one-way and two-way ANOVA with Tukey HSD post-hoc tests to compare group means in real business data

✓ Build and evaluate linear and multiple regression models: cleaning, EDA, feature importance, step AIC selection, and model comparison with ANOVA

✓ Build logistic and multinomial logistic regression models with odds ratios, log odds, confusion matrices, and ROC curves for classification

✓ Apply Ridge and Lasso regularisation with cross-validation: multi-collinearity, encoding, non-zero coefficient selection, and prediction

✓ Apply K-means clustering with silhouette analysis and interactive 3D visualisation, and decision trees for supervised classification

✓ Build complete Tidymodels ML pipelines: time series features, recipes, model workflows, resampling, metrics, model stacking, and future prediction

COURSE CONTENT — 16 SECTIONS · 243 LECTURES · 27 HOURS · 79 DOWNLOADABLE RESOURCES

PART 1 — R FOUNDATIONS

SECTION 1

Introduction to data science and R

Set the context for the entire course. Understand the types of analytics (descriptive, predictive, prescriptive), the objectives and applications of data science, and the data science process. Learn why R is the language of choice for statistical analysis and business modelling. Install R and RStudio, set up your project, and install packages.

SECTION 2

R programming fundamentals

Build R from scratch with a business analytics mindset. Work with different data structures and types, perform arithmetic and write functions, create lists, import and explore data, select from dataframes, apply if-else logic, write functions with conditions, and build for loops. Includes a two-part graded assignment.

SECTION 3

Data manipulation with dplyr

The pandas of R. Apply dplyr to real invoice data: investigate patterns, calculate unique invoices, average basket values per country, average items per invoice, join datasets, handle datetime, and use pivot wider, pivot longer, separate, and paste. Includes the New York Airlines graded assignment — seven questions on a real airline dataset.

R Tidyverse

SECTION 4

Visualisation with ggplot2

Data visualisation is not decoration — it is analysis. Build line plots, scatter plots, bar plots, distribution plots, box plots, and histograms in ggplot2, applied to real business datasets. Graded two-part assignment.

R ggplot2

PART 2 — PROBABILITY & DISTRIBUTIONS

SECTION 5

Probability, distributions, associations, and Bayes

Probability is the language of uncertainty. Cover probability fundamentals, variance and standard deviation, overlapping events, discrete vs continuous probability, conditional probability, Binomial and Poisson distributions with for-loop automation, continuous distributions (Normal and Uniform), the Central Limit Theorem, relative risk, associations, correlation matrices, and Bayes’ theorem — all introduced through business questions. Includes distribution shapes, chi-square tests in Excel, bike demand assignment, and distributions in R.

R Excel

PART 3 — SIMULATION

SECTION 6

Monte Carlo business simulation

Stop estimating with a single number. Build Monte Carlo simulations in R: restaurant revenue example, modelling customer numbers, calculating expected revenue, and simulation assignment. A tool most business analysts have never used but every strategic planner needs.

SECTION 7

Queuing theory and waiting line simulation

The second simulation module — and the one most statistics courses omit entirely. Apply queuing theory to real service operations: waiting line fundamentals, worked examples in Excel and R, simulating 400 waiting line scenarios, call centre modelling, defining the optimal number of service channels (K), capacity constraints, sequential single-system service, multiple parallel services, and full multiple-service simulation in R. Includes graded assignment.

R Excel

PART 4 — STATISTICAL INFERENCE

SECTION 8

Hypothesis testing, Bayesian inference, and ANOVA

The core of applied statistics. Build the full hypothesis testing toolkit: sampling, t-tests (one and two sample), non-normality handling, chi-square test for independence, Fisher’s exact test, UK drivers case study, tests for association, hypothesis test for binomial distributions, Bayesian inference, posterior estimation, odds and odds ratio. Then apply ANOVA: one-way, two-way, Tukey HSD post-hoc tests, and interpretation. Quiz on regression and ANOVA.

PART 5 — REGRESSION MODELS

SECTION 9

Linear and multiple regression

Model relationships between business variables. Introduce linear regression in Excel and R. Clean data and perform EDA on a housing dataset. Build one-variable and multiple regression models. Apply model interaction terms. Compare models with ANOVA. Perform further data analysis, regress all variables, measure feature importance, and apply step AIC for model selection.

R Excel

SECTION 10

Logistic and multinomial logistic regression

When the outcome is binary or multi-class. Build logistic regression from first principles: city vs price per square foot, predicting individual observations, odds and probability, fitting all variables, understanding multiple predictors, testing categorical variables, comparing three models, log odds of categorical variables, and log odds interpretation. Extend to multinomial logistic regression, prediction, testing socioeconomic status, and model improvement. Quiz included.

SECTION 11

Ridge and Lasso regularisation

Standard regression fails on messy, multi-collinear real data. Apply regularised regression: understand the loss function, detect multi-collinearity, split data, encode categorical variables, train and cross-validate Ridge regression, visualise Ridge coefficients, apply Lasso with minimum squared error, cross-validate Lasso, build model matrix for logistic Lasso, identify non-zero coefficients, and interpret Lasso feature selection.

PART 6 — MACHINE LEARNING

SECTION 12

Unsupervised and supervised machine learning in R

Build the machine learning toolkit from first principles. Decision tree demo and theory. K-means clustering with elbow method (total sum of squares), silhouette analysis, and interactive 3D scatter plot visualisation. Forecasting with ML overview. Supervised decision tree learning and model comparison. Graded assignment.

SECTION 13

Classification models: logistic regression, decision trees, and model evaluation

Apply supervised learning to a real classification dataset. Explore and orient to the data. Build a correlation matrix. Split into training and testing sets, control the fitting process, apply logistic regression classification, extract probabilities, build and read a confusion matrix, plot and interpret an ROC curve, build a decision tree classifier, compare all models, and draw conclusions. Graded assignment.

SECTION 14

Tidymodels: the complete ML workflow

The capstone section. Apply Tidymodels to a full time series machine learning problem. Convert data to tsibble, generate time series features, handle missing values by level, split and log-transform the data, build recipes for feature engineering, define model workflows, resample with cross-validation, collect and compare accuracy metrics, stack models for improved performance, predict and visualise the future. This section is the professional-grade ML endpoint of the entire statistical journey.

R Tidymodels Tidyverse

THIS COURSE IS NOT FOR YOU IF...

✗ You want a Python statistics course — this course uses R throughout; separate Python analytics courses are available in the instructor’s catalogue

✗ You are looking for a pure data engineering or SQL course — this course focuses on statistical analysis and machine learning, not data infrastructure or pipelines

✗ You need deep learning or neural networks — this course covers traditional machine learning with Tidymodels; deep learning is a separate specialisation

✗ You want a maths-heavy academic statistics course — this course prioritises business intuition and practical R application over formal mathematical proofs

REQUIREMENTS

● No R experience required — Section 1 teaches R from absolute scratch including installation, syntax, data structures, and data manipulation.

● No prior statistics or probability knowledge required — every concept is introduced from first principles before any R code is written.

● Basic numeracy and comfort working with data — spreadsheets and simple formulas — is helpful but not essential.

● A computer capable of running R and RStudio (both free) — full installation walkthroughs are provided at the start of the course.

● Tidyverse and Tidymodels are free, open-source R packages — installation is guided within the relevant sections.

YOUR INSTRUCTOR

Haytham Omar, Ph.D.

Supply Chain & Business Intelligence Consultant · Developer · Trainer — UAE & France · Co-Founder, Keip

Haytham holds a Ph.D. in Supply Chain and Forecasting from the University of Bordeaux and a Master of Science in Global Supply Chain Management from Bordeaux École de Management. His doctoral research was in forecasting and statistical modelling.

He is an active consultant who works with retailers and supply chain organisations including Sephora France and Sharaf Group Dubai, and is co-founder of Keip , a SaaS platform for retail and supply chain management. He has trained over 70,000 professionals across more than 70 workshops in the UAE, with additional clients including Aster Group, DNO, PWC Training Academy Dubai, Qarar, and the Higher College of Technology.

He's also the creator of the Inventorize package for R and Python, used by over 90,000 supply chain professionals worldwide. This course is the statistical foundation underneath everything else he teaches — forecasting, inventory, retail analytics, supply chain design. It's the one to start with.

A complete, applied foundation in statistics and machine learning with R.

16 sections · 27 hours · 243 lectures · R from zero · Probability to ML · Ph.D. instructor

Who this course is for:

Supply chain and operations professionals
Data science career changers
Marketing and commercial analysts
Business analysts and BI professionals
Finance and risk professionals
Students and early-career analysts

Statistical Thinking & Data science with R.

What you'll learn

Explore related topics

Course content

Introduction8 lectures • 34min

Installing R and R Studio8 lectures • 34min

R fundamentals17 lectures • 2hr 32min

Descriptive statistics7 lectures • 52min

Data cleaning and manipulation18 lectures • 2hr 33min

Visulalization12 lectures • 1hr 42min

Probabilities23 lectures • 2hr 25min

Fitting Distributions11 lectures • 1hr 15min

Simulations11 lectures • 1hr 12min

Simulation with Capacity Constraints12 lectures • 1hr 11min

Requirements

Description

Who this course is for: