
Explore statistical thinking for data science with R, from data cleaning and visualization to probability, hypothesis testing, simulations, and regression models, including regularized approaches for business analytics.
Explore descriptive, predictive, and prescriptive analytics as three interconnected approaches, using descriptive visuals, predictive models like linear and logistic regression, and prescriptive optimization to improve outcomes.
Learn how data science transforms raw data into information, replacing subjectivity with objectivity, speeding analytics, and handling structured and unstructured data for machine learning and analytics.
Explore data science applications across marketing, supply chain, and finance, including rfm analysis, churn prediction, forecasting, stock replenishment, and optimization powered by ai, computer vision, and natural language processing.
Identify problems and goals, acquire and clean structured and unstructured data, integrate sources, then analyze, model, and interpret results, deploying automated, real-time data science workflows.
Explore why the lecture advocates using R for data science, highlighting its rich statistical packages, beautiful visualizations, large community, and ease for beginners, while comparing Python's speed and versatility.
Discover what the R statistical language is—a free software environment for statistical computing and graphics—and its role in data science, with comparisons to Python and its growing community.
Explore the R Studio workspace, including the console, environment, plot area, and packages, and learn to start a script, run code, and view output from downloaded packages.
Create a new project folder for your supply chain data science work, set up a dedicated workspace, and save outputs there to keep data, projects, and analyses organized.
Install packages with install.packages, load them with library, and set up a dedicated project folder with a resource folder for easy data import and export.
Explore core data structures in R, including vectors, data frames, and lists, and types like strings, numeric, integers, dates, and factors. Learn type conversions, shaping, for loops, and simple functions.
Discover core data structures in R, including vectors, lists, matrices, and data frames, with observations and attributes guiding modeling. Learn essential data types: character, numeric, integer, factor, and dates.
Practice arithmetic in R, save results as variables, and explore vectors, names, and selective indexing, using a UK online retailer data set to simulate stocks and forecasting.
Create a list in R by combining numeric and character vectors, then access elements with double brackets and named indices, exploring subset and extraction techniques.
Import diverse data formats into R with the Reader package, then explore the retail data by inspecting structure and names, view samples, and summarize numeric statistics.
Learn to subset and filter data frames in R by selecting rows and columns, inspecting names and types, and filtering for country France while addressing negative quantities.
Build a function of x using if-else conditions to classify a person as child, teenager, or adult, and apply it to many people.
Apply a function inside a loop to categorize each person by age, iterate through a list of names, and print the age-based categories.
Apply a country function to every row of a data frame, create a new column, and compare for loops with if-else logic to label United Kingdom or France.
Explore the cars dataset by counting observations and features, computing cylinder counts with unique and table, and summarizing speed, horsepower, and price, while renaming the name column to car name.
The lecture shows adding a price category to the cars dataset in R: budget under 20000, suitable under 35000, else expensive, by subsetting name and price and counting results.
Review of section four covers R data structures (vectors, lists, matrices, data frames) and data types, with importing, summarizing continuous and categorical variables, subsetting, filtering, and creating new columns.
Examine data types, including categorical and numeric, and distinguish discrete from continuous data while applying central tendency measures (mean, median, mode) and spread (standard deviation, variance) with outliers.
Explore descriptive statistics and central tendency in statistical thinking and data science with R, including the mean, median, and mode, and how outliers affect the center of a data distribution.
Compare mean and median to see how outliers shift the average, and compute spread using standard deviation, variance, range, and the 25th percentile in R.
Explore key measures of centrality and spread in data, including mean, median, range, standard deviation, variance, and the coefficient of variation, with applications to demand forecasting in supply chain.
Identify and interpret outliers in supply chain data using the 1.5 iqr rule, explain their impact on forecasting and inventory decisions, and explore causes behind anomalies.
Explore data manipulation with the dplyr package by using verbs like filter, group_by, summarize, arrange, mutate, and slice, and connect them with the pipe to perform chained operations.
Explore how to use dplyr to investigate sales by country, group by country, summarize units sold, and sort results, while filtering out outliers and cleaning data.
Learn how to join data sets using left and full joins to combine sales, stocks, and production data, using common keys and by-equal conditions to align records.
Master separating a column by space into date and time with separate, then paste the parts back, enabling data cleaning and wrangling for supply chain analysis.
Analyze the 2013 New York flights data package by joining flights with planes, airlines, and weather to identify most used planes and most punctual airlines.
Use R to identify the busiest month and compute airline punctuality by aggregate delay and mean of departure/arrival delays.
Identify the destination with the longest average air time across origins, and find the airline with the most delays and the carrier with highest seat capacity using joins and aggregations.
Analyze which airplane model and manufacturer are most used by applying joins, group by, and distinct counts, handle missing data, and practice data manipulation in supply chain analytics.
Master data manipulation with core verbs filter, mutate, arrange, slice, and summarize, using pipes to filter multiple variables and create summaries; apply group by, pivot longer, and left join.
Learn the basics of plotting and the plot grammar, including aesthetics and geometry, to create scatterplots, lines, histograms, and customized visuals with color, size, and shape.
Create line plots of total sales over time by country in R, using long data, color by country, and facet wrap for separate country trends.
Explore distribution plots to visualize continuous data and compare iris species—virginica, versicolor, and setosa—by length, revealing how distributions differ and hinting at simple classification.
Explore box plots that compare cylinder categories (six and eight) on a continuous y axis, revealing quartiles, medians, and outliers while learning to make the visualization interactive.
Explore histograms in R to characterize juice distributions, adjust bin width and bandwidth, identify outliers, and compare apple, grape, and other juice distributions.
Learn to arrange multiple histograms together using the Great Range function, compare distributions (normal, not normal, discrete), and customize colors to visualize data in R.
Use scatterplots to relate price, horsepower, and mileage, and visualize iris species distributions; filter cars by cylinder count, highlighting the role of visualization in data science.
Introduce probability and randomness through random experiments, random variables, and sample spaces, with coin flips and the two-child problem. Explain discrete versus continuous distributions and their role in data science.
Explore probability by distinguishing discrete and continuous random variables and defining events by likelihood. Use the probability mass function and expected value to compute outcomes, with fair and unfair dice.
Learn how to measure data dispersion by computing variance and standard deviation from the mean, using squared distances multiplied by the probability mass function, and interpret the spread.
Explore conditional probability and independent events through practical examples, from a die showing even and divisible by three to conditioning on at least two heads in three coin flips.
Explore the probability distribution of X, the sum of two fair dice, by enumerating all 36 outcomes to compute X>7 and X odd.
Compute the probability of exactly nine heads in twenty fair coin flips using the binomial formula, multiplying the number of combinations by (0.5)^20.
Explore looping on a binomial distribution by summing probabilities with a for loop, storing results in a probability vector, and analyzing outcomes for 20 flips with a fair coin.
Learn to model world cup goal counts with the Poisson distribution in R using lambda = 2.5, compute P(X=5) with the Poisson formula, and visualize probabilities from 0 to 10.
Compute probabilities for a normal distribution with mean 20 and sd 2, including p(X>25) and p(18≤X≤23) using the normal cdf; explore simulation and histogram convergence.
Illustrate the uniform distribution in discrete and continuous forms with coins, random generators, lotteries, and everyday encounters, and explain probability and the effect of sample size on the histogram.
Explore the central limit theorem: summing independent variables yields a normal distribution, enabling easy simulations; see how binomial outcomes resemble normal when combined, as in coin flips.
Demonstrates that association is not causation by illustrating experiments with randomization, control and treatment groups, and double-blind designs to establish cause and effect while confounding variables obscure links.
Bayes theory uses prior knowledge and conditioning to update the probability of events, illustrated by examples on gender and smoking, two-child scenarios, and Bayes rule applications.
Explore how data form distributions around a center, with the normal and standard normal shapes, the central limit theorem, and discrete versus continuous examples like binomial and revenue simulations.
Explore how distribution shapes affect inventory decisions and service levels, comparing normal and skewed distributions, discrete and continuous types, and models like binomial, gamma, and exponential.
Apply the chi-square test to assess normality by comparing a sample with a simulated normal reference in Excel, using null and alternative hypotheses and p-values around 0.05.
Explore apple juice data in Excel by building a histogram and calculating the average and standard deviation. Create a frequency or contingency table with bucketed ranges and determine cumulative observations.
Explore using a normal distribution with mean 100 and standard deviation 20 to compute cdf, pdf, and probabilities, and apply a chi-square test to validate observed versus expected counts.
Explore fitting distributions in R with the Gammell package, distinguish discrete from continuous data, and compare apple, grape, and cantaloupe samples to estimate 90% coverage.
Practice using R to fit continuous demand distributions with chi-square tests and estimate a price-demand relationship via linear regression, concluding normally distributed demand and a negative price effect.
Simulate interactions of discrete, normal, and exponential distributions to estimate daily revenue, exploring arrival patterns, table capacity, and profit through 10,000 coding simulations over 16 hours.
Simulate restaurant arrivals with exponential interarrival times at a given rate, filling an arrivals vector until 960 minutes, and illustrate the commutative sum and while loop logic in R.
Explore modeling customer arrivals to revenue in R by building an expected revenue vector and simulating revenue with a normal distribution, then computing total expected revenue.
Learn to build a revenue simulation in R using a function and 10,000 runs to estimate daily revenue and the normal distribution, illustrating the central limit theorem.
Explore how waiting lines are modeled with queuing theory and simulation to assess bank capacity, arrival and service distributions, and performance metrics like utilization, queue length, and waiting time.
Explore the M/G/1 queueing model with a single server, exponential interarrival times, and general service times, to assess if one cashier keeps waiting times under two minutes.
Explore simulating a single-server waiting line in R using arrival (exponential) and service (normal) times, building a waiting time function and looping 400 events to study queue dynamics.
Explore simulating a grocery store waiting model with 400 runs, adding random arrival and service times inside a for loop, and examine mean waiting times between one and two servers.
Model a call center queue in R to determine required servers for keeping waiting time below five minutes, using arrival rate 40/hour, seven-minute service, and exponential times in a simulation.
Define the right K by simulating various call center rep counts from 1 to 20, computing waiting times and utilization, then identify the rep level that meets the threshold.
Explore capacity constraints in a bank operations scenario using exponential arrival and service times, with a 55-customer capacity and a 10-minute waiting-time threshold, analyzed via simulation.
Apply a for-loop to find the optimal number of servers in a bank model with exponential arrivals and 30-capacity, keeping mean waiting under 10 minutes.
Link multiple services in a bank to model sequential service on one system, routing arrivals 70% to teller and 30% to customer service, with varied service times.
Model a multi-service bank with registering, tellers, and customer service; exponential arrivals at 150 per hour, 65% to tellers, 35% to customer service, to estimate the median waiting time.
Apply a multi-stage queuing model to analyze mean waiting times and service times across registration and other stages, and demonstrate how adding a machine reduces waits and guides service decisions.
Model a bank with registration (30 seconds per registration), tellers, and customer service using exponential arrivals at 150 per hour. Split arrivals 55% to tellers and 45% to customer service.
★ Taught by a Ph.D. in Supply Chain and Forecasting from the University of Bordeaux
Haytham’s doctoral research was in forecasting and statistical modelling. He also applies these methods professionally as a consultant to retail and supply chain organisations across the UAE and France, and that applied experience informs how the material in this course is sequenced and explained.
★ Each method is introduced through an applied business question before the underlying theory
Whether a campaign had a genuine effect, whether a change in sales is meaningful or within normal variation, what the probability of a given outcome is — these are the framing questions for hypothesis testing, ANOVA, regression, and Bayesian inference. The statistical reasoning is taught in service of these questions rather than as an abstract sequence of topics.
★ 27 hours across 16 sections — broader in scope than most introductory R statistics courses
The curriculum includes topics that are frequently omitted from shorter courses: queuing theory, multinomial logistic regression, K-means clustering with silhouette analysis, model stacking with Tidymodels, step AIC for model selection, and Fisher’s exact test. Each is treated with the same care given to the more commonly taught material.
★ Simulation coverage extends beyond Monte Carlo to queuing theory
Most business statistics courses that cover simulation stop at Monte Carlo revenue estimation. This course also covers queuing theory: waiting line analysis, determining the appropriate number of service channels, capacity constraints, and multi-server systems such as call centres. These methods are common in operations research but are not often taught alongside introductory statistics.
★ Concludes with a complete machine learning pipeline in Tidymodels
The course progresses from K-means clustering and decision trees through logistic classification with confusion matrices and ROC curves, to a full Tidymodels workflow including cross-validation, hyperparameter tuning, and model stacking. The aim is a working, reproducible ML pipeline rather than an isolated example.
COURSE DESCRIPTION
This course covers the statistical methods that support evidence-based business decisions. Questions such as whether an observed difference is statistically meaningful, or what the probability of a given outcome is, recur across marketing, operations, and finance. The course addresses these questions directly, using R as the analytical tool throughout.
No prior experience with R or statistics is assumed. The course progresses from R fundamentals and probability theory through distributions, Monte Carlo and queuing simulation, hypothesis testing, ANOVA, linear and logistic regression, Ridge and Lasso regularisation, Bayesian inference, K-means clustering, and a complete Tidymodels machine learning pipeline. Across 16 sections and 243 lectures, each method is introduced alongside the type of business question it is typically used to answer.
The instructor holds a Ph.D. in forecasting and statistical modelling from the University of Bordeaux and applies these methods professionally as a consultant to retail and supply chain organisations across the UAE and France. Over 18,300 students have enrolled in the course to date.
WHAT MAKES THIS COURSE DIFFERENT
[ BIZ ]
Applied framing throughout
Each statistical method is paired with the type of business question it is commonly used to answer.
[ A→Z ]
No prerequisites
R is installed and introduced in the first section. No prior programming or statistics background is required.
[ ML ]
Extends to applied machine learning
The course concludes with cross-validated, tuned, and stacked models built with Tidymodels.
TOOLS AND TECHNOLOGIES COVERED
R / RStudio | Tidyverse | Tidymodels | ggplot2 | dplyr / tidyr | Excel
WHAT YOU WILL LEARN
✓ Learn R from scratch: installation, syntax, data structures, manipulation with dplyr and tidyr, and visualisation with ggplot2
✓ Calculate measures of spread and centrality, detect outliers statistically, and investigate data with dplyr on real invoice and airline datasets
✓ Master probability fundamentals: conditional probability, Bayes’ theorem, relative risk, odds and odds ratio, and correlation matrices
✓ Work with discrete and continuous distributions: Binomial, Poisson, Normal, Uniform — and apply the Central Limit Theorem
✓ Build Monte Carlo revenue simulations and queuing theory models for waiting lines, call centres, and multi-server service systems in R
✓ Conduct the full hypothesis testing toolkit: t-tests (one and two sample), chi-square, Fisher’s exact test, tests for association, and Bayesian inference
✓ Apply one-way and two-way ANOVA with Tukey HSD post-hoc tests to compare group means in real business data
✓ Build and evaluate linear and multiple regression models: cleaning, EDA, feature importance, step AIC selection, and model comparison with ANOVA
✓ Build logistic and multinomial logistic regression models with odds ratios, log odds, confusion matrices, and ROC curves for classification
✓ Apply Ridge and Lasso regularisation with cross-validation: multi-collinearity, encoding, non-zero coefficient selection, and prediction
✓ Apply K-means clustering with silhouette analysis and interactive 3D visualisation, and decision trees for supervised classification
✓ Build complete Tidymodels ML pipelines: time series features, recipes, model workflows, resampling, metrics, model stacking, and future prediction
COURSE CONTENT — 16 SECTIONS · 243 LECTURES · 27 HOURS · 79 DOWNLOADABLE RESOURCES
PART 1 — R FOUNDATIONS
SECTION 1
Introduction to data science and R
Set the context for the entire course. Understand the types of analytics (descriptive, predictive, prescriptive), the objectives and applications of data science, and the data science process. Learn why R is the language of choice for statistical analysis and business modelling. Install R and RStudio, set up your project, and install packages.
R
SECTION 2
R programming fundamentals
Build R from scratch with a business analytics mindset. Work with different data structures and types, perform arithmetic and write functions, create lists, import and explore data, select from dataframes, apply if-else logic, write functions with conditions, and build for loops. Includes a two-part graded assignment.
R
SECTION 3
Data manipulation with dplyr
The pandas of R. Apply dplyr to real invoice data: investigate patterns, calculate unique invoices, average basket values per country, average items per invoice, join datasets, handle datetime, and use pivot wider, pivot longer, separate, and paste. Includes the New York Airlines graded assignment — seven questions on a real airline dataset.
R Tidyverse
SECTION 4
Visualisation with ggplot2
Data visualisation is not decoration — it is analysis. Build line plots, scatter plots, bar plots, distribution plots, box plots, and histograms in ggplot2, applied to real business datasets. Graded two-part assignment.
R ggplot2
PART 2 — PROBABILITY & DISTRIBUTIONS
SECTION 5
Probability, distributions, associations, and Bayes
Probability is the language of uncertainty. Cover probability fundamentals, variance and standard deviation, overlapping events, discrete vs continuous probability, conditional probability, Binomial and Poisson distributions with for-loop automation, continuous distributions (Normal and Uniform), the Central Limit Theorem, relative risk, associations, correlation matrices, and Bayes’ theorem — all introduced through business questions. Includes distribution shapes, chi-square tests in Excel, bike demand assignment, and distributions in R.
R Excel
PART 3 — SIMULATION
SECTION 6
Monte Carlo business simulation
Stop estimating with a single number. Build Monte Carlo simulations in R: restaurant revenue example, modelling customer numbers, calculating expected revenue, and simulation assignment. A tool most business analysts have never used but every strategic planner needs.
R
SECTION 7
Queuing theory and waiting line simulation
The second simulation module — and the one most statistics courses omit entirely. Apply queuing theory to real service operations: waiting line fundamentals, worked examples in Excel and R, simulating 400 waiting line scenarios, call centre modelling, defining the optimal number of service channels (K), capacity constraints, sequential single-system service, multiple parallel services, and full multiple-service simulation in R. Includes graded assignment.
R Excel
PART 4 — STATISTICAL INFERENCE
SECTION 8
Hypothesis testing, Bayesian inference, and ANOVA
The core of applied statistics. Build the full hypothesis testing toolkit: sampling, t-tests (one and two sample), non-normality handling, chi-square test for independence, Fisher’s exact test, UK drivers case study, tests for association, hypothesis test for binomial distributions, Bayesian inference, posterior estimation, odds and odds ratio. Then apply ANOVA: one-way, two-way, Tukey HSD post-hoc tests, and interpretation. Quiz on regression and ANOVA.
R
PART 5 — REGRESSION MODELS
SECTION 9
Linear and multiple regression
Model relationships between business variables. Introduce linear regression in Excel and R. Clean data and perform EDA on a housing dataset. Build one-variable and multiple regression models. Apply model interaction terms. Compare models with ANOVA. Perform further data analysis, regress all variables, measure feature importance, and apply step AIC for model selection.
R Excel
SECTION 10
Logistic and multinomial logistic regression
When the outcome is binary or multi-class. Build logistic regression from first principles: city vs price per square foot, predicting individual observations, odds and probability, fitting all variables, understanding multiple predictors, testing categorical variables, comparing three models, log odds of categorical variables, and log odds interpretation. Extend to multinomial logistic regression, prediction, testing socioeconomic status, and model improvement. Quiz included.
R
SECTION 11
Ridge and Lasso regularisation
Standard regression fails on messy, multi-collinear real data. Apply regularised regression: understand the loss function, detect multi-collinearity, split data, encode categorical variables, train and cross-validate Ridge regression, visualise Ridge coefficients, apply Lasso with minimum squared error, cross-validate Lasso, build model matrix for logistic Lasso, identify non-zero coefficients, and interpret Lasso feature selection.
R
PART 6 — MACHINE LEARNING
SECTION 12
Unsupervised and supervised machine learning in R
Build the machine learning toolkit from first principles. Decision tree demo and theory. K-means clustering with elbow method (total sum of squares), silhouette analysis, and interactive 3D scatter plot visualisation. Forecasting with ML overview. Supervised decision tree learning and model comparison. Graded assignment.
R
SECTION 13
Classification models: logistic regression, decision trees, and model evaluation
Apply supervised learning to a real classification dataset. Explore and orient to the data. Build a correlation matrix. Split into training and testing sets, control the fitting process, apply logistic regression classification, extract probabilities, build and read a confusion matrix, plot and interpret an ROC curve, build a decision tree classifier, compare all models, and draw conclusions. Graded assignment.
R
SECTION 14
Tidymodels: the complete ML workflow
The capstone section. Apply Tidymodels to a full time series machine learning problem. Convert data to tsibble, generate time series features, handle missing values by level, split and log-transform the data, build recipes for feature engineering, define model workflows, resample with cross-validation, collect and compare accuracy metrics, stack models for improved performance, predict and visualise the future. This section is the professional-grade ML endpoint of the entire statistical journey.
R Tidymodels Tidyverse
THIS COURSE IS NOT FOR YOU IF...
✗ You want a Python statistics course — this course uses R throughout; separate Python analytics courses are available in the instructor’s catalogue
✗ You are looking for a pure data engineering or SQL course — this course focuses on statistical analysis and machine learning, not data infrastructure or pipelines
✗ You need deep learning or neural networks — this course covers traditional machine learning with Tidymodels; deep learning is a separate specialisation
✗ You want a maths-heavy academic statistics course — this course prioritises business intuition and practical R application over formal mathematical proofs
REQUIREMENTS
● No R experience required — Section 1 teaches R from absolute scratch including installation, syntax, data structures, and data manipulation.
● No prior statistics or probability knowledge required — every concept is introduced from first principles before any R code is written.
● Basic numeracy and comfort working with data — spreadsheets and simple formulas — is helpful but not essential.
● A computer capable of running R and RStudio (both free) — full installation walkthroughs are provided at the start of the course.
● Tidyverse and Tidymodels are free, open-source R packages — installation is guided within the relevant sections.
YOUR INSTRUCTOR
Haytham Omar, Ph.D.
Supply Chain & Business Intelligence Consultant · Developer · Trainer — UAE & France · Co-Founder, Keip
Haytham holds a Ph.D. in Supply Chain and Forecasting from the University of Bordeaux and a Master of Science in Global Supply Chain Management from Bordeaux École de Management. His doctoral research was in forecasting and statistical modelling.
He is an active consultant who works with retailers and supply chain organisations including Sephora France and Sharaf Group Dubai, and is co-founder of Keip , a SaaS platform for retail and supply chain management. He has trained over 70,000 professionals across more than 70 workshops in the UAE, with additional clients including Aster Group, DNO, PWC Training Academy Dubai, Qarar, and the Higher College of Technology.
He's also the creator of the Inventorize package for R and Python, used by over 90,000 supply chain professionals worldwide. This course is the statistical foundation underneath everything else he teaches — forecasting, inventory, retail analytics, supply chain design. It's the one to start with.
A complete, applied foundation in statistics and machine learning with R.
16 sections · 27 hours · 243 lectures · R from zero · Probability to ML · Ph.D. instructor