Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Statistical Thinking & Data science with R.
Rating: 4.3 out of 5(254 ratings)
18,327 students

Statistical Thinking & Data science with R.

Master probability, distributions, hypothesis testing, regression & Tidymodels in R. Business-focused. Beginners welcome
Last updated 6/2026
English

What you'll learn

  • Learn R from scratch: installation, syntax, data structures, data manipulation, and visualisation with ggplot2
  • Calculate and interpret measures of spread and centrality — and detect and remove outliers from real datasets
  • Understand probability fundamentals and apply them to real business decision problems
  • ✓ Work with continuous and discrete probability distributions and fit the right distribution to your dat
  • Build business simulations in R to model revenue, risk, and uncertainty for planning decisions
  • Conduct hypothesis testing and confidence interval analysis to validate business decisions with data
  • Apply ANOVA to compare means across groups and regression to model relationships between variables
  • Build and interpret logistic regression models — including log odds, odds ratios, and conversion probabilities
  • Apply regularised regression (Ridge, Lasso) for feature selection and model performance improvement
  • Use Bayesian analysis to estimate distributions and quantify uncertainty in business parameters
  • Measure relative risk and odds ratios for business and clinical decision problems
  • Build and evaluate machine learning models with Tidymodels — the standard ML framework in R

Course content

16 sections243 lectures26h 58m total length
  • Introduction3:06

    Explore statistical thinking for data science with R, from data cleaning and visualization to probability, hypothesis testing, simulations, and regression models, including regularized approaches for business analytics.

  • Get the Best out of this course2:11
  • Curriculum5:15
  • Types of analytics6:20

    Explore descriptive, predictive, and prescriptive analytics as three interconnected approaches, using descriptive visuals, predictive models like linear and logistic regression, and prescriptive optimization to improve outcomes.

  • Objectives of data science4:01

    Learn how data science transforms raw data into information, replacing subjectivity with objectivity, speeding analytics, and handling structured and unstructured data for machine learning and analytics.

  • Applications of data science4:55

    Explore data science applications across marketing, supply chain, and finance, including rfm analysis, churn prediction, forecasting, stock replenishment, and optimization powered by ai, computer vision, and natural language processing.

  • The data science Process2:18

    Identify problems and goals, acquire and clean structured and unstructured data, integrate sources, then analyze, model, and interpret results, deploying automated, real-time data science workflows.

  • Why R5:38

    Explore why the lecture advocates using R for data science, highlighting its rich statistical packages, beautiful visualizations, large community, and ease for beginners, while comparing Python's speed and versatility.

Requirements

  • No R experience required — Module 1 teaches R from absolute scratch including installation, syntax, and data manipulation.
  • No prior statistics or probability knowledge required — every concept is introduced from first principles before any R code is written.
  • Basic numeracy and comfort working with data (spreadsheets, simple formulas) is helpful but not essential.
  • A computer capable of running R and RStudio (both free) — installation walkthroughs are provided at the start of the course.
  • Tidyverse and Tidymodels are free, open-source R packages — installation is guided within the relevant modules.

Description


★ Taught by a Ph.D. in Supply Chain and Forecasting from the University of Bordeaux

Haytham’s doctoral research was in forecasting and statistical modelling. He also applies these methods professionally as a consultant to retail and supply chain organisations across the UAE and France, and that applied experience informs how the material in this course is sequenced and explained.


★ Each method is introduced through an applied business question before the underlying theory

Whether a campaign had a genuine effect, whether a change in sales is meaningful or within normal variation, what the probability of a given outcome is — these are the framing questions for hypothesis testing, ANOVA, regression, and Bayesian inference. The statistical reasoning is taught in service of these questions rather than as an abstract sequence of topics.


★ 27 hours across 16 sections — broader in scope than most introductory R statistics courses

The curriculum includes topics that are frequently omitted from shorter courses: queuing theory, multinomial logistic regression, K-means clustering with silhouette analysis, model stacking with Tidymodels, step AIC for model selection, and Fisher’s exact test. Each is treated with the same care given to the more commonly taught material.


★ Simulation coverage extends beyond Monte Carlo to queuing theory

Most business statistics courses that cover simulation stop at Monte Carlo revenue estimation. This course also covers queuing theory: waiting line analysis, determining the appropriate number of service channels, capacity constraints, and multi-server systems such as call centres. These methods are common in operations research but are not often taught alongside introductory statistics.


★ Concludes with a complete machine learning pipeline in Tidymodels

The course progresses from K-means clustering and decision trees through logistic classification with confusion matrices and ROC curves, to a full Tidymodels workflow including cross-validation, hyperparameter tuning, and model stacking. The aim is a working, reproducible ML pipeline rather than an isolated example.





COURSE DESCRIPTION

This course covers the statistical methods that support evidence-based business decisions. Questions such as whether an observed difference is statistically meaningful, or what the probability of a given outcome is, recur across marketing, operations, and finance. The course addresses these questions directly, using R as the analytical tool throughout.

No prior experience with R or statistics is assumed. The course progresses from R fundamentals and probability theory through distributions, Monte Carlo and queuing simulation, hypothesis testing, ANOVA, linear and logistic regression, Ridge and Lasso regularisation, Bayesian inference, K-means clustering, and a complete Tidymodels machine learning pipeline. Across 16 sections and 243 lectures, each method is introduced alongside the type of business question it is typically used to answer.

The instructor holds a Ph.D. in forecasting and statistical modelling from the University of Bordeaux and applies these methods professionally as a consultant to retail and supply chain organisations across the UAE and France. Over 18,300 students have enrolled in the course to date.



WHAT MAKES THIS COURSE DIFFERENT


[ BIZ ]

Applied framing throughout

Each statistical method is paired with the type of business question it is commonly used to answer.


[ A→Z ]

No prerequisites

R is installed and introduced in the first section. No prior programming or statistics background is required.


[ ML ]

Extends to applied machine learning

The course concludes with cross-validated, tuned, and stacked models built with Tidymodels.



TOOLS AND TECHNOLOGIES COVERED

R / RStudio | Tidyverse | Tidymodels | ggplot2 | dplyr / tidyr | Excel



WHAT YOU WILL LEARN

✓ Learn R from scratch: installation, syntax, data structures, manipulation with dplyr and tidyr, and visualisation with ggplot2

✓ Calculate measures of spread and centrality, detect outliers statistically, and investigate data with dplyr on real invoice and airline datasets

✓ Master probability fundamentals: conditional probability, Bayes’ theorem, relative risk, odds and odds ratio, and correlation matrices

✓ Work with discrete and continuous distributions: Binomial, Poisson, Normal, Uniform — and apply the Central Limit Theorem

✓ Build Monte Carlo revenue simulations and queuing theory models for waiting lines, call centres, and multi-server service systems in R

✓ Conduct the full hypothesis testing toolkit: t-tests (one and two sample), chi-square, Fisher’s exact test, tests for association, and Bayesian inference

✓ Apply one-way and two-way ANOVA with Tukey HSD post-hoc tests to compare group means in real business data

✓ Build and evaluate linear and multiple regression models: cleaning, EDA, feature importance, step AIC selection, and model comparison with ANOVA

✓ Build logistic and multinomial logistic regression models with odds ratios, log odds, confusion matrices, and ROC curves for classification

✓ Apply Ridge and Lasso regularisation with cross-validation: multi-collinearity, encoding, non-zero coefficient selection, and prediction

✓ Apply K-means clustering with silhouette analysis and interactive 3D visualisation, and decision trees for supervised classification

✓ Build complete Tidymodels ML pipelines: time series features, recipes, model workflows, resampling, metrics, model stacking, and future prediction



COURSE CONTENT — 16 SECTIONS · 243 LECTURES · 27 HOURS · 79 DOWNLOADABLE RESOURCES


PART 1 — R FOUNDATIONS

SECTION 1

Introduction to data science and R

Set the context for the entire course. Understand the types of analytics (descriptive, predictive, prescriptive), the objectives and applications of data science, and the data science process. Learn why R is the language of choice for statistical analysis and business modelling. Install R and RStudio, set up your project, and install packages.

R


SECTION 2

R programming fundamentals

Build R from scratch with a business analytics mindset. Work with different data structures and types, perform arithmetic and write functions, create lists, import and explore data, select from dataframes, apply if-else logic, write functions with conditions, and build for loops. Includes a two-part graded assignment.

R


SECTION 3

Data manipulation with dplyr

The pandas of R. Apply dplyr to real invoice data: investigate patterns, calculate unique invoices, average basket values per country, average items per invoice, join datasets, handle datetime, and use pivot wider, pivot longer, separate, and paste. Includes the New York Airlines graded assignment — seven questions on a real airline dataset.

R Tidyverse


SECTION 4

Visualisation with ggplot2

Data visualisation is not decoration — it is analysis. Build line plots, scatter plots, bar plots, distribution plots, box plots, and histograms in ggplot2, applied to real business datasets. Graded two-part assignment.

R ggplot2


PART 2 — PROBABILITY & DISTRIBUTIONS

SECTION 5

Probability, distributions, associations, and Bayes

Probability is the language of uncertainty. Cover probability fundamentals, variance and standard deviation, overlapping events, discrete vs continuous probability, conditional probability, Binomial and Poisson distributions with for-loop automation, continuous distributions (Normal and Uniform), the Central Limit Theorem, relative risk, associations, correlation matrices, and Bayes’ theorem — all introduced through business questions. Includes distribution shapes, chi-square tests in Excel, bike demand assignment, and distributions in R.

R Excel


PART 3 — SIMULATION

SECTION 6

Monte Carlo business simulation

Stop estimating with a single number. Build Monte Carlo simulations in R: restaurant revenue example, modelling customer numbers, calculating expected revenue, and simulation assignment. A tool most business analysts have never used but every strategic planner needs.

R


SECTION 7

Queuing theory and waiting line simulation

The second simulation module — and the one most statistics courses omit entirely. Apply queuing theory to real service operations: waiting line fundamentals, worked examples in Excel and R, simulating 400 waiting line scenarios, call centre modelling, defining the optimal number of service channels (K), capacity constraints, sequential single-system service, multiple parallel services, and full multiple-service simulation in R. Includes graded assignment.

R Excel


PART 4 — STATISTICAL INFERENCE

SECTION 8

Hypothesis testing, Bayesian inference, and ANOVA

The core of applied statistics. Build the full hypothesis testing toolkit: sampling, t-tests (one and two sample), non-normality handling, chi-square test for independence, Fisher’s exact test, UK drivers case study, tests for association, hypothesis test for binomial distributions, Bayesian inference, posterior estimation, odds and odds ratio. Then apply ANOVA: one-way, two-way, Tukey HSD post-hoc tests, and interpretation. Quiz on regression and ANOVA.

R


PART 5 — REGRESSION MODELS

SECTION 9

Linear and multiple regression

Model relationships between business variables. Introduce linear regression in Excel and R. Clean data and perform EDA on a housing dataset. Build one-variable and multiple regression models. Apply model interaction terms. Compare models with ANOVA. Perform further data analysis, regress all variables, measure feature importance, and apply step AIC for model selection.

R Excel


SECTION 10

Logistic and multinomial logistic regression

When the outcome is binary or multi-class. Build logistic regression from first principles: city vs price per square foot, predicting individual observations, odds and probability, fitting all variables, understanding multiple predictors, testing categorical variables, comparing three models, log odds of categorical variables, and log odds interpretation. Extend to multinomial logistic regression, prediction, testing socioeconomic status, and model improvement. Quiz included.

R


SECTION 11

Ridge and Lasso regularisation

Standard regression fails on messy, multi-collinear real data. Apply regularised regression: understand the loss function, detect multi-collinearity, split data, encode categorical variables, train and cross-validate Ridge regression, visualise Ridge coefficients, apply Lasso with minimum squared error, cross-validate Lasso, build model matrix for logistic Lasso, identify non-zero coefficients, and interpret Lasso feature selection.

R


PART 6 — MACHINE LEARNING

SECTION 12

Unsupervised and supervised machine learning in R

Build the machine learning toolkit from first principles. Decision tree demo and theory. K-means clustering with elbow method (total sum of squares), silhouette analysis, and interactive 3D scatter plot visualisation. Forecasting with ML overview. Supervised decision tree learning and model comparison. Graded assignment.

R


SECTION 13

Classification models: logistic regression, decision trees, and model evaluation

Apply supervised learning to a real classification dataset. Explore and orient to the data. Build a correlation matrix. Split into training and testing sets, control the fitting process, apply logistic regression classification, extract probabilities, build and read a confusion matrix, plot and interpret an ROC curve, build a decision tree classifier, compare all models, and draw conclusions. Graded assignment.

R


SECTION 14

Tidymodels: the complete ML workflow

The capstone section. Apply Tidymodels to a full time series machine learning problem. Convert data to tsibble, generate time series features, handle missing values by level, split and log-transform the data, build recipes for feature engineering, define model workflows, resample with cross-validation, collect and compare accuracy metrics, stack models for improved performance, predict and visualise the future. This section is the professional-grade ML endpoint of the entire statistical journey.

R Tidymodels Tidyverse



THIS COURSE IS NOT FOR YOU IF...

✗ You want a Python statistics course — this course uses R throughout; separate Python analytics courses are available in the instructor’s catalogue

✗ You are looking for a pure data engineering or SQL course — this course focuses on statistical analysis and machine learning, not data infrastructure or pipelines

✗ You need deep learning or neural networks — this course covers traditional machine learning with Tidymodels; deep learning is a separate specialisation

✗ You want a maths-heavy academic statistics course — this course prioritises business intuition and practical R application over formal mathematical proofs


REQUIREMENTS

● No R experience required — Section 1 teaches R from absolute scratch including installation, syntax, data structures, and data manipulation.

● No prior statistics or probability knowledge required — every concept is introduced from first principles before any R code is written.

● Basic numeracy and comfort working with data — spreadsheets and simple formulas — is helpful but not essential.

● A computer capable of running R and RStudio (both free) — full installation walkthroughs are provided at the start of the course.

● Tidyverse and Tidymodels are free, open-source R packages — installation is guided within the relevant sections.


YOUR INSTRUCTOR


Haytham Omar, Ph.D.

Supply Chain & Business Intelligence Consultant · Developer · Trainer — UAE & France · Co-Founder, Keip

Haytham holds a Ph.D. in Supply Chain and Forecasting from the University of Bordeaux and a Master of Science in Global Supply Chain Management from Bordeaux École de Management. His doctoral research was in forecasting and statistical modelling.

He is an active consultant who works with retailers and supply chain organisations including Sephora France and Sharaf Group Dubai, and is co-founder of Keip , a SaaS platform for retail and supply chain management. He has trained over 70,000 professionals across more than 70 workshops in the UAE, with additional clients including Aster Group, DNO, PWC Training Academy Dubai, Qarar, and the Higher College of Technology.

He's also the creator of the Inventorize package for R and Python, used by over 90,000 supply chain professionals worldwide. This course is the statistical foundation underneath everything else he teaches — forecasting, inventory, retail analytics, supply chain design. It's the one to start with.


A complete, applied foundation in statistics and machine learning with R.

16 sections · 27 hours · 243 lectures · R from zero · Probability to ML · Ph.D. instructor

Who this course is for:

  • Supply chain and operations professionals
  • Data science career changers
  • Marketing and commercial analysts
  • Business analysts and BI professionals
  • Finance and risk professionals
  • Students and early-career analysts