
Explore the theoretical foundations of decision trees, including bagging and boosting, through a hands-on classification exercise using a hair attacks dataset and the recursive partition algorithm.
Learn how to build a decision tree by filtering predictors like chest pain and exercise induced angina, build contingency tables for heart attack predictions, and evaluate impurity to select root.
Build a decision tree by evaluating Gini impurity for chest pain and exercise induced angina, selecting blocked arteries as the left node and comparing weighted impurities for prediction.
Set the working directory to the CSV file, import the data into R with a semicolon delimiter, and convert the first column to row names via a data frame operation.
Learn how to change the data type of variables in R, converting numeric categories to factors so decision trees treat them as categories rather than numbers, using as.factor.
Split data into training and testing sets to evaluate predictive models, using a random seed and an 80 percent training and 20 percent testing split with the caret package.
Learn to estimate and interpret the area under the ROC curve (AUC) for binary classification in R, assess model discrimination, and compare methods across thresholds.
Explore how a random forest improves model accuracy by combining many decision trees through bagging. Learn how to create a random forest, an ensemble algorithm built from multiple trees.
Build a classification random forest in R with the randomForest package, evaluating importance and proximity, and interpreting variable importance via mean decrease accuracy and Gini impurity.
Gradient boosting turns weak learners like decision trees into a strong model through sequential weighted training and compares it with random forests.
Artificial neural networks form the basis for deep learning, using feedforward architectures with input, hidden, and output layers to learn non-linear mappings via activation functions.
Explore recurrent neural networks and how looping the hidden state captures sequential information and word dependencies, with parameter sharing across time steps.
This session introduces convolutional neural networks, their automatic filter learning, spatial feature extraction, and parameter sharing, with applications in image and video processing and a teaching case on credit cards.
Identify missing data in the dataset using Excel observations. Convert start date from character to date in R using as.Date with day/month/year format enabling missing data handling for neural network.
Compute the correlation matrix to identify multicollinearity and use qualitative analysis to decide which variables to keep, with examples like time of employment, down payment fraction, and region or branch.
Set a seed and install the caret package, then split the dataset with createDataPartition for an 80/20 training/testing split to train and evaluate the model.
Complete your neural networks for business analytics foundations, then apply your skills to create predictive models with machine learning tools and add value to your professional career.
Do you want to build predictive models with machine learning—and actually understand what’s happening under the hood?
Welcome to “Decision Trees, Random Forests, and Gradient Boosting in R.” This is a hands-on, learning-by-doing course where you’ll work with real datasets and build models step by step, using the most important tree-based methods in applied machine learning.
I’m Carlos Martínez (Ph.D., University of St. Gallen). I designed this course to be practical, structured, and rigorous, so you can go beyond “running code” and gain the judgment you need to build, tune, and evaluate models properly.
What you’ll learn
By the end of the course, you’ll be able to:
Understand how recursive partitioning works (the logic behind decision trees)
Build trees in R using rpart and ctree (conditional inference trees)
Control complexity, reduce overfitting, and improve generalization using:
complexity parameter (cp)
pruning strategies
Apply and compare two high-performance ensemble methods:
Random Forests
Gradient Boosting
Evaluate predictive performance using ROC curves and AUC, so you can compare models with a robust metric
What’s included
Video lessons + structured explanations
Real datasets and all course code (R scripts)
Practice assignments + detailed solutions, so you can self-check and build confidence
Who this course is for
University students and professionals who want practical machine learning skills
Analysts working in business intelligence, analytics, finance, operations, or data roles
Anyone who wants to learn tree-based modeling properly, from fundamentals to evaluation
Prerequisites
Basic comfort with spreadsheets
Basic familiarity with R (you don’t need to be advanced)
What students say
Stefan L.: “Even though the topic was new to me, the course is easy to understand and the RStudio exercises work as explained.”
Frank B.: “Very beneficial… well organized and easy to understand. It gave me new ideas to assess model validity.”
Steven H.: “A very good review before my test tomorrow.”
Al M.: “Excellent.”
If you want a clear, practical path to mastering decision trees and modern ensembles in R—and learning how to evaluate them correctly—this course is for you.
Enroll today, and I’ll see you in the first lesson.