
Explore decision tree models in R, from simple, interpretable trees to advanced bagging, random forest, and boosting, with hands-on coding to solve business problems.
Install R from the official R-project site and set up RStudio to run scripts, using the script window and output window for Windows users, with a quick statistics crash course.
Celebrate reaching this milestone in the decision trees, random forests, bagging, and XGBoost course on R Studio, and access resources and your certificate to continue.
Master the basics of R and R Studio: run code with control enter, use comments, create variables with <-, perform vector operations, and manage the workspace with ls and rm.
learn how to install, load, and manage packages in R, including using library and require, installing from CRAN repositories, and scripting installations for reproducible analysis.
Add data in R by building datasets, or entering manually, or importing from CSP file. Explore the iris dataset with help and str, then load it with data(Iris).
Learn to input data manually in R by assigning values, using concatenation, and generating sequences (multiples of five from five to fifty) with the sequence function.
Import tab-delimited product data and comma-delimited customer data into the workspace, create data frames with headers, and inspect structure (1862 observations, four variables; 793 observations, nine variables).
Create a frequency distribution of regions from customer data and visualize it with a bar plot in R, adjusting color, orientation, borders, and labels, then export.
Learn to create histograms in R to visualize age distributions by binning into categories with breaks, display frequencies, customize color and labels, and export the chart.
Explore how machine learning uses past data to optimize performance, distinguish supervised and unsupervised learning, and apply classification and regression to real-world problems.
Learn the seven-step process to build a machine learning model—from problem formulation and data preparation to train-test split, model training, validation, and deployment for prediction and monitoring.
Explore the basics of decision trees, including root and leaf nodes, splitting to form regions, and regression and classification trees, illustrated with study hours and scores.
Understand how a regression tree partitions data into regions and predicts each region's mean. See how greedy binary splitting selects variables and splits to minimize the sum of squared errors.
Control tree growth by setting stopping criteria such as minimum observations to split, minimum observations at leaf nodes, and maximum depth, to prevent overfitting.
Explore a simulated movie dataset with 18 columns, where 17 predictors estimate the collection, the dependent variable, using a regression tree on budget, marketing, and genre.
Learn how to import a data set into R, inspect headers and variables, perform mean imputation for missing values, and prepare the data for training a decision tree model.
Learn how to split data into training and testing sets in R using an 80/20 split, set seed for reproducibility, and evaluate model performance on unseen data.
Build a regression tree in r using rpart and rpart.plot packages to train on a movie dataset, predict box office on the test set, and evaluate with mean squared error.
Prune large decision trees to balance interpretability and performance with cost complexity pruning using an alpha parameter that minimizes RSS plus terminal nodes, selected by cross-validation.
Learn how to prune a regression tree in R using the rpart package, selecting the cp value via cross-validated error to create a simpler, more accurate model.
Analyze classification trees that assign the most frequent class in each region, and compare splitting criteria such as classification error rate, Gini index, and cross entropy.
Use a 506-movie dataset to build a classification model predicting Oscar wins from its variables. Split the data into training and testing sets to train and evaluate performance.
Build a classification decision tree in R using regression template, impute missing values, split data into train and test, fit an rpart model with classification, plot tree, and evaluate accuracy.
Decision trees are easy to explain, graphically representable, and handle qualitative predictors without dummy variables. Yet a single tree may have lower accuracy, but ensembles can significantly improve performance.
Explore ensemble methods like bagging, random forest, and boosting to reduce variance in decision-tree predictions. See how bootstrapping and averaging multiple trees improve regression and classification accuracy.
Learn bagging in R with the randomForest package, using bootstrap samples and all predictors, compare its MSE to pruned trees, and understand the trade-off between prediction accuracy and interpretability.
Explore how random forest improves over bagging by reducing correlated tree results through random predictor subsets, and apply the M rule of thumb for variable selection.
Build a random forest model in R with the randomForest package, using a formula and predictors from the green data, and tune mtry for improved mse versus bagging.
Explore boosting techniques in ensemble learning, including gradient boosting, AdaBoost, and XGBoost, using sequential trees, residuals, shrinkage, depth, and regularization to improve performance.
Install and load gbm package, use gradient boosting in R to tune n.trees, interaction.depth, and shrinkage, predict on test data, and compare mean squared error with bagging and random forest.
Learn to implement ada boosting for classification in R with the adabag package, train boosted models, and evaluate with a confusion matrix, tuning mfinal to improve accuracy.
Explore xgboosting in R by preparing data in ab matrix format, converting categorical variables to dummy variables, and training a multi-class classifier with tunable learning rate, max depth, and iterations.
Identify the business context and key variables through primary and secondary research, then gather data to model factors like cart abandonment along the customer journey.
Identify data needed, request internal and external data, perform data receipt quality check, and study cart abandonment by marketing channels and the three buying steps: cart review, address entry, payment.
Identify the price as the dependent variable and uncover independent factors, standardize variable names with underscores, merge sources, and build a data dictionary with primary keys and definitions.
Import the dataset from a csv file into RStudio with read.csv(header=TRUE), creating data frame B, then use str to show 506 observations and 19 variables.
Explore univariate analysis by examining descriptive statistics for each variable, including mean, median, mode, range, quartiles, and standard deviations, while identifying outliers and missing values using the extended data dictionary.
Perform exploratory data analysis in r by examining the data dictionary, distributions, histograms, and scatter plots to identify outliers, missing values, and categorical variables that affect price and crime rate.
Identify and treat outliers using box plots, scatter plots, and histograms; apply imputation methods such as capping at 99th percentile, lower limits, and sigma-based replacement to preserve model accuracy.
Apply capping to outliers in hard rooms and rainfall by setting upper bounds at three times 99th percentile and lower bounds at 0.3 times first quartile, improving mean–median alignment.
Learn to handle missing values by imputation with mean, median, or mode, or zero when sensible, and apply segment means for groups, guided by business knowledge.
Impute missing values in R by replacing with the mean, handling NA with na.rm, and identifying NA entries using is.na and which, followed by assignment to update the dataset.
Explore seasonality in data, such as summer sales and tourism fluctuations, and learn to normalize by multiplying observations with a correction factor using M = mean year over mean month.
Analyze two-variable relationships with scatter plots and correlation matrices, decide to keep, discard, or transform variables, and apply transformations to achieve linearity for regression.
Transform the crime rate with log of one plus crime rate to linearize price, shown by scatter plots. Create an average distance variable from four distances and remove unused columns.
Identify and remove non-informative variables, including single-value and missing-value issues, iteratively refine features with business and regulatory knowledge for decision trees, random forests, and XGBoost.
Create dummy variables to convert categorical data into numeric inputs for regression models by coding each category as 0 or 1, using n minus one variables for n categories.
Create dummy variables in R using the dummy's package to convert the airport and water body categories into numeric columns, drop redundant columns, and prepare a numerical dataset for regression.
Learn to interpret positive, negative, and zero correlations with scatter plots and correlation coefficients, distinguish correlation from causation, and use a correlation matrix to manage multicollinearity in modeling.
Compute and round a correlation matrix in R to relate variables to price. Identify high correlations and remove one variable, such as deleting box and keeping air quality.
Celebrate reaching the final milestone by completing all lectures to obtain your certificate of completion and download it from your email or course site; share a review to motivate others.
You're looking for a complete Decision tree course that teaches you everything you need to create a Decision tree/ Random Forest/ XGBoost model in R, right?
You've found the right Decision Trees and tree based advanced techniques course!
After completing this course you will be able to:
Identify the business problem which can be solved using Decision tree/ Random Forest/ XGBoost of Machine Learning.
Have a clear understanding of Advanced Decision tree based algorithms such as Random Forest, Bagging, AdaBoost and XGBoost
Create a tree based (Decision tree, Random Forest, Bagging, AdaBoost and XGBoost) model in R and analyze its result.
Confidently practice, discuss and understand Machine Learning concepts
How this course will help you?
A Verifiable Certificate of Completion is presented to all students who undertake this Machine learning advanced course.
If you are a business manager or an executive, or a student who wants to learn and apply machine learning in Real world problems of business, this course will give you a solid base for that by teaching you some of the advanced technique of machine learning, which are Decision tree, Random Forest, Bagging, AdaBoost and XGBoost.
Why should you choose this course?
This course covers all the steps that one should take while solving a business problem through Decision tree.
Most courses only focus on teaching how to run the analysis but we believe that what happens before and after running analysis is even more important i.e. before running analysis it is very important that you have the right data and do some pre-processing on it. And after running analysis, you should be able to judge how good your model is and interpret the results to actually be able to help your business.
What makes us qualified to teach you?
The course is taught by Abhishek and Pukhraj. As managers in Global Analytics Consulting firm, we have helped businesses solve their business problem using machine learning techniques and we have used our experience to include the practical aspects of data analysis in this course
We are also the creators of some of the most popular online courses - with over 150,000 enrollments and thousands of 5-star reviews like these ones:
This is very good, i love the fact the all explanation given can be understood by a layman - Joshua
Thank you Author for this wonderful course. You are the best and this course is worth any price. - Daisy
Our Promise
Teaching our students is our job and we are committed to it. If you have any questions about the course content, practice sheet or anything related to any topic, you can always post a question in the course or send us a direct message.
Download Practice files, take Quizzes, and complete Assignments
With each lecture, there are class notes attached for you to follow along. You can also take quizzes to check your understanding of concepts. Each section contains a practice assignment for you to practically implement your learning.
What is covered in this course?
This course teaches you all the steps of creating a decision tree based model, which are some of the most popular Machine Learning model, to solve business problems.
Below are the course contents of this course :
Section 1 - Introduction to Machine Learning
In this section we will learn - What does Machine Learning mean. What are the meanings or different terms associated with machine learning? You will see some examples so that you understand what machine learning actually is. It also contains steps involved in building a machine learning model, not just linear models, any machine learning model.
Section 2 - R basic
This section will help you set up the R and R studio on your system and it'll teach you how to perform some basic operations in R.
Section 3 - Pre-processing and Simple Decision trees
In this section you will learn what actions you need to take to prepare it for the analysis, these steps are very important for creating a meaningful.
In this section, we will start with the basic theory of decision tree then we cover data pre-processing topics like missing value imputation, variable transformation and Test-Train split. In the end we will create and plot a simple Regression decision tree.
Section 4 - Simple Classification Tree
This section we will expand our knowledge of regression Decision tree to classification trees, we will also learn how to create a classification tree in Python
Section 5, 6 and 7 - Ensemble technique
In this section we will start our discussion about advanced ensemble techniques for Decision trees. Ensembles techniques are used to improve the stability and accuracy of machine learning algorithms. In this course we will discuss Random Forest, Bagging, Gradient Boosting, AdaBoost and XGBoost.
By the end of this course, your confidence in creating a Decision tree model in R will soar. You'll have a thorough understanding of how to use Decision tree modelling to create predictive models and solve business problems.
Go ahead and click the enroll button, and I'll see you in lesson 1!
Cheers
Start-Tech Academy
------------
Below is a list of popular FAQs of students who want to start their Machine learning journey-
What is Machine Learning?
Machine Learning is a field of computer science which gives the computer the ability to learn without being explicitly programmed. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.
What are the steps I should follow to be able to build a Machine Learning model?
You can divide your learning process into 3 parts:
Statistics and Probability - Implementing Machine learning techniques require basic knowledge of Statistics and probability concepts. Second section of the course covers this part.
Understanding of Machine learning - Fourth section helps you understand the terms and concepts associated with Machine learning and gives you the steps to be followed to build a machine learning model
Programming Experience - A significant part of machine learning is programming. Python and R clearly stand out to be the leaders in the recent days. Third section will help you set up the Python environment and teach you some basic operations. In later sections there is a video on how to implement each concept taught in theory lecture in Python
Understanding of models - Fifth and sixth section cover Classification models and with each theory lecture comes a corresponding practical lecture where we actually run each query with you.
Why use R for Machine Learning?
Understanding R is one of the valuable skills needed for a career in Machine Learning. Below are some reasons why you should learn Machine learning in R
1. It’s a popular language for Machine Learning at top tech firms. Almost all of them hire data scientists who use R. Facebook, for example, uses R to do behavioral analysis with user post data. Google uses R to assess ad effectiveness and make economic forecasts. And by the way, it’s not just tech firms: R is in use at analysis and consulting firms, banks and other financial institutions, academic institutions and research labs, and pretty much everywhere else data needs analyzing and visualizing.
2. Learning the data science basics is arguably easier in R. R has a big advantage: it was designed specifically with data manipulation and analysis in mind.
3. Amazing packages that make your life easier. Because R was designed with statistical analysis in mind, it has a fantastic ecosystem of packages and other resources that are great for data science.
4. Robust, growing community of data scientists and statisticians. As the field of data science has exploded, R has exploded with it, becoming one of the fastest-growing languages in the world (as measured by StackOverflow). That means it’s easy to find answers to questions and community guidance as you work your way through projects in R.
5. Put another tool in your toolkit. No one language is going to be the right tool for every job. Adding R to your repertoire will make some projects easier – and of course, it’ll also make you a more flexible and marketable employee when you’re looking for jobs in data science.
What is the difference between Data Mining, Machine Learning, and Deep Learning?
Put simply, machine learning and data mining use the same algorithms and techniques as data mining, except the kinds of predictions vary. While data mining discovers previously unknown patterns and knowledge, machine learning reproduces known patterns and knowledge—and further automatically applies that information to data, decision-making, and actions.
Deep learning, on the other hand, uses advanced computing power and special types of neural networks and applies them to large amounts of data to learn, understand, and identify complicated patterns. Automatic language translation and medical diagnoses are examples of deep learning.