
Learn machine learning in R Studio by setting up R Studio, revising basic statistics, and building models such as linear regression, logistic regression, KNN, decision trees, and SVM.
Learn how to install R and R Studio on Windows, including downloading from the official site, selecting a mirror, installing R, and using R Studio’s four windows.
Celebrate reaching the milestone and maintain motivation to complete the course, while learning to rate, adjust playback and subtitles, use the Q&A and AI assistant, and download your certificate.
Learn to run R scripts in the script and console, using control + enter (windows) or command + return (mac), create variables with assignment, and perform element-wise vector operations.
Install packages in R from the repository, load with library or require, view installed with search, detach and remove with remove.packages, and use scripts for reproducible analyses.
Learn to input data in R using built-in data sets from the datasets package, manual entry, or import from csv or text files, and explore iris data set with str.
Assign values to variables manually, use concatenation, and generate sequences to input data, including the sequence function for multiples of 5 from 5 to 50 and scan for manual entry.
Learn to import data into RStudio from text and csv files, using read.table and read.csv with headers, tab and comma delimiters, and inspect with str.
Create a frequency distribution of customers by region. Visualize it with a bar plot in R, customizing order, orientation, color, borders, and labels, and export as image or PDF.
Learn how to create and customize histograms in R using hist, set breaks or widths to define age categories, adjust frequency vs density, color, titles, labels, and export the plot.
Identify the two main data types—qualitative and quantitative—and distinguish nominal, ordinal, discrete, and continuous data to determine appropriate analyses.
Explore descriptive statistics to summarize data with measures of center and dispersion. Use inferential statistics to predict outcomes with linear and logistic regression, LDA, and decision trees.
Describe data using frequency distributions for qualitative and quantitative data, convert to bar charts and histograms, and explore descriptive statistics, class width, grouped data, and normal distribution.
Explore the four measures of center: mean, median, mode, and mid-range, and learn their differences, including population versus sample mean and how to choose the right measure.
Explore measures of dispersion: range, standard deviation, and variance. View variance as the square of standard deviation, and note outliers affect range; include population vs sample with n-1.
Explore how machine learning uses past data to improve task performance, and distinguish supervised versus unsupervised learning, including classification, regression, and predictive models.
Formulate the business problem as a statistical task, identify dependent and independent variables, tidy and preprocess data, then split, train, validate, and deploy predictions.
Map the business context to identify key variables and gather relevant data through primary research with stakeholders and secondary studies on cart abandonment.
Identify data needs, request data from internal and external sources, and perform quality checks to tidy variables; apply this to cart abandonment with channel data, cart steps, value, and ratings.
Aggregate data from multiple sources into a single dataset of 19 variables and 506 observations, identifying price as the dependent variable and others as independent variables, with a data dictionary.
Import the house pricing dataset in R Studio from csv with header true, then view and examine its structure with str(df) showing 506 observations and 19 variables.
Explore univariate analysis with descriptive statistics like mean, median, mode, dispersion, and quartiles, using age examples, and use the extended data dictionary to spot outliers and missing values.
Perform univariate analysis in R using histograms, bar plots, and summaries (min, max, mean, quartiles) to assess distributions, skewness, outliers, and missing values for crime_rate, n_hot_rooms, and rainfall.
Identify outliers with box plots, scatter plots, and histograms; impute them via 99th percentile capping or lower-limit rules, exponential smoothing, or sigma methods.
Cap n_hot_rooms at three times the 99th percentile and floor rainfall at 0.3 times the first percentile using quantile. Replace outliers in df$n_hot_rooms and df$rainfall; observe mean and median closer.
Learn to handle missing values by removing sparse rows or imputing with harmless values, such as mean, median, or mode for a categorical variable, including segment or neighboring means.
Learn to impute missing values in R by computing the mean with na.rm=TRUE, locating NA positions with is.na, and replacing them in df$n_hos_beds to produce a complete variable.
Learn how seasonality in time-based data creates recurring patterns and remove it with a correction factor by multiplying monthly values by the yearly mean divided by the monthly mean.
Explore bivariate analysis using scatterplots and correlation matrices to assess relationships and apply variable transformations such as log, exponential, and polynomial forms to improve linearity and model fit.
Apply a log transformation to crime rate (add one to avoid zero) to linearize its relation to price. Verify with a scatter plot and average distance from dist1–dist4.
Identify and remove non usable variables by evaluating unique values, missing data, and regulatory constraints, using imputation and sensitivity considerations, then validate with bi variate analysis to refine models.
Learn to convert categorical variables into dummy variables for regression analysis, using the n-1 rule and 0/1 encoding to handle nominal data such as airport, waterbody, and subjects.
Learn how to convert categorical values to numerical by creating dummy variables in R, using airport and waterbody examples, and drop redundant columns to preserve information.
Explore correlation matrix and correlation coefficient to see how two variables move together, distinguish correlation from causation, and identify multicollinearity to decide which variables to keep.
Explore reading and interpreting a correlation matrix in R, identify variables with high or low correlation to the price, and detect multicollinearity to select variables.
Explore linear regression as a foundational supervised learning method, learn the least squares fit, and use it to predict house prices while estimating each variable's effect.
Explore simple linear regression and the least squares method to estimate beta0 cap and beta1 cap, predict house prices from room count, and minimize residual sum of squares.
Assess how sample regression coefficients estimate population effects for house price and rooms, using standard errors, confidence intervals, and hypothesis tests to confirm a nonzero relationship.
Learn to assess model accuracy using residual standard error and R-squared, interpret R-squared and adjusted R-squared, explain TSS and RSS, and judge fit for regression models.
Run a linear regression in R with lm to predict price from room_num. Beta zero and beta one show price rises 9.09 units per room, with 48% of variance explained.
Extend linear regression to multiple predictors, interpreting each beta as the effect of a unit change in a predictor on price while holding others fixed, with RSS, R-squared, and p-values.
Explore the f statistic in multiple regression to test if the model's predictors collectively relate to the response, and interpret t-values, p-values, and the 5% significance threshold.
Learn how to interpret categorical variables in linear models by converting them to dummy variables, reading beta coefficients, and assessing p-values to judge their impact on house prices.
Learn to run multiple linear regression in r with lm on all predictors for price, interpret coefficients, p-values, and r-squared, and derive business insights about air quality and room number.
Split data into training and test sets and use test mean squared error to compare models, selecting the one with the lowest test error for future predictions.
Explains the bias-variance trade-off, detailing how variance grows with model flexibility while bias shrinks, how overfitting raises test error, and how to minimize their sum.
Split data into training and test sets with 0.8 ratio using caTools and set.seed; train a linear model and compare mean squared errors on training and testing data.
Explore linear models beyond ordinary least squares, using RSS, shrinkage and regularization to reduce variance, and subset selection to boost accuracy and interpretability.
Explore best subset, forward stepwise, and backward stepwise selection, and learn how adjusted R-squared guides model choice with a three-predictor house price example.
Master best subset selection in R with the leaps library, compare models by adjusted r-squared, and use forward or backward stepwise methods to identify the optimal variable subset.
Learn how ridge regression and the lasso shrink coefficients toward zero, balance bias and variance with a tuning parameter lambda, and perform variable selection for clearer models.
Learn to train a model with ridge and lasso regression in R using glmnet, select the optimal lambda via cross-validation, and compare R-squared performance.
Explore three classification models—logistic regression, k-nearest neighbors, and linear discriminant analysis—and learn when to apply them to binary, categorical prediction problems using a preprocessed property dataset.
Import data into R by reading a csv into DF with header = true, change back slashes to forward slashes, view contents and inspect structure for variables and data types.
Define two business questions—prediction and inferential analysis—and assess classifier performance to predict whether a house will sell within three months and gauge the impact of each variable.
Explore why linear regression can't be used for classification, including issues with multi-level responses, predicted values not representing probabilities, and sensitivity to outliers, and preview logistic regression as the remedy.
Explore logistic regression with the sigmoid function to model the probability of credit default, keeping values between zero and one, and estimate coefficients via maximum likelihood.
Learn logistic regression by building a simple glm model with one predictor, price, to predict whether a house sells within three months, using binomial family and interpreting beta0 and beta1.
Run a logistic model with price as predictor; interpret beta0 and beta1, then use standard error, z value, and p-value to test price impact when p-value is below threshold.
Learn to extend logistic regression from a single predictor to multiple predictors, estimate betas by maximum likelihood, and use probability values with a 0.5 threshold to classify multi-class outcomes.
Run a logistic regression with multiple predictors in R using dot notation to include all variables, interpret beta coefficients and p-values to identify significant predictors of sold.
Learn to read a confusion matrix to evaluate predictions, recognize correct versus incorrect predictions, differentiate type 1 (false positive) and type 2 (false negative) errors, and consider threshold-based costs.
Explore how to interpret a confusion matrix and key performance metrics—precision, sensitivity, specificity, and false positive rate—through the ROC curve and AUC for model evaluation.
Explore predicting house sale probabilities with a glm in R using predict(type='response'), classify with a 0.5 threshold, and evaluate with a confusion matrix.
Explore linear discriminant analysis, a multiclass classifier using Bayes-based conditional probabilities and normal distribution to assign the most likely class.
Learn to perform linear discriminant analysis in R using the MASS package, fit with lda, and interpret predicted classes, posterior probabilities, and confusion matrices.
You're looking for a complete Machine Learning course that can help you launch a flourishing career in the field of Data Science, Machine Learning, R and Predictive Modeling, right?
You've found the right Machine Learning course!
After completing this course, you will be able to:
· Confidently build predictive Machine Learning models using R to solve business problems and create business strategy
· Answer Machine Learning related interview questions
· Participate and perform in online Data Analytics competitions such as Kaggle competitions
Check out the table of contents below to see what all Machine Learning models you are going to learn.
How will this course help you?
A Verifiable Certificate of Completion is presented to all students who undertake this Machine learning basics course.
If you are a business manager or an executive, or a student who wants to learn and apply machine learning, R and predictive modelling in Real world problems of business, this course will give you a solid base for that by teaching you the most popular techniques of machine learning, R and predictive modelling.
Why should you choose this course?
This course covers all the steps that one should take while solving a business problem through linear regression. This course will give you an in-depth understanding of machine learning and predictive modelling techniques using R.
Most courses only focus on teaching how to run the analysis but we believe that what happens before and after running analysis is even more important i.e. before running analysis it is very important that you have the right data and do some pre-processing on it. And after running analysis, you should be able to judge how good your model is and interpret the results to actually be able to help your business.
What makes us qualified to teach you?
The course is taught by Abhishek and Pukhraj. As managers in Global Analytics Consulting firm, we have helped businesses solve their business problem using machine learning techniques using R, Python, and we have used our experience to include the practical aspects of data analysis in this course.
We are also the creators of some of the most popular online courses - with over 150,000 enrollments and thousands of 5-star reviews like these ones:
This is very good, i love the fact the all explanation given can be understood by a layman - Joshua
Thank you Author for this wonderful course. You are the best and this course is worth any price. - Daisy
Our Promise
Teaching our students is our job and we are committed to it. If you have any questions about the course content, machine learning, R, predictive modelling, practice sheet or anything related to any topic, you can always post a question in the course or send us a direct message.
Download Practice files, take Quizzes, and complete Assignments
With each lecture, there are class notes attached for you to follow along. You can also take quizzes to check your understanding of concepts of machine learning, R and predictive modelling. Each section contains a practice assignment for you to practically implement your learning on machine learning, R and predictive modelling.
Below is a list of popular FAQs of students who want to start their Machine learning journey-
What is Machine Learning?
Machine Learning is a field of computer science which gives the computer the ability to learn without being explicitly programmed. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns, and make decisions with minimal human intervention.
What are the steps I should follow to be able to build a Machine Learning model?
You can divide your learning process into 3 parts:
Statistics and Probability - Implementing Machine learning techniques require basic knowledge of Statistics and probability concepts. Second section of the course covers this part.
Understanding of Machine learning - Fourth section helps you understand the terms and concepts associated with Machine learning and gives you the steps to be followed to build a machine learning model
Programming Experience - A significant part of machine learning is programming. Python and R clearly stand out to be the leaders in the recent days. Third section will help you set up the Python environment and teach you some basic operations. In later sections there is a video on how to implement each concept taught in theory lecture in Python
Understanding of models - Fifth and sixth section cover Classification models and with each theory lecture comes a corresponding practical lecture where we actually run each query with you.
Why use R for Machine Learning?
Understanding R is one of the valuable skills needed for a career in Machine Learning. Below are some reasons why you should learn Machine learning in R
1. It’s a popular language for Machine Learning at top tech firms. Almost all of them hire data scientists who use R. Facebook, for example, uses R to do behavioral analysis with user post data. Google uses R to assess ad effectiveness and make economic forecasts. And by the way, it’s not just tech firms: R is in use at analysis and consulting firms, banks and other financial institutions, academic institutions and research labs, and pretty much everywhere else data needs analyzing and visualizing.
2. Learning the data science basics is arguably easier in R than Python. R has a big advantage: it was designed specifically with data manipulation and analysis in mind.
3. Amazing packages that make your life easier. As compared to Python, R was designed with statistical analysis in mind, it has a fantastic ecosystem of packages and other resources that are great for data science.
4. Robust, growing community of data scientists and statisticians. As the field of data science has exploded, usage of R and Python has exploded with it, becoming one of the fastest-growing languages in the world (as measured by StackOverflow). That means it’s easy to find answers to questions and community guidance as you work your way through projects in R.
5. Put another tool in your toolkit. No one language is going to be the right tool for every job. Like Python, adding R to your repertoire will make some projects easier – and of course, it’ll also make you a more flexible and marketable employee when you’re looking for jobs in data science.
What are the major advantages of using R over Python?
As compared to Python, R has a higher user base and the biggest number of statistical packages and libraries available. Although, Python has almost all features that analysts need, R triumphs over Python.
R is a function-based language, whereas Python is object-oriented. If you are coming from a purely statistical background and are not looking to take over major software engineering tasks when productizing your models, R is an easier option, than Python.
R has more data analysis functionality built-in than Python, whereas Python relies on Packages
Python has main packages for data analysis tasks, R has a larger ecosystem of small packages
Graphics capabilities are generally considered better in R than in Python
R has more statistical support in general than Python
What is the difference between Data Mining, Machine Learning, and Deep Learning?
Put simply, machine learning and data mining use the same algorithms and techniques as data mining, except the kinds of predictions vary. While data mining discovers previously unknown patterns and knowledge, machine learning reproduces known patterns and knowledge—and further automatically applies that information to data, decision-making, and actions.
Deep learning, on the other hand, uses advanced computing power and special types of neural networks and applies them to large amounts of data to learn, understand, and identify complicated patterns. Automatic language translation and medical diagnoses are examples of deep learning.