Machine Learning with R

Name: Machine Learning with R
Rating: 3.8 (15 reviews)

Learn how to use the R programming language for data science and machine learning and data visualization

Created byEDUCBA Bridging the Gap

Last updated 1/2024

English

What you'll learn

Read In Data Into The R Environment From Different Sources
Implement Unsupervised/Clustering Techniques Such As k-means Clustering
Implement Supervised Learning Techniques/Classification Such As Random Forests
Be Able To Harness The Power Of R For Practical Data Science

Course content

3 sections • 156 lectures • 24h 49m total length

Introduction to Machine Learning8:08
Explore how machine learning turns data into intelligent action using statistical methods. Understand data input, abstraction, and generalization, plus the ethical and legal considerations.
How do Machine Learn8:46
Explore abstraction and knowledge representation that turn data into meaningful models, and examine training, generalization, overfitting, underfitting, bias, and evaluation.
Steps to Apply Machine Learning7:25
Collect data in electronic format, then explore and prepare it with missing value imputation, outlier detection, and exploratory data analysis, before training and evaluating a machine learning model.
Regression and Classification Problems8:27
Explore supervised learning with regression and classification, contrast with unsupervised clustering and collaborative filtering, and apply these approaches in R using distributed tools like Hadoop and H2O.
Basic Data Manipulation in R9:19
Learn basic data manipulation in R, using vectors, matrices, and data frames, and master interactive analysis with RStudio and the R console.
More on Data Manipulation in R7:22
Explore the RStudio interface, including history, environment, plots, and help, and learn to load libraries and datasets such as air quality for data manipulation in R.
Basic Data Manipulation in R - Practical9:19
Master basic data manipulation in R and navigate RStudio and the R console to run commands, manage data frames, and leverage the CRAN package ecosystem.
Create a Vector9:03
Learn how to create vectors with the c constructor in R, distinguish vectors from scalars, and compute basic statistics: mean, median, sd, var, correlation, and covariance using practical examples.
2.7 Problem and Solution8:24
Handle missing values with na.rm = TRUE in mean and sd, and perform element by element comparisons between vectors and scalars, including sequences like 1 to 10 with length.out.
2.10 Problem and Solution9:25
Explore conditional selection on vectors, naming and accessing elements by name, vector arithmetic with recycling rules, and namespace precedence using :: and $ in R programming.
Exponentiation Right to Left6:57
Explore exponentiation right to left, unary, arithmetic, and logical and bitwise operators in R. Master function definitions, assignment directions, scope, and practical examples such as CV and gcd.
2.13 Avoiding Some Common Mistakes7:17
Identify common R mistakes, such as escaping backslashes, using correct sequence syntax, and distinguishing assignment from comparison, then apply proper vector, list, and data frame handling for accurate results.
Simple Linear Regression10:39
Learn simple linear regression and other regression forms, and fit a line in R using ordinary least squares with intercept and slope, plus r square and p values.
Simple Linear Regression Continues6:59
Explore simple linear regression in R by building a model with x and y data, visualizing the AB line, and interpreting the intercept and slope from the model summary.
What is Rsquare10:44
Compute x and y values, derive mean of y, and obtain estimated y from the regression to calculate r square, then explain adjusted r square for model complexity.
Standard Error9:29
Explore standard error as the difference between actual and estimated values, compute residuals and residual sums of squares, and evaluate linear regression with f statistics and null hypotheses in Excel.
General Statistics5:51
Compute the f-statistic under restricted conditions and compare it with the f critical value to conclude that the coefficients are not zero using R.
General Statistics Continues6:51
Learn to compute p values and t statistics from a regression model using the variance-covariance diagonal, with coefficients 2.2 and 0.6, and interpret residuals, F statistics, R², and adjusted R².
Simple Linear Regression and More of Statistics10:40
Master simple linear regression in R, learning intercept and slope, r-squared, t and f statistics, p values, residuals, and how ordinary least squares fits a regression line.
Open the Studio7:00
Open r studio and fit a simple linear regression with lm to obtain y = 2.2 + 0.6x. Inspect the model summary for coefficients, residuals, and r square values.
What is R Square10:44
Explore how r square measures the fit of a simple regression with x values 1–5 and mean y of 4, and how adjusted r square accounts for model complexity.
What is STD Error9:21
Explore how standard error derives from residuals in linear regression, including residual standard error, residual sum of squares, and the role of F statistics and null hypotheses in Excel.
Reject Null Hypothesis10:14
Learn to test a linear regression model by rejecting the null hypothesis that coefficients are zero, using f statistics, t values, p values, and interpreting residuals and r-squared.
Variance Covariance and Correlation10:44
Explore variance, covariance, and correlation, and learn how these measures describe data spread, how two series change together, and how they inform regression analysis.
Root names and Types of Distribution Function10:52
Explore covariance and correlation with a simple linear regression model, test significance via Pearson correlation and p-values, and learn about normal, binomial, and Poisson distributions with R functions.
Generating Random Numbers and Combination Function8:02
Learn to generate random numbers in r with run and set.seed, specify min and max ranges, use sample with replacement options, and model normal data with rnorm and dnorm.
Probabilities for Discrete Distribution Function10:22
Visualize a normal distribution with mean 50 and sd 5, and compute binomial probabilities in R using dbinom and pbinom for 25 widgets at p=0.005 and 20 trials at p=1/6.
Quantile Function and Poison Distribution10:24
Explore the Poisson distribution, its lambda parameter, and the probability of x events in a fixed time, then connect to normal distribution quantiles and R functions pnorm, qnorm, and dt.
Students T Distribution, Hypothesis and Example9:37
Learn to perform a one-sample t test, compute the t statistic, and decide on the null hypothesis using critical values with df = n-1, including p-values from air quality data.
Chai-Square Distribution4:51
Use z scores and the empirical rule to interpret p values and reject the null. The lecture covers normal distribution, six sigma, central limit theorem, and applying p.test in R.
Data Visualization9:11
Explore multiple linear regression in R with lm, visualize data, and select predictors using the cars 93 data set, converting type to a factor and applying aggregate insights.
More on Data Visualization8:27
Visualize car data with co plots and pairs, explore how horsepower, rpm, weight, and origin affect mpg city, and convert origin to a factor for a linear model.
Multiple Linear Regression8:47
Explore multiple linear regression by building model one and model two to predict city mpg, comparing p-values, r-squared and adjusted r-squared, with AIC and step commands, and 70/30 train-test split.
Multiple Linear Regression Continues7:11
Apply stepwise multiple linear regression with the lm function, using backward and forward selection to add or remove predictors. Use AIC as the criterion to identify the best fitting model.
Regression Variables9:05
Run regression analysis using forward and backward variable selection, compare models with AIC and RSS, and validate predictions on the test set to reach 85% accurate model.
Generalized Linear Model11:58
Explore generalized linear models and generalized least squares, and contrast lm regression with glm on air quality data (ozone as y, wind as x) after removing NAs.
Generalized Least Square9:22
Apply generalized least squares to model ozone on wind and date, assess autocorrelation, compare models with aic, and predict missing ozone values using gls on the air quality data.
KNN- Various Methods of Distance Measurements8:07
Explore k-nearest neighbors as a supervised classifier, using distance measures like euclidean, manhattan, minkowski, hamming, cosine, and jaccard to label new items by majority vote.
Overview of KNN- (Steps involved)9:26
Apply kNN in R by exploring distance measures such as Hamming and Euclidean, and data normalization. Build and evaluate a model on iris data using train-test splits and misclassification rate.
Data normalization and prediction on Test Data8:08
Normalize the iris data, create iris_new, and split into 100 training and 50 test records to apply knn for species prediction using sepal and petal features.
Improvement of Model Performance and ROC9:48
Explore k-nearest neighbors classification on the iris data set, using k=11, and assess performance with a confusion matrix and ROC, reporting accuracy, true positive rate, and false positive rate.
Decision Tree Classifier8:30
Construct and prune decision trees, distinguish classification and regression trees, and apply splitting criteria such as information gain, Gini index, and reduction in variance with practical examples.
More on Decision Tree Classifier9:14
Explore how to compute the Gini index for features B, C, and D, compare Gini gains to identify the root level split, and understand data splitting by y.
Pruning of Decision Trees9:01
Explore building a decision tree using the Gini index for splits, in Excel and in R with the car seats data, including creating a sales indicator and plotting.
Decision Tree Remaining7:11
Convert the sales indicator to a factor, train a decision tree model, and visualize its structure. Use cross-validation and pruning to reduce misclassification and improve generalization.
Decision Tree Remaining Continues5:56
Prune the decision tree to reduce overfitting, compare misclassification error with test data, and observe that pruning may improve or sometimes not improve performance.
General concept of Random Forest10:32
Explore the general concept of random forest, a supervised learning model that builds many decision trees via random data and feature selection to improve classification and regression accuracy.
Ada Boosting and Ensemble Learning11:01
Explore Ada boosting, an adaptive boosting approach that sequentially refines models by weighting weak features, and harness ensemble learning to combine multiple algorithms for improved predictions, including random forest.
Data Visualization and Preparation10:42
Import ctg data, convert NSP to a factor, split training and test sets, build a 500-tree random forest, and assess with out-of-bag error and confusion matrices.
Tuning Random Forest Model7:39
Tune a random forest in R by adjusting trees and mtry, compare out-of-bag error, train/test accuracy, and feature importance, then plot tree-size histogram to assess model performance.
Evaluation of Random Forest Model Performance7:10
Explore how to evaluate a random forest model using variable importance plots, Gini gains, and accuracy impact when features like class and LTV are removed, plus partial plots.
Introduction to Kmeans Clustering11:42
Explore k-means clustering, an unsupervised vector quantization method that partitions data into k clusters using centroid distances, distance metrics (Euclidean, Manhattan, Minkowski), and the elbow method.
Kmeans Elbow Point and Dataset10:46
Example of Kmeans Dataset11:15
Creating a Graph for Kmeans Clustering11:23
Creating a Graph for Kmeans Clustering Continues7:24
Apply k-means clustering to a 30,000-teen dataset, assign cluster labels to the original data, and reveal dominant interests per group.
Aggregation Function of Clustering9:10
Learn how aggregation functions summarize cluster characteristics by age, gender, and social behavior, and use r square and within, between, and total sum of squares to evaluate k-means clustering.
Conditional Probability with Bayes Algorithm10:27
Explore the Naive Bayes classifier and Bayes theorem, including conditional probability and class conditional independence, and apply to spam filtering and text classification.
Venn Diagram Naive Bayes Classification8:55
Explore how joint probability and the Venn diagram illuminate naive Bayes classification for spam detection, using prior, likelihood, marginal likelihood, and posterior probabilities.
Component OF Bayes Theorem using Frequency Table10:54
Explore constructing a frequency table and a likelihood table to apply Bayes theorem for text classification, highlighting Naive Bayes' conditional probability and spam filtering.
Naive Bayes Classification Algorithm and Laplace Estimator9:17
Explore how Naive Bayes classification combines feature probabilities, priors, and the Laplace estimator to handle zero counts and distinguish spam from ham.
Example of Naive Bayes Classification9:26
Learn how Naive Bayes uses discretization and Laplace smoothing to handle numerical features, builds frequency-based models, and applies to SMS spam filtering with an R example.
Example of Naive Bayes Classification Continues11:01
Learn to perform text mining in R with the tm package: create a cleaned sms corpus and a document-term matrix for spam classifier training.
Spam and Ham Messages in Word Cloud9:09
Explore spam detection with R by building a term-document matrix from SMS data, splitting into training and test sets, and visualizing spam and ham word clouds.
Implementation of Dictionary and Document Term Matrix6:57
Learn to build a document-term matrix from an sms corpus, filter to frequent terms, convert counts to binary yes/no, and train a naive bayes model with e1071 for spam detection.
Executes the Function Naive Bayes8:50
Explore Naive Bayes classification for SMS spam filtering in R, including training data, cross tables, Laplace smoothing, and evaluating misclassification rates.
Support Vector Machine with Black Box Method9:29
Explore how support vector machines, a black box method, use a hyperplane to separate data with maximum margin, enabling binary classification, numerical prediction, and pattern recognition in high-dimensional spaces.
Linearly and Non- Linearly Support Vector Machine9:46
Discover how to find the maximum margin hyperplane for linearly separable data using convex hulls and quadratic optimization, and apply slack variables with C for soft margins.
Kernal Trick10:17
Learn how support vector machines use the kernel trick to transform nonlinear data into a higher-dimensional space for linear separation, using linear, polynomial, and sigmoid kernels.
Gaussian RBF Kernal and OCR with SVMs9:47
Explore the gaussian rbf kernel and svm-based OCR, using a 16-feature letter dataset to train and test, and compare kernel choices through trial and error.
Examples of Gaussian RBF Kernal and OCR with SVMs7:32
Learn how a gaussian rbf kernel used with svm boosts OCR of letters from 84% to 93% accuracy, with tuning on a 16,000-record training set and a 4,000-input test set.
Summary of Support Vector Machine8:24
Explore how to tune a support vector machine with radial kernel, selecting the best cost and gamma to achieve high accuracy, as demonstrated with 97% test accuracy and 116 misclassifications.
Feature Selection Dimension Reduction Technique9:36
Feature Extraction Dimension Reduction Technique9:54
Learn principal component analysis and kernel PCA for dimensionality reduction, and explore graph-based kernel PCA, Isomap, LLE, Hessian L, Laplace eigenmaps, Ltsa, plus LDA and GDA.
Dimension Reduction Technique Example8:59
The lecture demonstrates dimension reduction in R by loading caret and correlation plot libraries, removing features with high missing values and near-zero variance, computing correlations, and dropping highly correlated predictors.
Dimension Reduction Technique Example Continues7:42
Perform feature selection with random forest, ranking predictors by mean decrease accuracy to reveal top features, then build a reduced data set for training and testing in dimension reduction.
Introduction Principal Component Analysis10:52
Explore principal component analysis as a dimension reduction technique and learn how principal component analysis uses eigenvalues and eigenvectors to perform an orthogonal transformation into linearly uncorrelated principal components.
Steps of PCA10:51
Learn to perform principal component analysis by centering data, building the covariance matrix, and computing eigenvalues and eigenvectors; multiply by the eigenvectors to obtain orthogonal principal components.
Steps of PCA Continues9:27
Explore how principal components reduce data from two dimensions to one along the x axis, illustrating information loss, compression in apps like WhatsApp, and the eigenvalue–eigenvector framework behind PCA.
Eigen Values9:22
Learn to compute eigenvalues and eigenvectors of a 2x2 matrix and verify results in R, finding lambda values -1 and 8 with corresponding eigenvectors.
Eigen Vectors7:41
Explore eigenvalues and eigenvectors of a matrix, solve for lambda values, and find corresponding eigenvectors, using R and matrix transformations. Understand how PCA leverages eigenvalues to preserve maximum information.
Principal Component Analysis using Pr-Comp10:06
Apply principal component analysis in R using prcomp on the iris data, after scaling features; interpret standard deviation, rotations, and variance proportions, and use a scree plot to decide components.
Principal Component Analysis using Pr-Comp Continues9:02
Explore principal component analysis with prcomp on iris data, interpret pc1 and pc2 variance, use loadings (rotation), and create a pca biplot.
C Bind Type in PCA9:02
Explore PCA on the iris data set, select the first two principal components for training and test data, and build a classification model with a decision tree in R.
R Type Model12:31
Build an R part type model on iris data, predict with iris_pc_train and iris_pc_test, convert probabilities to species and report misclassifications; explore standardization and PCA with eigenvalues, loadings, and scores.
Black Box Method in Neural Network8:57
This lecture demystifies neural networks as powerful but opaque black box models, explains their brain-inspired architecture, and surveys practical applications from speech recognition to self-driving cars.
Characteristics of a Neural Networks9:25
Explain how artificial neurons process inputs with weighted sums and activation functions, compare threshold and sigmoid activations, and discuss normalization and standardization to squash inputs for training.
Network Topology of a Neural Networks10:55
Explore neural network topology, from single-layer networks with input and output nodes to multilayer networks with hidden layers, fully connected designs, and backpropagation training.
Weight Adjustment and Case Update11:30
Learn how neural networks adjust weights with case updates and batch updates, through inputs, biases, sums, errors, activations, outputs, and backpropagation with learning rate.
Introduction Model Building in R10:44
Explore practical neural networks in R by building and normalizing a university admission model using a 0 to 1 scale, 70/30 training/testing split, and sigmoid activation.
Installing the Package of Model Building in R11:14
Install and load the neuralnet package in R, set up and run a fully connected neural network, and interpret weights, bias, and sigmoid outputs to predict admission probability.
Nodes in Model Building in R8:29
Learn how a neural network initializes random weights and biases, applies sigmoid activation, evaluates with training and testing data using a compute function, and tunes hidden layers to optimize accuracy.
Example of Model Building in R8:19
Explore building a neural network in R with a two-hidden-layer configuration, training and testing performance, parameter tuning, and model visualization to compare errors and iteration results.
Time Series Analysis8:22
Learn time series analysis and forecasting theory with autoregressive models like ARMA, focusing on application over math. Distinguish univariate time series from cross-sectional data and how past values drive current outcomes.
Pattern in Time Series Data8:13
Explore patterns in time series across frequencies, including upward trends, sine wave patterns, and random series, and learn to identify trend, seasonality, cyclical, and random components for forecasting.
Time Series Modelling8:48
Explore univariate time series forecasting with autoregressive and moving average models, using past values and past errors, while recognizing white noise and arima applications in central banks.
Moving Average Model10:46
Explore how moving average models rely on past error terms (white noise) and how ARMA and ARIMA blend autoregression, moving averages, and differencing to achieve stationarity.
Auto Correlation Function8:27
Learn to make time series stationary via differencing, and use the autocorrelation function (ACF) and partial autocorrelation function (PACF) to identify model p and lags.
Inference of ACF and PFCF7:10
Learn how ACF and PACF determine the number of past values in AR, MA, or ARMA models by using correlograms to select p and q, after ensuring stationarity.
Diagnostic Checking9:07
Analyze diagnostic checking in time series, using regression and ARIMA, to select models by R square and AIC/BIC, examine residuals, and compare random walk and exponential smoothing forecasts.
Forecasting Using Stock Price10:18
Forecast SBI stock closing price with ARIMA, using National Stock Exchange historical data and AR, differencing, integration, and moving average components, prepared in CSV for R.
Stock Price Index10:35
Stock Price Index Continues9:44
Perform arima forecasting with p=1, d=1, q=1 on a time series split into 733 training records and five test records, and compare models via aic and confidence intervals.
Prophet Stock5:17
Forecast stock prices with time series insights and apply Prophet, a library developed by Facebook, in R. Prepare data with two parameters: y (closing price) and date.
Run Prophet Stock8:18
Learn how to forecast stock prices using Prophet in R, prepare data frames, build predictions, and visualize results with confidence intervals and trend components.
Time Series Data Denationalization9:43
Time Series Data Denationalization Continues7:35
Learn to create and customize an Excel line chart, label axes and titles, identify seasonality and trend, and apply a CMA to smooth data.
Average of Quarter Denationalization11:19
Identify seasonality and irregularity in time series, deseasonalize data by dividing by the seasonal component, and infer the trend with simple linear regression using Excel’s data analysis tool.
Regression of Denationalization9:15
Perform simple linear regression with R to estimate trends, seasonality, and predictions from time series data, using coefficients, p-values, and 95% confidence intervals.
Gradient Boosting Machines9:37
Gradient boosting builds an ensemble of weak models in a stagewise fashion to optimize a differentiable loss for regression and classification, sequentially focusing on misclassified observations.
Errors in Gradient Boosting Machines11:54
Discover how gradient boosting turns weak classifiers into a strong ensemble through weighted voting, with h1, h2, h3 models and weights, and learn tricks to tackle overfitting.
What is Error Rate in Gradient Boosting Machines9:34
Explore how gradient boosting combines weak learners like decision tree stumps to minimize the error rate, using initial equal weights and alpha for a weighted wisdom of crowds ensemble.
Optimization Gradient Boosting Machines9:02
Learn how gradient boosting machines optimize with exponential loss, updating weights iteratively via a learning rate alpha and z normalization, increasing weights for wrong predictions and decreasing for correct ones.
Gradient Boosting Trees (GBT)6:26
Explore gradient boosting trees that sequentially add shallow learners to correct errors, tune learning rate, depth, and number of trees for better generalization and reduced overfitting vs random forest.
Dataset Boosting in Gradient9:25
Explore gradient boosting and AdaBoost, showing how weak learners are boosted to correct misclassifications and form a strong model, with an end-to-end R example.
Example of Dataset Boosting in Gradient9:55
Explore a dataset boosting example using gradient boosting and cart with rpart, creating training and test sets, tuning cp, cross-validation, and evaluating with a confusion matrix.
Example of Dataset Boosting in Gradient Continues11:19
Explore gradient boosting in R with gbm, comparing AdaBoost and bernoulli distribution, tuning trees and shrinkage, and evaluating with confusion matrices showing high accuracy.
Market Basket Analysis Association Rules11:54
Learn market basket analysis as an unsupervised method to uncover association rules among item sets using support, confidence, and lift.
Market Basket Analysis Association Rules Continues10:37
Explore market basket analysis as an unsupervised algorithm, deriving association rules through support, confidence, and lift, with practical examples and cautions against misleading results.
Market Basket Analysis Interpretation7:41
Interpret market basket association rules by measuring support, confidence, and lift on region-based transaction data to understand how product pairs co-occur and influence purchase likelihood.
Implementation of Market Basket Analysis5:19
Learn how to apply market basket analysis using association rules to optimize store layout, cross selling, and pricing strategies, including loss leader tactics and product placement.
Example of Market Basket Analysis9:22
Explore market basket analysis using a groceries data set and csv file, applying association rules with support and confidence, using Excel Miner and R for data mining and modeling.
Datamining in Market Basket Analysis10:29
Apply market basket analysis with association rules and the apriori algorithm on a grocery CSV dataset, using binary sparse matrices and metrics like support, confidence, and lift. See how to configure data ranges, handle binary data formats, and transition from Excel outputs to RStudio for analysis.
Market Basket Analysis Using Rstudio9:17
Learn market basket analysis in RStudio using the arules package to read groceries transaction data, convert to sparse matrices, and explore item frequencies and basket statistics.
Market Basket Analysis Using Rstudio Continues9:26
Explore market basket analysis in Rstudio by inspecting baskets, computing item frequencies, and building apriori rules with support and confidence to reveal frequent associations.
More on Rstudio in Market Analysis11:52
Explore association rules in RStudio for market analysis: inspect, sort by lift, fix left-hand side and right-hand side, and visualize rules with graphs for items like herbs, vegetables, and milk.
New Development in Machine Learning10:59
Explore new developments in machine learning with R, including data science scope, salaries, and real-world examples like acquiring Twitter data, building a Facebook chatbot, and key data scientist skills.
Data Scientist in Machine Learnirng10:33
Explore the data science landscape, including roles, skills, tools, and Google's machine learning APIs for vision, speech, translation, and video intelligence.
Types of Detection in Machine Learning11:02
Explore how the Google Vision API detects labels, faces, OCR text, landmarks, and explicit content via REST, returning JSON for images and enabling longitude, latitude, and price insights.
Example of New Development in Machine Learning10:07
Learn natural language processing in R, perform sentiment analysis on hotel and food reviews, and visualize analytics with BigQuery and Firebase; experiment with word clouds from tweets via Twitter packages.
Example of New Development in Machine Learning Continues5:07
Extracts and preprocesses Twitter data in R, converts tweets to text, builds a corpus, cleans data, and creates a word cloud to highlight frequent terms and prepare for sentiment analysis.

Working on Linear Regression15:57
Use linear regression to predict a dependent variable from an independent variable, showing how correlation relates to regression with simple and multiple regression and the regression equation.
Equation11:51
Discover how linear regression fits a line of best fit through x-bar and y-bar to minimize squared errors and predict salaries.
Making the Regression of the Algorithm5:45
Explore the data set and fit a linear regression model with lm, plotting tip values and adding a mean line. Contrast linear regression with logistic regression, continuous versus categorical outcomes.
Basic Types of Algorithms13:13
Learn how to fit a simple linear regression in R using y ~ x with tip and bill data, and visualize the regression line.
predicting the Salary of the Employee15:56
Learn to build a simple linear regression in R to predict employee salary from years of experience, with training and test splits and RMSE evaluation.
Making of Simple Linear Regression Model8:04
Build a simple linear regression model in R using lm to predict salary from years of experience, train and test sets, and evaluate with MSE, RMSE, and MAPE.
Plotting Training Set and Work17:20
Visualize training and test set results with ggplot2 by plotting points and the regression line, and interpret coefficients, residuals, R-squared, and p-values in simple and multiple linear regression.
Multiple Linear Regression12:58
Explore multiple linear regression with several predictors to predict profit using a dataset of 50 startups, handling dummy variables for state and evaluating model performance with r-squared.
Dummy Variable Concept7:05
Explore dummy variable concepts in feature engineering for multiple linear regression, using one-hot encodings for state categories, avoiding the dummy variable trap, with R automating dummies.
Predictions Over Year10:00
Learn to generate yearly predictions, evaluate accuracy with rmse and mape, and improve models using backward elimination, forward selection, and bidirectional approaches at alpha 0.05.
Difference Between Reference Elimination9:48
Compare backward elimination and forward selection in regression by evaluating p-values and stopping when remaining variables meet the 0.05 threshold, highlighting the dominance of R&D spend.
Working of the Model13:08
Evaluate a linear regression model with R square and adjusted R square, explaining SSR, SSE, SST, and assess RMSE, MAPE while applying backward elimination and forward selection including marketing spend.
Working on Another Dataset14:08
Model medv on the Boston housing data using multiple linear regression, assess multicollinearity with correlation plots from core plot, and identify key predictors like rooms per dwelling and crime rate.
Backward Elimination Approach15:48
apply backward elimination to refine a regression model by removing the highest p-value variable while monitoring r-squared, adjusted r-squared, and AIC via the step function.
Making of the Model with Full and Null12:28
Explore backward elimination and forward selection in R using lm and the step function to build models from null to full, compare RMSE and MAPE, and assess multicollinearity with VIF.

Intro to Machine Learning Project1:18
Learn to build a random forest in R using caret and ggplot, with train control and train functions, plus oversampling, imputation, stratified sampling, and cross-validation.
Starting with the Machine Learning Project10:59
Conduct a machine learning project to predict bankruptcy of Polish manufacturing firms using 64 financial attributes and 7000‑company, 5-year dataset, addressing missing data and oversampling minority class, evaluated with auc.
Reading Files in the List10:01
Read the list of five data frames, rename the class and attribute names, convert zeros and ones to meaningful labels, and apply a change_names function across the list.
Mapping the Missing Data10:06
Explore mapping and imputing missing data in a machine learning workflow using the Amelia package to visualize missingness and the mice package for imputation, while assessing correlations and data loss.
Checking the Attributes9:42
Convert missing indicators to 0/1 with is.na, identify attributes with any values, compute correlations, and visualize them as a heatmap with the complex heatmap package.
Creating Lower Triangular Correlation Matrix12:18
Create a lower triangular correlation matrix heat map by displaying only correlations above 0.7, printing values with one decimal place, and generating visualizations to reveal missingness patterns in financial data.
Calculating Data Imbalance10:12
Calculate data imbalance by computing the percentage of bankrupt companies and how many are still going concerned, then impute missing data using mean imputation and predictive mean matching with mice.
Choose the Imputation9:21
Select the pm method and complete the data with the mice package, then oversample the minority class using smote with five neighbors and check remaining missing values.
Preprocess the Imputed Data11:13
Learn how to preprocess imputed data by renaming labels to meaningful factors, partition data with caret, and train a random forest with repeated cross-validation and parallel processing.
Make Clusters10:19
Train a random forest classifier in R using a formula with train control; load audio data with radius, predict, and assess with a confusion matrix and ROC curve AUC 0.9985.

Requirements

No prior knowledge of machine learning required. Basic knowledge of R

Description

Data Scientist has been ranked the number one job on Glassdoor and the average salary of a data scientist is over $120,000 in the United States according to Indeed! Data Science is a rewarding career that allows you to solve some of the world's most interesting problems! This course is designed for both complete beginners with no programming experience or experienced developers looking to make the jump to Data Science! This comprehensive course is comparable to other ML bootcamps that usually cost thousands of dollars, but now you can learn all that information at a fraction of the cost! this is one of the most comprehensive course for data science and machine learning. We'll teach you how to program with R, how to create amazing data visualizations, and how to use Machine Learning with R!

Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model from example inputs and using that to make predictions or decisions, rather than following strictly static program instructions. Machine learning is closely related to and often overlaps with computational statistics; a discipline that also specializes in prediction-making. This training is an introduction to the concept of machine learning and its application using R tool.

The training will include the following:

Introducing Machine Learning

a. The origins of machine learning

b. Uses and abuses of machine learning

Ethical considerations
How do machines learn?
Steps to apply machine learning to your data
Choosing a machine learning algorithm
Using R for machine learning
Forecasting Numeric Data – Regression Methods
Understanding regression
Example – predicting medical expenses using linear regression

a. collecting data

b. exploring and preparing the data

c. training a model on the data

d. evaluating model performance

e. improving model performance

Who this course is for:

Anyone who wants to learn about data and analytics, Data Engineers, Analysts, Architects, Software Engineers, IT operations, Technical managers

Machine Learning with R

What you'll learn

Explore related topics

Course content