
Explore how machine learning turns data into intelligent action using statistical methods. Understand data input, abstraction, and generalization, plus the ethical and legal considerations.
Explore abstraction and knowledge representation that turn data into meaningful models, and examine training, generalization, overfitting, underfitting, bias, and evaluation.
Collect data in electronic format, then explore and prepare it with missing value imputation, outlier detection, and exploratory data analysis, before training and evaluating a machine learning model.
Explore supervised learning with regression and classification, contrast with unsupervised clustering and collaborative filtering, and apply these approaches in R using distributed tools like Hadoop and H2O.
Learn basic data manipulation in R, using vectors, matrices, and data frames, and master interactive analysis with RStudio and the R console.
Explore the RStudio interface, including history, environment, plots, and help, and learn to load libraries and datasets such as air quality for data manipulation in R.
Master basic data manipulation in R and navigate RStudio and the R console to run commands, manage data frames, and leverage the CRAN package ecosystem.
Learn how to create vectors with the c constructor in R, distinguish vectors from scalars, and compute basic statistics: mean, median, sd, var, correlation, and covariance using practical examples.
Handle missing values with na.rm = TRUE in mean and sd, and perform element by element comparisons between vectors and scalars, including sequences like 1 to 10 with length.out.
Explore conditional selection on vectors, naming and accessing elements by name, vector arithmetic with recycling rules, and namespace precedence using :: and $ in R programming.
Explore exponentiation right to left, unary, arithmetic, and logical and bitwise operators in R. Master function definitions, assignment directions, scope, and practical examples such as CV and gcd.
Identify common R mistakes, such as escaping backslashes, using correct sequence syntax, and distinguishing assignment from comparison, then apply proper vector, list, and data frame handling for accurate results.
Learn simple linear regression and other regression forms, and fit a line in R using ordinary least squares with intercept and slope, plus r square and p values.
Explore simple linear regression in R by building a model with x and y data, visualizing the AB line, and interpreting the intercept and slope from the model summary.
Compute x and y values, derive mean of y, and obtain estimated y from the regression to calculate r square, then explain adjusted r square for model complexity.
Explore standard error as the difference between actual and estimated values, compute residuals and residual sums of squares, and evaluate linear regression with f statistics and null hypotheses in Excel.
Compute the f-statistic under restricted conditions and compare it with the f critical value to conclude that the coefficients are not zero using R.
Learn to compute p values and t statistics from a regression model using the variance-covariance diagonal, with coefficients 2.2 and 0.6, and interpret residuals, F statistics, R², and adjusted R².
Master simple linear regression in R, learning intercept and slope, r-squared, t and f statistics, p values, residuals, and how ordinary least squares fits a regression line.
Open r studio and fit a simple linear regression with lm to obtain y = 2.2 + 0.6x. Inspect the model summary for coefficients, residuals, and r square values.
Explore how r square measures the fit of a simple regression with x values 1–5 and mean y of 4, and how adjusted r square accounts for model complexity.
Explore how standard error derives from residuals in linear regression, including residual standard error, residual sum of squares, and the role of F statistics and null hypotheses in Excel.
Learn to test a linear regression model by rejecting the null hypothesis that coefficients are zero, using f statistics, t values, p values, and interpreting residuals and r-squared.
Explore variance, covariance, and correlation, and learn how these measures describe data spread, how two series change together, and how they inform regression analysis.
Explore covariance and correlation with a simple linear regression model, test significance via Pearson correlation and p-values, and learn about normal, binomial, and Poisson distributions with R functions.
Learn to generate random numbers in r with run and set.seed, specify min and max ranges, use sample with replacement options, and model normal data with rnorm and dnorm.
Visualize a normal distribution with mean 50 and sd 5, and compute binomial probabilities in R using dbinom and pbinom for 25 widgets at p=0.005 and 20 trials at p=1/6.
Explore the Poisson distribution, its lambda parameter, and the probability of x events in a fixed time, then connect to normal distribution quantiles and R functions pnorm, qnorm, and dt.
Learn to perform a one-sample t test, compute the t statistic, and decide on the null hypothesis using critical values with df = n-1, including p-values from air quality data.
Use z scores and the empirical rule to interpret p values and reject the null. The lecture covers normal distribution, six sigma, central limit theorem, and applying p.test in R.
Explore multiple linear regression in R with lm, visualize data, and select predictors using the cars 93 data set, converting type to a factor and applying aggregate insights.
Visualize car data with co plots and pairs, explore how horsepower, rpm, weight, and origin affect mpg city, and convert origin to a factor for a linear model.
Explore multiple linear regression by building model one and model two to predict city mpg, comparing p-values, r-squared and adjusted r-squared, with AIC and step commands, and 70/30 train-test split.
Apply stepwise multiple linear regression with the lm function, using backward and forward selection to add or remove predictors. Use AIC as the criterion to identify the best fitting model.
Run regression analysis using forward and backward variable selection, compare models with AIC and RSS, and validate predictions on the test set to reach 85% accurate model.
Explore generalized linear models and generalized least squares, and contrast lm regression with glm on air quality data (ozone as y, wind as x) after removing NAs.
Apply generalized least squares to model ozone on wind and date, assess autocorrelation, compare models with aic, and predict missing ozone values using gls on the air quality data.
Explore k-nearest neighbors as a supervised classifier, using distance measures like euclidean, manhattan, minkowski, hamming, cosine, and jaccard to label new items by majority vote.
Apply kNN in R by exploring distance measures such as Hamming and Euclidean, and data normalization. Build and evaluate a model on iris data using train-test splits and misclassification rate.
Normalize the iris data, create iris_new, and split into 100 training and 50 test records to apply knn for species prediction using sepal and petal features.
Explore k-nearest neighbors classification on the iris data set, using k=11, and assess performance with a confusion matrix and ROC, reporting accuracy, true positive rate, and false positive rate.
Construct and prune decision trees, distinguish classification and regression trees, and apply splitting criteria such as information gain, Gini index, and reduction in variance with practical examples.
Explore how to compute the Gini index for features B, C, and D, compare Gini gains to identify the root level split, and understand data splitting by y.
Explore building a decision tree using the Gini index for splits, in Excel and in R with the car seats data, including creating a sales indicator and plotting.
Convert the sales indicator to a factor, train a decision tree model, and visualize its structure. Use cross-validation and pruning to reduce misclassification and improve generalization.
Prune the decision tree to reduce overfitting, compare misclassification error with test data, and observe that pruning may improve or sometimes not improve performance.
Explore the general concept of random forest, a supervised learning model that builds many decision trees via random data and feature selection to improve classification and regression accuracy.
Explore Ada boosting, an adaptive boosting approach that sequentially refines models by weighting weak features, and harness ensemble learning to combine multiple algorithms for improved predictions, including random forest.
Import ctg data, convert NSP to a factor, split training and test sets, build a 500-tree random forest, and assess with out-of-bag error and confusion matrices.
Tune a random forest in R by adjusting trees and mtry, compare out-of-bag error, train/test accuracy, and feature importance, then plot tree-size histogram to assess model performance.
Explore how to evaluate a random forest model using variable importance plots, Gini gains, and accuracy impact when features like class and LTV are removed, plus partial plots.
Explore k-means clustering, an unsupervised vector quantization method that partitions data into k clusters using centroid distances, distance metrics (Euclidean, Manhattan, Minkowski), and the elbow method.
Apply k-means clustering to a 30,000-teen dataset, assign cluster labels to the original data, and reveal dominant interests per group.
Learn how aggregation functions summarize cluster characteristics by age, gender, and social behavior, and use r square and within, between, and total sum of squares to evaluate k-means clustering.
Explore the Naive Bayes classifier and Bayes theorem, including conditional probability and class conditional independence, and apply to spam filtering and text classification.
Explore how joint probability and the Venn diagram illuminate naive Bayes classification for spam detection, using prior, likelihood, marginal likelihood, and posterior probabilities.
Explore constructing a frequency table and a likelihood table to apply Bayes theorem for text classification, highlighting Naive Bayes' conditional probability and spam filtering.
Explore how Naive Bayes classification combines feature probabilities, priors, and the Laplace estimator to handle zero counts and distinguish spam from ham.
Learn how Naive Bayes uses discretization and Laplace smoothing to handle numerical features, builds frequency-based models, and applies to SMS spam filtering with an R example.
Learn to perform text mining in R with the tm package: create a cleaned sms corpus and a document-term matrix for spam classifier training.
Explore spam detection with R by building a term-document matrix from SMS data, splitting into training and test sets, and visualizing spam and ham word clouds.
Learn to build a document-term matrix from an sms corpus, filter to frequent terms, convert counts to binary yes/no, and train a naive bayes model with e1071 for spam detection.
Explore Naive Bayes classification for SMS spam filtering in R, including training data, cross tables, Laplace smoothing, and evaluating misclassification rates.
Explore how support vector machines, a black box method, use a hyperplane to separate data with maximum margin, enabling binary classification, numerical prediction, and pattern recognition in high-dimensional spaces.
Discover how to find the maximum margin hyperplane for linearly separable data using convex hulls and quadratic optimization, and apply slack variables with C for soft margins.
Learn how support vector machines use the kernel trick to transform nonlinear data into a higher-dimensional space for linear separation, using linear, polynomial, and sigmoid kernels.
Explore the gaussian rbf kernel and svm-based OCR, using a 16-feature letter dataset to train and test, and compare kernel choices through trial and error.
Learn how a gaussian rbf kernel used with svm boosts OCR of letters from 84% to 93% accuracy, with tuning on a 16,000-record training set and a 4,000-input test set.
Explore how to tune a support vector machine with radial kernel, selecting the best cost and gamma to achieve high accuracy, as demonstrated with 97% test accuracy and 116 misclassifications.
Learn principal component analysis and kernel PCA for dimensionality reduction, and explore graph-based kernel PCA, Isomap, LLE, Hessian L, Laplace eigenmaps, Ltsa, plus LDA and GDA.
The lecture demonstrates dimension reduction in R by loading caret and correlation plot libraries, removing features with high missing values and near-zero variance, computing correlations, and dropping highly correlated predictors.
Perform feature selection with random forest, ranking predictors by mean decrease accuracy to reveal top features, then build a reduced data set for training and testing in dimension reduction.
Explore principal component analysis as a dimension reduction technique and learn how principal component analysis uses eigenvalues and eigenvectors to perform an orthogonal transformation into linearly uncorrelated principal components.
Learn to perform principal component analysis by centering data, building the covariance matrix, and computing eigenvalues and eigenvectors; multiply by the eigenvectors to obtain orthogonal principal components.
Explore how principal components reduce data from two dimensions to one along the x axis, illustrating information loss, compression in apps like WhatsApp, and the eigenvalue–eigenvector framework behind PCA.
Learn to compute eigenvalues and eigenvectors of a 2x2 matrix and verify results in R, finding lambda values -1 and 8 with corresponding eigenvectors.
Explore eigenvalues and eigenvectors of a matrix, solve for lambda values, and find corresponding eigenvectors, using R and matrix transformations. Understand how PCA leverages eigenvalues to preserve maximum information.
Apply principal component analysis in R using prcomp on the iris data, after scaling features; interpret standard deviation, rotations, and variance proportions, and use a scree plot to decide components.
Explore principal component analysis with prcomp on iris data, interpret pc1 and pc2 variance, use loadings (rotation), and create a pca biplot.
Explore PCA on the iris data set, select the first two principal components for training and test data, and build a classification model with a decision tree in R.
Build an R part type model on iris data, predict with iris_pc_train and iris_pc_test, convert probabilities to species and report misclassifications; explore standardization and PCA with eigenvalues, loadings, and scores.
This lecture demystifies neural networks as powerful but opaque black box models, explains their brain-inspired architecture, and surveys practical applications from speech recognition to self-driving cars.
Explain how artificial neurons process inputs with weighted sums and activation functions, compare threshold and sigmoid activations, and discuss normalization and standardization to squash inputs for training.
Explore neural network topology, from single-layer networks with input and output nodes to multilayer networks with hidden layers, fully connected designs, and backpropagation training.
Learn how neural networks adjust weights with case updates and batch updates, through inputs, biases, sums, errors, activations, outputs, and backpropagation with learning rate.
Explore practical neural networks in R by building and normalizing a university admission model using a 0 to 1 scale, 70/30 training/testing split, and sigmoid activation.
Install and load the neuralnet package in R, set up and run a fully connected neural network, and interpret weights, bias, and sigmoid outputs to predict admission probability.
Learn how a neural network initializes random weights and biases, applies sigmoid activation, evaluates with training and testing data using a compute function, and tunes hidden layers to optimize accuracy.
Explore building a neural network in R with a two-hidden-layer configuration, training and testing performance, parameter tuning, and model visualization to compare errors and iteration results.
Learn time series analysis and forecasting theory with autoregressive models like ARMA, focusing on application over math. Distinguish univariate time series from cross-sectional data and how past values drive current outcomes.
Explore patterns in time series across frequencies, including upward trends, sine wave patterns, and random series, and learn to identify trend, seasonality, cyclical, and random components for forecasting.
Explore univariate time series forecasting with autoregressive and moving average models, using past values and past errors, while recognizing white noise and arima applications in central banks.
Explore how moving average models rely on past error terms (white noise) and how ARMA and ARIMA blend autoregression, moving averages, and differencing to achieve stationarity.
Learn to make time series stationary via differencing, and use the autocorrelation function (ACF) and partial autocorrelation function (PACF) to identify model p and lags.
Learn how ACF and PACF determine the number of past values in AR, MA, or ARMA models by using correlograms to select p and q, after ensuring stationarity.
Analyze diagnostic checking in time series, using regression and ARIMA, to select models by R square and AIC/BIC, examine residuals, and compare random walk and exponential smoothing forecasts.
Forecast SBI stock closing price with ARIMA, using National Stock Exchange historical data and AR, differencing, integration, and moving average components, prepared in CSV for R.
Perform arima forecasting with p=1, d=1, q=1 on a time series split into 733 training records and five test records, and compare models via aic and confidence intervals.
Forecast stock prices with time series insights and apply Prophet, a library developed by Facebook, in R. Prepare data with two parameters: y (closing price) and date.
Learn how to forecast stock prices using Prophet in R, prepare data frames, build predictions, and visualize results with confidence intervals and trend components.
Learn to create and customize an Excel line chart, label axes and titles, identify seasonality and trend, and apply a CMA to smooth data.
Identify seasonality and irregularity in time series, deseasonalize data by dividing by the seasonal component, and infer the trend with simple linear regression using Excel’s data analysis tool.
Perform simple linear regression with R to estimate trends, seasonality, and predictions from time series data, using coefficients, p-values, and 95% confidence intervals.
Gradient boosting builds an ensemble of weak models in a stagewise fashion to optimize a differentiable loss for regression and classification, sequentially focusing on misclassified observations.
Discover how gradient boosting turns weak classifiers into a strong ensemble through weighted voting, with h1, h2, h3 models and weights, and learn tricks to tackle overfitting.
Explore how gradient boosting combines weak learners like decision tree stumps to minimize the error rate, using initial equal weights and alpha for a weighted wisdom of crowds ensemble.
Learn how gradient boosting machines optimize with exponential loss, updating weights iteratively via a learning rate alpha and z normalization, increasing weights for wrong predictions and decreasing for correct ones.
Explore gradient boosting trees that sequentially add shallow learners to correct errors, tune learning rate, depth, and number of trees for better generalization and reduced overfitting vs random forest.
Explore gradient boosting and AdaBoost, showing how weak learners are boosted to correct misclassifications and form a strong model, with an end-to-end R example.
Explore a dataset boosting example using gradient boosting and cart with rpart, creating training and test sets, tuning cp, cross-validation, and evaluating with a confusion matrix.
Explore gradient boosting in R with gbm, comparing AdaBoost and bernoulli distribution, tuning trees and shrinkage, and evaluating with confusion matrices showing high accuracy.
Learn market basket analysis as an unsupervised method to uncover association rules among item sets using support, confidence, and lift.
Explore market basket analysis as an unsupervised algorithm, deriving association rules through support, confidence, and lift, with practical examples and cautions against misleading results.
Interpret market basket association rules by measuring support, confidence, and lift on region-based transaction data to understand how product pairs co-occur and influence purchase likelihood.
Learn how to apply market basket analysis using association rules to optimize store layout, cross selling, and pricing strategies, including loss leader tactics and product placement.
Explore market basket analysis using a groceries data set and csv file, applying association rules with support and confidence, using Excel Miner and R for data mining and modeling.
Apply market basket analysis with association rules and the apriori algorithm on a grocery CSV dataset, using binary sparse matrices and metrics like support, confidence, and lift. See how to configure data ranges, handle binary data formats, and transition from Excel outputs to RStudio for analysis.
Learn market basket analysis in RStudio using the arules package to read groceries transaction data, convert to sparse matrices, and explore item frequencies and basket statistics.
Explore market basket analysis in Rstudio by inspecting baskets, computing item frequencies, and building apriori rules with support and confidence to reveal frequent associations.
Explore association rules in RStudio for market analysis: inspect, sort by lift, fix left-hand side and right-hand side, and visualize rules with graphs for items like herbs, vegetables, and milk.
Explore new developments in machine learning with R, including data science scope, salaries, and real-world examples like acquiring Twitter data, building a Facebook chatbot, and key data scientist skills.
Explore the data science landscape, including roles, skills, tools, and Google's machine learning APIs for vision, speech, translation, and video intelligence.
Explore how the Google Vision API detects labels, faces, OCR text, landmarks, and explicit content via REST, returning JSON for images and enabling longitude, latitude, and price insights.
Learn natural language processing in R, perform sentiment analysis on hotel and food reviews, and visualize analytics with BigQuery and Firebase; experiment with word clouds from tweets via Twitter packages.
Extracts and preprocesses Twitter data in R, converts tweets to text, builds a corpus, cleans data, and creates a word cloud to highlight frequent terms and prepare for sentiment analysis.
Use linear regression to predict a dependent variable from an independent variable, showing how correlation relates to regression with simple and multiple regression and the regression equation.
Discover how linear regression fits a line of best fit through x-bar and y-bar to minimize squared errors and predict salaries.
Explore the data set and fit a linear regression model with lm, plotting tip values and adding a mean line. Contrast linear regression with logistic regression, continuous versus categorical outcomes.
Learn how to fit a simple linear regression in R using y ~ x with tip and bill data, and visualize the regression line.
Learn to build a simple linear regression in R to predict employee salary from years of experience, with training and test splits and RMSE evaluation.
Build a simple linear regression model in R using lm to predict salary from years of experience, train and test sets, and evaluate with MSE, RMSE, and MAPE.
Visualize training and test set results with ggplot2 by plotting points and the regression line, and interpret coefficients, residuals, R-squared, and p-values in simple and multiple linear regression.
Explore multiple linear regression with several predictors to predict profit using a dataset of 50 startups, handling dummy variables for state and evaluating model performance with r-squared.
Explore dummy variable concepts in feature engineering for multiple linear regression, using one-hot encodings for state categories, avoiding the dummy variable trap, with R automating dummies.
Learn to generate yearly predictions, evaluate accuracy with rmse and mape, and improve models using backward elimination, forward selection, and bidirectional approaches at alpha 0.05.
Compare backward elimination and forward selection in regression by evaluating p-values and stopping when remaining variables meet the 0.05 threshold, highlighting the dominance of R&D spend.
Evaluate a linear regression model with R square and adjusted R square, explaining SSR, SSE, SST, and assess RMSE, MAPE while applying backward elimination and forward selection including marketing spend.
Model medv on the Boston housing data using multiple linear regression, assess multicollinearity with correlation plots from core plot, and identify key predictors like rooms per dwelling and crime rate.
apply backward elimination to refine a regression model by removing the highest p-value variable while monitoring r-squared, adjusted r-squared, and AIC via the step function.
Explore backward elimination and forward selection in R using lm and the step function to build models from null to full, compare RMSE and MAPE, and assess multicollinearity with VIF.
Learn to build a random forest in R using caret and ggplot, with train control and train functions, plus oversampling, imputation, stratified sampling, and cross-validation.
Conduct a machine learning project to predict bankruptcy of Polish manufacturing firms using 64 financial attributes and 7000‑company, 5-year dataset, addressing missing data and oversampling minority class, evaluated with auc.
Read the list of five data frames, rename the class and attribute names, convert zeros and ones to meaningful labels, and apply a change_names function across the list.
Explore mapping and imputing missing data in a machine learning workflow using the Amelia package to visualize missingness and the mice package for imputation, while assessing correlations and data loss.
Convert missing indicators to 0/1 with is.na, identify attributes with any values, compute correlations, and visualize them as a heatmap with the complex heatmap package.
Create a lower triangular correlation matrix heat map by displaying only correlations above 0.7, printing values with one decimal place, and generating visualizations to reveal missingness patterns in financial data.
Calculate data imbalance by computing the percentage of bankrupt companies and how many are still going concerned, then impute missing data using mean imputation and predictive mean matching with mice.
Select the pm method and complete the data with the mice package, then oversample the minority class using smote with five neighbors and check remaining missing values.
Learn how to preprocess imputed data by renaming labels to meaningful factors, partition data with caret, and train a random forest with repeated cross-validation and parallel processing.
Train a random forest classifier in R using a formula with train control; load audio data with radius, predict, and assess with a confusion matrix and ROC curve AUC 0.9985.
Data Scientist has been ranked the number one job on Glassdoor and the average salary of a data scientist is over $120,000 in the United States according to Indeed! Data Science is a rewarding career that allows you to solve some of the world's most interesting problems! This course is designed for both complete beginners with no programming experience or experienced developers looking to make the jump to Data Science! This comprehensive course is comparable to other ML bootcamps that usually cost thousands of dollars, but now you can learn all that information at a fraction of the cost! this is one of the most comprehensive course for data science and machine learning. We'll teach you how to program with R, how to create amazing data visualizations, and how to use Machine Learning with R!
Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model from example inputs and using that to make predictions or decisions, rather than following strictly static program instructions. Machine learning is closely related to and often overlaps with computational statistics; a discipline that also specializes in prediction-making. This training is an introduction to the concept of machine learning and its application using R tool.
The training will include the following:
Introducing Machine Learning
a. The origins of machine learning
b. Uses and abuses of machine learning
Ethical considerations
How do machines learn?
Steps to apply machine learning to your data
Choosing a machine learning algorithm
Using R for machine learning
Forecasting Numeric Data – Regression Methods
Understanding regression
Example – predicting medical expenses using linear regression
a. collecting data
b. exploring and preparing the data
c. training a model on the data
d. evaluating model performance
e. improving model performance