Regression, Data Mining, Text Mining, Forecasting using R

2,607 students enrolled

Please confirm that you want to add **Regression, Data Mining, Text Mining, Forecasting using R** to your Wishlist.

Learn Regression Techniques, Data Mining, Forecasting, Text Mining using R

Best Seller

2,607 students enrolled

Current price: $12
Original price: $100
Discount:
88% off

30-Day Money-Back Guarantee

- 32.5 hours on-demand video
- 1 Article
- 32 Supplemental Resources
- Full lifetime access
- Access on mobile and TV

- Certificate of Completion

Get your team access to Udemy's top 2,000 courses anytime, anywhere.

Try Udemy for Business
What Will I Learn?

- Learn about the basic statistics, including measures of central tendency, dispersion, skewness, kurtosis, graphical representation, probability, probability distribution, etc.
- Learn about scatter diagram, correlation coefficient, confidence interval, Z distribution & t distribution, which are all required for Linear Regression understanding
- Learn about the usage of R for building Linear Regression
- Learn about the K-Means clustering algorithm & how to use R to accomplish this
- Learn about the science behind text mining, word cloud & sentiment analysis & accomplish the same using R

Requirements

- Download R & RStudio before starting this tutorial
- Download datasets folder in zipfile which is uploaded in starting of all sections

Description

Data Science using R is designed to cover majority of the capabilities of R from Analytics & Data Science perspective, which includes the following:

- Learn about the basic statistics, including measures of central tendency, dispersion, skewness, kurtosis, graphical representation, probability, probability distribution, etc.
- Learn about scatter diagram, correlation coefficient, confidence interval, Z distribution & t distribution, which are all required for Linear Regression understanding
- Learn about the usage of R for building Regression models

- Learn about the K-Means clustering algorithm & how to use R to accomplish the same
- Learn about the science behind text mining, word cloud, sentiment analysis & accomplish the same using R
- Learn about Forecasting models including AR, MA, ES, ARMA, ARIMA, etc., and how to accomplish the same using R
- Learn about Logistic Regression & how to accomplish the same using R

Who is the target audience?

- All the IT professionals, whose experience ranges from '0' onwards are eligible to take this session. Especially professionals from data analysis, data warehouse, data mining, business intelligence, reporting, data science, etc, will naturally fit in well to take this course.

Compare to Other Text Mining Courses

Curriculum For This Course

172 Lectures

32:16:51
+
–

Introduction To Data Science
4 Lectures
36:00

Introduction of the trainer & the agenda of the various concepts that you will learn as part of Data Science using R, XLMiner, Tableau will be discussed.

Preview
05:44

Get Inspired by the importance of data science and also find the strength of analytics in present generation & future world

Preview
06:12

Get wondered on how much volume of data is getting generated from Social Media, E-commerce & various interesting sources

Big Data And Getting Drenched In Data

12:20

Learn about Data Science emergence over years as number one profession, the dearth in the professionals with these skills, tools which have the best bet for data scientist and many more interesting insights.

Why Data Science.....?

11:44

+
–

Basic Statistics
29 Lectures
04:52:52

Brief introduction about basic statistics and overall Agenda of this program discussed and the concepts of basic statistics, which are predominantly used in data analytics

Data Types And Preliminaries

09:30

Lean about random variable, which happens to be the stepping stone for success in statistical world.

Random Variable

06:01

Learn in Understanding probability of outcomes based on possible events using the Playing card example

What Is Probability...?

03:59

An aperitif to “Internet of Things”

Understanding random variables and their probabilities; Notation for Random variable; Understanding Discrete probability distribution

Probability Distribution

08:57

Learn Quick recap on Data Types, Random Variable, Probability and Probability distribution; Real life applications of Probability

Recap Of Concepts And Probability Applications

07:55

Understanding the various facets of the Sampling Funnel, Standard notation for ‘Population Parameter’ and ‘Sample Statistic’

Sampling Funnel

04:59

Learn about Rudimentary statistical measures – mean, median, mode “First Moment Business Decision”, Outliers and how to deal with it, Unimodal, Bimodal and Multimodal.

Measures Of Central Tendency

15:53

Learn Next step to Statistical measures – Understanding the concepts of dispersion using “control chart”

Measures Of Dispersion

07:13

As a part “Second moment Business Decision”; Variance, Standard Deviation & Range; understanding why different formulae for Population and Sample

Measures Of Dispersion Part-2

15:29

Learn about Expected Value and Variance for Discrete data Computing mean & variance for Probabilistic data

Excpected Value And Variance For Discrete Data

03:29

Learn about Preliminaries of R and R Studio Introduction to R, it origin & installation; GUI based tool ‘R Studio’

Preliminaries Of R Aad R Studio

06:17

Preliminaries of R and R Studio Introduction to R, it origin & installation; GUI based tool ‘R Studio’

Various Components And Basics Of R Studio

06:51

Visualization of data for better insights, Improved understanding using Histogram, Understanding Skewness from a histogram, what is a long tail?

“AI in daily life – Amazon’s Echo”

Data Visualization Using R-Barplot,Histogram,Skewness

14:54

Study of Skewness - positive or right, negative or left, Long tail; Kurtosis - positive is thin peak, negative is wide peak; analysis based on the skewness and kurtosis values.

3rd And 4th Moment Business Decision

08:47

Recap - First Second third & fourth moment business decision, What is EDA? Study of “Box Plot”, creating using R and judging skewness from it

Recap And Box Plot

12:12

Understanding distribution of heights of people Vs the probability associated with it; the rules of normal distribution.

Normal Distribution-Part 1

08:54

Learn about Characteristics of Normal Distribution Curve, understanding specification limits, ‘Probability distribution Function’ curves for various combinations of mean and Standard deviation, Having an equivalent representation of variables by standardization to make it unitless, Intro to Z values

Normal Distribution-Part 2 & Standarad Normal Distribution

18:02

Lean about Mean & Standard distribution values of Standard Normal Distribution, what is Z distribution?, Introduction to Standard Normal Distribution

Standard Normal Distribution -Part 2

07:43

Learn Using the “Standard Table” for computing probability using the Z value; finding Z value using Z tables, R & MS Excel; Computing probability of an interval

Calculating Probabilities From Z-Distribution

14:13

Learn about Various possibilities of samples from the same population, Standard Error Vs Standard Deviation, ‘Size’ of Sample

Sampling Variation, Samplesize & Central Limit Theorm

13:14

Learn in Determining if data is normal; Transformation of data that is not normal; various types of data transformation

Normal Quantile Plot (Q-QPlot)

07:13

Learn Recap on Box Plot, Normal Distribution, Z Distribution, Central Limit Theorem; point estimate; Interval Estimate with confidence level; Alpha Error

Recap Confidence Interval

12:28

Case Study on Confidence Interval; Formula for Interval estimate

Confidence Interval Z-Distribution Part-1

10:45

Learn About Alpha - The measure of confidence; Formula for ‘Confidence interval’ with population parameter

Confidence Interval Z-Distribution Part-2

16:35

Understanding what the obtained range signifies and how to state the same.

Confidence Interval Interpretation

03:33

Learn Formula for confidence interval without population parameter; rudiments of “degrees of freedom’, T - distribution vs std. Normal distribution; reading the T table, using R for T value

Confidence Interval T-Distribution

18:57

Learn Analysis of variance over analysis of means; what is the desirable intergroup and intragroup variances

ANOVA Part 1

09:54

Learn about The math behind the ANOVA - an illustrative approach

ANOVA Part 2

09:35

Data Types, Probability, Probability distribution, Normal Distribution, Standard Normal Distribution

Recap Basic Statistics

09:20

Quiz-2

17 questions

+
–

Hypothesis Testing Introduction
3 Lectures
26:17

The cardinal ‘formula’ for a data scientist; The outcomes relating to Null Hypothesis and Alternate Hypothesis

Hypothesis Testing Introduction

11:02

Null Hypothesis statement and Alternate hypothesis statement; formulating them with some examples

Hypothesis Testing Formulation

10:59

The difference between one tail and two tail in a distribution curve; The four main types of tests and the testing flowpath. Performing the sample t - test using a case study

Hypothesis Testing- Various Types Of Tests

04:16

+
–

Hypothesis Testing- Parametric
16 Lectures
02:35:25

Defining the Null and Alternate hypothesis statements for the given case study; defining the Ho and Ha for the various comparative tests that are a part of the t - test, Using minitab.

2 Sample T-Test Part-1

18:13

Conducting the comparative tests viz Normality test, Equal variance test using Minitab

2 Sample T-Test Part-2

07:36

Learn the hypothesis statements for 2 sample t - test and the 2 sample t - test using minitab, Iterative hypothesis testing

2 Sample T-Test Part-3

08:50

Formulating the null and the alternate hypothesis statements based on the test flowpath; conduct of Normality test for all samples

One Way ANOVA Part-1

08:33

Learn how to conduct Normality test for all samples, Variance test for more than 2 populations

One Way ANOVA Part-2

04:40

Formulation of Null and alternate hypothesis statement for ANOVA test for comparisons; Conduct of one way ANOVA using Minitab.

One Way ANOVA Part-3

03:16

The distinction between one way ANOVA, Two Way ANOVA and Multiple ANOVA or MANOVA.

ANOVA-1,2 Multiple Way

02:54

Application of 2 proportion test based on the output & input data types; Formulating the hypothesis statements; Iterative testing of 2 proportion test

2 Proportion Test

11:54

Learn Recap on 2 Sample t - test, ANOVA (1 way, 2 way, multiple), 2 Proportion test, Chi Square Test; Hypothesis testing to sift the statistically significant inputs

Recap Hypothesis Testing

04:59

1 Sample Z Test Part-1

08:28

1 Sample Z Test Part-2

13:13

Defining the Null and Alternate hypothesis statements for the given case study; defining the Ho and Ha for the various comparative tests that are a part of the t - test, Using minitab.

1 Sample T Test

15:36

1 Sample Sign Test

06:26

Conducting the comparative tests viz Normality test, Equal variance test using Minitab

Paired T Test

11:59

Tukey Pairwise Comparisons Part-1

15:33

Tukey Pairwise Comparisons Part-2

13:15

Quiz 3

9 questions

+
–

Hypothesis Testing-Non Parametric
4 Lectures
38:33

Choosing correct test based on the number of inputs; Formulating the combined Hypothesis statement for Null and Alternate; Conducting the Chi Square Test using minitab

Chi Square Test

08:52

Mann Whitney Test

10:59

Paired T Test Assumption

09:19

Moods Median Test

09:23

+
–

BASICS OF R-PROGRAMMING
15 Lectures
03:22:18

Learn Using R as a calculator; some rudiments about using R, The ubiquitous “Hello World” exercise; Defining and assigning value to a variable, Mathematical functions with variables, Installing and invoking a package

Basic Programing Using R Part 1

15:30

Reading files of different extensions using various lines of command using R; The default working directory and Viewing the Database as a frame

Basic Programing Using R Part 2

10:46

Using help in R, Installing different packages for working with different databases; working with ODBC databases; Reading data from webpages; Details of packages using vignettes

Basic Programing Using R Part 3

13:13

What is Vector?; Merging of Lists; Ease of use using R-Studio over R

Basic Programing Using R Part 4

16:14

Sorting of Data in a frame in ascending and descending orders

R Programming Case Study Part 1

05:38

Combining two data sets using ‘Row bind’ & ‘merge’ function; understanding the concept of “array”

R Programming Case Study Part 2

15:46

The Default Data sets that are a part of R and its packages; using help to understand the dataset; various data manipulation techniques to structure the raw data

R Programming Case Study Part 3

21:26

Using ‘if then else’ function to replace values in a data frame; using the mean, sum other function for columns

R Programming Case Study Part 4

13:50

Reading files saved from other statistical packages into R; Reading databases from the Web

Stastical Packages In R

08:07

Using inbuilt databases to understand various aspects of Exploratory Data Analysis (EDA)

R Programming Case Study Using Inbuilt Datasets

16:39

Basic Plots in R; The box plot using R; Scatter diagram of all variables in one view; Various plot styles

Case Study On Data Visualization Using R Part 1

18:58

Various types of visualization using R; obtaining multiplots of the same feature; coloring the polts

Case Study On Data Visualization Using R Part 2

07:10

Histogram and their various varieties having color options; Plot with 3 attributes; understanding ‘Tableplot’

Case Studay On Data Visualization Using R Part 3

10:35

Analysing the ‘Mosaic’ plot using the titanic case; Heat Map of correlated features; Country map using various physical attributes for Eg Altitude; Political demarcation data; working with String data

Case Study On Data Visualization Using R Part 4

13:14

Recap of EDA: First, Second, Third and fourth moment business decision; Dichotomies between population and sample; Graphical representation of data; Need for interval estimate with confidence

Recap Exploratory Data Analysis

15:12

Quiz 4

1 question

+
–

Predictive Analytics
23 Lectures
05:41:54

Learn in Understanding linearity, direction strength with a scatter diagram; correlation does not imply causation

Scatter Diagram

10:48

Learn about Correlation Analysis; measure of correlation - correlation coefficient ‘r’; calculating correlation coefficient using the formula

Correlation Coefficient

09:10

Learn about Analogy of Equation of the line with linear regression equation; Formula for Coefficients; Euclidean distance; Least squares technique for prediction line

Simple Linear Regression Introduction

12:18

Learn about Strength of a prediction model - R Squared; Confidence interval vs prediction interval; the rules for simple linear regression; Understanding a business problem; EDA of the loaded database in R; Linear regression modelling in R

Simple Linear Regression Using R -Part 1

25:37

Learn about Study of the regression model’s output; p value for the significance of coefficients; “SIGNIFICANTLY DIFFERENT THAN ZERO”; Multiple R squared value for model’s strength; Rules for improving the Model; Confidence interval for regression model with 95% confidence; Study of the predicted values using the three equations

Simple Linear Regression Using R - Part 2

16:21

Learn about Transformations for improving the strength of the model; Various ways of transformation; Domain knowledge in prediction modelling; implication of the estimate value with respect to the change in output

Simple Linear Regression Using R - Part 3

12:47

Learn Using the predict function in R - Predict interval & confidence interval; Study of the predicted values using the three equations for existing inputs in the database; Computation of errors ‘ε’; regression using transformed output values; Steps involved in linear regression modelling

Simple Linear Regression Using R - Part 4

14:19

Discrete variable to Dummy variable for Multiple Linear Regression; Multiple linear regression using an example; The Regression equation; Model Assumption for regression - parameters are linear with output, Assumptions with regards to errors, Assumptions with regards to inputs, Assumptions with regard to each record;

Multiple Linear Regression Introduction

13:14

Learn How to Attach a dataset to the current workbook; EDA - Correlations analysis of all inputs and output, the ‘Pairs’ command in R; the correlation coefficient matrix; Recording the inferences from EDA

Multiple Linear Regression Using R - Part 1

17:13

Learn about Partial correlation matrix; pure correlation between the inputs; Running multiple linear regression model in R; Analysis of Coefficient of the regression model with the output and correlation with Domain knowledge; Analysis of probability values of each of the coefficients; Iterative analysis of regression modeling with individual inputs;

Multiple Linear Regression Using R - Part 2

11:00

Learn in Identifying the collinearity problem with 2 inputs; Function for Scatter diagram and the correlation coefficient values in one visualization; Identifying and removing influential records; Analysis of Diagnostic plots - Cook’s Distance, Studentized residual, Bonferroni p - values, hat values; Variance Inflation Factor & Added Variable Plots for identifying the column that needs to be removed from the regression modelling; Multiple R squared value vs Adjusted R squared value; Evaluating the LINE Assumptions using Plots

Multiple Linear Regression Using R - Part 3

20:16

Learn Recap of Simple linear Regression with Discrete input and their treatment; Steps in Linear Regression - (Perform EDA; View Scatter diagram for judging the correlation, strength and presence of clusters; Analysis the correlation and covariance matrix); Model Building - p values of Estimates, R squared value, Model Assumption - LINE; perform deletion diagnostics to identify the influential variable; Added variable plot for identifying the least significant input

Recap Linear Regression

19:21

Different regression techniques based on output type; various nomenclature of discrete data; Example of binary outcome; the three techniques for discrete analysis; output of logistic regression is a probabilistic value; linear regression line vs logistic regression curve (Sigmoid curve); The Logistic regression model and the probability function; the three sub types of logistic regression; Steps and assumption for performing logistic regression

Understanding Logistic Regression Concepts

19:43

Simple Logistic regression walk thru with a case study; Multiple logistic regression with an example; using the output to compute the probability value; drawing inferences from the probability function outputs; The claimants data base; understanding factor variable and its levels; continuous variable.

Logistic Regression Part 1

17:07

Formulating a regression equation with the co efficient, understanding why the ‘lm’ function cannot be used logistic regression; Introduction to missingness ;

Logistic Regression Part 2

15:06

Confusion matrix for measure of accuracy; the math behind the confusion matrix

Logistic Regression Part 3

11:23

Logistic Regression Part 4

13:51

The logistic functions; The odds ratio and its formula and logit model; interpreting the odds; relationship between odds and probability; interpretation of βx values with the odds ratio; unit increase in input value and its effect on the odds ratio

Logistic Regression-Logistic Function Representation

17:29

The glm model; Interpreting the Co efficient values and their influence on the final model, Null deviance, residual deviance & Akaike information criterion for measure of models strength

Logistic Regression Part 5

15:38

Other measures of goodness of fit; Building confusion matrix in R; Sensitivity, specificity and accuracy

Logistic Regression- Confusion Matrix

11:13

Logistic Regression-ROC

13:23

Logistic Regression-ROC Case studies part 1

10:00

Logistic Regression-ROC Case studies part 2

14:37

Quiz 5

11 questions

+
–

Data Mining/Clustering Using R
20 Lectures
03:27:02

Why do we need to do clustering and how does it help in making a business decision.What is the primary objective of Clustering; learn about it with an example.

Introduction To Clustering

12:23

Data Mining in a Nutshell!What are the two main branches of Data mining and their distributaries? The Two approaches to clustering, and introduction to principle of Hierarchical clustering.

Types Of Data Mining Techniques

11:32

Visualisation of Hierarchical clustering, Grouping of records & division of cluster based on distance measure. Rules for the measure of distance

Hierarchical Clustering Introduction

09:44

Hierarchical clustering using a case study, Computation of distance amongst two records having multiple inputs using the ecleadian method. Understanding the need for standardization of data; Z-score, Other types of distance measures

Hierarchical Clustering Case Study

09:41

Learn How to measure of distance between categorical data. Various measurements using binary matrix for twin category data. Distance measurement rule for for more than 2 categories.

Calculating Distance For Categorical Data

09:36

Learn about Measurement of distance between records having both numerical and categorical variables. Method of creation of Dummy variable data from categorical data. The need and the method of standardization the numerical data to the same scale as categorical dummy variable data.

Calculating Distance For Mixed Data With Case Study

15:13

Learn about Measure of distance using Gower’s General Similarity coefficient for mixed data using weighted means.

Calculating Distance For Mixed Data Case Study Part 2

05:49

Learn about Various distance measure criteria for between clusters. “Hands On” exercise on hierarchical clustering and summarizing using Dendrogram

Calculating Distance Between Clusters With Case Study

13:01

Learn How Clustering helps - the consumer perspective and the supplier perspective; Other insights from clusters and their labeling;

Hierarchical Clustering Synopsis

06:06

Learn about Reading an XLSX file into R; Understanding the Database (MBA); Scaling the data to make it unitless using Z Scores

Hierarchical Clustering Using R Part 1

10:45

Learn about The ‘hclust’ function with complete linkage; Visualization of the clustering as as dendrogram using the ‘plot’ function; splitting the dendrogram into ‘k’ clusters using the ‘cuttree’ function; assigning the records to the respective clusters

Hierarchical Clustering Using R Part 2

11:02

The main difference between Hierarchical and non-hierarchical clustering; similarity within, dissimilarity amongst clusters; Algorithm (or iterative steps) for K means clustering.

K Means Clustering Introduction

04:26

R code for K means clustering using only two inputs; for ease in visualization; computing the ideal number of clusters; viewing the iterative process of clustering (giving and receiving clusters) as an animation.

K Means Clustering Using R - Part 1

13:41

Learn R code for K means clustering using only two inputs; for ease in visualization; computing the ideal number of clusters; viewing the iterative process of clustering (giving and receiving clusters) as an animation

K Means Clustering Using R - Part 2

18:12

Learn about random generation of centroid at the first iteration; The explanation of ‘receiving and giving clusters’ using the euclidean distance; understanding the math of centroid and the clustering based on distance; understanding the various attributes of the k-means output.

K Means Clustering Using R - Part 3

15:35

Learn The best value for ‘k’ using scree plot; determining the value of ‘k’ based on information gain; the point of the elbow; the ‘aggregate’ function for viewing each cluster as one record and analysis of the same; Labeling of clusters

K Means Clustering Using R - Part 4

11:54

Learn about Selection of ‘k’ based on the simplicity or adequacy; risks with the ‘k’ value - Local Minima problem; Cross checking of clusters for consistency;

Summary Of K Means Clustering

03:31

Learn about The Pros and Cons of K means clustering and Hierarchical Clustering

Difference Between K Means And Hierarchical

03:32

Learn about accelerometer data set for analysing the steps taken by different user as tracked by their smart phones; Importance of Domain knowledge

K Means Clustering Case Study

08:49

Learn about Data Mining unsupervised; Hierarchical Clustering; Non Hierarchical Clustering; Distance measure for continuous and discrete; Types of Linkage; Dendrogram; Sum of Sqares distances between & within Clusters;

Recap Data Mining Clustering

12:30

Quiz 6

21 questions

+
–

Dimension Reduction
9 Lectures
01:34:09

Learn Why dimension reduction; the types of dimension reduction

Dimension Reduction Introduction

02:41

Learn about Computational speeds; Face Recognition (Facebook’s Deepface); Image Compression

Dimension Reduction Applications

14:02

Learn about Reduction in number of columns; Analyse relations between columns; Visualisation of Multidimensional data in 2D; From ‘All information’ to ‘most of the information’

PCA Key Benefits

10:11

Learn About analogy of multiple school quizzes and capturing most of information from many in one

PCA Intuition

09:34

Learn No of PCs is equal to No of columns; Difference between PCs and the original Columns; The rationale behind the selection of the Weight for computation of the PC - maximization of variance principle;

PCA Preliminaries And Weights

21:09

Learn Why to Standardize; The math for obtaining the Principal Component from the Principal Component Weight;

Standardize Variables And PCA Calculation

04:39

Learn about Data Compression; How much information is enough - Consult with Domain experts; understanding data compression with matrices;

PCA First Goal

14:21

Learn about Labelling of Principal components - detailed study of the Principal component weights;

PCA Second Goal

06:37

Learn Visualization in 2D; possibility of visualizing the clustering; Batch processing; Analysis of multivariate data; Visualization to spot outliers; Brain Gym

PCA Third Goal And Additional Benefits

10:55

Quiz-7

9 questions

+
–

Association Rules
9 Lectures
01:33:19

Learn about Market Basket Analysis, Relationship mining, Affinity analysis; The analogy of supermarket; How is it different from online recommender systems

Association Rules Introduction

06:28

Learn The population of data through POS and also The definition of a transaction

Market Basket Analysis

03:26

Learn What goes with what; do any pairs of groups exist among the products; how can this information be used

Association Rules Part 1

06:31

Learn about Product bundling; Stocking; Racking; Association Rules in other than retail stores

Association Rules Part 2

11:55

Learn about Converting the list to format data to binary data; listing possible rules’; Antecedent and Consequent;

Association Rules-Case Studay And Terminology

07:52

Learn about The Performance measure - Support, Confidence, Lift; The formula for Support; support criterion is based on frequency; The Apriori Algorithm

Association Rules-Performance Measures And Support Calculation

15:49

Confidence a measure better than Support; The Formula for confidence; The weakness of confidence;

Association Rules-Confidance Calculation

13:32

‘Lift Ratio’ - a variant of the Confidence measure; Lift ratio is a ratio the dependencies and independencies of the antecedent and the consequent

Association Rules-Lift Calculation

15:32

Learn The flow path for formulating association rules; Drawback of Association rules - May produce absurd and interesting rules, Profusion of rules; Other applications of Association rules

Association Rules-Rules Selection Process And Applications

12:14

Quiz-8

9 questions

5 More Sections

About the Instructor