Please confirm that you want to add Regression, Data Mining, Text Mining, Forecasting using R to your Wishlist.
Data Science using R is designed to cover majority of the capabilities of R from Analytics & Data Science perspective, which includes the following:
Introduction of the trainer & the agenda of the various concepts that you will learn as part of Data Science using R, XLMiner, Tableau will be discussed.
Get Inspired by the importance of data science and also find the strength of analytics in present generation & future world
Get wondered on how much volume of data is getting generated from Social Media, E-commerce & various interesting sources
Learn about Data Science emergence over years as number one profession, the dearth in the professionals with these skills, tools which have the best bet for data scientist and many more interesting insights.
Brief introduction about basic statistics and overall Agenda of this program discussed and the concepts of basic statistics, which are predominantly used in data analytics
Lean about random variable, which happens to be the stepping stone for success in statistical world.
Learn in Understanding probability of outcomes based on possible events using the Playing card example
An aperitif to “Internet of Things”
Understanding random variables and their probabilities; Notation for Random variable; Understanding Discrete probability distribution
Learn Quick recap on Data Types, Random Variable, Probability and Probability distribution; Real life applications of Probability
Understanding the various facets of the Sampling Funnel, Standard notation for ‘Population Parameter’ and ‘Sample Statistic’
Learn about Rudimentary statistical measures – mean, median, mode “First Moment Business Decision”, Outliers and how to deal with it, Unimodal, Bimodal and Multimodal.
Learn Next step to Statistical measures – Understanding the concepts of dispersion using “control chart”
As a part “Second moment Business Decision”; Variance, Standard Deviation & Range; understanding why different formulae for Population and Sample
Learn about Expected Value and Variance for Discrete data Computing mean & variance for Probabilistic data
Learn about Preliminaries of R and R Studio Introduction to R, it origin & installation; GUI based tool ‘R Studio’
Preliminaries of R and R Studio Introduction to R, it origin & installation; GUI based tool ‘R Studio’
Visualization of data for better insights, Improved understanding using Histogram, Understanding Skewness from a histogram, what is a long tail?
“AI in daily life – Amazon’s Echo”
Study of Skewness - positive or right, negative or left, Long tail; Kurtosis - positive is thin peak, negative is wide peak; analysis based on the skewness and kurtosis values.
Recap - First Second third & fourth moment business decision, What is EDA? Study of “Box Plot”, creating using R and judging skewness from it
Understanding distribution of heights of people Vs the probability associated with it; the rules of normal distribution.
Learn about Characteristics of Normal Distribution Curve, understanding specification limits, ‘Probability distribution Function’ curves for various combinations of mean and Standard deviation, Having an equivalent representation of variables by standardization to make it unitless, Intro to Z values
Lean about Mean & Standard distribution values of Standard Normal Distribution, what is Z distribution?, Introduction to Standard Normal Distribution
Learn Using the “Standard Table” for computing probability using the Z value; finding Z value using Z tables, R & MS Excel; Computing probability of an interval
Learn about Various possibilities of samples from the same population, Standard Error Vs Standard Deviation, ‘Size’ of Sample
Learn in Determining if data is normal; Transformation of data that is not normal; various types of data transformation
Learn Recap on Box Plot, Normal Distribution, Z Distribution, Central Limit Theorem; point estimate; Interval Estimate with confidence level; Alpha Error
Case Study on Confidence Interval; Formula for Interval estimate
Learn About Alpha - The measure of confidence; Formula for ‘Confidence interval’ with population parameter
Understanding what the obtained range signifies and how to state the same.
Learn Formula for confidence interval without population parameter; rudiments of “degrees of freedom’, T - distribution vs std. Normal distribution; reading the T table, using R for T value
Learn Analysis of variance over analysis of means; what is the desirable intergroup and intragroup variances
Learn about The math behind the ANOVA - an illustrative approach
Data Types, Probability, Probability distribution, Normal Distribution, Standard Normal Distribution
The cardinal ‘formula’ for a data scientist; The outcomes relating to Null Hypothesis and Alternate Hypothesis
Null Hypothesis statement and Alternate hypothesis statement; formulating them with some examples
The difference between one tail and two tail in a distribution curve; The four main types of tests and the testing flowpath. Performing the sample t - test using a case study
Defining the Null and Alternate hypothesis statements for the given case study; defining the Ho and Ha for the various comparative tests that are a part of the t - test, Using minitab.
Conducting the comparative tests viz Normality test, Equal variance test using Minitab
Learn the hypothesis statements for 2 sample t - test and the 2 sample t - test using minitab, Iterative hypothesis testing
Formulating the null and the alternate hypothesis statements based on the test flowpath; conduct of Normality test for all samples
Learn how to conduct Normality test for all samples, Variance test for more than 2 populations
Formulation of Null and alternate hypothesis statement for ANOVA test for comparisons; Conduct of one way ANOVA using Minitab.
The distinction between one way ANOVA, Two Way ANOVA and Multiple ANOVA or MANOVA.
Application of 2 proportion test based on the output & input data types; Formulating the hypothesis statements; Iterative testing of 2 proportion test
Learn Recap on 2 Sample t - test, ANOVA (1 way, 2 way, multiple), 2 Proportion test, Chi Square Test; Hypothesis testing to sift the statistically significant inputs
Defining the Null and Alternate hypothesis statements for the given case study; defining the Ho and Ha for the various comparative tests that are a part of the t - test, Using minitab.
Conducting the comparative tests viz Normality test, Equal variance test using Minitab
Choosing correct test based on the number of inputs; Formulating the combined Hypothesis statement for Null and Alternate; Conducting the Chi Square Test using minitab
Learn Using R as a calculator; some rudiments about using R, The ubiquitous “Hello World” exercise; Defining and assigning value to a variable, Mathematical functions with variables, Installing and invoking a package
Reading files of different extensions using various lines of command using R; The default working directory and Viewing the Database as a frame
Using help in R, Installing different packages for working with different databases; working with ODBC databases; Reading data from webpages; Details of packages using vignettes
What is Vector?; Merging of Lists; Ease of use using R-Studio over R
Sorting of Data in a frame in ascending and descending orders
Combining two data sets using ‘Row bind’ & ‘merge’ function; understanding the concept of “array”
The Default Data sets that are a part of R and its packages; using help to understand the dataset; various data manipulation techniques to structure the raw data
Using ‘if then else’ function to replace values in a data frame; using the mean, sum other function for columns
Reading files saved from other statistical packages into R; Reading databases from the Web
Using inbuilt databases to understand various aspects of Exploratory Data Analysis (EDA)
Basic Plots in R; The box plot using R; Scatter diagram of all variables in one view; Various plot styles
Various types of visualization using R; obtaining multiplots of the same feature; coloring the polts
Histogram and their various varieties having color options; Plot with 3 attributes; understanding ‘Tableplot’
Analysing the ‘Mosaic’ plot using the titanic case; Heat Map of correlated features; Country map using various physical attributes for Eg Altitude; Political demarcation data; working with String data
Recap of EDA: First, Second, Third and fourth moment business decision; Dichotomies between population and sample; Graphical representation of data; Need for interval estimate with confidence
Learn in Understanding linearity, direction strength with a scatter diagram; correlation does not imply causation
Learn about Correlation Analysis; measure of correlation - correlation coefficient ‘r’; calculating correlation coefficient using the formula
Learn about Analogy of Equation of the line with linear regression equation; Formula for Coefficients; Euclidean distance; Least squares technique for prediction line
Learn about Strength of a prediction model - R Squared; Confidence interval vs prediction interval; the rules for simple linear regression; Understanding a business problem; EDA of the loaded database in R; Linear regression modelling in R
Learn about Study of the regression model’s output; p value for the significance of coefficients; “SIGNIFICANTLY DIFFERENT THAN ZERO”; Multiple R squared value for model’s strength; Rules for improving the Model; Confidence interval for regression model with 95% confidence; Study of the predicted values using the three equations
Learn about Transformations for improving the strength of the model; Various ways of transformation; Domain knowledge in prediction modelling; implication of the estimate value with respect to the change in output
Learn Using the predict function in R - Predict interval & confidence interval; Study of the predicted values using the three equations for existing inputs in the database; Computation of errors ‘ε’; regression using transformed output values; Steps involved in linear regression modelling
Discrete variable to Dummy variable for Multiple Linear Regression; Multiple linear regression using an example; The Regression equation; Model Assumption for regression - parameters are linear with output, Assumptions with regards to errors, Assumptions with regards to inputs, Assumptions with regard to each record;
Learn How to Attach a dataset to the current workbook; EDA - Correlations analysis of all inputs and output, the ‘Pairs’ command in R; the correlation coefficient matrix; Recording the inferences from EDA
Learn about Partial correlation matrix; pure correlation between the inputs; Running multiple linear regression model in R; Analysis of Coefficient of the regression model with the output and correlation with Domain knowledge; Analysis of probability values of each of the coefficients; Iterative analysis of regression modeling with individual inputs;
Learn in Identifying the collinearity problem with 2 inputs; Function for Scatter diagram and the correlation coefficient values in one visualization; Identifying and removing influential records; Analysis of Diagnostic plots - Cook’s Distance, Studentized residual, Bonferroni p - values, hat values; Variance Inflation Factor & Added Variable Plots for identifying the column that needs to be removed from the regression modelling; Multiple R squared value vs Adjusted R squared value; Evaluating the LINE Assumptions using Plots
Learn Recap of Simple linear Regression with Discrete input and their treatment; Steps in Linear Regression - (Perform EDA; View Scatter diagram for judging the correlation, strength and presence of clusters; Analysis the correlation and covariance matrix); Model Building - p values of Estimates, R squared value, Model Assumption - LINE; perform deletion diagnostics to identify the influential variable; Added variable plot for identifying the least significant input
Different regression techniques based on output type; various nomenclature of discrete data; Example of binary outcome; the three techniques for discrete analysis; output of logistic regression is a probabilistic value; linear regression line vs logistic regression curve (Sigmoid curve); The Logistic regression model and the probability function; the three sub types of logistic regression; Steps and assumption for performing logistic regression
Simple Logistic regression walk thru with a case study; Multiple logistic regression with an example; using the output to compute the probability value; drawing inferences from the probability function outputs; The claimants data base; understanding factor variable and its levels; continuous variable.
Formulating a regression equation with the co efficient, understanding why the ‘lm’ function cannot be used logistic regression; Introduction to missingness ;
Confusion matrix for measure of accuracy; the math behind the confusion matrix
The logistic functions; The odds ratio and its formula and logit model; interpreting the odds; relationship between odds and probability; interpretation of βx values with the odds ratio; unit increase in input value and its effect on the odds ratio
The glm model; Interpreting the Co efficient values and their influence on the final model, Null deviance, residual deviance & Akaike information criterion for measure of models strength
Other measures of goodness of fit; Building confusion matrix in R; Sensitivity, specificity and accuracy
Why do we need to do clustering and how does it help in making a business decision.What is the primary objective of Clustering; learn about it with an example.
Data Mining in a Nutshell!What are the two main branches of Data mining and their distributaries? The Two approaches to clustering, and introduction to principle of Hierarchical clustering.
Visualisation of Hierarchical clustering, Grouping of records & division of cluster based on distance measure. Rules for the measure of distance
Hierarchical clustering using a case study, Computation of distance amongst two records having multiple inputs using the ecleadian method. Understanding the need for standardization of data; Z-score, Other types of distance measures
Learn How to measure of distance between categorical data. Various measurements using binary matrix for twin category data. Distance measurement rule for for more than 2 categories.
Learn about Measurement of distance between records having both numerical and categorical variables. Method of creation of Dummy variable data from categorical data. The need and the method of standardization the numerical data to the same scale as categorical dummy variable data.
Learn about Measure of distance using Gower’s General Similarity coefficient for mixed data using weighted means.
Learn about Various distance measure criteria for between clusters. “Hands On” exercise on hierarchical clustering and summarizing using Dendrogram
Learn How Clustering helps - the consumer perspective and the supplier perspective; Other insights from clusters and their labeling;
Learn about Reading an XLSX file into R; Understanding the Database (MBA); Scaling the data to make it unitless using Z Scores
Learn about The ‘hclust’ function with complete linkage; Visualization of the clustering as as dendrogram using the ‘plot’ function; splitting the dendrogram into ‘k’ clusters using the ‘cuttree’ function; assigning the records to the respective clusters
The main difference between Hierarchical and non-hierarchical clustering; similarity within, dissimilarity amongst clusters; Algorithm (or iterative steps) for K means clustering.
R code for K means clustering using only two inputs; for ease in visualization; computing the ideal number of clusters; viewing the iterative process of clustering (giving and receiving clusters) as an animation.
Learn R code for K means clustering using only two inputs; for ease in visualization; computing the ideal number of clusters; viewing the iterative process of clustering (giving and receiving clusters) as an animation
Learn about random generation of centroid at the first iteration; The explanation of ‘receiving and giving clusters’ using the euclidean distance; understanding the math of centroid and the clustering based on distance; understanding the various attributes of the k-means output.
Learn The best value for ‘k’ using scree plot; determining the value of ‘k’ based on information gain; the point of the elbow; the ‘aggregate’ function for viewing each cluster as one record and analysis of the same; Labeling of clusters
Learn about Selection of ‘k’ based on the simplicity or adequacy; risks with the ‘k’ value - Local Minima problem; Cross checking of clusters for consistency;
Learn about The Pros and Cons of K means clustering and Hierarchical Clustering
Learn about accelerometer data set for analysing the steps taken by different user as tracked by their smart phones; Importance of Domain knowledge
Learn about Data Mining unsupervised; Hierarchical Clustering; Non Hierarchical Clustering; Distance measure for continuous and discrete; Types of Linkage; Dendrogram; Sum of Sqares distances between & within Clusters;
Learn Why dimension reduction; the types of dimension reduction
Learn about Computational speeds; Face Recognition (Facebook’s Deepface); Image Compression
Learn about Reduction in number of columns; Analyse relations between columns; Visualisation of Multidimensional data in 2D; From ‘All information’ to ‘most of the information’
Learn About analogy of multiple school quizzes and capturing most of information from many in one
Learn No of PCs is equal to No of columns; Difference between PCs and the original Columns; The rationale behind the selection of the Weight for computation of the PC - maximization of variance principle;
Learn Why to Standardize; The math for obtaining the Principal Component from the Principal Component Weight;
Learn about Data Compression; How much information is enough - Consult with Domain experts; understanding data compression with matrices;
Learn about Labelling of Principal components - detailed study of the Principal component weights;
Learn Visualization in 2D; possibility of visualizing the clustering; Batch processing; Analysis of multivariate data; Visualization to spot outliers; Brain Gym
Learn about Market Basket Analysis, Relationship mining, Affinity analysis; The analogy of supermarket; How is it different from online recommender systems
Learn The population of data through POS and also The definition of a transaction
Learn What goes with what; do any pairs of groups exist among the products; how can this information be used
Learn about Product bundling; Stocking; Racking; Association Rules in other than retail stores
Learn about Converting the list to format data to binary data; listing possible rules’; Antecedent and Consequent;
Learn about The Performance measure - Support, Confidence, Lift; The formula for Support; support criterion is based on frequency; The Apriori Algorithm
Confidence a measure better than Support; The Formula for confidence; The weakness of confidence;
‘Lift Ratio’ - a variant of the Confidence measure; Lift ratio is a ratio the dependencies and independencies of the antecedent and the consequent
Learn The flow path for formulating association rules; Drawback of Association rules - May produce absurd and interesting rules, Profusion of rules; Other applications of Association rules
Certifications:
Certified Six Sigma Master Black Belt
Project Management Professional (PMP)
Agile Certified Practitioner (PMI - ACP)
Risk Management Professional (PMI-RMP)
Certified Scrum Master
Agile Project Management – Foundation & Practitioner from APMG
Bharani Kumar is an Alumnus of premier institutions like IIT & ISB with 15+ years professional experience and worked in various MNCs such as HSBC, ITC, Infosys, Deloitte in various capacities such as Data Scientist, Project Manager, Service Delivery Manager, Process Consultant, Delivery Head etc.
He has trained over 1500 professionals across the globe on Business Analytics, Agile, PMP, Lean Six Sigma, Business analytics and the likes.
He has 8 years of extensive experience in corporate, open house and online training.
He is a thorough implementer with abilities in Business Analytics and Agile projects.
He worked in Delivery management focusing on maximizing business value articulation.
He has a comprehensive experience in leading teams and multiple projects.
Quality Management: A thorough implementer with abilities in Quality management focusing on maximizing customer satisfaction, process compliance and business value articulation; comprehensive experience in leading teams & multiple projects. A result-oriented leader with expertise in devising strategies aimed at enhancing overall organizational growth, sustained profitability of operations and improved business performance.
Project Management: Project Management Professional involved in Initiation, Planning, Execution, Monitoring & Controlling and Closing phases of project activities. Devising and implementing project plans within preset budgets and deadlines and managing the projects towards successful delivery of project deliverables and meeting project objectives.
Training: Close to 8 years training experience and conducted multiple trainings in PMP, Agile, Six Sigma, Business Analytics and Process Excellence across the globe. Understands the individual differences of the attendees and possesses excellent training skills and considered as one of the best trainers in his areas of expertise.