Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Learn By Example: Statistics and Data Science in R

Name: Learn By Example: Statistics and Data Science in R
Rating: 3.7 (377 reviews)

A gentle yet thorough introduction to Data Science, Statistics and R using real life examples

Created byLoony Corn

Last updated 12/2016

English

English [Auto],

What you'll learn

Harness R and R packages to read, process and visualize data
Understand linear regression and use it confidently to build models
Understand the intricacies of all the different data structures in R
Use Linear regression in R to overcome the difficulties of LINEST() in Excel
Draw inferences from data and support them using tests of significance
Use descriptive statistics to perform a quick study of some data and present results

Course content

14 sections • 82 lectures • 9h 7m total length

You, This course and Us2:32
This course is a gentle yet thorough introduction to Data Science, Statistics and R using real life examples.
Top Down vs Bottoms Up : The Google vs McKinsey way of looking at data13:11
Q. How do companies make decisions?
A. Using data
We talk about what it takes to go from data to making a decision from data. This sets the agenda for the rest of the course - each of the things on this journey is covered in the upcoming sections
R and RStudio installed5:10
Get setup with R and Rstudio. All the examples that follow in this course will have source code attached. Download and run them in Rstudio

Descriptive Statistics : Mean, Median, Mode10:07
Bosses are impatient. They often want you to cut to the chase, and give them an answer that's ok, but in a short amount of time. Descriptive statistics are the first place to start - they are often the 10s answer to any question about the data.
Our first foray into R : Frequency Distributions6:06
Computing a frequency distribution using R
Draw your first plot : A Histogram3:11
A histogram is a good visual summary of your data.
Computing Mean, Median, Mode in R2:21
Computing the Mean, Median, Mode in R
What is IQR (Inter-quartile Range)?8:08
The mean, median and mode are point estimates to represent your data. IQR is a measure that explains the spread of the data.
Box and Whisker Plots3:11
Visualize the IQR and outliers using box and whisker plots
The Standard Deviation10:24
The standard deviation measures the spread of a dataset, and it so happens, the standard deviation is actually very profound.
Computing IQR and Standard Deviation in R6:06
Compute the IQR and standard deviation in R using built-in functions, compare medians, quartiles, and ranges via box plots, and observe outlier effects across datasets.

Drawing inferences from data3:25
Drawing inferences from data is key to being able to take decisions using data. There is a science to this, whose foundation is in random variables, probability distributions, and performing tests of statistical significance.
Random Variables are ubiquitous16:54
Random variables are everywhere. Any data that you'll study is a random variable whose behaviour is determined by a probability distribution.
The Normal Probability Distribution9:31
The Normal Distribution is arguably the most well-known and commonly seen probability distribution. It is characterized by its probability density function, mean and standard deviation.
Sampling is like fishing6:14
Sampling is a little like fishing. Sampling is crucial to induction - drawing conclusions about something by looking at some evidence.
Sample Statistics and Sampling Distributions9:25
A sample is described by sample statistics like the sample mean. The sampling distribution is the probability distribution of sample means.

Case Study 1 : Football Players (Estimating Population Mean from a Sample)6:45
Find a point estimate for the average weight of all football players using a sample of football players in 1 college team.
Case Study 2 : Election Polling (Estimating Population Proportion from a Sample)7:50
Find a point estimate for the % of voters in favor of a candidate.
Case Study 3 : A Medical Study (Hypothesis Test for the Population Mean)13:53
A test of significance is an important step in building support for your findings and inferences. Here is the first example of a test of significance - is the population mean equal to a given value?
Case Study 4 : Employee Behavior (Hypothesis Test for the Population Proportion)9:49
Perform a test of significance to check whether the population % is equal to a certain value
Case Study 5: A/B Testing (Comparing the means of two populations)17:18
Perform a test of significance to compare 2 population means. The example used is A/B Testing - which is pretty widely used in internet companies to test out product features.
Case Study 6: Customer Analysis (Comparing the proportions of 2 populations)11:50
Perform a test of significance to compare two population proportions

Harnessing the power of R7:26
The next few sections dive deep into all the data processing, slicing and dicing ability that R provides. The wide variety of R packages available is one reason why R is popular among many data scientists.
Assigning Variables8:47
Let's start with the basics. What are variables and how do we assign variables in R?
Printing an output13:03
print(), show(), message(), cat() are different ways to print something to screen.
Numbers are of type numeric5:24
Numbers in R are of type numeric.
Characters and Dates7:30
R has built-in datatypes for dates and timestamps.
Logicals3:24
Logical is a datatype that is the result of conditional tests in R

Data Structures are the building blocks of R8:24
The wide variety of built-in data structures are what makes R different from other standard programming languages. These include vectors, arrays, matrices, data frames and lists.
Creating a Vector2:22
The Mode of a Vector4:18
The mode of a vector is the datatype of all its elements.
Vectors are Atomic2:24
Learn how vectors in R are atomic, meaning each element cannot be broken down, and how concatenating vectors with the c function yields a single, combined vector.
Doing something with each element of a Vector3:09
Aggregating Vectors1:28
Finding the sum, product, or mean of a vector
Operations between vectors of the same length5:39
Operations between vectors of different length5:30
Generating Sequences6:25
Generate sequences using the : operator, rep() and seq()
Using conditions with Vectors2:04
Find the lengths of multiple strings using Vectors2:22
Generate a complex sequence (using recycling)2:49
Vector Indexing (using numbers)6:56
Access elements based on their position in the vector.
Vector Indexing (using conditions)6:18
Access elements based on whether they pass a conditional test.
Vector Indexing (using names)2:27
Assign names to the elements of a vector

Introducing Lists5:11
Lists are fundamentally different from vectors, arrays and matrices - which are all homogenous data structures.
Introducing Data Frames4:28
Data Frames are how R stores data read from files and databases.
Reading Data from files4:52
Indexing a Data Frame5:38
Aggregating and Sorting a Data Frame6:28
Using the aggregate() and order() functions
Merging Data Frames3:29
Merge data frames based on one or more common columns

Requirements

No prerequisites : We start from basics and cover everything you need to know. We will be installing R and RStudio as part of the course and using it for most of the examples. Excel is used for one of the examples and basic knowledge of excel is assumed.

Description

Taught by a Stanford-educated, ex-Googler and an IIT, IIM - educated ex-Flipkart lead analyst. This team has decades of practical experience in quant trading, analytics and e-commerce.

This course is a gentle yet thorough introduction to Data Science, Statistics and R using real life examples.

Let’s parse that.

Gentle, yet thorough: This course does not require a prior quantitative or mathematics background. It starts by introducing basic concepts such as the mean, median etc and eventually covers all aspects of an analytics (or) data science career from analysing and preparing raw data to visualising your findings.

Data Science, Statistics and R: This course is an introduction to Data Science and Statistics using the R programming language. It covers both the theoretical aspects of Statistical concepts and the practical implementation using R.

Real life examples: Every concept is explained with the help of examples, case studies and source code in R wherever necessary. The examples cover a wide array of topics and range from A/B testing in an Internet company context to the Capital Asset Pricing Model in a quant finance context.

What's Covered:

Data Analysis with R: Datatypes and Data structures in R, Vectors, Arrays, Matrices, Lists, Data Frames, Reading data from files, Aggregating, Sorting & Merging Data Frames

Linear Regression: Regression, Simple Linear Regression in Excel, Simple Linear Regression in R, Multiple Linear Regression in R, Categorical variables in regression, Robust regression, Parsing regression diagnostic plots

Data Visualization in R: Line plot, Scatter plot, Bar plot, Histogram, Scatterplot matrix, Heat map, Packages for Data Visualisation : Rcolorbrewer, ggplot2

Descriptive Statistics: Mean, Median, Mode, IQR, Standard Deviation, Frequency Distributions, Histograms, Boxplots

Inferential Statistics: Random Variables, Probability Distributions, Uniform Distribution, Normal Distribution, Sampling, Sampling Distribution, Hypothesis testing, Test statistic, Test of significance

Who this course is for:

Yep! MBA graduates or business professionals who are looking to move to a heavily quantitative role
Yep! Engineers who want to understand basic statistics and lay a foundation for a career in Data Science
Yep! Analytics professionals who have mostly worked in Descriptive analytics and want to make the shift to being modelers or data scientists
Yep! Folks who've worked mostly with tools like Excel and want to learn how to use R for statistical analysis

Learn By Example: Statistics and Data Science in R

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 21min

The 10 second answer : Descriptive Statistics8 lectures • 50min

Inferential Statistics5 lectures • 45min

Case studies in Inferential Statistics6 lectures • 1hr 7min

Diving into R6 lectures • 46min

Vectors15 lectures • 1hr 3min

Arrays5 lectures • 31min

Matrices5 lectures • 17min

Factors5 lectures • 17min

Lists and Data Frames6 lectures • 30min

Requirements

Description

Who this course is for: