Python for Statistical Analysis
4.5 (1,316 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
47,195 students enrolled

Python for Statistical Analysis

Master applied Statistics with Python by solving real-world problems with state-of-the-art software and libraries
Highest Rated
4.5 (1,316 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
47,195 students enrolled
Last updated 7/2020
English
English [Auto]
Current price: $139.99 Original price: $199.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 8.5 hours on-demand video
  • 1 article
  • 36 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Gain deeper insights into data
  • Use Python to solve common and complex statistical and Machine Learning-related projects
  • How to interpret and visualize outcomes, integrating visual output and graphical exploration
  • Learn hypothesis testing and how to efficiently implement tests in Python
Course content
Expand all 54 lectures 08:36:49
+ Introduction
7 lectures 31:52

A general course overview - what we'll cover, how we'll cover it, and where you can get help if things go wrong!

To join the Facebook ground, check this link out: https://www.facebook.com/groups/superdatascience/


For the Python 2v3 links, see:

https://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html

https://www.geeksforgeeks.org/important-differences-between-python-2-x-and-python-3-x-with-examples/

Preview 08:10

Let's talk about setting everything up. What python version we'll use and the different ways you can get it.


If you've downloaded anaconda, you should have everything you need to get started available right away, and if not, here is the updated link to the Anaconda tutorial I've hosted online (apologies, the link has changed from the one in the presentation):

https://cosmiccoding.com.au/tutorial/2018/07/30/anaconda.html


If you've picked miniconda, you'll need to use conda to install dependencies. To do that in your base environment, execute

conda install numpy scipy matplotlib pandas jupyter scikit-learn

If you want a new environment for this course (called 'stats'), try this out

conda create -n stats python=3.7 numpy scipy matplotlib pandas jupyter scikit-learn

conda activate stats

Preview 05:23
BONUS: Learning Path
00:34

Let's do a live run through installing anaconda - the best way of getting a scientific distribution of python on your machine.


Anaconda download link: https://www.anaconda.com/distribution/

Miniconda download link: https://docs.conda.io/en/latest/miniconda.html

Live Install and Verification
04:27

Now that we've got python installed, we need to figure out how we should write our code. There are a lot of options, so lets touch on them quickly so you can find something that works well for you!

Coding Editors
04:31

Better than just talking about editors, let's run a few so you can see better how they work and how you can use them.

Live Coding Editor Comparison
06:04

Finally, let's discuss how to keep track of your code. No one wants to lose work by accident, and there are a few ways around this. One way far superior to the others, as you'll see inside the video!

File Management
02:43
+ Exploring Data Analysis
15 lectures 02:16:33

We'll be working with a lot of datasets in the coming lectures. So before we jump into that, let's discuss the different ways we can load data into our code. No coding in this one, let's focus on the higher level for just a moment!

Preview 07:00

Jumping into the code, let's have a look at all the different ways we can load data into our code, using numpy, pandas and pickling!

Loading Data - Practical Example
14:42

Loading data into our code is the easy part. The vast majority of our time will be spent sanitising, cleaning and preparing the data. Let's run through some basic tools you can use to do this, and hope that your first project goes as simply as this example!

Dataset Preparation - Practical Example
15:45

Sometimes the data we get doesn't just have NaN's in our data, we have outlying points that we want to identify and potentially remove. Let's look at how.

Dealing with Outliers - Practical Example
14:09

A brief conceptual overview of a bunch of ways we can visualise one dimensional data before jumping into the code!

1D Distribution Overview
01:07

One dimensional histograms are easy to make, and by far the most common way of visualising a distribution. You'll see why in the video.

1D Histograms - Practical Example
15:57

For a bit of flair, we can look at bee swarm plots. Great for presentations!

1D Bee Swarm - Practical Example
06:44

Another useful tool are box and violin plots. Violin plots can be elegant and useful in direct comparisons, and are used a lot in scientific publications.

1D Box and Violin - Practical Example
09:59

Empirical CDFs aren't the most useful visualisation tool, but boy will they come in handy later when we apply statistical tests, so let's cover them here. On top of that, let's also quickly look at panda's describe function, which will quickly become a staple of your workflow.

1D Empirical CDF and Pandas Describe - Practical Example
08:08

What do we do when we need to go beyond a single dimension? How do we visualise multivariate distributions and data?

Higher Dimensional Distributions Overview
01:22

The most common, and probably most useful, visualisation for higher dimensional data is a scatter matrix. And lucky for us, pandas has one built in!

ND Scatter Matrix - Practical Example
09:50

If we want something a bit smaller and faster to make than a scatter matrix, we can get basic information out of a correlation plot! We'll cover correlation mathematically a bit later in the course, so don't worry if the underlying math isn't intuitive!

ND Correlation - Practical Example
05:20

Let's look at 2D data briefly, and work with some examples on how to plot 2D histograms, contour plots and utilise the power of kernel density estimation!

2D Histograms, Contours and KDE - Practical Example
11:20

Let's mix some probability into things and see talk about likelihood contours!

ND Scatter Probability - Practical Example
12:13

Time to put everything back together for a quick summary! Don't forget to download the attached cheat sheet!

Exploratory Data Analysis Summary
02:57
+ Characterising
7 lectures 53:58

We almost always need some measure of the central value in our data or a distribution. Unfortunately, there are many ways of doing this, and we need to figure out which methods we should use in which circumstance.

Mean Median Mode - Practical Example
14:36

After finding a central value, we normally always need to characterise the width of the distribution. This one has less freedom, which simplifies things!

Widths - Practical Example
05:09

Finally, sometimes our distributions are asymmetric, and this needs to be quantified if we wish to approximate our data.

Skewness and Kurtosis - Practical Example
06:57

What if we don't want a few standardised numbers and are happy to compress our distribution to an arbitrary number of points? Why, then we'd use percentiles!

Percentiles - Practical Example
07:52

Let's move onto multivariate distributions again, just like in the EDA section. Let's quantify covariance and correlation.

Multivariate Distributions - Practical Example
12:56

Time to wrap it all up for this chapter! Don't forget to download the attached cheat sheet!

Summary
03:45
+ Probability
9 lectures 01:46:08

Let's refresh some basic probability theory, probabilistic identities and the difference between a probability density function and a probability mass function.

Preview 05:17

What are common PDF and PMFs? What are their forms, their parametrisation and when should we use them?

Introduction to Probability Distributions
16:28

Let's take the functions from the previous video and learn how to invoke them in code!

Probability Distributions - Practical Example
10:14

What are cumulative density functions, survival functions, and how can we use probability theory when our distributions have no analytic form?

Probability Functions and Empirical Distributions
06:51

Empirical probability distributions in code! Let's discuss different interpolation and integration methods that come hand-in-hand with using an arbitrary function as a PDF.

Empirical Distributions - Practical Example
25:45

Now that we've got all these probability density functions, how can we sample from them to generate our own random numbers, and what on Earth is the Central Limit Theorem, and why is it so important?

Introduction to Sampling and the Central Limit Theorem
09:57

Now that we've covered the concepts in the previous video, let's power through the code!

Sampling Distributions - Practical Example
17:42

If you're still a bit confused over the central limit theorem, not to worry, let's dig a little deeper!

Central Limit Theorem - Practical Example
10:39

The main takeaways from probability theory.

Summary
03:15
+ Hypothesis Testing
9 lectures 01:23:13

An introduction to hypothesis testing. After all, what does the phrase even mean?

Preview 02:17

A short motivation example about detecting loaded dice!

Motivation Loaded Die - Practical Example
08:24

Let's talk about the simplest forms of tests - one-tailed and two-tailed tests.

Basic Tests
13:11

Let's answer a function question about the fate of the planet from asteroid impacts using a one-tailed test.

Basic Tests Example - Asteroid Impacts
20:17

Proportion testing is a special case of one and two tailed testing, so when would we use it and why?

Introduction to Proportion Testing
02:44

A fun election rigging example of when proportion testing is useful.

Proportion Testing Example - Election Rigging
12:01

Pearson's Chi2 test is a broad and powerful statistical check for discrete outcomes. Let's see how it works and apply it to our loaded dice example.

Pearsons Chi2 Test - Practical Example
09:08

If we want to compare entire distributions against each other, then we need other tests. Let's look at the original test - the KS test, and its improved version - the AD test.

Comparing Distributions - Kolmogorow-Smirnow and Anderson-Darling Tests
12:41

Putting it all back together. Don't forget to download the attached cheat sheet!

Summary
02:30
+ Conclusion
7 lectures 01:45:05

A brief summary of each chapter, highlighting the main points of each.

Conclusion
09:56

A case example of exactly what not to do when you're hypothesis testing.

Extra: Significance Hunting - What not to do!
06:12

An introduction to gaussian proccesses.

Extra: Introduction to Gaussian Proccesses
25:41

An extra prac looking at relative rates for disparate distributions.

Extra Prac - Cosmic Impact
11:21

An extra prac looking at low-number statistics.

Extra Prac: Car Emission Standards
08:39

An extra prac looking at multivariate-gaussian modelling of relative rates.

Extra Prac: Diagnosing Diabetes
16:20

An example on how to perform numerical uncertainty analysis that can be applied to almost any statistical problem.

Extra Prac: Numerical Uncertainty on Sales
26:56
Requirements
  • Python basics
Description

Welcome to Python for Statistical Analysis!


This course is designed to position you for success by diving into the real-world of statistics and data science.


  1. Learn through real-world examples: Instead of sitting through hours of theoretical content and struggling to connect it to real-world problems, we'll focus entirely upon applied statistics. Taking theory and immediately applying it through Python onto common problems to give you the knowledge and skills you need to excel.


  2. Presentation-focused outcomes: Crunching the numbers is easy, and quickly becoming the domain of computers and not people. The skills people have are interpreting and visualising outcomes and so we focus heavily on this, integrating visual output and graphical exploration in our workflows. Plus, extra bonus content on great ways to spice up visuals for reports, articles and presentations, so that you can stand out from the crowd.


  3. Modern tools and workflows: This isn't school, where we want to spend hours grinding through problems by hand for reinforcement learning. No, we'll solve our problems using state-of-the-art techniques and code libraries, utilising features from the very latest software releases to make us as productive and efficient as possible. Don't reinvent the wheel when the industry has moved to rockets.

Who this course is for:
  • Data Scientists who want to add to their skillset statistical analysis
  • Data Scientists who want to do machine learning but want some more statistical foundations before jumping in
  • Students wanting to learn applied statistics for research, coursework or business