Data visualization and Descriptive Statistics with Python 3
4.4 (22 ratings)
160 students enrolled

# Data visualization and Descriptive Statistics with Python 3

Using practical real-world datasets to showcase how to visualize and analyze data with Python Pandas, scipy and numpy
4.4 (22 ratings)
160 students enrolled
Created by Luc Zio
Last updated 11/2018
English
English [Auto]
Current price: \$47.99 Original price: \$79.99 Discount: 40% off
2 days left at this price!
30-Day Money-Back Guarantee
This course includes
• 5.5 hours on-demand video
• 1 Practice Test
• Access on mobile and TV
• Certificate of Completion
Training 5 or more people?

What you'll learn
• Create professional charts with real world data using Python 3
• Understand Python 3 visual analysis tools and how to use them
• Understand how and why some charting types are used to explore data in data science and Python
• Understand how the different Python libraries treat missing values in data
• Be able to use effectively Python statistical libraries to compute descriptive statistics
Course content
Expand all 34 lectures 05:25:28
+ Getting started with the course
2 lectures 20:44

In this section, we will show you how to obtain Anaconda for Python 3 and how to launch the Jupyter notebooks.

Preview 07:35

In this lecture, you will learn how the course is organized. In particular, you will learn how to easily download the files necessary for each lecture as well as the projects files.

Course organization, Jupyter notebooks, data and project files using Python 3
13:09
+ Exploratory data analysis using Python 3 graphical libraries.
13 lectures 01:58:54

In this lecture, we use Python matplotlib graphical library to create a pie chart of the Ebola data. We illustrate how the chart is created and most importantly how to interpret the chart.

Preview 12:26

Quick check to  about displaying a chart in Python

Showing the graphical display in Python
1 question

In this video, we learn how to construct side by side pie charts of the Ebola data usin matplotlib library in Python 3

Side by Side Pie charts using matplotlib library in Python 3
14:15

Subplot grid parameters can be written as single integer like 131 or (1,3,1). For example, "131" means "1x3 grid, first subplot" and "234" means "2x3 grid, 4th subplot".  So, (1,2,1) will mean a plot with  1 row, 2 columns (charts), for the 1st subplot.  As an example, the python code will look like: plt.subplot(1, 3, 2)

Understanding subploting in Python data visualization
1 question

In this lecture, we showcase the creation of the stacked area plot using Washington DC crime statistics data and how to interpret it.

Creating a stacked area plot using Python seaborn library
12:26

Stacked area plots just like stacked bar plots are used to visualize table data. Example: Number of victimes by year and crime type. The syntax for area plot is df.plot(kind="area",stacked=True) where df is a crosstab table of data.

Understanding stacked area and stacked bar plots in Python
1 question

In this lecture, we are using Seaborn graphical library to create a scatterplot of the systolic blood pressure data and also showcase to interpret the chart.

Creating a scatterplot chart in Python 3 using seaborn library.
08:48

We can use regplot in a seaborn to create a scatter plot or (XY) plot.  sns.regplot(x="Age", y="SBP", fit_reg=True, data=df)
by default, fit_reg is set to True in which case a regression line is drawn through the data.  If we don't want a regression line,
we set use: fit_reg = False

Understanding regplot for creating scatterplot in seaborn.
1 question

In this lecture, we are looking at pairwise relationships between quantitative variables using Python PAIRPLOT library in seaborn.

Creating a pairplot using Python seaborn graphical library
05:09

PAIRPLOT is available in seaborn library and helps to assess relationships between quantitative variables and the distribution of each variable using histograms.

Understanding the use of pairplot in Seaborn
1 question

In this lecture we use Python seaborn boxplot library to analyze corruption perception index data in order to compare the score by continent.

Preview 07:24

In this video, we use line plot library in pandas to show the trend of life expectancy in countries of the world

Creating a line plot trend of the data using Python pandas library
10:07

In this lecture, we use Python seaborn library to create a histogram about the world corruption perception index data

Creating a histogram using Python seaborn to analyze data
06:43

In this lecture, we use a Barplot to analyze the EBOLA data in Sierra Leone and Guinea

Creating a Barplot using Python seaborn library to analyze the Ebola data
07:44

In this lecture, we create a professionally looking barchart using colors palettes with Python seaborn graphical library.

Creating a Barplot using colors palettes with Python seaborn library
05:04

In this lecture, we construct a stacked bar using a real world dataset about missing migrants.

Creating a Stacked bar of the missing migrants data using Python seaborn library
10:26

In this lecture, we are constructing an ordered barchart of Duncan's PRESTIGE DATA. This ordered barchart is equivalent to a Paerto's chart .

Creating a Pareto type barchart using Python seaborn library
08:19

In this lecture, you will learn how to construct a heatmap in Python seaborn using the Washington DC crime statistics data

Creating a heatmap plot using Python seaborn library
10:03
+ Projects and hands on applications
1 lecture 05:30

The purpose of this lecture is to help you get hands on practice with real-world data

Hands on project about visualizations in Python
05:30
+ Analyzing descriptive statistics using Pandas libraries in Python 3
7 lectures 01:33:22

In this lecture, we use Pandas in Python 3 to compute various descriptive statistics using Real World datasets.  We are computing statistics such as the mean, standard deviation, mean absolute deviation within the Pandas library.

Computing descriptive statistics in Python Pandas Part 1
18:29

In this lecture, we show how to use Pandas in Python 3 to aggregate data by computing the number of observations, mean absolute deviation, mean, min, max, median, standard deviation and skewness coefficient.

Analyzing Baseball players data with Pandas in Python 3
12:41

In addition to computing Pearson, Kendall and Spearman correlation using Python Pandas library, in this lecture we compute skewness coefficient, percentiles and ranks using real world datasets.

Computing descriptive statistics in Python Pandas Part 2
19:20

In this lecture, you will learn how to compute Kendall tau coefficient of correlation, Pearson Correlation coefficient and Spearman Rank correlation using real world datasets.

Computing correlation coefficients with Python Scipy library
05:37

In this lecture, we compute the coefficient of variation in scipy stats and explain how its useful in our daily work.

Computing the coefficient of variation in Python scipy statistics library
04:55

At the end of this video, students will be able to tackle Real World applications using the classification techniques taught in the course using Pandas in Python 3.

Classifying World literacy rate using Pandas libraries in Python
20:50

In this lecture, we show how to use Pandas quantile functions to determine outliers in the data using real world infant mortality dataset.

Finding outliers in data using Python Pandas library with quantiles functions
11:30
+ Computing descriptive statistics in Python Scipy library
4 lectures 36:47

In this lecture, we use libraries in Python scipy stats to compute various means such as geometric and  harmonic means.

Using Python Scipy library to compute various measures of center of the data
10:25

In this lecture, you learn how to compute and interpret the Z score from the world corruption perception data.

Computing the Z score using Python Scipy library
07:48

In this lecture, we use Python Scipy PercentOfScore to compute percentiles of the data as well as ranking the data.

Computing percentiles of scores and IQR using Python Scipy library
07:39

In this lecture, we showcase how to compute trimmed means using Scipy stats library in Python

Computing trimmed statistics using Python 3 scipy statistics library
10:55
+ Computing Descriptive Statistics using the Statistics library in Python
3 lectures 25:34

In this lecture, you will learn about missing values and their effect statistical computations within the statistics library in Python.

Computing statistics with missing values using the statistics library in Python
06:59

In this lecture, we showcase how to handle missing values to perform proper statistical computations using the statistics library in Python

Handling missing values using the statistics library in Python
08:52

In this lecture, we showcase how to compute the median, median low and median high using the statistics library in Python

Computing various medians using the Statistics library in Python
09:43
+ Computing Descriptive Statistics using the Numpy library in Python
2 lectures 21:02

In this lecture, we show how to handle missing data in Pandas Numpy and specifically how to compute the descriptive statistics functions

Handling missing values in Numpy library in Python
09:23

In this lecture we illustrate how to use numpy to compute various statistical functions including weighted means

Computing Descriptive Statistics using the Numpy library in Python
11:39

The purpose of this practice test is to measure your understanding of how numpy computes statistics with missing values.

Understanding numpy computations when missing values are in the data
3 questions
+ Hands on analysis of Descriptive statistics data in Python 3
1 lecture 01:36

After this lecture, you will understand how to use exploratory data analysis techniques in Python to analyze data.

Analyzing life expectancy data using exploratory data analysis in Python
01:36
+ Conclusion for the course
1 lecture 01:59

Concluding remarks  and next steps

Conclusion for the course
01:59
Requirements
• Python Anaconda 3.6 using Jupyter notebook
• Introductory level of Python
• Introductory statistics
Description

This course is designed to teach analysts, students interested in data science, statisticians, data scientists how to analyze real world data by  creating professional looking charts and using numerical descriptive statistics techniques in Python 3.  You will learn how to use charting libraries  in Python 3 to analyze real-world data about corruption perception, infant mortality rate, life expectancy, the Ebola virus, alcohol and liver disease data, World literacy rate, violent crime in the USA, soccer World Cup,
migrants deaths, etc.

You will also learn how to effectively use the various statistical libraries in Python 3 such as numpy, scipy.stats, pandas and statistics to create all descriptive statistics summaries that are necessary for analyzing real world data.
In this course, you will understand how each library handles missing values and you will learn how to compute the various statistics properly when missing values are present in the data.

The course will teach you all that you need to know in order to analyze hands on real world data using Python 3.  You will be able to appropriately create the visualizations using seaborn, matplotlib or pandas libraries in Python 3

Using a wide variety of world datasets, we will analyze each one of the data using these tools within pandas, matplotlib and seaborn:

• Correlation plots

• Box-plots for comparing groups distributions

• Time series and lines plots

• Side by side comparative pie charts

• Areas charts

• Stacked bar charts

• Histograms of continuous data

• Bar charts

• Regression plots

• Statistical measures of the center of the data

• Statistical measures of spread in the data

• Statistical measures of relative standing in the data

• Calculating Correlation coefficients

• Ranking and relative standing in data

• Determining outliers in datasets

• Binning data in terciles, quartiles, quintiles, deciles, etc.

The course is taught using Anaconda Jupyter notebook, in order to achieve a reproducible research goal, where we use markdowns to clearly
document the codes in order to make them easily understandable and shareable.

This is what some students are saying:

"I really like the tips that you share in every unit in the course sections. This was a well delivered course."

"I am a Data Scientist with many years using Python /Big Data. The content of this course provides a rich resource to students interested in learning hands on data visualization in Python and the analysis of descriptive statistics. I will recommend this course anyone trying to come into this domain."

Who this course is for:
• Anyone interested in charting real world datasets with Python 3
• Anyone interested in Exploratory Data Analysis using Python 3
• Anyone interested in understanding how Pandas, Numpy, statistics and Scipy libraries treat missing values in Python and how they affect data sciences computations
• Anyone interested in understanding how to compute descriptive statistics using Python libraries
• Anyone interested in understanding how to effectively use the different statistical libraries for computing descriptive statistics