Exploratory Data Analysis with Pandas and Python 3.x
3.7 (45 ratings)
149 students enrolled

# Exploratory Data Analysis with Pandas and Python 3.x

Extract and transform your data to gain valuable insights
3.7 (45 ratings)
149 students enrolled
Created by Packt Publishing
Last updated 8/2019
English
English [Auto-generated]
Current price: \$86.99 Original price: \$124.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
• 5 hours on-demand video
• Access on mobile and TV
• Certificate of Completion
Training 5 or more people?

What you'll learn
• Improve your understanding of descriptive statistics and apply them over a dataset.
• Learn how to deal with missing data and outliers to resolve data inconsistencies.
• Explore various visualization techniques for bivariate and multivariate analysis.
• Enhance your programming skills and master data exploration and visualization in Python.
• Learn multidimensional analysis and reduction techniques.
Course content
Expand all 32 lectures 05:03:49
+ Descriptive Statistics
6 lectures 37:44

This video will give you an overview about the course.

Preview 04:38

Before moving on to the coding part of the course, we must lay the foundation of descriptive statistics which will be used heavily throughout the course.

•  Explore the various measure of statistics like mean, median, and mode

•  Understand the various properties of these measures

•  Learn how to calculate these statistical measures

Basic Statistical Measures
07:39

Once we have learned how to calculate these statistical measures, we move on to visualizing them in the form of graphs for better understanding.

•  Explore the various graphs through which we can visualize the statistical measures

•  Understand the visualization changes with change in values of these measures

•  Explore alternate graphs for visualizations

Variance and Standard Deviation
04:10

We must understand the importance of variance in data and how it ties up with other measures of central tendencies.

•  Explore the concept of variance

•  Visualize variance in data

•  Understand how it depends on other statistical measures

Visualizing Statistical Measures
09:03

Percentiles allow us to interpret data in a more readable format. We will explore how they are calculated and what information they give regarding the dataset.

•  Understand what are iterators and the iterator protocol

•  Implement iterators in Python

•  Implement generators in Python using the yield keyword

Calculating Percentiles
05:10

Once we are done with percentiles and how they can be calculated, we move on to the concept of Quartiles and how to visualize them using box plots.

•  Understand the concept of Quartiles

•  Visualize percentiles and Quartiles using box plots

•  Get a better understanding of box plots

Quartiles and Box Plots
07:04
+ Dealing with Missing Data
4 lectures 44:35

Most of the real-world datasets contain missing values due to various reasons. In this video, we find out how we can know whether we have missing values in our dataset using Pandas library in Python.

•  Explore the various reasons for the missing values in datasets

•  Understand the various Pandas functions that can be used to find the missing values

•  Learn about the different types of missing values and how Pandas does type conversion for them

Preview 11:25

Once we have learned how to find missing values in the dataset, we move on to discussing the different ways to deal with missing values.

•  First, we discuss why simply ignoring rows with missing values might not work

•  Understand how we can impute missing values with measures of central tendencies

•  Demonstrate via an example about we can fill missing values based on other columns

Dealing with Missing Values
06:18

Now, we move on to using Pandas library to deal with missing data.

•  Explore the df.dropna function and its various attributes

•  Explore the various ways of filling missing values via df.fillna, df.ffill, and df.bfill

•  Implement an example in which we fill missing values based on values in other columns

Hands-on with Dealing with Missing Values
14:43

We need to apply the concepts that we have learnt in this section over the real-world Titanic Dataset.

•  Load the Titanic Dataset and explore the various columns

•  Find out the descriptive statistics of the dataset

•  Impute missing values in the dataset

Case Study: Missing Data in Titanic Dataset
12:09
+ Dealing with Outliers
4 lectures 28:39

Sometimes we might encounter values in our dataset which are abnormally high, low, or simply weird as compared to other values in the dataset. We must understand what outliers are and what causes them to occur.

•  Understand what outliers are

•  Understand the causes of outliers

•  Explore via examples, the different types of outliers

What are Outliers?
05:22

Z-scores are one of the commonly used methods to identify outliers. In this video, we understand the idea behind Z-score and how they can be used to identify outliers.

•  Discuss what are Z-scores and what do they signify

•  Visualize Z-scores over a normal distribution for more clarity

•  Implement Z-scores to find outliers in a dummy dataset

Using Z-scores to Find Outliers
06:50

Z-scores can sometimes not be very efficient since they use mean and standard deviation to detect outliers. In this video, we use a modified version of Z-score which is based on median.

•  Understand why Z-score might fail in some cases

•  Understand the idea of Median, Standard Deviation, and Modified Z-scores

•  Implement an example in which we find missing values using Modified Z-scores

Modified Z-scores
07:41

Finally, we also learn how to use Interquartile Range (IQR) to detect outliers in a dataset and visualize them via box plots.

•  Explore the concept of IQR and how it can be used to identify outliers

•  Visualize IQR and outliers over a box plot

•  Implement an example using IQR and box plots to detect outliers

Using IQR to Detect Outliers
08:46
+ Univariate Analysis
4 lectures 39:47

Before moving on to analyzing the various types of variables in a dataset, we must understand the different variables that might occur in a dataset.

•  Understand what are the different types of variables

•  Explore the different types of numeric variables

•  Explore the different types of categorical variables

Types of Variables
17:25

Now that we have understood the different types of variables, let’s take a look at the different ways of analyzing variables using Python.

•  Create dummy data for our analysis

•  Implement code for plotting different types of graphs in Python

•  Explore the different graphs and libraries available in Python

Introduction to Univariate Analysis
06:27

After learning about the various graphs that we can use to explore columns in Python, we must first understand the concept of Skewness and Kurtosis in Statistics and how they affect the shape of a distribution.

•  Understand what Skewness is

•  Understand the idea behind Kurtosis

•  Explore how Skewness and Kurtosis affect the shape of the curve

Skewness and Kurtosis
04:16

Finally, we will apply the different techniques that we have learned for Univariate Analysis over the Olympics Dataset.

•  Explore the different columns in Olympics Dataset

•  Draw density plots, histograms, and so on. over various columns

•  Find Skewness of the data using SciPy module in Python

Univariate Analysis over Olympics Dataset
11:39
+ Bivariate Analysis
5 lectures 42:08

Now that we have explored univariate analysis, we move ahead to bivariate analysis where we explore two variables at the same time.

•  Understand what is bivariate analysis

•  Understand how bivariate analysis helps us understand our data better

•  List out various graphs used for bivariate analysis

Introduction to Bivariate Analysis
02:25

Before moving on to doing practical bivariate analysis, we must understand the theoretical concept behind correlation coefficients.

•  Explore the concept of correlation coefficient

•  Understand the different types of correlation coefficient

•  Understand what correlation coefficient signifies for our data

Correlation Coefficient
04:21

After understanding the theoretical concepts behind correlation coefficients, we now move on to visualizing correlation between two sets of variables.

•  Implement code for positive and negative correlation

•  Use seaborn library to visualize scatterplots

•  Use heatmaps to visualize correlation between multiple pair of columns at once

Scatter Plots and Heatmaps
08:25

In this video, we will apply various techniques of bivariate analysis over the Titanic Dataset.

•  Implement bivariate graphs using Seaborn

•  Identify trends if they exist in the data

Bivariate Analysis: Titanic Dataset
08:32

In this video, we will apply various techniques of bivariate analysis over the video game sales dataset.

•  Load the video game sales dataset and understand the various columns

•  Implement interactive graphs using Bokeh library in Python

•  Identify trends if they exist in the data using bivariate graphs

Bivariate Analysis: Video Game Sales
18:25
+ Multivariate Analysis
5 lectures 41:23

Now that we have explored univariate and bivariate analysis, we move ahead to multivariate analysis where we explore more than two variables at the same time.

•  Understand what is multivariate analysis

•  Understand the various advantages of multivariate analysis

•  Visualize a graph depicting multivariate analysis

Introduction to Multivariate Analysis
03:01

In this video, we will apply various techniques of multivariate analysis over the Titanic Dataset.

•  Load the Titanic Dataset and find descriptive statistics of the various variables

•  Implement multivariate graphs using Seaborn

•  Identify trends if they exist in the data

Multivariate Analysis over Titanic Dataset
10:06

In this video, we will apply various techniques of multivariate analysis over the Pokemon Dataset.

•  Load the Pokemon Dataset and find descriptive statistics of the various variables

•  Implement interactive graphs using Bokeh

•  Identify trends if they exist in the data using multivariate graphs

Multivariate Analysis over Pokemon Dataset
18:57

Simpson’s Paradox is a phenomenon that may occur in real-world data, leading to conflicting results. We understand why it happens and what we can do to prevent it.

•   Understand what is Simpson’s Paradox

•   Understand what causes it and how we can prevent it from happening

•  Demonstrate Simpson’s Paradox using an example

04:33

This is one of the most widely misinterpreted phenomena that occurs in real world. We understand why it happens and what we can do to prevent it.

•  Understand why Correlation does not necessarily imply causation

•  Understand what causes it and how we can prevent it from happening

•  Demonstrate that correlation does not imply causation using various examples

Correlation Is Not Causation
04:46
+ Bringing It All Together
4 lectures 01:09:33

In this video, we will apply all the different techniques that we have learned in the previous sections to a real-world dataset.

•  Explore the different variables in the dataset

•  Create a set of questions that we will answer through our analysis

Wine Data Analysis: Initial Setup
04:49

Here we will do Exploratory Data Analysis over Red Wine Data.

•  Explore the different variables in the dataset

•  Identify trends if they exist in the data

Red Wine Analysis
24:35

In this video, we will do Exploratory Data Analysis over White Wine Data.

•  Explore the different variables in the dataset

•  Identify trends if they exist in the data

White Wine Analysis
21:49

Here, we will do a comparative analysis about how these wines are different from each other.

•  Explore the different variables in the dataset based on the type of wines

•  Identify trends if they exist in the data

White Wine versus Red Wine: Analysis
18:20
Requirements
• Basic Python programming experience required.
Description

How do you take your data analysis skills beyond Excel to the next level? By learning just enough Python to get stuff done. This hands-on course shows non-programmers how to process information that’s initially too messy or difficult to access. Through various step-by-step exercises, you’ll learn how to acquire, clean, analyze, and present data efficiently.

This course will take you from Python basics to explore many different types of data. Throughout the course, you will be working with real-world datasets to retrieve insights from data. You'll be exposed to different kinds of data structure and data-related problems. You'll learn how to prepare data for analysis, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data, and more!