Exploratory Data Analysis with Pandas and Python 3.x

Extract and transform your data to gain valuable insights

Created byPackt Publishing

Last updated 8/2019

English

What you'll learn

Improve your understanding of descriptive statistics and apply them over a dataset.
Learn how to deal with missing data and outliers to resolve data inconsistencies.
Explore various visualization techniques for bivariate and multivariate analysis.
Enhance your programming skills and master data exploration and visualization in Python.
Learn multidimensional analysis and reduction techniques.
Master advanced visualization techniques (such as heatmaps) for better analysis and rapidly broaden your understanding

Course content

7 sections • 32 lectures • 5h 3m total length

The Course Overview4:38
This video will give you an overview about the course.
Basic Statistical Measures7:39
Before moving on to the coding part of the course, we must lay the foundation of descriptive statistics which will be used heavily throughout the course.
• Explore the various measure of statistics like mean, median, and mode
• Understand the various properties of these measures
• Learn how to calculate these statistical measures
Variance and Standard Deviation4:10
Once we have learned how to calculate these statistical measures, we move on to visualizing them in the form of graphs for better understanding.
• Explore the various graphs through which we can visualize the statistical measures
• Understand the visualization changes with change in values of these measures
• Explore alternate graphs for visualizations
Visualizing Statistical Measures9:03
We must understand the importance of variance in data and how it ties up with other measures of central tendencies.
• Explore the concept of variance
• Visualize variance in data
• Understand how it depends on other statistical measures
Calculating Percentiles5:10
Percentiles allow us to interpret data in a more readable format. We will explore how they are calculated and what information they give regarding the dataset.
• Understand what are iterators and the iterator protocol
• Implement iterators in Python
• Implement generators in Python using the yield keyword
Quartiles and Box Plots7:04
Once we are done with percentiles and how they can be calculated, we move on to the concept of Quartiles and how to visualize them using box plots.
• Understand the concept of Quartiles
• Visualize percentiles and Quartiles using box plots
• Get a better understanding of box plots

Finding Missing Values11:25
Most of the real-world datasets contain missing values due to various reasons. In this video, we find out how we can know whether we have missing values in our dataset using Pandas library in Python.
• Explore the various reasons for the missing values in datasets
• Understand the various Pandas functions that can be used to find the missing values
• Learn about the different types of missing values and how Pandas does type conversion for them
Dealing with Missing Values6:18
Once we have learned how to find missing values in the dataset, we move on to discussing the different ways to deal with missing values.
• First, we discuss why simply ignoring rows with missing values might not work
• Understand how we can impute missing values with measures of central tendencies
• Demonstrate via an example about we can fill missing values based on other columns
Hands-on with Dealing with Missing Values14:43
Now, we move on to using Pandas library to deal with missing data.
• Explore the df.dropna function and its various attributes
• Explore the various ways of filling missing values via df.fillna, df.ffill, and df.bfill
• Implement an example in which we fill missing values based on values in other columns
Case Study: Missing Data in Titanic Dataset12:09
We need to apply the concepts that we have learnt in this section over the real-world Titanic Dataset.
• Load the Titanic Dataset and explore the various columns
• Find out the descriptive statistics of the dataset
• Impute missing values in the dataset

What are Outliers?5:22
Sometimes we might encounter values in our dataset which are abnormally high, low, or simply weird as compared to other values in the dataset. We must understand what outliers are and what causes them to occur.
• Understand what outliers are
• Understand the causes of outliers
• Explore via examples, the different types of outliers
Using Z-scores to Find Outliers6:50
Z-scores are one of the commonly used methods to identify outliers. In this video, we understand the idea behind Z-score and how they can be used to identify outliers.
• Discuss what are Z-scores and what do they signify
• Visualize Z-scores over a normal distribution for more clarity
• Implement Z-scores to find outliers in a dummy dataset
Modified Z-scores7:41
Z-scores can sometimes not be very efficient since they use mean and standard deviation to detect outliers. In this video, we use a modified version of Z-score which is based on median.
• Understand why Z-score might fail in some cases
• Understand the idea of Median, Standard Deviation, and Modified Z-scores
• Implement an example in which we find missing values using Modified Z-scores
Using IQR to Detect Outliers8:46
Finally, we also learn how to use Interquartile Range (IQR) to detect outliers in a dataset and visualize them via box plots.
• Explore the concept of IQR and how it can be used to identify outliers
• Visualize IQR and outliers over a box plot
• Implement an example using IQR and box plots to detect outliers

Types of Variables17:25
Before moving on to analyzing the various types of variables in a dataset, we must understand the different variables that might occur in a dataset.
• Understand what are the different types of variables
• Explore the different types of numeric variables
• Explore the different types of categorical variables
Introduction to Univariate Analysis6:27
Now that we have understood the different types of variables, let’s take a look at the different ways of analyzing variables using Python.
• Create dummy data for our analysis
• Implement code for plotting different types of graphs in Python
• Explore the different graphs and libraries available in Python
Skewness and Kurtosis4:16
After learning about the various graphs that we can use to explore columns in Python, we must first understand the concept of Skewness and Kurtosis in Statistics and how they affect the shape of a distribution.
• Understand what Skewness is
• Understand the idea behind Kurtosis
• Explore how Skewness and Kurtosis affect the shape of the curve
Univariate Analysis over Olympics Dataset11:39
Finally, we will apply the different techniques that we have learned for Univariate Analysis over the Olympics Dataset.
• Explore the different columns in Olympics Dataset
• Draw density plots, histograms, and so on. over various columns
• Find Skewness of the data using SciPy module in Python

Introduction to Bivariate Analysis2:25
Now that we have explored univariate analysis, we move ahead to bivariate analysis where we explore two variables at the same time.
• Understand what is bivariate analysis
• Understand how bivariate analysis helps us understand our data better
• List out various graphs used for bivariate analysis
Correlation Coefficient4:21
Before moving on to doing practical bivariate analysis, we must understand the theoretical concept behind correlation coefficients.
• Explore the concept of correlation coefficient
• Understand the different types of correlation coefficient
• Understand what correlation coefficient signifies for our data
Scatter Plots and Heatmaps8:25
After understanding the theoretical concepts behind correlation coefficients, we now move on to visualizing correlation between two sets of variables.
• Implement code for positive and negative correlation
• Use seaborn library to visualize scatterplots
• Use heatmaps to visualize correlation between multiple pair of columns at once
Bivariate Analysis: Titanic Dataset8:32
In this video, we will apply various techniques of bivariate analysis over the Titanic Dataset.
• Load the Titanic Dataset
• Implement bivariate graphs using Seaborn
• Identify trends if they exist in the data
Bivariate Analysis: Video Game Sales18:25
In this video, we will apply various techniques of bivariate analysis over the video game sales dataset.
• Load the video game sales dataset and understand the various columns
• Implement interactive graphs using Bokeh library in Python
• Identify trends if they exist in the data using bivariate graphs

Introduction to Multivariate Analysis3:01
Now that we have explored univariate and bivariate analysis, we move ahead to multivariate analysis where we explore more than two variables at the same time.
• Understand what is multivariate analysis
• Understand the various advantages of multivariate analysis
• Visualize a graph depicting multivariate analysis
Multivariate Analysis over Titanic Dataset10:06
In this video, we will apply various techniques of multivariate analysis over the Titanic Dataset.
• Load the Titanic Dataset and find descriptive statistics of the various variables
• Implement multivariate graphs using Seaborn
• Identify trends if they exist in the data
Multivariate Analysis over Pokemon Dataset18:57
In this video, we will apply various techniques of multivariate analysis over the Pokemon Dataset.
• Load the Pokemon Dataset and find descriptive statistics of the various variables
• Implement interactive graphs using Bokeh
• Identify trends if they exist in the data using multivariate graphs
Simpson’s Paradox4:33
Simpson’s Paradox is a phenomenon that may occur in real-world data, leading to conflicting results. We understand why it happens and what we can do to prevent it.
• Understand what is Simpson’s Paradox
• Understand what causes it and how we can prevent it from happening
• Demonstrate Simpson’s Paradox using an example
Correlation Is Not Causation4:46
This is one of the most widely misinterpreted phenomena that occurs in real world. We understand why it happens and what we can do to prevent it.
• Understand why Correlation does not necessarily imply causation
• Understand what causes it and how we can prevent it from happening
• Demonstrate that correlation does not imply causation using various examples

Wine Data Analysis: Initial Setup4:49
In this video, we will apply all the different techniques that we have learned in the previous sections to a real-world dataset.
• Download and load the dataset
• Explore the different variables in the dataset
• Create a set of questions that we will answer through our analysis
Red Wine Analysis24:35
Here we will do Exploratory Data Analysis over Red Wine Data.
• Download and load the dataset
• Explore the different variables in the dataset
• Identify trends if they exist in the data
White Wine Analysis21:49
In this video, we will do Exploratory Data Analysis over White Wine Data.
• Download and load the dataset
• Explore the different variables in the dataset
• Identify trends if they exist in the data
White Wine versus Red Wine: Analysis18:20
Here, we will do a comparative analysis about how these wines are different from each other.
• Download and load the dataset
• Explore the different variables in the dataset based on the type of wines
• Identify trends if they exist in the data

Requirements

Basic Python programming experience required.

Description

How do you take your data analysis skills beyond Excel to the next level? By learning just enough Python to get stuff done. This hands-on course shows non-programmers how to process information that’s initially too messy or difficult to access. Through various step-by-step exercises, you’ll learn how to acquire, clean, analyze, and present data efficiently.

This course will take you from Python basics to explore many different types of data. Throughout the course, you will be working with real-world datasets to retrieve insights from data. You'll be exposed to different kinds of data structure and data-related problems. You'll learn how to prepare data for analysis, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data, and more!

About the Author

Mohammed Kashif works as a Data Scientist at Nineleaps, India, dealing mostly with graph data analysis. Prior to this, he worked as a Python developer at Qualcomm. He completed his Master's degree in Computer Science from IIT Delhi, with a specialization in data engineering. His areas of interests include recommender systems, NLP, and graph analytics. In his spare time, he likes to solve questions on StackOverflow and help debug other people out of their misery. He is also an experienced teaching assistant with a demonstrated history of working in the Higher-Education industry.

Who this course is for:

This course is for Python developers, data analysts, and IT professionals who want to move toward a career as a full-fledged data scientist/analytics expert; anyone who wants to use data analytics/machine learning to enrich their current personal or professional projects will also benefit from it.

Exploratory Data Analysis with Pandas and Python 3.x

What you'll learn

Explore related topics

Course content

Descriptive Statistics6 lectures • 38min

Dealing with Missing Data4 lectures • 45min

Dealing with Outliers4 lectures • 29min

Univariate Analysis4 lectures • 40min

Bivariate Analysis5 lectures • 42min

Multivariate Analysis5 lectures • 41min

Bringing It All Together4 lectures • 1hr 10min

Requirements

Description

Who this course is for: