Exploratory Data Analysis with Pandas and Python 3.x
- 5 hours on-demand video
- 1 downloadable resource
- Full lifetime access
- Access on mobile and TV
- Certificate of Completion
Get your team access to 4,000+ top Udemy courses anytime, anywhere.Try Udemy for Business
- Improve your understanding of descriptive statistics and apply them over a dataset.
- Learn how to deal with missing data and outliers to resolve data inconsistencies.
- Explore various visualization techniques for bivariate and multivariate analysis.
- Enhance your programming skills and master data exploration and visualization in Python.
- Learn multidimensional analysis and reduction techniques.
- Master advanced visualization techniques (such as heatmaps) for better analysis and rapidly broaden your understanding
Before moving on to the coding part of the course, we must lay the foundation of descriptive statistics which will be used heavily throughout the course.
• Explore the various measure of statistics like mean, median, and mode
• Understand the various properties of these measures
• Learn how to calculate these statistical measures
Once we have learned how to calculate these statistical measures, we move on to visualizing them in the form of graphs for better understanding.
• Explore the various graphs through which we can visualize the statistical measures
• Understand the visualization changes with change in values of these measures
• Explore alternate graphs for visualizations
Percentiles allow us to interpret data in a more readable format. We will explore how they are calculated and what information they give regarding the dataset.
• Understand what are iterators and the iterator protocol
• Implement iterators in Python
• Implement generators in Python using the yield keyword
Once we are done with percentiles and how they can be calculated, we move on to the concept of Quartiles and how to visualize them using box plots.
• Understand the concept of Quartiles
• Visualize percentiles and Quartiles using box plots
• Get a better understanding of box plots
Most of the real-world datasets contain missing values due to various reasons. In this video, we find out how we can know whether we have missing values in our dataset using Pandas library in Python.
• Explore the various reasons for the missing values in datasets
• Understand the various Pandas functions that can be used to find the missing values
• Learn about the different types of missing values and how Pandas does type conversion for them
Once we have learned how to find missing values in the dataset, we move on to discussing the different ways to deal with missing values.
• First, we discuss why simply ignoring rows with missing values might not work
• Understand how we can impute missing values with measures of central tendencies
• Demonstrate via an example about we can fill missing values based on other columns
Now, we move on to using Pandas library to deal with missing data.
• Explore the df.dropna function and its various attributes
• Explore the various ways of filling missing values via df.fillna, df.ffill, and df.bfill
• Implement an example in which we fill missing values based on values in other columns
Sometimes we might encounter values in our dataset which are abnormally high, low, or simply weird as compared to other values in the dataset. We must understand what outliers are and what causes them to occur.
• Understand what outliers are
• Understand the causes of outliers
• Explore via examples, the different types of outliers
Z-scores are one of the commonly used methods to identify outliers. In this video, we understand the idea behind Z-score and how they can be used to identify outliers.
• Discuss what are Z-scores and what do they signify
• Visualize Z-scores over a normal distribution for more clarity
• Implement Z-scores to find outliers in a dummy dataset
Z-scores can sometimes not be very efficient since they use mean and standard deviation to detect outliers. In this video, we use a modified version of Z-score which is based on median.
• Understand why Z-score might fail in some cases
• Understand the idea of Median, Standard Deviation, and Modified Z-scores
• Implement an example in which we find missing values using Modified Z-scores
Finally, we also learn how to use Interquartile Range (IQR) to detect outliers in a dataset and visualize them via box plots.
• Explore the concept of IQR and how it can be used to identify outliers
• Visualize IQR and outliers over a box plot
• Implement an example using IQR and box plots to detect outliers
Before moving on to analyzing the various types of variables in a dataset, we must understand the different variables that might occur in a dataset.
• Understand what are the different types of variables
• Explore the different types of numeric variables
• Explore the different types of categorical variables
Now that we have understood the different types of variables, let’s take a look at the different ways of analyzing variables using Python.
• Create dummy data for our analysis
• Implement code for plotting different types of graphs in Python
• Explore the different graphs and libraries available in Python
After learning about the various graphs that we can use to explore columns in Python, we must first understand the concept of Skewness and Kurtosis in Statistics and how they affect the shape of a distribution.
• Understand what Skewness is
• Understand the idea behind Kurtosis
• Explore how Skewness and Kurtosis affect the shape of the curve
Finally, we will apply the different techniques that we have learned for Univariate Analysis over the Olympics Dataset.
• Explore the different columns in Olympics Dataset
• Draw density plots, histograms, and so on. over various columns
• Find Skewness of the data using SciPy module in Python
Now that we have explored univariate analysis, we move ahead to bivariate analysis where we explore two variables at the same time.
• Understand what is bivariate analysis
• Understand how bivariate analysis helps us understand our data better
• List out various graphs used for bivariate analysis
Before moving on to doing practical bivariate analysis, we must understand the theoretical concept behind correlation coefficients.
• Explore the concept of correlation coefficient
• Understand the different types of correlation coefficient
• Understand what correlation coefficient signifies for our data
After understanding the theoretical concepts behind correlation coefficients, we now move on to visualizing correlation between two sets of variables.
• Implement code for positive and negative correlation
• Use seaborn library to visualize scatterplots
• Use heatmaps to visualize correlation between multiple pair of columns at once
In this video, we will apply various techniques of bivariate analysis over the video game sales dataset.
• Load the video game sales dataset and understand the various columns
• Implement interactive graphs using Bokeh library in Python
• Identify trends if they exist in the data using bivariate graphs
Now that we have explored univariate and bivariate analysis, we move ahead to multivariate analysis where we explore more than two variables at the same time.
• Understand what is multivariate analysis
• Understand the various advantages of multivariate analysis
• Visualize a graph depicting multivariate analysis
In this video, we will apply various techniques of multivariate analysis over the Titanic Dataset.
• Load the Titanic Dataset and find descriptive statistics of the various variables
• Implement multivariate graphs using Seaborn
• Identify trends if they exist in the data
In this video, we will apply various techniques of multivariate analysis over the Pokemon Dataset.
• Load the Pokemon Dataset and find descriptive statistics of the various variables
• Implement interactive graphs using Bokeh
• Identify trends if they exist in the data using multivariate graphs
Simpson’s Paradox is a phenomenon that may occur in real-world data, leading to conflicting results. We understand why it happens and what we can do to prevent it.
• Understand what is Simpson’s Paradox
• Understand what causes it and how we can prevent it from happening
• Demonstrate Simpson’s Paradox using an example
This is one of the most widely misinterpreted phenomena that occurs in real world. We understand why it happens and what we can do to prevent it.
• Understand why Correlation does not necessarily imply causation
• Understand what causes it and how we can prevent it from happening
• Demonstrate that correlation does not imply causation using various examples
In this video, we will apply all the different techniques that we have learned in the previous sections to a real-world dataset.
• Download and load the dataset
• Explore the different variables in the dataset
• Create a set of questions that we will answer through our analysis
- Basic Python programming experience required.
How do you take your data analysis skills beyond Excel to the next level? By learning just enough Python to get stuff done. This hands-on course shows non-programmers how to process information that’s initially too messy or difficult to access. Through various step-by-step exercises, you’ll learn how to acquire, clean, analyze, and present data efficiently.
This course will take you from Python basics to explore many different types of data. Throughout the course, you will be working with real-world datasets to retrieve insights from data. You'll be exposed to different kinds of data structure and data-related problems. You'll learn how to prepare data for analysis, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data, and more!
About the Author
Mohammed Kashif works as a Data Scientist at Nineleaps, India, dealing mostly with graph data analysis. Prior to this, he worked as a Python developer at Qualcomm. He completed his Master's degree in Computer Science from IIT Delhi, with a specialization in data engineering. His areas of interests include recommender systems, NLP, and graph analytics. In his spare time, he likes to solve questions on StackOverflow and help debug other people out of their misery. He is also an experienced teaching assistant with a demonstrated history of working in the Higher-Education industry.
- This course is for Python developers, data analysts, and IT professionals who want to move toward a career as a full-fledged data scientist/analytics expert; anyone who wants to use data analytics/machine learning to enrich their current personal or professional projects will also benefit from it.