Data Analysis with Pandas and Python
4.6 (10,461 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
134,054 students enrolled

Data Analysis with Pandas and Python

Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!
4.6 (10,461 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
134,054 students enrolled
Created by Boris Paskhaver
Last updated 8/2020
English [Auto], French [Auto], 5 more
  • German [Auto]
  • Italian [Auto]
  • Polish [Auto]
  • Portuguese [Auto]
  • Spanish [Auto]
Current price: $12.99 Original price: $19.99 Discount: 35% off
9 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 19 hours on-demand video
  • 2 articles
  • 4 downloadable resources
  • 7 coding exercises
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Perform a multitude of data operations in Python's popular "pandas" library including grouping, pivoting, joining and more!
  • Learn hundreds of methods and attributes across numerous pandas objects
  • Possess a strong understanding of manipulating 1D, 2D, and 3D data sets
  • Resolve common issues in broken or incomplete data sets
Course content
Expand all 162 lectures 18:48:37
+ Installation and Setup
22 lectures 02:46:31

Welcome to Data Analysis with Pandas and Python. In this lesson, we

  • introduces the pandas library including its history and purpose

  • introduce Jupyter Notebook, the environment in which we'll be writing our code

  • explore sample Jupyter Notebooks to showcase some of the technology's features

The datasets for this course are available in a single file. Download and unpack the file in the directory of your choice.

Preview 12:15

Get to know a little about your instructor.

About Me

This lesson includes the completed Jupyter Notebooks that were created during the recording of the course.

Completed Course Files

The next batch of lessons focuses on installing and configuring the Anaconda distribution on a MacOS machine. When downloading the distribution, choose the latest version of the language. It will have the greatest version number. In this lesson, we also discuss the differences between Python 2 and 3.

MacOS - Download the Anaconda Distribution, our Python development environment

In this lesson, we install the Anaconda distribution on a MacOS machine. The setup install Python and over 100 of the most popular libraries for data science in a central directory on your computer. We also explore the Anaconda Navigator program, a visual application for interacting with Anaconda.

MacOS - Install Anaconda Distribution

The Terminal is an application for issuing text-based commands to your MacOS operating system. In this lesson, you'll learn two ways to access the Terminal. We also verify that Anaconda has been successfully installed and update the version of the conda environment manager.

MacOS - Access the Terminal Application

In this lesson, we use the Terminal to create a new Anaconda environment and install Pandas (and some other libraries) within it. We also learn how to activate and deactivate conda environments, and update packages.

MacOS - Create conda Environment and Install pandas and Jupyter Notebook

The course materials, a collection of datasets in .csv and .xlsx file formats, is available for download in a single zip file attached to this lesson. I strongly recommend following along with my tutorials by practicing the syntax on your end. In this lesson, we walk through the startup and shutdown process for a Jupyter Notebook session. We also execute our first line of Python code!

MacOS - Unpack Course Materials + The Start and Shutdown Process

In this lesson, we download the Anaconda distribution, a software bundle that includes Python and the conda environment manager, for our Windows computers. We discuss the differences between Python 2 and 3 and also determine which version of the distribution to download (32-bit vs 64-bit).

Windows - Download the Anaconda Distribution

In this lesson, we install the Anaconda distribution on our Windows machines. The executable installs Python, pandas, Jupyter Notebook and over 100 popular libraries for data analysis in a standard "base" environment. We conclude by launching the Anaconda prompt.

Windows - Install Anaconda Distribution

Access the Command Prompt on a Windows machine. The prompt (also known as the command line) is used to interact with the computer with text-based commands. We'll use it to download additional Python libraries for the course and update all installed Anaconda libraries.

Windows - Create conda Environment and Install pandas and Jupyter Notebook

In this lesson, we extract our .csv and .xlsx datasets, which are available in a single .zip file attached to this lesson.. We also walk through the startup and shutdown process for a study session, which includes

  • activating the correct Anaconda environment

  • launching the Jupyter Notebooks application

  • opening and closing a Jupyter Notebook

  • shutting down the Jupyter server

Windows - Unpack Course Materials + The Startdown and Shutdown Process

In this lesson, we explore the Jupyter Notebook interface including the toolbars and menus. We also dive into the ways we can restart the Notebook in case of slowness or unresponsiveness.

Intro to the Jupyter Notebook Interface

In this lesson, we learn about the two different modes (Edit Mode and Command Mode) within a Jupyter Notebook. Edit Mode modifies the contents of a single cell while Command Mode enables keyboard shortcuts that work on the Notebook as a whole.

Cell Types and Cell Modes in Jupyter Notebook

Learn the multiple keyboard shortcuts to execute code cells and Markdown cells. We'll also learn how Jupyter Notebook chooses what to output below a cell that has multiple commands.

Code Cell Execution in Jupyter Notebook

In this lesson, we practice using keyboard shortcuts to add and delete cells from the Jupyter Notebook. We also see how to access a helpful cheatsheet of available commands.

Popular Keyboard Shortcuts in Jupyter Notebook

In this lesson, we discuss how to use the import keyword to import libraries like panda and numpy into a Jupyter Notebook. We also talk about the as keyword to assign an alias to an import, as well as the popular community aliases for pandas and numpy.

Import Libraries into Jupyter Notebook

This next batch of lessons offers a quick crash course on the Python programming language. In this lesson, we'll review Python comments, the built-in type function, and variables. 

Python Crash Course, Part 1 - Data Types and Variables

In this lesson ,we'll review Python lists and how to extract values from them by index position. A list is the equivalent of an array in other programming languages. It is used to store an ordered collection of objects.

Python Crash Course, Part 2 - Lists

Review the Python dictionary object which associates keys with values. The keys must be unique; the values can be duplicated. Dictionaries are created with curly braces and pairs of comma-separated key value pairs.

Python Crash Course, Part 3 - Dictionaries

Review Python's mathematical and equality operators. These will be critical for pandas filtering processes later in the course.

Python Crash Course, Part 4 - Operators

Define and call a sample Python function. A function is a reusable chunk of code that can accept inputs (arguments) and return outputs. We'll use custom functions later on our pandas object to apply operations to all values in a dataset.

Python Crash Course, Part 5 - Functions
+ Series
22 lectures 02:18:44

In this lesson, we create a new Jupyter Notebook for the Series section of the course. The pandas Series object is a one-dimensional labelled array that combines the best features of a Python list and a Python dictionary.

Create Jupyter Notebook for the Series Module

A pandas Series can be created with the pd.Series() constructor method. In this lesson, we'll practice creating a few sample Series by feeding in Python lists as inputs to the constructor method.

Create A Series Object from a Python List

The pd.Series constructor method accepts a variety of inputs, including native Python object. In this lesson, we'll create a Series from a Python dictionary. We'll also explore the differences between the Series and Python's built-in objects, and understand how the index operates in a Series.

Create A Series Object from a Python Dictionary
Create a Series Object
1 question

Objects in pandas have attributes and methods. Methods actively interact with and modify the object while attributes return information about the object's state. In this lesson, we'll use the .values.index, and .dtype attributes on a Series object.

Intro to Attributes on a Series Object

In this lesson, we'll continue our exploration of methods on pandas object. We'll utilize the .sum().product(), and the .mean() mathematical methods on a sample Series.

Intro to Methods on a Series Object

Parameters are the options that a method has. Arguments are the choices we choose for those options. In this lesson, we'll learn the syntax of supplying arguments to parameters on pandas methods.

Parameters and Arguments

The time has come to import our first datasets into our Jupyter Notebook work environment. In this lesson, we use the pd.read_csv method to import a dataset of Pokemon and Google stock prices. We also explore the squeeze parameter, which coerces an imported one-column DataFrame into a Series object.

Create Series from Dataset with the pd.read_csv Method
Import Series with the read_csv Method
1 question

Use the .head() and .tail() methods to return a specified number of rows from the beginning or end of a Series. The methods return a brand new Series.

Use the head and tail Methods to Return Rows from Beginning and End of Dataset

See how the Series interacts with Python's built-in functions including lentypesortedlistdictmax, and min. pandas works seamlessly with all of them.

Passing pandas Objects to Python Built-In Functions

Get some new Series attributes on the pandas Series object including, and .is_unique. Attributes return information about the object; methods directly modify the object.

Accessing More Series Attributes

Call the .sort_values() method on a Series to sort the values in ascending or descending order. We'll see how this command operates on both a numeric and alphabetical dataset.

Preview 06:04

Modify the argument to the inplace parameter on a Series method to permanently modify the object it is called on. This is an alternative to reassigning the new object to the same variable.

Use the inplace Parameter to permanently mutate a pandas data structure

Call the .sort_index() method on a pandas Series to sort it by the index instead of its values.

Use the sort_index Method to Sort the Index of a pandas Series object
The sort_values and sort_index Methods
1 question

Use Python's in keyword and attributes to check if a value exists in either the values or index of a Series. If the .index or .values attribute is not included, pandas will default to searching among the Series index.

Use Python's in Keyword to Check for Inclusion in Series values or index

In this lesson, we walk through how to use square bracket notation to extract one or more Series values by their index position. The index position represents the order of the row within the Series.

Extract Series Values by Index Positiox

In this lesson, we explore how to use use bracket notation to extract one or more values from a Series by their index labels.

Extract Series Values by Index Label
Extract Series Values by Index Position or Index Label
1 question

In this lesson, we explore an alternative approach to extracting one or more values from a Series by index position or index label. The get method accepts the key to search for in the index as well as a fallback to value in return if the key is not found.

Use the get Method to Retrieve a Value for an index label in a Series

In this lesson, we invoke common mathematical methods including .count().sum(), and .mean() on a Series object.

Math Methods on Series Objects

Call the .idxmax() and .idxmin() methods to extract the index positions of the highest or lowest values in a Series. We'll see how these can be used to extract the highest / lowest values as well.

Use the idxmax and idxmin Methods to Find Index of Greatest or Smallest Value

Call the .value_counts() method to count the number of the times each unique value occurs in a Series. The result will be a brand new Series where each unique value from the original Series serves as an index label.

Use the value_counts Method to See Counts of Unique Values within a Series

Call the .apply() method and feed it a Python function as an argument to use the function on every Series value. This is helpful for executing custom operations that are not included in pandas or numpy.

Use the apply Method to Invoke a Function on Every Series Values

Call the .map() method to tie together the values from one object to another. We'll practice with (a) two Series and (b) a Series and a dictionary object.

The Series#map Method

Review the pandas Series concepts you explored in this module with this action-packed quiz!

A Review of the Series Module
7 questions
+ DataFrames I: Introduction
15 lectures 01:45:19

In this lesson, we create a new Jupyter Notebook for this section, our first to cover the 2-dimensional DataFrame object. We walk through what dimensions are and also introduce the nba.csv dataset we'll be exploring throughout the section.

Intro to DataFrames I Module

The pandas Series and DataFrame objects share many attributes and methods. this lesson, we'll review attributes like .index, .values, .shape, .ndim, and .dtypes and see what they return on a 2D DataFrame. We'll also introduce new attributes including .columns and .axes that are exclusive to DataFrames.

Shared Methods and Attributes between Series and DataFrames

Series and DataFrame may share attributes and methods but they are still different objects. In this lesson, we'll see how identical methods operate differently depending on the pandas object they are called on.

Differences between Shared Methods

Use two syntactical options to extract a single column from a pandas DataFrame. I prefer the square bracket approach because it works 100% of the time. The alternative option is using dot syntax, which treats the columns as attributes of the larger DataFrame object.

Preview 07:57
Select One Column from a DataFrame
1 question

In this lesson, we'll select two or more columns from a pandas DataFrame. We'll still need bracket syntax to extract but now we'll include a Python list to specify the specific columns we'd like to pull out. The result will be a new DataFrame.

Select Two or More Columns from a DataFrame
Select Two or More Columns from a DataFrame
1 question

In addition to extracting existing columns, bracket syntax can be assed to create a new column on the right end of a DataFrame and populating it values. In this lesson, we'll also dive into the alternate .insert() method to insert a column into the middle of a DataFrame.

Add New Column to DataFrame

A broadcasting operation performs an operation on all values within a pandas object. In this lesson, we'll apply several mathematical operations to values in a DataFrame column (i.e. a Series) including the .add(), .sub(), .mul() and .div() methods. We'll also cover the operator shortcuts for these methods.

Broadcasting Operations on DataFrames

Refresh your memory on the .value_counts() Series method, which counts the number of times each unique value occurs within the Series. The result is a brand new Series.

A Review of the value_counts Method

Null values are represented with a NaN marker in pandas. In this lesson, we'll delete rows with null (NaN) values by caling the .dropna() method. We'll also modify the arguments of the method to specify how to select the rows to be deleted.

Drop DataFrame Rows with Null Values with the dropna Method

One alternative to dropping null value is populating them with a predefined value. In this lesson, we'll call the .fillna() method to accomplish this. We'll practice the method on both DataFrame and Series objects.

Fill in Null DataFrame Values with the fillna Method

Data types in a Series will not always be the types we want or the types that are best for efficiency. In this lesson, we'll convert the data types in a Series with the .astype() method. We'll also show how to overwrite an old Series with a Series of new data values.

Convert DataFrame Column Types with the astype Method

Call the .sort_values() method to sort the values in a DataFrame based on the values in a single column. The method is a bit more complex than when called on a single-dimensional pandas Series.

Preview 05:46

In this lesson, we'll explore additional parameters to the .sort_values() method to sort the values in a DataFrame based on the values in multiple columns. We'll also cover how to specify different sort orders (ascending vs. descending) on different columns.

Sort a DataFrame with the sort_values Method, Part II
The sort_values Method on a DataFrame
1 question

Call the .sort_index() method to sort the values in a DataFrame based on their index positions or labels instead of their values.

Sort DataFrame Indexwith the sort_index Method

Values in a Series can be ranked in order with the .rank() method. In this lesson, we'll practice this method on a numeric Series and then confirm the results through our own sort test.

Rank Series Values with the rank Method
+ DataFrames II: Filtering Data
10 lectures 01:22:02

In this lesson, we create the Jupyter Notebook for our new section, our second focusing on the 2D DataFrame object. The focus of this module is filtering data or, in other words, how we extract rows based on one or more conditions. We also introduce the employees.csv dataset that we'll be working with.

This Module's Dataset + Memory Optimization

In this lesson, we'll filter rows from the DataFrame based on a single condition. The logic involves creating a Boolean Series of True and False values, then passing it in square brackets after our DataFrame.

Preview 12:57

In this lesson, we'll explore more complex row filtering based on multiple conditions. The syntax requires some additional symbols (&) to specify that we want to check the truthiness of multiple conditions.

Filter DataFrame with More than One Condition (AND - &)

In this lesson, we'll continue filtering rows from the DataFrame based on multiple conditions. However, this time we'll use a new symbol ( | ) to specify an OR check. This requires only one of the tested conditions to evaluate to True in order to include the row.

Filter DataFrame with More than One Condition (OR - |)

One common problem is data analysis is extracting rows whose values fall within a collection of values. Instead of writing multiple OR statements, we can use the isin method and pass in a list of values to match against.

Check for Inclusion with the isin Method

Call the .isnull() and .notnull() methods to create Boolean Series for extracting rows will null or non-null values. Both methods return a Boolean Series object, which can be passed within square brackets after the DataFrame to filter it.

Check for Null and Present DataFrame Values with the isnull and notnull Methods

Call the .between() method to extract rows where a column value falls in between a predefined range. This is another method that return a Boolean Series object, which can be passed within square brackets after the DataFrame to filter it.

Check For Inclusion Within a Range of Values with the between Method

Call the .duplicated() method to create a Boolean Series and use it to extract rows that have duplicate values. This is another example of a method that returns a Boolean Series object, which can be passed within square brackets after the DataFrame to filter it.

Check for Duplicate DataFrame Rows with the duplicated Method

An alternative option to identifying duplicate rows and removing them through filtering is the .drop_duplicates() method. In this lesson, we'll invoke the method to remove rows with duplicate values in a DataFrame. We'll also provide custom arguments to modify how the method operates.

Delete Duplicate DataFrame Rows with the drop_duplicates Method

Call the .unique() and .nunique() methods on a Series to extract the unique values and a count of the unique values. These methods are one letter apart but return completely different results. In addition, the .nunique() requires an additional argument to include null values in its count.

Identify and Count Unique Values with the unique and nunique Methods
+ DataFrames III: Data Extraction
16 lectures 01:53:36

In this lesson, we introduce the third DataFrame-focused section of the course. The upcoming lessons cover how to:

  • set and reset an index in a DataFrame

  • retrieve DataFrame rows by index position or index label

  • set new values for one or more cells in the DataFrame

  • rename or delete rows or columns

  • extract a random sample of rows / columns

and more!

Intro to the DataFrames III Module + Import Dataset

Pandas will default to assigning a data structure a numeric index starting at 0. In this lesson, we'll explore how we can use the set_index and reset_index methods to customize and reset the index labels of a DataFrame object.

Use the set_index and reset_index methods to define a new DataFrame index

In this lesson, we'll use the .loc[] accessor to retrieve DataFrame rows based on index label. We also look at providing multiple index labels within a list.

Retrieve Rows by Index Label with loc Accessor

In this lesson, we'll use the .iloc[] accessor to retrieve DataFrame rows based on index position. We also look at providing multiple index positions within a list.

Retrieve Rows by Index Position with iloc Accessor

The .loc[] and loc accessors can take second arguments to specify the column(s) that should be extracted. In this lesson, we'll practice extracting movies from our dataset with this syntax.

Passing second arguments to the loc and iloc Accessors

In this lesson, we'll discuss how to assign a new value to one cell in a DataFrame. We first extract the cell value by using the .ix[] method with a row and column argument, then reset its value with the assignment operator (=).

Set New Value for a Specific Cell or Cells In a Row

In this lesson, we explore how we can overwrite multiple values in a DataFrame by passing a Boolean Series to the loc accessor. We also discuss how we can accidentally overwrite values on a slice of data rather than the original DataFrame itself.

Set Multiple Values in a DataFrame

In this lesson, we invoke the rename method on a DataFrame to change the names of the index labels or column names. We can either combine the mapper and axis parameters, or target the columns and index parameter exclusively. In either case, we provide an argument of a dictionary where the keys represent the current label names and the values represent the desired label names.

Preview 09:33

In this lesson, we practice 3 different syntactical options to delete rows or columns from a DataFrame. They include the .drop() method, the .pop() method, and Python's built in del keyword.

Delete Rows or Columns from a DataFrame

In this lesson, we'll call the .sample() method to pull out a random sample of rows or columns from a DataFrame. We'll specify the number of values to include by modifying the n parameter.

Create Random Sample with the sample Method

There is a shortcut available to pull out the rows with the smallest or largest values in a column. Instead of sorting the rows and using the .head() method, we can call the .nsmallest() and .nlargest() methods. We'll dive into these methods and their parameters in this lesson.

Use the nsmallest / nlargest methods to get rows with smallest / largest values.

Sometimes, you'll want to retain the structure of the original DataFrame when you extract a subset. In this lesson, we'll call the .where() method to return a modified DataFrame that holds NaN values for all rows that don't match our provided condition.

Filter A DataFrame with the where method

Our filtration process so far has involved using official pandas syntax. In this lesson, I'll introduce the .query() method, an alternate string-based syntax for extracting a subset from a DataFrame.

Filter A DataFrame with the query method

In this review of a lesson from our Series Module, we'll call the .apply() method on a Series to apply a Python function on every value within it. This will act as a foundation for the next lesson, where we'll invoke the same method on a DataFrame.

A Review of the apply Method on a pandas Series Object

The .apply() method applies a Python function on a row-by-row basis in a DataFrame. In this example, we'll create a custom ranking function for our films, then demonstrate how it can be applied to a DataFrame.

Apply a Function to every DataFrame Row with the apply Method

The default bracket syntax extracts a component of the larger DataFrame. Any operations on that component will affect the larger DataFrame. If we want to separate the two objects, we can use the .copy() method, which create an independent copy of a pandas object.

Create a Copy of a DataFrame with the copy Method
+ Working with Text Data
9 lectures 59:56

Datasets can arrive with plenty of improperly formatted text data. The Working with Text Data section introduces the methods available in pandas to clean your data. In this introductory lesson, we create a Jupyter Notebook for this sectionand import a CSV file with public data on employees in the city of Chicago. We also optimize the DataFrame for speed and efficiency.

Intro to the Working with Text Data Section

String methods in pandas require a .str prefix to operate properly. In this lesson, we'll explore four popular string methods we can invoke on all values in a Series:

  • str.lower() to convert a string's characters to lowercase

  • str.upper() to convert a string's characters to uppercase

  • str.title() to capitalize the first letter of every word in a string

  • str.len() to return a count of the number of characters in a string

Preview 07:14

The str.replace() method replaces a substring within a string with another value for all Series values. In this lesson, we use it to convert our Employee Annual Salary column to store numeric values instead of text ones.

Use the str.replace method to replace all occurrences of character with another

In this lesson, we'll introduce the .str.contains().str.startswith(), and .str.endswith() methods. All three create a Boolean Series, which can be used to extracting rows from a DataFrame. We'll also discuss case normalization to increase the accuracy of our results.

Filter a DataFrame's Rows with String Methods

In this lesson, we'll invoke the .str.strip() family of methods to remove leading and trailing whitespace from strings in a Series. The .str.lstrip() method removes whitespace from the left side (beginning) of a string, the .str.strip() method removes whitespace from the right side (end) of a string, and the .str.strip() method does both.

More DataFrame String Methods - strip, lstrip, and rstrip

The past few lessons focused on calling string methods on the values in a column of our dataset. In this lesson, we'll familiarize ourselves with calling the same string methods on the index labels and column names of a DataFrame.

Invoke String Methods on DataFrame Index and Columns

Strings can often contain multiple pieces of information that are separated by a common delimiter. In this lesson, we'll introduce the .str.split() method, which can split a string value based on an occurrence of a user-specified value. This is equivalent to the Text to Columns feature in Microsoft Excel.

Split Strings by Characters with the str.split Method

In this lesson, we'll utilize additional parameters on the .str.split() method to modify its performance. We'll extract the first names of all the employees in our dataset, a slightly more challenging puzzle than the one in the previous lesson.

More Practice with the str.split method on a Series

In this lesson, we'll explore even more parameters on the .str.split() method. The expand parameter allows us to expand the generated Python list into DataFrame columns while the parameter limits the total number of splits.

Exploring the expand and n Parameters of the str.split Method
+ MultiIndex
15 lectures 01:38:55

DataFrame or Series can hold multiple levels or layers in its index. The object that stores this index is called a MultiIndex. In this lesson, we create a Jupyter Notebook for this section and explore a new bigmac.csv dataset.

Intro to the MultiIndex Module

In this lesson, we'll create a multi-layer MultiIndex on a DataFrame with the .set_index() method. The method can be passed a list instead of a string to transfer multiple columns to the index.

Create a MultiIndex on a DataFrame with the set_index Method

The index attribute returns the underlying object that makes up the index of a DataFrame. In this lesson, we invoke the get_level_values method on the index to extract the values from one of its levels. We show how this can done either by the layer's index position or by its name.

Extract Index Level Values with the get_level_values Method

The levels or layers of a MultiIndex can be changed. In this lesson, we'll call the .set_names() method on a MultiIndex object to rename its levels.

Change Index Level Name with the set_names Method

In this lesson, we explore how the sort_index method operates on a MultiIndex DataFrame. We show how to sort all levels in the same order as well as how to vary up the sort order for different levels.

The sort_index Method on a MultiIndex DataFrame

In this lesson, we review the familiar .loc[] and .iloc[] accessors for extracting rows from a MultiIndex DataFrame. We discuss how to package up multiple level values within a tuple to be more precise in communicating what we want to extract.

Extract Rows from a MultiIndex DataFrame

In this lesson, we invoke the transpose method on a MultiIndex DataFrame to swap its row and column axes.  We then discuss how to use the loc accessor to attribute a column from a MultiIndex column index.

The transpose Method on a MultiIndex DataFrame

The swaplevel method swaps two levels within a MultiIndex. In this lesson, we practice moving around levels in the bigmac dataset. If the MultiIndex consists of only two levels, no additional arguments are required.

The .swaplevel() Method

The .stack() method stacks an index from the column axis to the row axis. It essentially transfers the columns to the row index. In this lesson, we'll see a live example on our bigmac dataset.

Preview 06:01

The .unstack() method does the exact opposite of the .stack() method. It moves an index level from the rows to the columns. In this lesson, we'll call the method without any arguments.

The .unstack() Method, Part 1

In this lesson, we'll continue our exploration of the .unstack() method. We'll introduce the numerous argument types we can feed it as arguments including positive integers, negative integers, and index level names.

The .unstack() Method, Part 2

Multiple levels of the row-based MultiIndex can be shifted with the .unstack() method. In this lesson, we'll explore how to provide a list argument to the level parameter to move multiple layers at a time. We'll also introduce the fill_value parameter to plug in missing values in the resulting DataFrame.

The .unstack() Method, Part 3

In this lesson, we'll reorganize the unique values in a DataFrame column as the column headers with the .pivot() method. This can be a particularly effective method for shortening the length of the DataFrame.

The pivot Method

In this lesson, we'll emulate Excel's Pivot Table functionality with the .pivot_table() method. We'll explore the valuesindex, column, and aggfunc parameters. We'll also discuss the variety of aggregation functions that we can use including sumcountmax, and min.

Use the pivot_table method to create an aggregate summary of a DataFrame

The pd.melt() can effectively perform anti-pivot operations. In this lesson, we'll call the method on a DataFrame to convert its current data structure into a more tabular format. We'll also explore the optional parameters available to modify the resulting column names in the new DataFrame.

Preview 05:59
+ The GroupBy Object
7 lectures 49:33

The pandas DataFrameGroupBy object allows us to create groupings of data based on common values in one or more DataFrame columns. In this lesson, we'll setup a new Jupyter Notebook in preparation for this module.

Intro to the Groupby Module

The GroupBy object does not offer us much of substance until we call a method on it. In this lesson, we'll call the .first().last(), and .size() methods on a GroupBy object to gain a better understanding of its internal data structure.

First Operations with groupby Object

The .get_group() method extracts a grouping from a GroupBy object. In this lesson, we'll practice pulling out a few groups from our companies dataset.

Retrieve a group from a GroupBy object with the get_group Method

Aggregation methods allow us to perform calculations on all groupings within a GroupBy object. In this lesson, we'll call some mathematical methods on the groups, including the .sum().mean(), and .max() methods.

Methods on the Groupby Object and DataFrame Columns

GroupBy object does not have to be made up of values from a single column. In this lesson, we'll create a new GroupBy object based on unique value combinations from two of our DataFame columns.

Grouping by Multiple Columns

Certain situations may require different aggregation methods on different columns within our groupings. In this lesson, we'll invoke the .agg() method on our GroupBy object to apply a different aggregation operation to each inner column.

Preview 06:11

A standard Python for loop can be used to iterate over the groups in a pandas GroupBy object. In this lesson, we'll loop over all of our gropings to extract selected rows from each inner DataFrame. We'll append these rows to a running DataFrame and then view the final result.

Iterating through Groups
+ Merging, Joining, and Concatenating DataFrames
11 lectures 01:23:35

Welcome to the Merging, Joining, and Concatenating section! In this module, we'll cover how to combine data from multiple DataFrames into one. In this section, we create a new Jupyter Notebook and introduce the 4 CSV files that we will be using.

Intro to the Merging, Joining, and Concatenating Section

The pd.concat method concatenates two or more DataFrames together. The process is simple when the DataFrames have an identical structure (i.e. the same column names). In this lesson, we also explore how to replace the merged index with a newly generated once.

The pd.concat Method, Part 1

In this lesson, we use the keys parameter on the pd.concat method to label each concatenated DataFrame with a unique identifier. This parameter yields a MultiIndex DataFrame where the outermost layer holds the keys and the innermost layer holds each DataFrame's original index values.

The pd.concat Method, Part 2

An inner join merges the values in two DataFrames based on common values across one or more columns. In this lesson, we'll explore the concept by merging on identical values in a single column.

Inner Joins, Part 1

This lesson continues our exploration of the .merge() method. This time, we'll merge the values in two DataFrames based on common values in multiple columns. We'll also validate the data with some filtering.

Inner Joins, Part 2

An outer join combines values that exist in either DataFrame into a central DataFrame. In this lesson, we'll invoke the .merge() method with a modified argument to the how parameter to perform an outer join on our weekly sales data sets.

Outer Joins

left join establishes one of the DataFrames as the base dataset for the merge. It attempts to find each value in another DataFrame and drag over that DataFrame's rows when there's a value match. In this lesson, we'll practice executing this join with the .merge() method.

Preview 09:19

DataFrames may come equipped with different names for columns that represent the same data. In this lesson, we'll talk about how to utilize the left_on and right_on parameters to specify how to match values in differently named columns across two DataFrames.

The left_on and right_on Parameters

Our merges so far have involved matches based on common column values. In this lesson, we'll explore how to merge DataFrames based on common index labels.

Merging by Indexes with the left_index and right_index Parameters

Call the .join() method, a simple method to concatenate two DataFrames vertically when they share the same index. This is a shortcut to a more explicit .merge() method.

The .join() Method

Call the pd.merge() method on the pandas library to merge two DataFrames. This is an alternate syntax to calling the .merge() method directly on a DataFrame.

The pd.merge() Method
+ Working with Dates and Times in Datasets
17 lectures 02:23:09

The Working with Dates and Times section offers a review of Python's built-in datetime objects as well as a comprehensive introduction to similar tools in the pandas library. In this lesson, we setup our Jupyter Notebook and import Python's datetime module.

Intro to the Working with Dates and Times Module

Python includes built-in date and datetime objects for working with dates and times. This lesson offers a review of how we can create these objects as well as some of the attributes (.year, .month, .day etc) that are available on them.

Review of Python's datetime Module

The pandas library includes its own Timestamp object to represent moments in time. In this lesson, we'll use the pd.Timestamp() constructor method with a variety of inputs (strings, date objects, date objects) to create some Timestamp objects.

The pandas Timestamp Object

DatetimeIndex is a pandas object for storing multiple Timestamp objects. In this lesson, we'll create a few DatetimeIndex objects from Python lists.

The pandas DateTimeIndex Object

The pd.to_datetime() method is a convenience method to convert various inputs to pandas-focused objects. In this lesson, we'll pass a variety of inputs (date objects, datetime objects, strings, lists) to the constructor method to see what it returns.

The pd.to_datetime() Method

Over the course of the next three lessons, we'll call the pd.date_range() method to generate a DatetimeIndex of Timestamp objects. This constructor method includes 3 critical parameters (startend, and periods); we need to provide 2 of these 3 for it to function. In this lesson, we'll see how the pd.date_range() method operates with arguments for the start and end parameters.

Preview 10:22

In this lesson, we'll see how the pd.date_range() method operates with arguments for the start and periods parameters. This approach creates a set number of dates beginning from a specific point.

Create Range of Dates with the pd.date_range() Method, Part 2

In this lesson, we'll see how the pd.date_range() method operates with arguments for the end and periods parameters. This approach creates a set number of dates, proceeding backwards from a specified date point. We'll also continue our exploration of the freq parameter to vary the durations between each Timestamp.

Create Range of Dates with the pd.date_range() Method, Part 3

The .dt accessor on a Series of Timestamp object allows us to access specific datetime properties, much like the .str accessor allows us to call specific methods on a Series of strings. In this lesson, we'll explore popular attributes like .day.weekday_name, and .month.

The .dt Accessor

Upcoming lessons rely on the pandas-datareader library to fetch financial datasets from Yahoo Finance. In this lesson, we'll  install the pandas-datareader library.

On a Mac system, open the Terminal. On a Windows machine, look for the Anaconda Prompt from the Start Menu.

Once the application is open, run the following commands.

  1. conda activate followed by your environment name (for example, conda activate pandas_playground)

  2. conda install pandas-datareader

Install pandas-datareader Library

In this lesson, we use our pandas_datareader library to fetch stock data for Microsoft. The result arrives in a DataFrame object with a DatetimeIndex.

Import Financial Data Set with pandas_datareader Library

Extracting rows from a DataFrame with a DatetimeIndex is no different than in previous sections. In this lesson, we review the familiar .loc and .iloc accessors. As a reminder, these methods use a pair of square brackets to target one or more rows by either index label or index position.

Selecting Rows from a DataFrame with a DateTimeIndex

In this lesson, we'll explore some of the attributes and methods on a pandas Timestamp object. We'll also practice extracting similar information from a complete DatetimeIndex of Timestamps.

Timestamp Object Attributes and Methods

In this lesson, we'll use the pd.DateOffset object to add hours, days, weeks, months, and years to each value in a DatetimeIndex.

The pd.DateOffset Object

In this lesson, we'll explore how we can use timeseries offsets to arrive at specific datetime values (such as the end of the month or the start of the year).

Timeseries Offsets

Over the next two lessons, we'll explore the pandas Timedelta object which represents durations. A Timedelta represents a distance of time while a Timestamp represents a specific moment in time.

The Timedelta Object

In this lesson, we'll create a Series of Timedelta objects by calculating the duration differences between two columns of Timestamps. Time difference operations can be easily performed with the subtraction ( - ) sign.

Timedeltas in a Dataset
  • Basic / intermediate experience with Microsoft Excel or another spreadsheet software (common functions, vlookups, Pivot Tables etc)
  • Basic experience with the Python programming language
  • Strong knowledge of data types (strings, integers, floating points, booleans) etc

Student Testimonials:

  • The instructor knows the material, and has detailed explanation on every topic he discusses. Has clarity too, and warns students of potential pitfalls. He has a very logical explanation, and it is easy to follow him. I highly recommend this class, and would look into taking a new class from him. - Diana

  • This is excellent, and I cannot complement the instructor enough. Extremely clear, relevant, and high quality - with helpful practical tips and advice. Would recommend this to anyone wanting to learn pandas. Lessons are well constructed. I'm actually surprised at how well done this is. I don't give many 5 stars, but this has earned it so far. - Michael

  • This course is very thorough, clear, and well thought out. This is the best Udemy course I have taken thus far. (This is my third course.) The instruction is excellent! - James

Welcome to the most comprehensive Pandas course available on Udemy! An excellent choice for both beginners and experts looking to expand their knowledge on one of the most popular Python libraries in the world!

Data Analysis with Pandas and Python offers 19+ hours of in-depth video tutorials on the most powerful data analysis toolkit available today. Lessons include:

  • installing

  • sorting

  • filtering

  • grouping

  • aggregating

  • de-duplicating

  • pivoting

  • munging

  • deleting

  • merging

  • visualizing

and more!

Why learn pandas?

If you've spent time in a spreadsheet software like Microsoft Excel, Apple Numbers, or Google Sheets and are eager to take your data analysis skills to the next level, this course is for you! 

Data Analysis with Pandas and Python introduces you to the popular Pandas library built on top of the Python programming language. 

Pandas is a powerhouse tool that allows you to do anything and everything with colossal data sets -- analyzing, organizing, sorting, filtering, pivoting, aggregating, munging, cleaning, calculating, and more! 

I call it "Excel on steroids"!

Over the course of more than 19 hours, I'll take you step-by-step through Pandas, from installation to visualization! We'll cover hundreds of different methods, attributes, features, and functionalities packed away inside this awesome library. We'll dive into tons of different datasets, short and long, broken and pristine, to demonstrate the incredible versatility and efficiency of this package.

Data Analysis with Pandas and Python is bundled with dozens of datasets for you to use. Dive right in and follow along with my lessons to see how easy it is to get started with pandas!

Whether you're a new data analyst or have spent years (*cough* too long *cough*) in Excel, Data Analysis with pandas and Python offers you an incredible introduction to one of the most powerful data toolkits available today!

Who this course is for:
  • Data analysts and business analysts
  • Excel users looking to learn a more powerful software for data analysis