
This course includes our updated coding exercises so you can practice your skills as you learn.
See a demo
Welcome to Data Analysis with Pandas and Python! In this lesson, we'll introduce the pandas library, the Python language, the structure of the course, the prerequisites, and the setup process.
In this lesson, we introduce the Terminal application for issuing commands to the system via a command-line. We also introduce the ls, pwd, cd, and clear commands and the Tab Autocompletion features.
In this lesson, we install the uv command-line tool for managing Python projects and dependencies. We also setup autocompletion for uv commands and discuss how to uninstall the tool.
It's time to get the course materials! In this lesson, we download the course repository and setup Python and our dependencies (Pandas, JupyterLab, and more) with the uv sync command.
In this lesson, we introduce the PowerShell/Terminal application for issuing commands to the system via a command-line. We also introduce the ls, pwd, cd, and clear commands and the Tab Autocompletion features.
In this lesson, we install the uv command-line tool for managing Python projects and dependencies. We also setup autocompletion for uv commands and discuss how to uninstall the tool.
It's time to get the course materials! In this lesson, we download the course repository and setup Python and our dependencies (Pandas, JupyterLab, and more) with the uv sync command.
In this lesson, we walk through the process of starting up and shutting down Jupyter Lab, our coding environment. We open some sample Jupyter Notebooks and describe how a Python server runs continuously in the background, waiting to execute the contents of a code cell.
In this lesson, we walk through the Jupyter Lab interface. A Notebook consists of cells, which can have different types (Markdown. We introduce some common actions like adding cells, deleting cells, restarting the kernel, and more.
In this lesson, we configure our Jupyter Notebook settings to enable Ruff, a code formatter that styles our Python code.
To conserve memory, Jupyter won't load Python modules into your Notebook automatically. In this lesson, we use the import keyword to bring the pandas library into a Notebook. We also talk about assigning aliases with the as keyword.
A comment is a line ignored by the Python interpreter when the program/cell runs. Declare a comment with a hashtag (#) symbol.
In this lesson, we introduce common data types in Python including integers, floating-points, strings, Booleans, and None.
In this lesson, we discuss common mathematical and logical operators including addition, subtraction, multiplication, two types of division, concatenation, and modulo.
In this lesson, we focus on the equality ( == ) and inequality ( !=) operators for comparing two values against each other.
A variable is a name we assign to a value in our program. In this lesson, we practice declaring variables and discuss Python community conventions for naming them.
A function is a reusable procedure, a sequence of steps to follow in order. In this lesson, we introduce Python's built-in functions and the syntax for invoking them. We cover len, str, int, and more.
Now it's time to build our own custom functions! In this lesson, we define a custom temperature conversion function from start to finish.
A method is a function attached to an object. It's a command or action we can ask the object to take. In this lesson, we explore some common string methods including lower, upper, strip, and replace.
A list is a mutable collection of ordered values. In this lesson, we learn the square bracket syntax for declaring lists as well as some common methods like pop and append.
Python assigns each list element and each string character an index position that reflects its place in line. In this lesson, we learn how to extract elements and characters from their lists/strings using square bracket notation. The index starts counting from 0!
A tuple is an immutable list. It's an ordered sequence of values in order but it cannot be modified after creation. We technically declare a tuple with a comma-separated sequence of values but the community convention is to wrap the sequence in parentheses.
A dictionary is a mutable collection of key-value pairs. A key serves as a unique identifier for a value. The keys must be unique, while the values can contain duplicates. In this lesson, we practice declaring some dictionary objects.
A class is a blueprint/template for creating an object, which we call an instance. The class defines the attributes and methods that all objects/instances will have. In this lesson, we walk through the terminology and provide a real-world analogy.
In this lesson, we review the import keyword for importing either a Python module or a library like pandas. We import the datetime library and use the as keyword to assign it an alias of dt.
In this lesson, we utilize the same import keyword to bring the pandas library into our Jupyter Notebook. We'll have to repeat this step in every Jupyter Notebook.
A Series is a one-dimensional labelled array that combines the best features of a list and a dictionary. In this lesson, we instantiate our first Series objects and introduce the index, the collection of identifiers for the Series's values.
In this lesson, we practice creating Series objects with dictionaries as the data source. Pandas will use the keys for the Series's index labels ad the values for the Series's values.
In this lesson, we invoke some sample methods like sum, product, and mean on Series objects. Methods utilize a dot, then the method name and a pair of parentheses.
An attribute is a piece of data that lives on an object. It's a fact, a detail, a characteristic of the object. In this lesson, we access various attributes on the Series and introduce the concept of composition, where an object is made up of many smaller objects.
A parameter is the name for an expected input to a function/method/class instantiation. An argument is the concrete value we provide for a parameter during invocation. In this lesson, we discuss the data and index parameters of the Series constructor.
A CSV is a plain text file that uses line breaks to separate rows and commas to separate row values. In this lesson, we use the pd.read_csv function to import 2 CSV datasets into pandas. We also introduce the 2-dimensional DataFrame object and learn how to convert it to a 1-dimensional Series with the squeeze method.
The head method returns a number of rows from the beginning of the Series. The complementary tail method returns a number of rows from the end of the Series.
In this lesson, we pass a Series to Python's built-in functions including len, type, list, dict, sorted, max, and min.
In this lesson, we practice using Python's in and not in keywords to check for inclusion among the Series's values and index labels. We utilize the index and values attribute to make sure we perform the search within the right collection.
The sort_values method sorts a Series's values in order. In this lesson, we invoke the method on both our alphabetical and numeric Series and also learn how to customize the sort type with the ascending parameter.
In this lesson, we set a custom index on our Series with the read_csv function's index_col parameter and learn how to sort an index using the sort_index method.
In this lesson, we use the iloc accessor to extract a Series value by its index position. iloc is short for "index location" and requires a special square bracket syntax. It supports single values, Python lists, and slices as well.
In this lesson, we use the loc accessor to extract a Series value by its index label. loc requires a special square bracket syntax. Like the iloc accessor, it supports single values, Python lists, and slices.
In this lesson, we introduce the get method for retrieving a Series value by index label and providing a fallback value in case the label does not exist. The default fallback value is None.
In this lesson, we show the syntax to overwrite a Series value. We first target it with the iloc/loc accessor, then provide an equal sign and the value to overwrite the origin value with.
In this lesson, we introduce the Copy-on-Write principle introduced in Pandas 3. Pandas will create a copy when a mutational operation occurs. We can treat any filtered subset or targeted segment as effectively a copy, even though Pandas will try to reuse the same memory chunks under the hood.
In this lesson, we run through some common mathematical methods on Series including count, sum, product, mean, max, min, median, mode, and more.
Broadcasting describes the process of applying a consistent arithmetic operation to an array. We can combine mathematical operators with a Series to apply the mathematical operation to every value. In this lesson, we practice adding and subtracting a consistent value from every Series entry.
In this lesson, we show how Pandas uses index labels to align multiple Series together when performing mathematical operations between them.
In this lesson, we explore the value_counts method, which returns the number of times each distinct value occurs in the Series. The normalize parameter returns the relative frequencies/percentages of the values instead of the counts.
In this lesson, we use the apply method to invoke a function for every Series value. Pandas collects the results in a new Series. The advantage of apply is that we can utilize basic Python code to achieve whatever manipulation we want. If we don't know a specific Series method but can accomplish the same result with Python constructs, apply can be a useful tool.
The map method connects each Series value to a complementary value from another data structure. It provides a connection/association to the other value. In this lesson, we practice using the method with arguments of a dictionary and a Series.
A DataFrame is a 2-dimensional table with an index. In this lesson, we introduce this new data structure and explore some of the methods and attributes it shares with the Series object. We also identify some unique attributes that exist only on one object but not the other.
In this lesson, we do a deeper dive into the sum method and how it operates differently between Series and DataFrame objects.
In this lesson, we introduce two syntax options to extract a column from a DataFrame: attribute access and square brackets. We also discuss the tradeoffs between the two approaches.
In this lesson, we learn how to extract multiple DataFrame columns by passing a list between the square bracket extraction syntax. Pandas returns a copy/new DataFrame when extracting multiple columns.
In this lesson, we add a new column to a DataFrame using square bracket notation. We show how to populate the new Series with a single value or a dynamic calculation from performing an operation on another Series's values.
In this lesson, we utilize the assign method to return a new DataFrame with new columns. Each keyword parameter represents the new column name and the complementary value represents the contents to populate the new column with.
In this lesson, we practice using the dropna method to remove DataFrame rows consisting of missing/NaN values. We discuss how to target rows that only hold missing values as well as rows with a missing value in a target column.
In this lesson, we explore an alternative approach for dealing with missing values: using the fillna method to populate missing values with a static value. We invoke the method on both a DataFrame and a Series.
In this lesson, we use the ffill and bfill method to forward-fill and back-fill the previous/next present value whenever there is a missing value. We also discuss how to set a max limit on the number of replaced consecutive values.
In this lesson, we introduce the astype method for converting the data types in a Series. We practice converting our floating-point columns to store integers.
In this lesson, we practice using the pd.to_numeric function to convert a Series's values into numeric types. One of the advantages of to_numeric over astype is the ability to react to errors in conversion.
In this lesson, we introduce the select_dtypes method to target DataFrame columns by their data type.
In this lesson, we introduce the category type, which is ideal when you have a small number of distinct values within a column. Categories help reduce total memory consumption.
In this lesson, we introduce attributes that expose objects with additional methods and attributes. There's often a category for a specific data type. For example, string columns have string methods underneath a str attribute/namespace and datetime method underneath a dt attribute/namespace.
In this lesson, we explore the sort_values method on a DataFrame. The default sort order is ascending (smallest to greatest, alphabetical), but we can customize the order with the ascending parameter. We also discuss the na_position parameter for placing the NaN values at the beginning or end of the sorted values.
In this lesson, we sort a DataFrame by multiple columns by passing a list of column names to the by parameter. We also customize the sort order for each type by passing a list to the ascending parameter.
In this lesson, we showcase how to use the category data type to set a custom sort order for the values in a column.
The sort_index method sorts a DataFrame by the index labels. In this lesson, we explore the method and a few of its parameters.
In this lesson, we learn the rank method for ordering and ranking the values in a Series. We use it to the rank our NBA players by their salaries, with the top player earning a rank of #1. We also show various approaches for dealing with ties.
Welcome to the next section of the course! In this lesson, we import and introduce the new employees DataFrame. We also convert some columns to their optimal formats (Booleans, categories, etc) and introduce the to_datetime function at the top level of pandas.
To filter a DataFrame, we must first generate a Boolean Series, then pass it in square brackets after the DataFrame. In this lesson, we practice extraction using a variety of data types and operations (equality, less than, greater than, and more).
In this lesson, we introduce the & operator for combining two Boolean Series with AND logic. We use this technique to filter a subset of DataFrame rows that fit multiple conditions.
In this lesson, we introduce the | operator for combining two Boolean Series with OR logic. We use this technique to filter a subset of DataFrame rows that fit either one of several conditions. We also discuss caveats when combining & and | in the extraction syntax.
The isin method checks for each Series's value presence in a predefined list. It returns a Boolean Series; a True indicates the row's value is found within the collection.
In this lesson, we discuss the isnull and notnull methods. They generate Boolean Series that validate whether a row's value is NaN (missing/absent) or non-NaN.
In this lesson, we utilize the between method to check if each Series value exists within a range/boundary of values. Both endpoints are inclusive. We utilize the resulting Boolean Series to filter our DataFrame.
The duplicated method marks a row's record as a duplicate when pandas encounters the value for a second time (and beyond). The first occurrence is not marked as a duplicate. In this lesson, we discuss the nuances of this method and the parameters we can customize to target the first duplicate, the last duplicate or all duplicates.
In this lesson, we explore the drop_duplicates method for removing rows with duplicate values from a DataFrame. We discuss how to declare a subset of columns to search for the duplicates within and also review the keep parameter options from the previous lesson.
In this lesson, we use the where method, which accepts a Boolean Series but returns a DataFrame with the same dimensions as the original one. Rows that meet the condition are retained, and rows that do not meet the condition are populated with NaNs (missing values).
The query method enables extracting a subset of DataFrame rows using natural language. We explore a few scenarios (equality, inequality, greater-than, inclusion, and even referencing an external Python variable).
Welcome to the DataFrames: Data Extraction section. In this lesson, we introduce the James Bond movie dataset we'll be using throughout the section.
The set_index method sets a column as the new DataFrame index, replacing the current index. The complementary reset_index method brings the current index into the table as a regular column and generates the standard numeric index.
The iloc accessor extracts a DataFrame row by its numeric index position. We also discuss the variety of filtering options (single value, list, slice, and slice shortcuts). The complementary loc accessor extracts a DataFrame row by its index label. We also discuss the variety of filtering options (single value, list, slice, and slice shortcuts).
In this lesson, we pass a second value inside the square brackets for loc and iloc. We also discuss the variety of filtering options (single value, list, slice, and slice shortcuts).
In this lesson, we overwrite a single value in the DataFrame and discuss a warning you may encounter when working with filtered DataFrames in pandas 3.0. We discuss a solution to the problem: passing a Boolean Series directly to the loc accessor.
In this lesson, we use the rename method for renaming one or more index labels on the row or column axis. We also discuss overwriting the DataFrame's columns attribute directly.
In this lesson, we discuss the drop method, the pop method, and Python's del keyword for deleting DataFrame columns.
In this lesson, we introduce the sample method for extracting one or more random rows or columns from the DataFrame. We can specify a percentage of total rows to target.
In this lesson, we discuss the nsmallest and nlargest methods for extracting rows with the smallest or largest values from a given DataFrame column. This is a faster, simpler alternative to using the sort_values method.
The clip method helps round values in two ways: up if they fall below a threshold and down if they fall above a threshold. NaN (missing) values will remain NaN.
In this lesson, we re-introduce the apply method for invoking a function once per every DataFrame row. We write a custom function for categorizing the Bond movies based on my own arbitrary film preferences. We then pass the function to the apply method; Pandas supplies each row as a Series into the function.
Welcome to the Working with Text Data section! In this lesson, we import and optimize our chicago.csv dataset. It contains data for public employees in the city of Chicago (name, title, department, salary).
In this lesson, we access the str attribute on a Series to access the StringMethods object. This object enables string-based operations across all Series's values. We practice using common string methods like lower, upper, title, and strip.
We can filter a subset of DataFrame rows as long as we have a Boolean Series. In this lesson, we introduce some string-based methods for generating those Series including contains, startswith, and endswith. We also talk about normalizing data before performing our inclusion checks.
In this lesson, we apply string-based operations to the row and column indexes of the DataFrame. The process remains the same -- access the str property to get access to the StringMethods object, then invoke the correct method on the nested object.
In this lesson, we review the split method on Python strings and then apply it to a column within the chicago DataFrame. We find the most common first word among job titles in the city of Chicago.
In this lesson, we introduce two additional parameters to the split method: the expand method expands the list into new DataFrame columns and n limits the maximum number of splits. We use these strategies to find the most common first name among the employees.
The explode method extracts every element within a list into a separate row in the resulting Series. In this lesson, we practice extracting every employee skill from a Series of lists.
Welcome to the MultiIndex section. A MultiIndex is an index that consists of multiple levels or tiers. In this lesson, we create our first MultiIndex DataFrame with both the set_index method and the index_col parameter of the read_csv function. We discuss the benefits of a MultiIndex and best strategies for determining which layer to place first.
In this lesson, we utilize the get_level_values method on the MultiIndex object to pull out the values from a certain level within the larger MultiIndex. We also use the set_names method on the MultiIndex to rename one or more levels of the MultiIndex.
In this lesson, we review the iloc and loc accessors for extracting DataFrame rows and columns. We practice new syntax that incorporates the levels of a MultiIndex when targeting a specific row or a slice of rows.
The xs (cross-section) method allows us to extract values based on a match in a MultiIndex level. The benefit over loc is that we can target based on a nested level.
The swaplevel method replaces/swaps multiple levels of the MultiIndex. The order of the levels doesn't matter.
In this lesson, we learn the transpose method for swapping the axes of the DataFrame. The method will move the row axis to the column axis and move the column axis to the row axis
In this lesson, we use the stack method to move an index level from the column axis to the row axis. This action will automatically create a MultiIndex on the row axis.
In this lesson, we learn the unstack method to move an index level from the row axis to the column axis. This action will automatically create a MultiIndex on the column axis.
In this lesson, we tackle the pivot method for reshaping a DataFrame. The pivot method converts a long dataset to a wide one by distributing row values across multiple columns. It's ideal when you want to summarize a long collection of values.
In this lesson, we introduce the complementary melt method for reshaping a DataFrame. It acts as an inverse of then pivot method. It converts a wide dataset to a long one by consolidating multiple columns's values into a single column. The column headers are placed in a column, and the values are placed in another one.
The pivot_table offers similar functionality to Excel's Pivot Table feature. It organizes a table of data based on grouping distinct values on either the row axis or column axis (or both), then applies an aggregation function to each collection. Aggregation functions include sum, count, average, max, min, and more.
In this lesson, we introduce the Fortune 1000 dataset that we'll be utilizing throughout the section. We employ the groupby method to create a DataFrameGroupBy object holding a collection of nested DataFrames, one for each distinct value in the Sector column.
In this lesson, we learn the get_group method to extract a nested DataFrame from a GroupBy object. The GroupBy object will store 20+ groups, one DataFrame for each distinct Sector value.
In this lesson, we apply aggregation operations like sum, average, max, and min to the GroupBy object. We target the specific columns we want to apply the operations to.
In this lesson, we describe the agg method, an alternative strategy for performing aggregation calculations (sum, mean, count, etc) on the nested DataFrames in a GroupBy object.
In this lesson, we pass a list of columns to the groupby method to create a DataFrameGroupBy object that accounts for each unique combination of sector and industry.
In this lesson, we introduce two ways to iterate over the groups within a GroupBy object. The first is with the traditional Python for loop, which yields the group name and DataFrame within a two-element tuple. The second approach is using the apply function to invoke a function upon every nested group and capture the function's return value in a new Series.
In this section, we'll explore various ways to merge or join two DataFrames together including concatenation, inner joins, left joins, outer joins, and more. To kick things off, we introduce the 4 datasets we'll be using throughout the lessons. They model a library management system and include books, members, and checkouts.
The pd.concat function appends one DataFrame to the end of another. In this lesson, we combine our jan and feb checkouts DataFrames in a vertical/row-axis direction.
In this lesson, we utilize the pd.concat function for concatenating on the column axis. Pandas glues the columns from the second DataFrame on the right-side of the left DataFrame.
A left join brings in rows from a right DataFrame whenever there is a match with a column value in the left DataFrame. In this lesson, we introduce the merge method, the primary approach for joining DataFrames in pandas, and apply a left join to our checkouts and and books tables.
The left_on and right_on parameters perform a join where the column names do not match between the two DataFrames. Each one accepts the column name from the respective DataFrame. We can only use the on parameter if the two column names match between the DataFrames.
An inner join identifies the shared values between two DataFrames. Values that exist in one DataFrame but not the other are excluded. In this lesson, we use an inner join to identify the members who took out a book in both January and February.
A join is not limited to a single column. In this lesson, we pass a list of columns to the on parameter ensure a join based on values across multiple columns. We use this technique to identify the members who came in both January and February and who took out the same book each month.
A full join keeps all rows from both tables. Wherever this is a match, pandas will combine the row values together. Whenever a value only exists in one table, it will still be kept -- with complementary NaN values for the other table's columns.
The left_index and right_index parameters of the merge method perform a join based on matching values in the indices of the respective tables. In this lesson, we practice joining the books and members tables together.
The join method offers a convenient shortcut when merging two DataFrames together using shared index labels.
** Newly recorded in 2026 for the release of Pandas 3 **
Student Testimonials:
The instructor knows the material, and has detailed explanation on every topic he discusses. Has clarity too, and warns students of potential pitfalls. He has a very logical explanation, and it is easy to follow him. I highly recommend this class, and would look into taking a new class from him. - Diana
This is excellent, and I cannot complement the instructor enough. Extremely clear, relevant, and high quality - with helpful practical tips and advice. Would recommend this to anyone wanting to learn pandas. Lessons are well constructed. I'm actually surprised at how well done this is. I don't give many 5 stars, but this has earned it so far. - Michael
This course is very thorough, clear, and well thought out. This is the best Udemy course I have taken thus far. (This is my third course.) The instruction is excellent! - James
Welcome to the most comprehensive Pandas course available on Udemy! An excellent choice for both beginners and experts looking to expand their knowledge on one of the most popular Python libraries in the world! This course has been re-recorded from scratch in 2026 for the release of Pandas 3.
Data Analysis with Pandas and Python offers 19+ hours of in-depth video tutorials on the most powerful data analysis toolkit available today. Lessons include:
installing
sorting
filtering
grouping
aggregating
de-duplicating
pivoting
munging
deleting
merging
visualizing
and more!
Why learn pandas?
If you've spent time in a spreadsheet software like Microsoft Excel, Apple Numbers, or Google Sheets and are eager to take your data analysis skills to the next level, this course is for you!
Data Analysis with Pandas and Python introduces you to the popular Pandas library built on top of the Python programming language.
Pandas is a powerhouse tool that allows you to do anything and everything with colossal data sets -- analyzing, organizing, sorting, filtering, pivoting, aggregating, munging, cleaning, calculating, and more!
I call it "Excel on steroids"!
Over the course of more than 19 hours, I'll take you step-by-step through Pandas, from installation to visualization! We'll cover hundreds of different methods, attributes, features, and functionalities packed away inside this awesome library. We'll dive into tons of different datasets, short and long, broken and pristine, to demonstrate the incredible versatility and efficiency of this package.
Data Analysis with Pandas and Python is bundled with dozens of datasets for you to use. Dive right in and follow along with my lessons to see how easy it is to get started with pandas!
Whether you're a new data analyst or have spent years (*cough* too long *cough*) in Excel, Data Analysis with pandas and Python offers you an incredible introduction to one of the most powerful data toolkits available today!