Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

The Ultimate Pandas Bootcamp: Advanced Python Data Analysis

Name: The Ultimate Pandas Bootcamp: Advanced Python Data Analysis
Rating: 4.5 (2717 reviews)

Master the powerful pandas library to analyze, manipulate and visualize data. More than 10 datasets & bonuses included!

Bestseller

Created byAndy Bek

Last updated 1/2026

English

What you'll learn

Learn everything there is to know about pandas - from absolute scratch!
Gain a deep and hands-on understanding of pandas data structures.
Transform, clean, filter, groupby, pivot, and otherwise manipulate a any dataset.
Understand related computer science topics like random-number generators, binary operators, memory pointers, and more!
Practice reading data from the web, pickles, Excel files right within pandas.
Discover and learn hundreds of methods, attributes, and techniques to manipulate data in pandas and python.

Course content

15 sections • 333 lectures • 32h 8m total length

Course Structure1:25
Explore the core pandas data structures, starting with series and dataframes, to build a solid foundation and prepare for advanced data analysis across the bootcamp.
Pandas Is Not Single1:58
Master pandas for advanced data analysis in Python by learning its fast, flexible data manipulation capabilities, its dependency on numpy and matplotlib, and starting in a Python environment.
Anaconda3:20
Learn to install and use the Anaconda data science distribution, manage environments with the Navigator, and work with pandas and Jupyter Notebook, plus compare Google Colab as a cloud alternative.
Jupyter Notebooks11:36
Launch a Jupyter notebook from the Anaconda Navigator and learn to run Python code in an interactive environment with code and markdown cells, using shortcuts like Shift+Enter.
Cloud vs Local5:40
Compare cloud versus local data science setups using Anaconda or Miniconda, Jupyter notebooks, and Google Colab, highlighting free GPU access and cross-platform code applicability.
Hello, Python4:16
Master pandas by understanding that Python runs via an interpreter, with CPython as the standard implementation, and use Appendix A to cover fundamentals and data types.
NumPy11:42
Explore NumPy, the numerical Python library behind pandas, by learning about ndarrays, fast ufuncs, and contiguous memory storage that enables performance.
All Course Resources0:13

Section Intro1:06
Explore the fundamentals of Pandas series, including their attributes and methods, and learn data selection and indexing techniques that underpin the rest of the course.
What Is A Series?3:57
Learn how Pandas Series are one-dimensional labeled arrays that store values of any data type, built from Python lists with the Series constructor, and support mixed types.
Parameters vs Arguments2:48
Discover the difference between parameters and arguments and see how Python passes data to functions and Pandas constructors, using the data parameter and actual arguments with concrete examples.
What’s In The Data?5:48
Explore how pandas series derive labels from Python lists or dictionaries, compare list and dictionary inputs, and understand automatic integer indexing when labels are absent.
The .dtype Attribute2:15
Pandas infers a series dtype from data and lets you specify a dtype manually, such as floats for numbers, while strings yield object dtype; use type to inspect data type.
BONUS: What Is dtype('o'), Really?3:23
Discover how NumPy's fixed-size arrays influence Pandas, showing why strings become dtype('o') and how Pandas stores string data as pointers rather than actual values.
Index And RangeIndex7:10
Explore how pandas automatically aligns data by the index, and learn to create custom labels for series using the index parameter, including range index, start-stop semantics, and immutability for performance.
Series And Index Names5:20
Name a Pandas series to assign a readable label, explore the name attribute, and see how a series and its index names become column names and labels in data frames.
Skill Challenge2:03
Create a four-item list of actor names and a corresponding ages list, then build a pandas series with ages labeled by actor names.
Solution4:13
Create a labeled pandas series of actor ages from Python lists, using the long form with keyword arguments, labeling ages by actor names, then explore dictionary and zip methods.
Another Solution3:02
Explore dictionary comprehension to pair actor names with ages from zip outputs, creating a dict suitable for pandas series, a concise, pythonic pattern beyond loops.
The head() And tail() Methods5:43
Explore data quickly with the head and tail methods on pandas series, learning how to preview the first or last records, and control output size with the n parameter.
Extracting By Index Position7:34
Access and extract values from a pandas series by index position using square bracket indexing, zero-based indexes, and slices. Explore using negative indices and custom labels to retrieve items efficiently.
Accessing Elements By Label6:59
Learn to access items in a pandas series by labeled index using square brackets, contrasting label-based with position-based indexing. See how a labeled alphabet demonstrates inclusive slicing by label.
BONUS: The add_prefix() And add_suffix() Methods3:30
Learn how to use pandas add_prefix and add_suffix to modify index labels for a series or column labels for a dataframe, with copies and reassignment to apply the changes.
Using Dot Notation3:45
Learn how dot notation cleanly accesses label values in a pandas series, but beware its limitations with slices and invalid identifiers.
Boolean Masks And The .loc Indexer8:16
Learn how boolean masks enable label-based extraction with the lock indexer and square brackets, ensuring the mask length matches the series to selectively return items.
Extracting By Position With .iloc3:58
Explore iloc, the integer locator for position-based indexing with zero-based indexing, including positional slices and index lists. Learn how square brackets reflect either positions or labels like iloc or loc.
BONUS: Using Callables With .loc And .iloc9:53
Explore indexing pandas series with callables using .loc and .iloc to emit labels, positions, slices, or boolean masks. Build functions that take a series and return the appropriate indexing output.
Selecting With .get()5:19
Learn how the get method retrieves values from a series by label, supports a custom default when a label is missing, and can index by position like square bracket indexing.
Selection Recap5:33
Master pandas data selection using label with square brackets and loc, or by position with iloc, slices, boolean masks, and callables, with loc favored for labels and iloc for positions.
Skill Challenge1:50
Create a 100-item series of squares, then extract the last three items by indexing and by tail, and compare results with the equals method to formalize equivalence.
Solution5:22
Create a pandas series of squares from 0 to 99 using a list comprehension. Compare last elements with head, tail, iloc, and equals to illustrate element-wise versus boolean results.
Section Recap Notebook0:04

Section Intro1:49
Develop skills in pandas series methods and manipulation, including handling NaNs and missing values, read an external dataset, and assess data structure, and apply descriptive statistics, sorting, and filtering transformations.
Reading In Data With read_csv()8:58
Learn to import csv data with pandas read_csv, explore data frames and series, and tailor reads with usecols, index_col, and squeeze using a FiveThirtyEight alcohol dataset.
Series Sizing With .size, .shape, And len()4:35
Explore how to size a pandas series using size, shape, and len, confirming equal lengths for values and index, and noting the one-dimensional nature of series.
Unique Values And Series Monotonicity5:20
Explore pandas series attributes to check uniqueness with is_unique and N unique, handle NaNs with dropna, and assess monotonicity using is_monotonic (increasing) and is_monotonic_decreasing, illustrated with real examples.
The count() Method2:16
Learn how pandas' count method counts non-null values in a series, revealing gaps where Na values exist, and contrasts with size which counts all elements.
Accessing And Counting NAs9:26
Identify and count null values in a pandas series using is null (alias isna) and boolean masking, then verify totals with size, count, and sum.
BONUS: Another Approach5:16
Explore a numpy ufunc approach to isolating nulls in pandas, leveraging vectorization for performance and seamless numpy-pandas interoperability.
The Other Side: notnull() And notna()3:01
Use notnull or notna to create a boolean mask of non-null records, then sum to count non-missing values; notnull and notna are aliases.
BONUS: Booleans Are Literally Numbers In Python3:19
Discover how booleans map to one and zero in Python, with bool as a subclass of int, and how arithmetic and method resolution order reveal their behavior.
Skill Challenge1:20
Isolate non-nulls in the alcohol series as wine_servings and compute wine consumed by countries. For the challenge, apply a boolean mask for countries with less than 100 servings and sum.
Solution2:52
Isolate nulls in the alcohol series, create a boolean mask for wine servings less than 100, and sum to compute total wine servings across countries in 2010.
Dropping And Filling NAs4:34
Learn how dropna and fillna handle missing values in a pandas series. Drops create copies by default, while inplace toggles can modify the original series or require reassignment.
Descriptive Statistics8:23
apply descriptive statistics in pandas to summarize data with sum, mean, median, quantiles, iqr, min, max, std, and var. understand how mean exceeds median in a right-skewed wine data distribution.
The describe() Method2:31
Apply the describe() method to quickly get descriptive statistics as a pandas series, giving a quick numerical sense of your data, with optional percentiles and include/exclude filters by type.
mode() And value_counts()7:21
Explore mode and value_counts in pandas: compute the most frequent item, compare single-value frequency with all unique values, and learn raw and normalized counts in a series.
idxmax() And idxmin()5:49
Sorting With sort_values()5:16
Learn how to sort a pandas series by values using sort_values, control ascending order, place NaNs with na_position, and compare copy versus in-place sorting, including default quicksort and alternatives.
nlargest() And nsmallest()2:49
Explore the nlargest and nsmallest methods to quickly extract top or bottom n values from a sorted series, avoiding explicit sorting and slicing.
Sorting With sort_index()3:41
Sort_index sorts a series by its index labels, ascending by default, with optional inplace updates; compare to sort_values, and note Na position handling and quicksort as the underlying algorithm.
Skill Challenge1:02
Practice a pandas skill challenge: filter countries with wine servings over 50 and 50 plus variable, then select smallest 21 from 50 plus and compute mean, median, and standard deviation.
Solution2:09
Create a new pandas series '50 plus' by boolean masking values over 50, then select the 20 smallest, and compute mean, standard deviation, and median from that subset.
Series Arithmetics And fill_value()8:26
Perform series arithmetic in pandas: add, subtract, multiply, and divide with scalar and series operands; leverage automatic index alignment and the fill_value option to preserve data when labels differ.
BONUS: Calculating Variance And Standard Deviation4:33
Compute variance and standard deviation in pandas by applying var method on a series, averaging squared differences from mean, and taking square root after adjusting for n minus one.
Cumulative Operations5:02
Explore cumulative operations in pandas, including sum, cumsum, prod, and cumulative min and max, with Na handling and practical examples on a wine servings series.
Pairwise Differences With diff()3:42
Learn how the pandas diff() method computes discrete, element-wise differences between pairs in a series, with the periods parameter adjusting the lag. This is foundational for time series analysis.
Series Iteration4:11
Discover how to iterate a pandas series using for loops, iterate over index labels, and the items method (or iteritems) for lazy, zip-based tuples.
Filtering: filter(), where(), And mask()11:41
Filter a pandas series with filter() using regex to show countries starting with v, such as Vietnam, Venezuela, and Vanuatu. Compare index-based filtering with values-based methods, including where and mask.
Transforming With update(), apply() And map()13:26
Master transforming a pandas series with update, apply, and map for spot and global transformations. Apply in-place changes, lambda functions, and parameterized inputs for flexible results.
Skill Challenge2:24
practice creating a pandas series from beer servings, compute mean, median, and std, assess skewness, compare the first ten countries, and explore z-scores and standardized scores.
Solution I - Reading Data2:07
Create a data URL for a CSV, read it with pandas, select beer, servings, and country, set country as the index, and convert the result to a series using squeeze.
Solution II - Mean, Median, And Standard Deviation3:19
Compute mean, median, and standard deviation of beer servings in pandas. Explore quantile methods and numpy to verify results, and assess right skew using describe and a quick histogram.
Solution III - Z-scores8:04
Compute z-scores for a Pandas series by subtracting the mean and dividing by the standard deviation, then interpret deviations and identify the largest absolute z-score (Namibia).
Section Recap Notebook0:04

Section Intro1:40
Explore the Pandas DataFrame, its relation to series, and essential data cleaning for numerical analysis today. Practice dataframe manipulation, learn regular expressions, and work with a 9000-item nutritional dataset.
What Is A DataFrame10:31
Explore how a pandas data frame extends series to two dimensions, with labeled indices and columns, as a collated, heterogeneous collection of series.
Creating A DataFrame4:40
Create a data frame by passing a dictionary of column labels to the pandas constructor, using equal-length lists for names, ages, and married to build the frame column by column.
BONUS - Four More Ways To Build DataFrames16:08
Explore four ways to build data frames in pandas—dictionary of tuples, dictionary of series, dictionary of dictionaries, and row-wise construction with a list of dicts—highlighting the library's flexible constructor.
The info() Method4:29
Explore the pandas info method to review a data frame’s index, columns, data types, non null values, and memory usage, with verbose, max_calls, and deep options.
Reading In Nutrition Data3:55
Read a 9,000-item nutrition data set with pandas read_csv, inspect its 77 columns and 9,000 records, and note memory usage around 40 MB and embedded units.
Some Cleanup: Removing The Duplicated Index5:38
Identify a duplicated index in the nutrition dataframe and remove it by dropping the column, setting it as the index, or using read_csv with index_col; prefer the latter.
The sample() Method4:14
Explore the data frame sample() method to draw random records, learn how a fixed random state yields deterministic results, and use n or frac to control the sample size.
BONUS - Sampling With Replacement Or Weights7:37
Explore sampling with and without replacement in pandas, using the replace parameter and bootstrapping concepts, and bias samples with weights via a pandas series and index labels.
BONUS - How Are Random Numbers Generated?5:40
Discover how true randomness via natural entropy differs from computer-generated pseudo randomness, and how pandas sample and numpy use the Mersenne Twister PRNG with seeds to produce repeatable results.
DataFrame Axes4:29
Discover how pandas dataframes have two axes, rows and columns, and use the axis attribute to access row and column labels and coordinates.
Changing The Index7:41
Explore changing a pandas dataframe index from an int64 range to a meaningful name using set index, with drop and verify integrity options, and discover multi index possibilities.
Extracting From DataFrames By Label7:22
DataFrame Extraction by Position8:37
Explore dataframe extraction by position using iloc and loc on a food nutrition dataset, including selecting specific rows and columns, boolean masks, and handling non-consecutive indices for efficient data access.
Single Value Access With .at And .iat5:24
Learn how to extract a single value from a pandas DataFrame using at and iat, compare them with loc and iloc, and understand their speed advantages for single-value access.
BONUS - The get_loc() Method6:28
Learn to convert between labels and integer positions with the get_loc method and use loc, iloc, and at to fetch a single value.
Skill Challenge1:17
Practice extracting data from a nutrition data frame in pandas: randomly select ten foods, pull total fat and cholesterol, and retrieve calories for the third food using attribute-based accessors.
Solution7:49
Learn Pandas techniques by sampling ten rows from a nutrition data frame and using label-based and location-based indexing with loc, iloc, and iat to extract total fat, cholesterol, and calories.
More Cleanup: Going Numeric3:25
Convert all unit-containing columns from strings to numeric values to enable accurate analysis in pandas, noting that 73 columns require casting before further steps.
The astype() Method5:56
Use the pandas astype method to cast dataframes and series to new types, reassign changes, and selectively convert columns, addressing non-numeric values in a nutrition dataframe.
DataFrame replace() + A Glimpse At Regex10:18
Apply DataFrame.replace and regular expressions to strip units from the nutrition data, converting string values to numeric types and inspecting dtype changes from object to 64-bit int.
Part I: Collecting The Units12:59
Isolate the units from each column by removing all numeric values with a regex. Then use mode to derive the most common unit per column for consistent labeling.
The rename() Method7:51
Learn to rename index and column labels in a pandas data frame using dictionaries or a mapper, control with axis and inplace, and rename both axes when needed.
DataFrame dropna()10:53
Learn how to use dataframe dropna with axis, how (any or all), and thresh to drop rows or columns containing nan, and apply inplace changes.
BONUS - dropna() With Subset7:55
Learn to use dropna with the subset parameter to limit drops to selected rows or columns, illustrated with gender and age examples.
Part II: Merging Units With Column Names11:32
Merge units into column labels by building a mapper from the units data frame and renaming columns, preserving unit information while moving toward a numeric data frame.
Part III: Removing Units From Values6:35
Remove units from values using the dataframe replace method with a regex pattern. Cast all values to float for a pure numeric dataset ready for numerical analysis.
Filtering in 2D9:15
Enable two-dimensional filtering in pandas by applying the filter method to both index and columns, using like, regex, and items to slice data efficiently.
DataFrame Sorting7:59
Sort dataframes with sort_values by calories and by multiple columns using an ascending flag list. Filter grams to assess brain composition, noting fat, protein, and water content.
Using Series between() With DataFrames6:09
Choose a one-dimensional series, apply the between method to calories to create a boolean mask, and use boolean indexing to extract four randomly sampled rows from the data frame.
BONUS - Min, Max and Idx[MinMax], And Good Foods9:15
Learn to compute min, max, and idxmin/idxmax across columns or rows in Pandas dataframes, apply to nutrition data like potassium and sodium, and filter with between for dietary insights.
DataFrame nlargest() And nsmallest()5:49
Learn how to use the dataframe methods nlargest and nsmallest to extract top and bottom records, choosing columns and applying to series or dataframes for efficient data selection.
Skill Challenge1:26
Identify the ten foods with the highest vitamin B12 using Pandas, isolate eggplant-related items to find the one with the most sodium, and sample four random rows and two random columns.
Solution5:53
Explore pandas techniques to identify the top ten vitamin B12 foods from a data frame, compare n largest and sorting, filter for eggplant, and sample four rows by two columns.
Another Skill Challenge2:00
Apply pandas to remove all items with any Na values, then identify foods with 20–40 mg vitamin C and count those between 2 and 3 standard deviations above the mean.
Solution6:48
Apply pandas to clean and analyze nutrition data by dropping rows in place, filtering vitamin c with between, and using mean and standard deviation to identify outliers.
Section Recap Notebook0:04

Section Intro2:23
Dive into advanced pandas DataFrame concepts, including binary indexing and bitwise operators, sorting and lookup, pruning duplicates, reshaping, and powerful transformation techniques using pandas, numpy, and Python.
Introducing A New Dataset3:56
Explore a new dataset of English Premier League players, load it with pandas and numpy, and inspect its structure, data types, and memory usage.
Quick Review: Indexing With Boolean Masks3:42
Use pandas boolean masking to generate a boolean sequence from a column comparison and index the dataframe to filter players with a market value over 40 million.
More Approaches To Boolean Masking10:27
Generate boolean series on the fly with the ease in method, the between method, and comparator wrappers, then filter defenders by market-value ranges and age using boolean masks.
Binary Operators With Booleans10:29
Explore how booleans combine with binary operators in pandas series, using bitwise or and and, and learn that alignment is by label, not order, when indexing data.
BONUS - XOR and Complement Binary Ops12:57
Explore XOR and the tilde-based complement operator in Python, learn how XOR differs from or, and see how to use boolean negation in Pandas and NumPy for dataframe indexing.
Combining Conditions7:52
Master pandas indexing with multiple boolean conditions using and, or, and not, applying filters like left backs, age 25 or younger, and market value at least 10 million.
Conditions As Variables4:44
Refactor complex conditions into standalone boolean variables to simplify data frame indexing. Use parentheses, assignment and comparison operators to filter Arsenal players who are right backs or Chelsea goalkeepers.
Skill Challenge1:07
Identify english players whose market value exceeds twice the league average and have either more than 4000 views or a new signing, but not both, with refactored conditions as variables.
Solution6:58
Learn to build boolean conditions in pandas: filter English players with market value over twice the league mean, using xor for page views or new signings, and apply boolean indexing.
2d Indexing10:00
Master two-dimensional indexing in pandas by filtering Chelsea players aged 23 and under with boolean conditions, then select relevant columns, including those starting with P, using the label-based indexer.
Fancy Indexing With lookup()8:30
Master fancy indexing in pandas with the lookup method to retrieve values by row and column labels. See lookup returns values for label coordinates and compare it to basic indexing.
Sorting By Index Or Column6:59
Sort a data frame by values and by index with sort_values and sort_index, using ascending or descending and in place options, and reset index to demote it to a column.
Sorting vs. Reordering12:29
Learn how to precisely reorder dataframe rows and columns using reindex, beyond basic sort_values and sort_index, including alphabetical column ordering and slicing possibilities.
BONUS - Another Way2:13
Explore using any array-like object for Pandas reindex and columns, not just Python lists. See how sorting a Pandas index with sort_values offers a valid alternative, yielding the same result.
15. BONUS - Please Avoid Sorting Like This3:37
Avoid the antipattern of sorting columns by transposing the data frame; learn a direct approach in pandas using sort_index with axis=1 to achieve alphabetically ordered columns efficiently.
Skill Challenge1:17
Tackle a skill challenge: sort a dataframe by age to find youngest EPL player, reindex by club and sort by index, then sort values by club and market value descending.
Solution4:03
Sort by age to reveal the youngest player using sort_values, or the min method with an indexer. Set club as the index, then perform a two-key sort with different directions.
Identifying Dupes10:40
Use pandas' duplicated method to identify duplicates, customize what counts as duplicate with subset, and control which occurrence is original with keep (first, last, or false) for accurate aggregates.
Removing Duplicates6:13
Identify and remove duplicate records in the EPL players dataset using the duplicate and drop duplicates methods, then recalculate the mean market value to reveal the true league-average market value.
Removing DataFrame Rows2:58
Use the drop method to remove rows by index labels or axis, including multiple labels, and return a copy without changing the original.
BONUS - Removing Columns3:02
learn how to remove columns in pandas using drop with axis 1, specify column labels, or pass directly to the columns parameter, returning a copy.
BONUS - Another Way: pop()4:13
Learn how to remove a column with the Pandas pop method, which returns the removed column as a series and modifies the dataframe in place.
BONUS - A Sophisticated Alternative5:12
Explore the reindex method to exclude rows and columns by computing set differences and creating a new data frame, noting that drop is often more durable for data cleaning.
Null Values In DataFrames7:13
Identify and count NaN values in data frames with isna, then locate missing-data records. Index with boolean arrays, convert to values, and use drop_duplicates to manage duplicates.
Dropping And Filling DataFrame NAs7:49
Learn to handle nas in data frames with fill and dropna, using column-specific defaults or a dictionary, and axis-based removal for rows or columns.
BONUS - Methods And Axes With fillna()10:05
Apply fillna with the method parameter to fill missing values using forward fill or backward fill, and understand axis 0 for index-wide and axis 1 for column-wide filling.
Skill Challenge1:37
Practice pandas data wrangling: create a copy DF2 by removing rows and a column, check for Na values and unique nationalities, then extract unique age–position pairs by club, excluding club.
Solution7:00
Apply pandas techniques to clean a dataframe: drop rows and a column, check Na values, count unique values, extract age and position combinations with a subset, returning age and position.
Calculating Aggregates With agg()9:23
Apply the pandas agg method to compute aggregates like mean or min across numeric columns, reshape data into a series or dataframe, and filter with select_dtypes.
Same-shape Transforms14:43
Learn how pandas transform applies a function to a dataframe without changing its shape, illustrated with currency conversion and a random string capitalization using choice and the str accessor.
More Flexibility With apply()13:14
Explore the pandas data frame apply method as a flexible tool that handles both aggregations and in-place transforms. Learn axis choices, type checks, and practical rounding of floating point columns.
Element-wise Operations With applymap()13:35
Explore vectorized operations in numpy and pandas for fast data processing. Learn when to use applymap for element-wise transformations and how it handles logging and inflation adjustments.
Skill Challenge2:07
Create a numeric classification function that maps inputs to popularity labels using predefined bounds. Apply it to players views with vectorized operations, add a popularity column, count super popular players.
Solution4:56
Create a pandas get popularity function with thresholds for labeling page views. Apply it to the players page views, add a popularity column, and count super popular players.
Setting DataFrame Values6:54
Explore spot value changes in dataframes using label-based and integer-position indexers to modify a single cell. Learn that at and iat are faster than loc and iloc for single-value assignments.
The SettingWithCopy Warning7:15
Explore the setting with copy warning in pandas, focusing on chained indexing, copy versus view, and how inplace and drop duplicates can influence updates.
View vs Copy9:00
Always assume pandas returns a copy, and use iloc or loc indexers to guarantee a view when updating the underlying data frame.
Adding DataFrame Columns8:02
Learn practical techniques to add columns to a data frame using assignment, insert, and assign methods, including placing new columns like nicknames and career goals.
Adding Rows To DataFrames9:58
Add rows to dataframes using the append method with a series or a dataframe, and learn why setting with enlargement is inefficient and not in place.
BONUS - How Are DataFrames Stored In Memory4:07
Learn how pandas stores dataframes in memory as column blocks managed by a block manager, and why appending rows is slow; optimize by operating on columns for better performance.
Skill Challenge1:25
Create a 4x4 data frame assigned to DF_random, then perform two separate operations: add a new row and add a new column, and compare their speeds using timeit.
Solution5:53
Create a 4x4 data frame by random sampling from players, then append a row and add a column, comparing performance; row additions are about seven times slower than column additions.
Section Recap Notebook0:05

Section Intro1:09
Explore how to combine multiple datasets with pandas by concatenating and performing join operations, including inner, outer, left, and right joins, focusing on structure and merge rules.
Introducing (Five?) New Datasets5:41
Load five US college salary datasets by region and major with pandas read_csv from URLs, forming engineering, state, party, liberal arts, and Ivy League groups, then merge them later.
Concatenating DataFrames7:39
Concatenate five data frames into a master data frame, inspect shapes, and resolve duplicates by using the duplicated method and dropping the party data frame, yielding 249 schools.
The Duplicated Index Issue7:54
Fix duplicated indices after concatenation by resetting the index with drop=true. Alternatively, use pd.concat ignore_index to create a unique range index for reliable slicing.
Enforcing Unique Indices7:36
Enforce unique indices when concatenating data frames with pandas by enabling verify integrity. Preserve meaningful indices like school name while avoiding ignore index.
BONUS - Creating Multiple Indices With concat()4:30
Create multi index dataframes with concat by using the keys parameter to label origin, forming a two-level index, and select with tuple-based labels or iloc.
Column Axis Concatenation4:17
Concatenate data frames along the column axis with pd.concat by setting axis to 1, enabling side-by-side comparisons of the top five engineering and Ivy League schools by median salary.
The append() Method: A Special Case Of concat()2:32
Explore how append and pd.concat yield identical results in simple cases, then compare differences: append is an instance method with fixed axis, while concat is a flexible module function.
Concat On Different Columns4:48
Learn how pandas concat handles dataframes with extra columns, using the stem column example, and control results with join inner versus outer to manage missing values.
Skill Challenge1:42
Demonstrate a pandas data frame task: concatenate liberal arts and state schools, compute unique names and average median starting salary, then compare top earners with nested column labels.
Solution10:29
Concatenate liberal and state frames to compute unique school names; compute mean mid-career median salary, and display top three liberal arts and top three state schools side-by-side with nested labels.
The merge() Method6:12
Learn how the Pandas merge method joins data frames on a common key, using an inner join on the school name via the on parameter, and contrast merging with concatenation.
The left_on And right_on Params4:44
learn to merge two dataframes with different key names using left_on and right_on, then drop the redundant key to extend schools with mid-career income percentile data.
Inner vs Outer Joins5:37
Explore how the Pandas merge how parameter selects inner or outer joins, showing that inner yields the intersection of keys and outer yields the union with NaN for missing data.
Left vs Right Joins3:58
Master left and right joins in pandas using merge, preserving left or right keys, discarding the rest, and handling nans. Flipping input order makes left joins equivalent to right joins.
One-to-One and One-to-Many Joins9:32
Identify 1-to-1 and 1-to-many joins in pandas using merge and key uniqueness. Check unique values, duplicates, and use drop duplicates to control merge outcomes.
Many-to-Many Joins8:34
Explore many-to-many joins by merging data with duplicate key values, observe how Cartesian products arise, and contrast 1-to-1, 1-to-many, and many-to-many join cardinalities.
Merging By Index5:38
Learn to merge data frames by index in pandas, using left index and right index as the join keys. Discover mixed merges that combine index and column keys.
The join() Method3:05
Explore the Pandas join method to merge data frames by index or a column key, using a concise instance method that behind the scenes calls merge, for shorter code.
Skill Challenge1:11
Merge the liberal arts dataframe with the regions dataframe and assign the result to the fme variable to identify the region with the most liberal arts schools.
Solution6:36
Learn to merge data frames with pandas, compute region distributions of liberal arts schools with value_counts, and assess 1-to-many joins after setting school name as the index.
Section Recap Notebook0:04

Section Intro1:51
Explore advanced pandas indexing with the multi index to represent hierarchical relationships within dataframes. Modify the index to support multiple label levels, enabling efficient analysis of multidimensional data.
Introducing New Data4:49
Explore a brand-new dataset of daily stock prices for Apple, Facebook, Microsoft, Google, and Amazon, with open, high, low, close, and volume over about five and a half years.
Index And RangeIndex4:29
Review index and range index in pandas, and how series and data frames use label-based indexing. Learn to set meaningful labels with set_index, such as dates, to enable date-based selection.
Creating A MultiIndex3:45
Learn to create a multi-index in pandas by passing a list of labels to set_index, producing a two-level hierarchical index with date and stock name, and applying changes in place.
MultiIndex From read_csv()3:53
Learn to create a multi-index dataframe in pandas in one step by using read_csv with the index_col parameter to define date and name as the two-level index.
Indexing Hierarchical DataFrames8:06
Extract values from multi-index dataframes by labeling with date and stock ticker, using label-based indexing and iloc for agnostic, position-based access to open and close prices.
Indexing Ranges And Slices11:56
Master advanced indexing in a two-level pandas multi-index dataframe by selecting date and stock slices with lists, tuples, and the slice object, including slice(None) for all dates.
BONUS - Use : With pd.IndexSlice!4:13
Master using pd.IndexSlice to index hierarchical pandas dataframes with the index slice object, enabling concise high-to-low selections across dates and companies.
Cross Sections With xs()5:30
Explore xs(), the cross section method for hierarchical data frames, compare with the lock indexer, and learn to select multiple levels using tuples, with drop level and axis options.
Skill Challenge1:13
Complete a skill challenge practicing dataframe slicing: create tech_df2 by date-slicing, sample ten random Apple days from it, and extract intraday high and low prices for Apple and Google.
Solution7:23
Create df2 by slicing the tech data frame to extract stock prices between dates, sample ten Apple days, and intraday high and low for Apple and Google via multiindex cross-section.
The Anatomy Of A MultiIndex Object7:52
Explore the anatomy of a pandas multi index: its names, levels, and values, and how the two-level hierarchy of dates and stock names spans 1421 dates by five tickers.
Adding Another Level5:58
Add a new level to a pandas multi-index with set_index and append, creating a three-level index (date, stock ticker, volume type) and learn selecting with multi-index tuples and cross section.
Shuffling Levels4:29
Master reordering a pandas multi-index by swapping two levels with swap level and using reorder levels for broader ordering, returning a new index.
Removing MultiIndex Levels6:02
Remove multi-index levels using drop level or reset index to reshape data frames, with reset index optionally restoring levels as columns or discarding them.
MultiIndex sort_index()6:11
Sort multi-index dataframes efficiently by using sort_index in place, learn to handle unsorted index errors, and tailor sorting with level parameters for optimized slicing and retrieval.
More MultiIndex Methods8:06
Master multi-index management with standalone methods: check lexicographic sorting, sort levels without altering data, set names, and convert to a flat index for clear labeling.
Reshaping With stack()5:51
Reshape a multi-index dataframe with the stack method, moving the column axis to the innermost index. Label the new level with set_names to complete the multi-index series transformation.
The Flipside: unstack()6:35
Unstack pivots the innermost level of a multi-index back to the column axis, reversing stack. Use the level parameter to target a specific axis.
BONUS: Creating MultiLevel Columns Manually10:59
Learn to manually create a two-level multi-index of columns in pandas by building a cartesian product of volume and ticker, then assemble a data frame of ten records.
An Easier Way: transpose()2:53
Combine set index and transpose to reshape a dataframe into a two-level multi-index column axis, turning the index into columns and trading date and volume category into the levels.
BONUS - What About Panels?3:28
Recognize that panels are deprecated and not covered in this course. Prefer multi-index dataframes for multidimensional data, and consult documentation when working with legacy panels and axis concepts.
Skill Challenge1:31
Transform the tag data frame into a four-level index (year, month, day, column axis), assign to tag_df_three, form tag_series for 2019 trading days, then compute mean and std of close.
Solution7:47
Learn to build three- and four-level multi-indices, select 2019 with label-based indexing, and compute mean and standard deviation of close prices, including an apply-based option.
Section Recap Notebook0:04

Section Intro1:15
Learn the split apply combine pattern in data analysis by mastering the groupby method, exploring aggregation functions, and grouping by multiple keys with transform, filter, or generic apply.
New Data: Game Sales3:04
Import pandas and numpy, load a 3000-game sales dataset across xbox and playstation, and learn group by to summarize regional sales by console, year, and publisher.
Simple Aggregations Review5:13
Review how simple aggregations such as sum, mean, standard deviation, and variance operate behind the scenes in pandas, and how axis control switches between vertical and horizontal aggregation.
Conditional Aggregates5:29
Explore conditional aggregates by computing platform-specific regional sales using boolean indexing and per-group sums, and contrast with the verbose approach that leads to a future groupby solution.
The Split-Apply-Combine Pattern4:43
Discover the split-apply-combine pattern behind groupby, splitting data into groups, applying an aggregation, and combining results to form a single summary dataframe.
The groupby() Method4:34
Explore the groupby method to split a data frame by platform, apply aggregations such as sum, mean, or median, and recombine results in a single, powerful command.
The DataFrameGroupBy Object4:06
Explore the dataframe groupby object as a lazy, intermediate view that splits data into four subgroups and awaits the next apply step to produce results.
Customizing Index To Group Mappings4:32
Map index labels to platform groups with a dictionary, then group by the mapped labels to aggregate PlayStation and Xbox totals without altering the underlying data.
BONUS - Series groupby()4:48
Explore the groupby pattern, epitome of split apply combine, on Series and DataFrame, using genre and global sales to compute mean by genre and sort results to reveal top genres.
Skill Challenge1:01
Apply concepts by creating a publishers dataset from the games data frame and identifying top publishers by North America sales, plus the platform with the most sales.
Solution5:42
Create a smaller dataframe, group by publisher to sum North American sales, rank top ten publishers, and identify the platform with the highest sales in North America using pandas.
Iterating Through Groups3:15
Explore how pandas groupby creates a lazily evaluated object and iterate over each platform subgroup to inspect labels and mini dataframes before aggregation.
Handpicking Subgroups4:52
Learn to selectively access subgroups in pandas groupby objects, diagnose issues with nulls or invalid data, and use get_group for efficient, direct retrieval.
MultiIndex Grouping5:46
Master multi-key grouping with pandas by using groupby on genre and publisher to analyze top publishers within each genre by global sales, revealing a two-level hierarchical index.
Fine-tuned Aggregates8:59
Harness the aggregate function (alias eg/agg) to apply multiple metrics—sum, mean, std, and count—over grouped data, yielding a multi index by genre and publisher and sortable by global sales sum.
Named Aggregations7:08
The filter() Method5:35
Learn to combine group by and filter in pandas to exclude records based on aggregated subgroup totals, such as publishers selling over 50 million in North America within each genre.
GroupBy Transformations8:19
Discover how group by and transform apply in-place, subgroup level calculations to convert raw sales into within-genre z-scores using genre mean and standard deviation.
BONUS - There's Also apply()7:48
learn how to use groupby with apply in pandas to run custom functions on genre subgroups, returning transforms, scalars, or lists, and assess solid or weak sales and variability.
Skill Challenge1:11
Apply pandas to the games dataset to compute total sales by year and identify the top three years. Identify Europe's top selling genre/year/platform and Japan sales higher than Europe.
Solution4:53
analyze the games dataframe in pandas to compute yearly global sales and top three years, then find europe's top genre-platform and filter platform-genre groups where japan sales exceed europe.
Section Recap Notebook0:04

Section Intro1:12
Learn pivoting in pandas, inspired by Excel pivot tables, and use a concise functional interface for grouping, aggregation, and multi-index pivots with customizations.
New Data: New York City SAT Scores4:39
Analyze New York City high schools' SAT scores from a curated subset. Use pandas to read the CSV and convert the percent tested values to floats.
Pivoting Data7:19
Pivot data reshapes a dataframe by turning rows into columns using index, columns and values, enabling efficient analysis of long datasets like SAT scores and a wide, readable table.
Undoing Pivots6:01
What About Aggregates?5:47
Learn how to compute average SAT scores by borough using pandas pivot_table to perform aggregation when pivoting, turning many school records into five borough aggregates.
The pivot_table()6:42
Learn how to use pandas pivot_table to aggregate school-level data by borough, specifying values, index, and columns, with mean by default or custom functions like std.
BONUS: The Problem With Average Percentage7:57
Show why averages of percentages are misleading and how weighting by enrollment yields the true SAT takers rate by borough, using takers and enrollment ratios.
Replicating Pivot Tables With GroupBy2:51
Replicate pivot tables using groupby and aggregate on multi-index dataframes by applying mean to the score, then unstack to move the section level to columns for a clean pivot-like view.
Adding Margins5:06
Set the margins parameter to true to add total rows and columns to pivot tables, tag city and borough averages, and verify results by comparing with raw data.
MultiIndex Pivot Tables3:13
Explore how pivot tables create multi-index dataframes by using a list of index labels, swap index and columns to move the hierarchy, or transpose to switch axes.
Applying Multiple Functions4:13
Explore applying multiple aggregation functions in pandas pivots by passing a list to the func parameter, generating mean, minimum, and maximum scores by borough in one pivot.
Skill Challenge1:25
Apply pivot tables to summarize total and average enrollment across five boroughs. Build a Queens high school pivot with city section scores as columns, sorted by math scores.
Solution6:02
Build pandas pivot tables to sum borough enrollments and show mean; then rank Queens schools by city section scores as columns with names as index, sorted by math scores.
Section Recap Notebook0:04

Section Intro1:13
Explore Pandas techniques to store and manipulate times and dates, from pure Python foundations to NumPy enhancements, then master datetime indices, resampling, interpolation, and moving averages.
The Python datetime Module9:39
Explore Python's datetime module to manipulate dates and times by importing date, time, and datetime, creating objects, and using attributes and iso format.
Parsing Dates From Text10:19
Learn to parse dates from text with Python's datetime strptime by defining format codes, convert strings to datetime objects, access year, month, day, and ISO format.
Even Better: dateutil4:54
From Datetime To String5:31
Convert a datetime object to a string using strftime with format codes, including locale-aware %c, and explore an alternative templated string approach that formats the date via substitution.
Performant Datetimes With Numpy8:59
Explore NumPy datetime64 for efficient large-scale date operations, including 64-bit encoding, time units, vectorized array arithmetic, rescaling to daily, and business day offsets as a stepping stone to Pandas.
The Pandas Timestamp5:10
The Pandas timestamp merges Python datetime simplicity with NumPy DateTime64 performance, enabling string-to-timestamp conversion and handling day-month ambiguity with a day first option.
Our Dataset: Brent Prices4:29
Explore a 19-year Brent crude price time series in pandas by reading a csv, inspecting the data structure, and examining dates and prices in usd per barrel.
Date Parsing And DatetimeIndex5:54
Convert the date column from object to datetime64 to unlock date-time operations and reduce memory usage, then set it as a datetime index with time-series attributes.
A Cool Shorcut: read_csv() With parse_dates4:01
Learn how to parse dates at read time with pandas read_csv by setting the index to the first column and enabling parse_dates to create a datetime index.
Indexing Dates5:39
Explore indexing data in a datetime index dataframe using pandas: label-based indexing with the log indexer, slices, and partial string indexing to retrieve January 2019 data, with date parsing.
Skill Challenge1:06
Filter Brand Time series for Dec 1, 2015 to Mar 31, 2016; use partial string indexing; compute Brent price standard deviation; compare Feb 2018 mean to Mar 2017 median.
Solution3:52
Select price data in a date range with pandas using lock indexer and string indexing, spanning Dec 2015 to Mar 2016, then compute standard deviation, mean, and compare to median.
DateTimeIndex Attribute Accessors8:54
Explore DateTimeIndex attribute accessors to extract quarter, week, month, and day name, build boolean masks, and compute means for targeted periods such as leap year Februarys.
Creating Date Ranges6:33
Learn to generate flexible date ranges with pandas date_range, creating a DateTime index from start and end dates, or from start dates and periods, with various frequencies.
Shifting Dates With pd.DateOffset7:27
Learn to shift and adjust dates with the date offset object in pandas by subtracting 18 days and adding 18 hours, enabling precise time-aware data arithmetic.
BONUS: Timedeltas And Absolute Time6:55
Resampling Timeseries8:12
Learn to resample time series data in pandas by changing observation frequency from daily to monthly, using resample with an aggregation (median or mean) to downsample.
Upsampling And Interpolation10:20
Upsample time series data using resample, then fill gaps with linear interpolation in pandas to create eight-hour observations from daily data.
What About asfreq()?9:21
Explore how pandas as frac resamples time series to a new frequency, fill gaps with forward or backward fill and field value options, and contrast it with resample’s aggregation capabilities.
BONUS: Rolling Windows11:26
Learn how rolling windows create moving averages to smooth time series data using pandas, visualize with matplotlib, and explore weighting schemes like Bartlett and Blackman.
Skill Challenge1:24
Add a quarter column to the Brent data frame, compute average price and standard deviation for 2014 with groupby, then reproduce the same results using resample on the raw data.
Solution5:18
Add a quarter column from date, then compute mean and standard deviation by quarter for 2014 using groupby and aggregation, and reproduce results with resample without a quarter column.
Section Recap Notebook0:04

Requirements

A computer (Windows/Mac/Linux). That's all!
No prior knowledge of python is required.
No prior knowledge of pandas is required.

Description

Welcome to the best resource online for learning and mastering data analysis with pandas and python.

Over 32 hours, 10+ datasets, and 50+ skill challenges, you will gain hands-on mastery of, not only pandas 1.x, but also tens of computer science, statistics, and programming concepts.

We will break down, understand, and practice hundreds of methods, attributes, and techniques in pandas and python that will fundamentally change the way you work with data.

In The Ultimate Pandas Bootcamp (2022) you won’t be working with outdated versions of pandas, writing repetitive commands on the same boring dataset. Instead, you’ll learn pandorable and pythonic solutions to interesting, real-world data problems, while working with many diverse datasets that range from wine servings, video game sales, and SAT scores to stock prices, college salaries and more!

Data analysis is an applied science, which is why in each section, you’ll stop and practice what you learn in dedicated skill challenges, followed by detailed solutions where we often consider and compare alternative solutions.

Data analysis is one of the most in-demand skill across all industries and an increasing number of roles. And python is increasingly the language of choice.

Pandas is the wonderful open-source library that is the embodiment of those trends: based on the python programming language, pandas is the de facto data analysis library in the python data science community.

––––– Structure & Curriculum –––––

Over more than 31 hours, we'll cover everything that pandas has to offer, from manipulating series and dataframes, to merging datasets, handling time series, aggregations, filtering, sorting and much more!

The first four sections of the bootcamp constitute the core curriculum. You'll get acquainted with series and dataframes and develop an in-depth understanding of pandas data structures.

· Series at a Glance

· Series Methods and Handling

· Introducing DataFrames

· DataFrames More In Depth

In the next eight sections, you will dive into more advanced topics and take your pandas skills to another level, learning how to work with multiple datasets, manipulate time series, visualize data, write custom functions to transform data and much more.

· Working With Multiple DataFrames

· Going MultiDimensional

· GroupBy And Aggregates

· Reshaping With Pivots

· Working With Dates And Time

· Regular Expressions And Text Manipulation

· Visualizing Data

· Data Formats And I/O

Pandas and python go hand-in-hand which is why this bootcamp also includes a full-length introduction to the python programming language, to get you up and running writing pythonic code in no time.

This is the ultimate course on one of the most-valuable skills today. I hope you commit to mastering data analysis with pandas.

See you inside!

Who this course is for:

Anyone looking to deeply understand and master pandas
Anyone interested in mastering data analysis with python

The Ultimate Pandas Bootcamp: Advanced Python Data Analysis

What you'll learn

Explore related topics

Course content

Introduction8 lectures • 40min

Series At A Glance24 lectures • 1hr 49min

Series Methods And Handling33 lectures • 2hr 39min

Working With DataFrames37 lectures • 4hr 6min

DataFrames In Depth44 lectures • 4hr 50min

Working With Multiple DataFrames22 lectures • 1hr 53min

Going MultiDimensional25 lectures • 2hr 15min

GroupBy And Aggregates22 lectures • 1hr 42min

Reshaping With Pivots14 lectures • 1hr 3min

Handling Date And Time24 lectures • 2hr 27min

Requirements

Description

Who this course is for: