Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Data Manipulation in Python: A Pandas Crash Course

Name: Data Manipulation in Python: A Pandas Crash Course
Rating: 4.6 (1889 reviews)

Learn how to use Python and Pandas for data analysis and data manipulation. Transform, clean and merge data with Python.

Created bySamuel Hinton, SuperDataScience Team, Ligency

Last updated 2/2025

English

English [Auto],

What you'll learn

Visualise data using methods from histograms to dimensionality reduction.
Create, save and serialise data frames in and out of multiple formats.
Clean and format data easily.
Detect and intelligently fill missing values.
Group, aggregate and summarise your data.
Merge data sources into a beautiful whole.
Pivot and cross-tabulate data like a pro.
Intersplice, summarise and investigate time series data.
Seamlessly work with data from different time zones.
Learn the common pitfalls and traps that ensnare beginners and how to avoid them.

Course content

10 sections • 58 lectures • 8h 48m total length

Introduction5:50
Learn data manipulation in Python using the pandas library to explore, clean, transform, and visualize datasets, merge and pivot data, and prepare time series for statistical analysis and machine learning.
Who Am I? And how to get help5:28
Meet the instructor, learn who he is, and how to get help while exploring data manipulation in Python with pandas. Access Udemy Q&A, Facebook group, and Stack Overflow for support.
EXTRA: Learning Path0:47
Setting up python and editors9:33
Install Python with Anaconda, set up editors, and create virtual environments to keep your data manipulation projects organized and safe.
Live Install6:30
Perform a live install of Anakonda on Windows, verify Python 3.7, activate conda environments, and launch Jupyter notebooks; learn to install and update essential packages for pandas data manipulation.
Get the materials0:17

Finding Datasets4:29
Explore finding and loading datasets, saving work, and inspecting data with text and plots. Learn to use kaggle and open sources, load with pandas read_csv.
Jupyter Notebooks and Loading Data12:14
Open Jupyter notebooks, load data with pandas read_csv to inspect a heart disease dataset, then preview with head and consider encoding, delimiter, and optional pickle speedups.
Pandas vs Numpy7:38
Compare Pandas and NumPy, focusing on data structures, loading data, and key operations. Learn when to use Pandas, why dot values are deprecated, and how reshaping and copying differ.
Creating DataFrames4:20
Create data frames in pandas using arrays, dictionaries, and structured arrays, set columns and optionally a custom index, then save the frames for later use.
Saving and Serialising9:13
Learn to save and serialize a pandas data frame using csv and pickle formats, tune index and float formatting, and compare performance and file sizes.
Inspecting DataFrames6:57
Inspect a pandas data frame using head, tail, and sample; then summarize with info, describe, shape, and corr, and inspect value counts for data quality insights.

Introduction and super basic plots10:04
Explore the basics of data visualization in Python using pandas, matplotlib, and seaborn. Learn bar, scatter, and line plots, group by operations, and plotting fundamentals for distributions and higher dimensions.
Pandas vs Matplotlib8:56
Explore how to combine pandas with matplotlib for advanced plotting, integrate seaborn, save figures to PNG or PDF, tune styles, and create multi-axis layouts while avoiding pie charts.
Visualising 1D distributions13:12
Explore visualizing one-dimensional data with histograms, box plots, and violin plots using Pandas, Matplotlib, and Seaborn, with practical examples on heart disease data.
Visualising 2D distributions14:16
Explore visualizing two-dimensional distributions with NASA meteorite data using histograms, contour plots, and kernel density estimation, plus 2D joint plots in Python with matplotlib and seaborn.
Styling Pandas Table outputs8:56
Explore styling pandas table outputs using df.style to color-code negatives in red, highlight max values in gold, apply color gradients, and render bar plots for data in HTML tables.
Higher dimension visualisations13:08
Visualize high-dimensional data with scatter matrix and correlation heat maps, then explore manifold learning techniques to reveal nonlinear relationships in heart disease data and synthetic data using pandas and seaborn.
Summary1:54
Use histograms for unambiguous, easy-to-digest data, explore 2d and higher-dimensional histograms, and apply correlation matrix plots to reveal relationships; try the Pokemon dataset practice with the attached notebooks.

Introduction, Labelling and Ordering12:21
Mastering indexing, labeling, and sorting in pandas with set_index, reset_index, sort_index, and sort_values, plus unique, value_counts, and rank for data ordering.
Slicing and Filtering13:11
Master slicing and filtering in pandas by selecting columns, building boolean masks, and combining conditions with logical operators, while understanding views versus copies and using loc for precise data access.
Replacing and Thresholding7:07
Learn to replace and threshold data in pandas, handling not a number values with dropna, fillna, and clip, using Airbnb dataset examples.
Removing and adding data13:57
Explore removing and adding data in pandas dataframes, converting birth dates to datetime, extracting year, using categorical data, dropping or appending rows, inserting and assigning columns, and transposing data.
Apply, map and vectorised functions14:44
Learn how to apply, map, and vectorize functions in pandas, compare apply vs map, use vectorized operations, and access string and date time vectorized functions for efficient data manipulation.
Summary2:31
Apply boolean masks to filter rows and select the columns you care about, then use dropna or fillna to handle missing values while keeping a copy to avoid mutating data.

Introduction and motivation1:42
Master data grouping and aggregation in pandas by learning groupby syntax, imputation strategies, and custom aggregations to extract precise insights for data scientists and analysts.
Basic grouping syntax13:30
Master grouping by store and day of week in pandas, perform aggregations, and visualize the resulting average sales with simple plots.
Intelligent imputation10:19
Apply intelligent imputation in pandas by replacing missing or corrupted values with groupby-based medians and transform-driven fills, preserving data integrity for downstream analysis.
Grouping aggregation8:56
Master grouping and aggregation in pandas, applying single and multiple aggregates via group by and the aggregation function, including mean and other stats.
Summary3:10
Discover pandas data manipulation techniques, including multi-level grouping, smart imputation, and aggregation functions, then learn to merge data frames from diverse sources for real-world analysis.

Introduction and basic syntax14:00
Learn how to merge real-world data using pandas: build basic syntax, concatenate and append data frames, and perform merges and joins to combine tables via keys or indices.
Different types of merging16:14
Explore how to merge data in Pandas using inner, left, right, and outer joins, handle duplicates and suffixes, and create cross joins and composite keys.
Helpful merging functions9:10
Explore how to merge time series data with pandas using merge_ordered and merge_as_of, interpolate across cadences, and align temperatures across Australia, US, and Brisbane with forward fill and nearest matching.
Summary2:14
Master pandas merging with inner, left, right, and outer joins, including merge_asof for ordered data, load restaurant and user data from CSVs, and identify the top restaurant.

Introduction and basic MultiIndexes12:20
Learn how to create and use multi-indexes in pandas, including cross section and index slice selections, with a focus on efficient lookups using a sorted index and flight data.
MultiIndex II - MultiIndex Strikes Back13:29
Master multi indexes and multi index columns in pandas. Learn creation methods, from set index to from arrays and from product, plus naming and level-based access.
Stacking and Unstacking13:30
Master stacking and unstacking in pandas to transform data from long format to wide format with a multi-index, control levels, handle missing values, and visualize trends with heatmaps.
Pivoting15:45
Explore pivoting and crosstabs in pandas by transforming election data and world happiness data, showing how to create pivot tables, perform aggregation, and compare countries across years.
Pivot Margins15:29
Explore pivot margins for rows and columns, customize aggregations with mean, sum, and lambda, and plot grouped data; apply cut to create star ratings.
Crosstab9:25
Explore crosstab in pandas, a wrapper around pivot table that counts occurrences and enables normalization across all, rows, or columns using gender and eye color data.
Melting7:26
Explore converting wide frames to long format in Pandas with melt and stack, preserving gender and alignment as IDs, and compare crosstab with pivot_table for counting.
Summary5:11
Master advanced Pandas reshaping methods, including pivot, unstack, melt, stack, and pivot tables, with multi-indexing, crosstab, and group by, and prepare for time series data next.

Introduction and the Datetime Index9:47
Explore time series data with pandas, learn datetime indexing, slicing, resampling, multi-index handling, and basic visualization using stock market data and timezone tricks in Python.
Reindexing11:22
Master reindexing in pandas for time series data and date ranges. Learn forward, back, and nearest fill methods, and how to handle multi-index securities.
Resampling10:50
Learn pandas resampling to aggregate data by year or month using resample rules, group by with multi-index, and apply custom functions like mean or median.
Rolling functions12:08
Explore rolling functions in pandas using a five-day window to compute rolling means and centered statistics on stock data, and compare boxcar versus gaussian smoothing and alignment.
Time Zones9:39
Discover how pandas localizes and converts time zones to analyze tweet timestamps, comparing Istanbul and Los Angeles times and visualizing hourly patterns with daylight saving considerations.
Summary3:28
Review time series concepts in pandas, including datetime index, resampling, partial-date slicing, rolling windows, and imputation, then apply to cryptocurrency data with open, high, low, close, volume, and market cap.

A recap and a thank you5:09
Review how to set up the environment, load data frames, clean and merge, and group data, then work with time series and hierarchical indexes in Pandas.
Extra - Customising Jupyter Notebooks8:54
Learn to customize Jupyter notebooks with the Dupatta dark theme by installing Jupyter extensions and themes, and setting matplotlib defaults via startup scripts.
Extra - Chapter 2 Data Runthrough4:46
Explore optional exercises from chapter two to deepen familiarity with data frames in pandas, including loading pickle data and csv formats, whitespace delimiters, and index and header handling.
Extra - Chapter 3 Visualisation Runthrough14:47
Explore visual data analysis with pandas and seaborn on a Pokemon dataset, comparing attack and defense, breaking down by type, and examining battle stat distributions.
Extra - Chapter 4 Basics Runthrough6:50
Apply pandas data manipulation to cost of living data: rename the index to location, split into city and country, find country with the most cities, and sort by housing cost.
Extra - Chapter 5 Grouping Runthrough14:13
Group video game sales data in pandas to identify the genre with the highest average global sales and to compare mean and standard deviation of European sales across genres.
Extra - Chapter 6 Merging Runthrough11:25
Use pandas to merge restaurant and user data, handle inner joins and duplicates, then group by place id to identify top five restaurants by average rating.
Extra - Chapter 7 Advanced Runthrough12:44
Use pandas to load airline satisfaction data, convert to numeric, create pivot tables of average satisfaction by gender and class, and analyze correlations among online features.
Extra - Chapter 8 TimeSeries Runthrough12:25
Explore time series analysis of cryptocurrency market data in pandas, including setting a multi-index, identifying top market cap symbols, and plotting closing prices with rolling smoothing.

Requirements

Basic knowledge of Python

Description

In the real-world, data is anything but clean, which is why Python libraries like Pandas are so valuable.

If data manipulation is setting your data analysis workflow behind then this course is the key to taking your power back.

Own your data, don’t let your data own you!

When data manipulation and preparation accounts for up to 80% of your work as a data scientist, learning data munging techniques that take raw data to a final product for analysis as efficiently as possible is essential for success.

Data analysis with Python library Pandas makes it easier for you to achieve better results, increase your productivity, spend more time problem-solving and less time data-wrangling, and communicate your insights more effectively.

This course prepares you to do just that!

With Pandas DataFrame, prepare to learn advanced data manipulation, preparation, sorting, blending, and data cleaning approaches to turn chaotic bits of data into a final pre-analysis product. This is exactly why Pandas is the most popular Python library in data science and why data scientists at Google, Facebook, JP Morgan, and nearly every other major company that analyzes data use Pandas.

If you want to learn how to efficiently utilize Pandas to manipulate, transform, pivot, stack, merge and aggregate your data for preparation of visualization, statistical analysis, or machine learning, then this course is for you.

Here’s what you can expect when you enrolled with your instructor, Ph.D. Samuel Hinton:

Learn common and advanced Pandas data manipulation techniques to take raw data to a final product for analysis as efficiently as possible.
Achieve better results by spending more time problem-solving and less time data-wrangling.
Learn how to shape and manipulate data to make statistical analysis and machine learning as simple as possible.
Utilize the latest version of Python and the industry-standard Pandas library.

Performing data analysis with Python’s Pandas library can help you do a lot, but it does have its downsides. And this course helps you beat them head-on:

1. Pandas has a steep learning curve: As you dive deeper into the Pandas library, the learning slope becomes steeper and steeper. This course guides beginners and intermediate users smoothly into every aspect of Pandas.

2. Inadequate documentation: Without proper documentation, it’s difficult to learn a new library. When it comes to advanced functions, Pandas documentation is rarely helpful. This course helps you grasp advanced Pandas techniques easily and saves you time in searching for help.

After this course, you will feel comfortable delving into complex and heterogeneous datasets knowing with absolute confidence that you can produce a useful result for the next stage of data analysis.

Here’s a closer look at the curriculum:

Loading and creating Pandas DataFrames
Displaying your data with basic plots, and 1D, 2D and multidimensional visualizations.
Performing basic DataFrame manipulations: indexing, labeling, ordering slicing, filtering and more.
Performing advanced Pandas DataFrame manipulations: multiIndexing, stacking, hierarchical indexing, pivoting, melting and more.
Carrying out DataFrame grouping: aggregation, imputation, and more.
Mastering time series manipulations: reindexing, resampling, rolling functions, method chaining and filtering, and more.
Merging Pandas DataFrames

Lastly, this course is packed with a cheatsheet and practical exercises that are based on real-life examples. So not only will you learn the theory, but you will also get some hands-on practice with Pandas too.

Who this course is for:

Python students that want to learn how to manipulate data professionally.
Aspiring data analysts and scientists looking to upgrade their skillset.
People who would prefer to spend more time solving interesting problems than formatting data.
Old hands at programming that want to see what new methods and industry-leading tools are at their fingertips in the new decade.

Data Manipulation in Python: A Pandas Crash Course

What you'll learn

Explore related topics

Course content

Introduction6 lectures • 28min

Dataset Basics6 lectures • 45min

Visual exploration7 lectures • 1hr 10min

Basic Data Manipulations6 lectures • 1hr 4min

Grouping5 lectures • 38min

Merging4 lectures • 42min

Advanced Manipulation - MultiIndex, Pivoting and more8 lectures • 1hr 33min

Time Series Data6 lectures • 57min

Conclusion9 lectures • 1hr 31min

Congratulations!! Don't forget your Prize :)1 lecture • 1min

Requirements

Description

Who this course is for: