
Examine data structures in Pandas, comparing the series—one-dimensional with an index—and the data frame—two-dimensional with rows and columns. Learn how selecting a column returns a series within a unified framework.
Explain zero-indexed range notation in pandas, with start inclusive and end exclusive, using nine to twenty five for rows and two to five for columns.
Explore Pandas aggregations and summary statistics, including mean calculations, group by operations, multiple aggregations with agg, and counting with value_counts for passenger class analysis.
Sort values with sort_values by column and order, then pivot data between long and wide formats and build pivot tables with index, columns, and mean aggregation.
Read data from files, json, and html into pandas data frames, explore with info, head, and describe, handle object types, and visualize with seaborn plots pair, joint, and heat maps.
practice data manipulation prompts in a hands-on lab exercise for pandas, try the listed operations, then check your progress.
Drop latitude and longitude, group by country and region, aggregate, and transpose to a date-indexed matrix of total cases; convert the index to datetime for series plotting and day-to-day differences.
Identify the top 20 countries by the most recent total cases in the cumulative data, then transpose, select the series, sort descending, and plot a horizontal bar chart.
This masterclass introduces you to concepts and practices for building compelling analyses and dashboards on datasets of any size. It is designed to be self contained and to be consumed quickly in a single session. It will get you up to speed from zero knowledge of Pandas to understanding how the library operates and using it in several different scenarios.
You will learn:
What tabular data is and where you find it
How Pandas allows you to load from, and save to, multiple data formats
How to use two main components of Pandas: the Series and the DataFrame
The main methods to select, group and summarize your data using Pandas
How to perform complex operations such as pivot tables and split-apply-combine
How to create compelling visualizations using Seaborn and Matplotlib directly from Pandas
The masterclass is designed to maximize the learning experience for everyone and includes 50% theory and 50% hands-on practice. It includes a lab with hands-on exercises and solutions.
No software installation required. You can run the code on Google CoLab and get started right away.
This class is the fastest way to get up to speed in Pandas.
Why Pandas?
Pandas is the most famous data manipulation library and it is used by millions of people every day to analyze and manipulate large datasets. It is mature, robust, easy to use and it has extensive documentation, so it's the perfect entry point for beginners and pros.