Data Manipulation with Pandas Masterclass

Name: Data Manipulation with Pandas Masterclass
Rating: 4.3 (34 reviews)

Learn the main functions of Pandas for data analysis and visualization in less than 2 hours. Theory and hands-on.

Created byFrancesco Mosconi, Data Weekends

Last updated 5/2021

English

What you'll learn

This is a short masterclass in Pandas, the most famous library for data manipulation in Python.
You will learn what Pandas is, and how it can help you load, manage, and transform tabular data.
Learn to analyze real world data using Python & Pandas.
Import data from multiple sources, clean, reshape, impute and visualize your data.
Use Python and Pandas to select, group and summarize your data.
Decide what data to keep and what to ignore.
Create compelling visualizations using Seaborn and Matplotlib.

Course content

2 sections • 26 lectures • 1h 6m total length

Introduction4:02
Agenda0:49
Tabular Data6:19
Data Manipulation with Pandas2:33
Data Structures1:25
Examine data structures in Pandas, comparing the series—one-dimensional with an index—and the data frame—two-dimensional with rows and columns. Learn how selecting a column returns a series within a unified framework.
Pandas IO1:16
Selections & Filters3:17
Question: Numpy & Pandas1:29
Question: Indexes0:55
Explain zero-indexed range notation in pandas, with start inclusive and end exclusive, using nine to twenty five for rows and two to five for columns.
Feature Engineering1:16
Aggregations2:42
Explore Pandas aggregations and summary statistics, including mean calculations, group by operations, multiple aggregations with agg, and counting with value_counts for passenger class analysis.
Sort & Pivot2:30
Sort values with sort_values by column and order, then pivot data between long and wide formats and build pivot tables with index, columns, and mean aggregation.
Joins1:50
Time Series2:04
Question: in memory0:52
Other Commands2:31
Data Visualization3:46

Lab Start2:38
Lab Part 16:44
Read data from files, json, and html into pandas data frames, explore with info, head, and describe, handle object types, and visualize with seaborn plots pair, joint, and heat maps.
Lab Part 20:19
Question Python & R0:38
Lab Part 27:52
Lab Exercise 1 - Prompt0:23
practice data manipulation prompts in a hands-on lab exercise for pandas, try the listed operations, then check your progress.
Lab Exercise 1 - Solution1:56
Lab Part 34:11
Drop latitude and longitude, group by country and region, aggregate, and transpose to a date-indexed matrix of total cases; convert the index to datetime for series plotting and day-to-day differences.
Lab Exercise 22:02
Identify the top 20 countries by the most recent total cases in the cumulative data, then transpose, select the series, sort descending, and plot a horizontal bar chart.

Requirements

Previous experience programming in Python is advised to make best use of the masterclass.
Some prior experience with tabular data formats such as CSV or Excel is also encouraged.

Description

This masterclass introduces you to concepts and practices for building compelling analyses and dashboards on datasets of any size. It is designed to be self contained and to be consumed quickly in a single session. It will get you up to speed from zero knowledge of Pandas to understanding how the library operates and using it in several different scenarios.

You will learn:

What tabular data is and where you find it
How Pandas allows you to load from, and save to, multiple data formats
How to use two main components of Pandas: the Series and the DataFrame
The main methods to select, group and summarize your data using Pandas
How to perform complex operations such as pivot tables and split-apply-combine
How to create compelling visualizations using Seaborn and Matplotlib directly from Pandas

The masterclass is designed to maximize the learning experience for everyone and includes 50% theory and 50% hands-on practice. It includes a lab with hands-on exercises and solutions.

No software installation required. You can run the code on Google CoLab and get started right away.

This class is the fastest way to get up to speed in Pandas.

Why Pandas?

Pandas is the most famous data manipulation library and it is used by millions of people every day to analyze and manipulate large datasets. It is mature, robust, easy to use and it has extensive documentation, so it's the perfect entry point for beginners and pros.

Who this course is for:

Python enthusiasts that want to deepen their knowledge of data analysis, data manipulation and data visualization.
Analysts in finance, insurance, consulting who are pro at Excel and want to start migrating towards Python and Pandas to scale their work.

Data Manipulation with Pandas Masterclass

What you'll learn

Explore related topics

Course content

Theory17 lectures • 40min

Lab & Exercises9 lectures • 27min

Requirements

Description

Who this course is for: