
Learn data manipulation in Python using the pandas library to explore, clean, transform, and visualize datasets, merge and pivot data, and prepare time series for statistical analysis and machine learning.
Install Python with Anaconda, set up editors, and create virtual environments to keep your data manipulation projects organized and safe.
Explore how to combine pandas with matplotlib for advanced plotting, integrate seaborn, save figures to PNG or PDF, tune styles, and create multi-axis layouts while avoiding pie charts.
Explore visualizing two-dimensional distributions with NASA meteorite data using histograms, contour plots, and kernel density estimation, plus 2D joint plots in Python with matplotlib and seaborn.
Visualize high-dimensional data with scatter matrix and correlation heat maps, then explore manifold learning techniques to reveal nonlinear relationships in heart disease data and synthetic data using pandas and seaborn.
Mastering indexing, labeling, and sorting in pandas with set_index, reset_index, sort_index, and sort_values, plus unique, value_counts, and rank for data ordering.
Master slicing and filtering in pandas by selecting columns, building boolean masks, and combining conditions with logical operators, while understanding views versus copies and using loc for precise data access.
Learn to replace and threshold data in pandas, handling not a number values with dropna, fillna, and clip, using Airbnb dataset examples.
Explore removing and adding data in pandas dataframes, converting birth dates to datetime, extracting year, using categorical data, dropping or appending rows, inserting and assigning columns, and transposing data.
Discover pandas data manipulation techniques, including multi-level grouping, smart imputation, and aggregation functions, then learn to merge data frames from diverse sources for real-world analysis.
Learn how to merge real-world data using pandas: build basic syntax, concatenate and append data frames, and perform merges and joins to combine tables via keys or indices.
Explore how to merge time series data with pandas using merge_ordered and merge_as_of, interpolate across cadences, and align temperatures across Australia, US, and Brisbane with forward fill and nearest matching.
Explore crosstab in pandas, a wrapper around pivot table that counts occurrences and enables normalization across all, rows, or columns using gender and eye color data.
Explore converting wide frames to long format in Pandas with melt and stack, preserving gender and alignment as IDs, and compare crosstab with pivot_table for counting.
Learn pandas resampling to aggregate data by year or month using resample rules, group by with multi-index, and apply custom functions like mean or median.
Review how to set up the environment, load data frames, clean and merge, and group data, then work with time series and hierarchical indexes in Pandas.
Explore optional exercises from chapter two to deepen familiarity with data frames in pandas, including loading pickle data and csv formats, whitespace delimiters, and index and header handling.
Explore visual data analysis with pandas and seaborn on a Pokemon dataset, comparing attack and defense, breaking down by type, and examining battle stat distributions.
Apply pandas data manipulation to cost of living data: rename the index to location, split into city and country, find country with the most cities, and sort by housing cost.
Use pandas to load airline satisfaction data, convert to numeric, create pivot tables of average satisfaction by gender and class, and analyze correlations among online features.
Explore time series analysis of cryptocurrency market data in pandas, including setting a multi-index, identifying top market cap symbols, and plotting closing prices with rolling smoothing.
In the real-world, data is anything but clean, which is why Python libraries like Pandas are so valuable.
If data manipulation is setting your data analysis workflow behind then this course is the key to taking your power back.
Own your data, don’t let your data own you!
When data manipulation and preparation accounts for up to 80% of your work as a data scientist, learning data munging techniques that take raw data to a final product for analysis as efficiently as possible is essential for success.
Data analysis with Python library Pandas makes it easier for you to achieve better results, increase your productivity, spend more time problem-solving and less time data-wrangling, and communicate your insights more effectively.
This course prepares you to do just that!
With Pandas DataFrame, prepare to learn advanced data manipulation, preparation, sorting, blending, and data cleaning approaches to turn chaotic bits of data into a final pre-analysis product. This is exactly why Pandas is the most popular Python library in data science and why data scientists at Google, Facebook, JP Morgan, and nearly every other major company that analyzes data use Pandas.
If you want to learn how to efficiently utilize Pandas to manipulate, transform, pivot, stack, merge and aggregate your data for preparation of visualization, statistical analysis, or machine learning, then this course is for you.
Here’s what you can expect when you enrolled with your instructor, Ph.D. Samuel Hinton:
Learn common and advanced Pandas data manipulation techniques to take raw data to a final product for analysis as efficiently as possible.
Achieve better results by spending more time problem-solving and less time data-wrangling.
Learn how to shape and manipulate data to make statistical analysis and machine learning as simple as possible.
Utilize the latest version of Python and the industry-standard Pandas library.
Performing data analysis with Python’s Pandas library can help you do a lot, but it does have its downsides. And this course helps you beat them head-on:
1. Pandas has a steep learning curve: As you dive deeper into the Pandas library, the learning slope becomes steeper and steeper. This course guides beginners and intermediate users smoothly into every aspect of Pandas.
2. Inadequate documentation: Without proper documentation, it’s difficult to learn a new library. When it comes to advanced functions, Pandas documentation is rarely helpful. This course helps you grasp advanced Pandas techniques easily and saves you time in searching for help.
After this course, you will feel comfortable delving into complex and heterogeneous datasets knowing with absolute confidence that you can produce a useful result for the next stage of data analysis.
Here’s a closer look at the curriculum:
Loading and creating Pandas DataFrames
Displaying your data with basic plots, and 1D, 2D and multidimensional visualizations.
Performing basic DataFrame manipulations: indexing, labeling, ordering slicing, filtering and more.
Performing advanced Pandas DataFrame manipulations: multiIndexing, stacking, hierarchical indexing, pivoting, melting and more.
Carrying out DataFrame grouping: aggregation, imputation, and more.
Mastering time series manipulations: reindexing, resampling, rolling functions, method chaining and filtering, and more.
Merging Pandas DataFrames
Lastly, this course is packed with a cheatsheet and practical exercises that are based on real-life examples. So not only will you learn the theory, but you will also get some hands-on practice with Pandas too.