
This course includes our updated coding exercises so you can practice your skills as you learn.
See a demo
Master real-world Pandas workflows from cleaning and wrangling messy data to machine learning prep, using dozens of Pandas methods, merging, joining, grouping, and explanatory data analysis with real datasets.
Explore seven tips to maximize learning in pandas bootcamp: navigate course content and prerequisites, use the AI assistant and Q&A, practice with downloads and coding exercises, and avoid skipping sections.
Explore Titanic survival factors with pandas and seaborn: analyze class, age, and gender via logistic regression, transforming raw csv data into graphs and tables with simple code.
Install the Anaconda distribution to set up Python and the data science ecosystem with pre-installed packages that simplify dependency management and support Jupyter notebooks and IDE options.
Navigate the Anaconda navigator, launch Jupyter notebook, create your first notebook, and run simple Python calculations while exploring base environments and packages like numpy and pandas using conda list.
Learn the major features and pitfalls of using Jupyter notebooks as an interactive Python environment, including edit and command modes, cells, markdown, headers, and keyboard shortcuts.
Engage in an AI-powered roleplay to practice data science with Python and pandas, focusing on loading, cleaning, and analyzing large datasets while refining workplace communication.
Download all course materials, notebooks, datasets, and exercises; open notebooks with Anaconda, load dataframes from CSVs, and learn pandas basics from zero to hero.
Explore pandas dataframes, tabular data concepts, and how rows, columns, and labels organize observations and features. Learn data types per column and the role of the index.
Explore how pandas and Python coding benefit from ChatGPT as a coding assistant, with insights on free and plus versions that can run Python code and enhance code quality.
Enable GPT-4 advanced data analysis to write and execute Python and Pandas code, upload Titanic CSV, and perform high-level inspection, data types, missing values, and descriptive statistics.
Create your first pandas DataFrame in a Jupyter notebook by importing Titanic.csv with read_csv. Ensure the notebook and CSV are in the same folder for successful import.
Learn how pandas dataframes behave as objects with attributes and methods, use built-in functions and method chaining to inspect shape, columns, and numeric data, and handle mixed types with numeric_only.
Boost your Pandas workflow with tab completion and tooltips, learning to use read_csv and sort_values efficiently while exploring parameters and keyboard shortcuts for faster coding.
Practice data analysis in pandas with Jupyter notebooks by completing coding exercises on the cars dataset, filling in code gaps, inspecting the dataframe, and using hints and solutions.
Import pandas, load the cast.csv into a dataframe, and use head, tail, info, and describe to explore 398 cars across 9 columns, including mpg, horsepower, weight, and origin.
Learn to select one or more columns in pandas by using a column label or a list, producing a series for a single column and a dataframe for multiple columns.
Compare dot notation and square bracket notation to select a single column in pandas, using the age column from the Titanic dataset, and verify equivalence with the equals method.
Explore position based indexing with Ilog on the summer olympics dataset, loading data with pandas from summer.csv. Master zero-based, negative indexing and slicing to select specific rows efficiently.
Explore label-based row selection with pandas loc, contrast with iloc, and retrieve single or multiple rows by label, as shown with Dimitrios Drivas and Michael Phelps.
Learn to use reindex for label-based indexing and slicing in pandas, including time series with datetime index and handling missing labels with implications of unique versus non-unique indices.
Import pandas as pd, read a csv, and inspect data with info. Use a loc operation to select rows and columns, avoiding chained indexing, and note DataFrame, Series, and Index.
Practice pandas dataframe indexing and slicing on the cast dataset using iloc and loc, select columns by name or position, and view specific fields such as name and weight.
Load the cars dataset with ChatGPT, save it as cars, inspect it, and query the car at index 393 and Torino’s model year and origin using dot notation.
Learn advanced indexing and slicing in pandas using the Summer Olympics dataset, including position and label based selection, combining rows and columns, and workarounds after removing the ix operator.
Explore two new role play scenarios in the course, provide feedback on the Q&A board, and help decide whether to add more role plays throughout the pandas bootcamp.
Learn to debug more than 90% of your errors in no time by embracing trial and error, reading, and common sense to fix issues under a minute.
Boost your debugging skills with a hands-on Python exercise on a medal dataset. Fix key errors, type errors, and indexing mistakes, and learn pandas series creation.
Identify the three major error categories and prioritize code issues first, then Python installation and external factors. Learn examples like typos, wrong indents, and mixing numbers and strings.
Learn the most common, easy-to-debug errors in Python data work, from dictionary key mistakes to syntax and name errors. See practical fixes using simple examples, including Pandas series assignments.
Learn how index errors occur in pandas dataframes and lists, with practical examples of out-of-bounds access, negative indexing, and debugging by inspecting previous cells.
Master Python indentation by diagnosing and fixing unexpected indents and missing indented blocks in multi-line cells, for loops, and dictionary iterations, to ensure clean, error-free code.
Explore Python type errors and value errors, contrasting incorrect types with invalid values. See concise examples: adding int and string, concatenating string and int, and sqrt of negative numbers.
Use ChatGPT to diagnose and fix pandas key errors, demonstrating indexing issues and solutions like boolean indexing or setting the athlete column as the dataframe index.
Explore using Google and Stack Overflow to fix coding errors, understand positional vs keyword arguments in pandas, and use DataFrame largest to find the three tallest players.
Learn to use Python traceback to locate errors in complex code, tracing from oldest to latest calls, with a backtester example using pandas and numpy.
Identify common Python installation issues that cause import errors, including missing packages, outdated versions, and corrupted setups. Learn to manage Conda and pip installations and reinstall cleanly when problems persist.
Explore external factors that disrupt Python data work, including web API calls, data scraping, loading financial data, and API key authentication, along with troubleshooting and firewall or admin rights considerations.
Download notebooks, watch videos, and run code to reduce transcription errors, recognize three error categories—incorrect notebook code, corrupted Python, and external factors, and practice creating your own code.
Master debugging with a flowchart: read the bottom error, inspect code and previous cells, restart the kernel, run all cells, and compare with the course notebook.
Explore pandas series as a one-dimensional labeled array, learn to select a single column, inspect dtype and shape, and convert a series to a dataframe for further analysis.
Analyze non-numerical values in a Pandas series with a Summer Olympics medals dataset, using unique, nunique, and value_counts to reveal counts and frequencies of athletes.
Learn to create pandas series from data frames and from scratch using pd.Series. Discover selecting a single column or row via brackets or attribute notation, with custom index and name.
Create pandas series from numpy arrays, lists, tuples, and dictionaries, customize the index and series name, and observe how new labels affect alignment and introduce missing values.
Learn to index and slice pandas series with square brackets, iloc, and loc, distinguishing position-based from label-based indexing. Explore examples with age and event columns and range index behavior.
Sort pandas series by values or by index using sort_values and sort_index, and learn how the inplace parameter toggles in-place changes, with ascending and NaN handling.
Explore how to use Pandas nlargest and nsmallest on the Titanic age series to obtain the three oldest or youngest passengers, comparing with sort_values and the head method.
Identify the index labels of the oldest and youngest passengers using idxmax and idxmin on a pandas series, then extract full rows with those labels for quick data insights.
Manipulating pandas series with label-based and position-based indexing, handle missing values, override duplicates (all occurrences), and perform vectorized operations like converting dollars to euros with rounding.
Analyze the mpg column from the cars dataframe as a pandas series, using head, describe, min, max, value counts, and conversion to liters per 100 km to explore fuel efficiency.
Explore pandas index objects, learn to inspect and customize row and column indices, and perform position-based and label-based slicing with unique vs duplicate labels, including converting index to list.
Discover how to create index objects from scratch using pandas, including integer, string, and range indices, assign a name like days, and attach them to a pandas series.
Set a data frame index with set_index using year or athlete columns, then reset to a range index with reset_index, exploring drop and inplace options for preserving columns.
Learn how to rename a data frame's column labels with pandas by replacing the full columns index and setting index names, while noting that index objects are immutible.
Rename row and column labels in pandas with the rename method. Use a mapper dictionary or direct index/columns dictionary, with optional inplace updates to apply changes.
Learn index operations in pandas with the cast dataframe: set and reset index, inspect range and named indexes, use is_unique and value_counts, and rename or drop the index inplace.
Learn to filter dataframes by column conditions using boolean masks and log notation in pandas, illustrated with the titanic dataset and numeric column selection.
Learn to filter dataframes with multiple conditions in pandas by creating boolean masks and combining them with and, then analyze survival rates in the Titanic data.
Filter Titanic data with pandas by two boolean masks—female or under 14—using the or operator; reveal that survival is higher for women or children, about 72% vs overall 38%.
Learn advanced filtering in pandas using between, isin, and the tilde operator on the Olympic Games medals dataset to select year ranges, specific years, and exclusions.
Learn how to use pandas any and all on a boolean series from Titanic data frame, checking any true, all true, exactly 80 years old, and fares greater than zero.
Remove single or multiple columns from a data frame using the pandas drop method, control changes with inplace, and compare dropping versus selecting columns with explicit label or axis options.
Learn to remove complete rows in a data frame using drop with index or labels, and use boolean indexing with masks to drop by year or sport, with inplace options.
Learn how to add a new column to a pandas dataframe using broadcasting, and see how square bracket notation can create columns while attribute notation cannot.
Use pandas to create columns via vectorized operations from data, calculate year of birth as 1912 minus age, add a relatives column by summing SP and parch, and inflate fare.
Learn to insert a new column at a specific index with pandas insert. Create relatives by summing the zip SP column and the parch column, then insert at index six.
Learn how to create pandas dataframes from scratch with pd.DataFrame, using column data, row tuples with index and columns, or by building a series and converting to a frame.
Add single or multiple rows to a pandas data frame with a hands-on approach, using manual row creation and pd.concat to concatenate while ignoring the index.
Filter a pandas dataframe using boolean masks to select europe-origin cars, then apply between 10 and 15 mpg, drop US-filtered rows, and add liters per 100 kilometer using mpg.
Apply best practices to change and manipulate single and multiple values in a pandas DataFrame using label-based indexing and position-based indexing, filtering, and the replace method on the Titanic dataset.
Explore the dangers of chained indexing in pandas, illustrating how improper slicing and assignment can trigger setting with copy warnings and misleading results, and introduce safe alternatives.
Chained indexing in pandas often fails to update the original dataframe and triggers a setting with copy warning; the lecture demonstrates boolean indexing and best practices to ensure changes.
Explore how pandas handles data frame slices as views or copies using the Titanic dataset, and learn rules to avoid chained indexing and unintended changes.
Filter the cars dataframe for mpg above 40, create an independent copy, cap the mpg values to 40, and avoid chain indexing with loc to prevent copy warnings.
Learn to use pandas nunique() to count columnwise unique values and nlargest() and nsmallest() to rank rows by columns like fare or age in the Titanic data.
Explore summary statistics and accumulations for the Titanic dataset with pandas, using describe, count, mean, std, min, max, percentiles, and correlation to understand distributions and relationships.
Learn how the agg() method performs customized data aggregations on numerical columns in pandas, enabling one-line summaries, multiple statistics, and per-column specifications via a dictionary.
Sort the cast data frame by mpg in descending order and rank horsepower. Compute summary statistics and a correlation matrix; weight is the strongest negative correlate of mpg.
Learn to apply user defined functions to pandas dataframes with apply, map, and applymap for row, column, and element-wise operations on real datasets.
Explore hierarchical indexing with multi indices in pandas by transforming a Titanic dataframe using set_index, multi-column keys, and sorting, then resetting and swapping levels.
Explore hierarchical indexing with a Titanic data slice by building a two-level multi-index on Pclass and sex, then use label-based indexing to slice by outer and inner labels and columns.
Explore pandas string operations with the STR attribute on series and data frames, applying lower, upper, title, split, and contains to clean and filter the Summer Olympics dataset.
Compute the range of all numerical columns in the cars dataset, create and sort a two-level index by model year and origin, and perform string operations to extract manufacturers.
Plot the Titanic dataframe using the pandas plot method to visualize numerical columns such as survived, Pclass, age, SibSp, parch, and fare with Matplotlib, adjusting subplots and figure size.
Learn to customize a matplotlib line plot by adjusting font size, color, line style, titles, axis labels, legends, grid, and axis limits; explore styles like seaborn and ggplot.
Create histograms of the Titanic age column with pandas, matplotlib, and seaborn to visualize age frequency and adjust bins.
Explore three ways to plot a histogram with pandas and matplotlib, compare bins, labels, and missing value handling, and learn when to use density or cumulative options.
Create bar charts and pie charts from aggregated data using pandas and matplotlib, with seaborn styling, to visualize medal totals by country in the 2012 Olympics.
Create two-dimensional scatter plots to explore relationships between numerical features like age and fare in the Titanic dataset, and personalize visuals with color, markers, and size.
Plot and interpret cars data: line plots for numerical columns, a 40-bin mpg histogram, and a horsepower vs mpg scatter plot colored by cylinders show rising mpg and falling horsepower.
**Now with ChatGPT for Pandas and more than 20 Udemy Online Coding Exercises - NEW Feature**
Welcome to the web´s most comprehensive Pandas Bootcamp. This is the only Pandas course you´ll ever need:
most comprehensive course with 36+ hours of video content
new AI features like Pandas Coding and Advanced Data Analysis with ChatGPT
150+ Coding Exercises (Online and Offline Exercises)
Practical Case Studies for Data Scientists and Finance Professionals
Fully updated to Pandas 2.2 and already anticipating Pandas 3.x
This course has one goal: Bringing your data handling skills to the next level to build your career in Data Science, Machine Learning, Finance & co. It has five parts:
Pandas Basics - from Zero to Hero (Part 1).
The complete data workflow A-Z with Pandas: Importing, Cleaning, Merging, Aggregating, and Preparing Data for Machine Learning. (Part 2)
Two Comprehensive Project Challenges that are frequently used in Data Science job recruiting/assessment centers: Test your skills! (Part 3).
Application 1: Pandas for Finance, Investing and other Time Series Data (Part 4)
Application 2: Machine Learning with Pandas and scikit-learn (Part 5)
Why should you learn Pandas?
The world is getting more and more data-driven. Data Scientists are gaining ground with $100k+ salaries. It´s time to switch from soapbox cars (spreadsheet software like Excel) to High Tuned Racing Cars (Pandas)!
Python is a great platform/environment for Data Science with powerful Tools for Science, Statistics, Finance, and Machine Learning. The Pandas Library is the Heart of Python Data Science. Pandas enables you to import, clean, join/merge/concatenate, manipulate, and deeply understand your Data and finally prepare/process Data for further Statistical Analysis, Machine Learning, or Data Presentation. In reality, all of these tasks require a high proficiency in Pandas! Data Scientists typically spend up to 85% of their time manipulating Data in Pandas.
Can you start right now?
A frequently asked question of Python Beginners is: "Do I need to become an expert in Python coding before I can start working with Pandas?"
The clear answer is: "No! Do you need to become a Microsoft Software Developer before you can start with Excel? Probably not!"
You require some Python Basics like data types, simple operations/operators, lists and numpy arrays. In the Appendix of this course, you can find a Python crash course. This Python Introduction is tailor-made and sufficient for Data Science purposes!
In addition, this course covers fundamental statistical concepts (coding with scipy).
In Summary, if you primarily want to use Python for Data Science or as a replacement for Excel, this course is a perfect match!
Why should you take this Course?
It is the most relevant and comprehensive course on Pandas.
It is the most up-to-date course and the first that covers Pandas Version 2.x. The Pandas Library has experienced massive improvements in the last couple of months. Working with and relying on outdated code can be painful.
Pandas isn´t an isolated tool. It is used together with other Libraries: Matplotlib and Seaborn for Data Visualization | Numpy, Scipy and Scikit-Learn for Machine Learning, scientific, and statistical computing. This course covers all these Libraries.
ChatGPT for Pandas Coding and advanced Data Analytics included!
In real-world projects, coding and the business side of things are equally important. This is probably the only Pandas course that teaches both: in-depth Pandas Coding and Big-Picture Thinking.
It serves as a Pandas Encyclopedia covering all relevant methods, attributes, and workflows for real-world projects. If you have problems with any method or workflow, you will most likely get help and find a solution in this course.
It shows and explains the full real-world Data Workflow A-Z: Starting with importing messy data, cleaning data, merging and concatenating data, grouping and aggregating data, Explanatory Data Analysis through to preparing and processing data for Statistics, Machine Learning, Finance, and Data Presentation.
It explains Pandas Coding on real Data and real-world Problems. No toy data! This is the best way to learn and understand Pandas.
It gives you plenty of opportunities to practice and code on your own. Learning by doing. In the exercises, you can select the level of difficulty with optional hints and guidance/instruction.
Pandas is a very powerful tool. But it also has pitfalls that can lead to unintended and undiscovered errors in your data. This course also focuses on commonly made mistakes and errors and teaches you, what you should not do.
Guaranteed Satisfaction: Otherwise, get your money back with a 30-Days-Money-Back-Guarantee.
I am looking forward to seeing you in the course!