Data Analysis with Polars

Name: Data Analysis with Polars
Rating: 4.7 (609 reviews)

"A thorough introduction to Polars" - Ritchie Vink, creator of Polars - over 3,000 learners to date!

Created byLiam Brannigan

Last updated 6/2025

English

What you'll learn

Taking advantage of parallel and optimised analysis with Polars
Working with larger-than-memory data
Using Polars expressions for analysis that is easy to read and write
Loading data from a wide variety of data sources
Combining data from different datasets using fast joins operations
Grouping and parallel aggregations
Deriving insight from time series
Preparing data for machine learning pipelines
Visualising data with Matplotlib, Seaborn, Altair & Plotly
Using Polars with Scikit-learn

Course content

9 sections • 65 lectures • 4h 46m total length

Course introduction1:38
Why use Polars instead of Pandas?4:05
How can you make best use of the course materials?1:03
Course materials1:14
Polars quickstart7:03
Polars quickstart

This video and notebook introduce some of the key concepts that make Polars a powerful data analysis tool.

The key concepts we meet are:
- fast flexible analysis with the Expression API in Polars
- easy parallel computations
- automatic query optimisation in lazy mode
- streaming to work with larger-than-memory datasets in Polars
Lazy mode: Introducing lazy mode12:40
Lazy mode 1: Introducing lazy mode

Notebook: intro/02-EagerAndLazyPolars.ipynb
By the end of this lecture you will be able to:
- create a `LazyFrame` from a CSV file
- explain the difference between a `DataFrame` and a `LazyFrame`
- print the optimized query plan
Lazy mode: evaluating queries0:11
Lazy mode 2: evaluating queries
Notebook: intro/03-LazyPolarsEvaluating.ipynb

By the end of this lecture you will be able to:
- trigger evaluation of a `LazyFrame`
- evaluate a `LazyFrame` in streaming mode
- convert a `DataFrame` to a `LazyFrame`

The notebook contains an additional section introducing streaming larger-than-memory queries.
Introduction to Data types3:22
Data types & Apache Arrow
Notebook: intro/04-DataTypesAndArrow.ipynb

By the end of this lecture you will be able to:
- get the data type schema of a `DataFrame`
- get the data type of a `Series`
- explain the relationship between Polars and Apache Arrow

We cover data types in more detail in the Data Types and Missing Values section of the course.
Series and DataFrame4:49
`Series` and `DataFrame`
Notebook: intro/05-SeriesAndDataFrame.ipynb
By the end of this lecture you will be able to:
- create a `Series` from a `DataFrame` column
- create a `Series` from a `list`
Converting to and from Pandas & Numpy8:15
Conversion to & from Numpy and Pandas
Notebook: intro/06-ConversionPandasNumpy.ipynb
By the end of this lecture you will be able to:
- convert between Polars and Numpy
- convert between Polars and Pandas
Visualisation19:38
By the end of this lecture on you will be able to:
- create charts from Polars with the built-in plot method
- configure charts created with the built-in plot method
- create charts with external plotting libraries such as Matplotlib, Seaborn, Plotly and Altair
Lazy mode

Filtering rows I: Filtering rows with square brackets3:09
Filtering rows 1: Indexing with []
Notebook: filtering_rows/01-SelectingRows.ipynb

By the end of this lecture you will be able to:
- select single rows with [] indexing
- select multiple rows with [] indexing
Filtering rows 2: Using `filter` and the Expression API8:24
Filtering rows 2: Using `filter` and the Expression API
Notebook: filtering_rows/02-SelectingRowsFilter.ipynb

By the end of this lecture you will be able to:
- select rows with the `filter` method
- partition a DataFrame
- explain the difference between `[]` and `filter`
Filtering rows 3: using `filter` in lazy mode5:48
Filtering rows 3: using `filter` in lazy mode
Notebook: filtering_rows/03-SelectingRowsLazy.ipynb

By the end of this lecture you will be able to:
- use `filter` in lazy mode
- understand the optimized and non-optimized query plans
- combine mulitiple conditions in lazy mode
Filtering rows based on values from another DataFrame0:05
Filtering rows

Selecting columns 1: using square brackets3:07
Selecting columns 1: using `[]`
Notebook: selecting_transformations/01-SelectingColumnsSquareBrackets.ipynb

By the end of this lecture you will be able to:
- select a column or columns with `[]` indexing
- select rows and columns with `[]` indexing
Selecting columns 2: using select and expressions5:06
Selecting columns 2: using `select` and expressions
Notebook:selecting_transformations/02-SelectingColumnsSelect
By the end of this lecture you will be able to:
- select a column or columns with `select`
- transform a column while selecting it
- select a column in lazy mode
Selecting columns 3: choosing multiple columns6:49
Selecting columns 2: selecting multiple columns
Notebook: 03_selecting_transformations/03-SelectingColumnsMultipleColumns.ipynb

By the end of this lecture you will be able to:
- select and exlude columns using the expression API
- select and exlude columns using the selectors API
Selecting columns 4: transforming and adding columns4:14
Selecting columns 4: Transforming and adding a column
Notebook: selecting_transformations/04-AddingANewColumn.ipynb

By the end of this lecture you will be able to:
- transform an existing column in place using `with_column`
- add a new column with an expression
- add a new column with column arithmetic
- add a column with constant values using `pl.lit`
Selecting columns 5: Transforming and adding multiple columns0:09
Selecting columns 4: Transforming and adding multiple columns
Notebook: selecting_transformations/05-AddingMultipleColumns.ipynb

By the end of this lecture you will be able to:
- transform multiple columns in-place
- add multiple columns
- transform and add multiple columns based on dtype
Selecting columns 6: Adding a column based on a condition or mapping5:49
Adding a new column based on a condition or mapping from an existing column
Notebook: selecting_transformations/06-AddANewColumnConditionally.ipynb

In this lecture we learn how to:
- add a new column with a dict mapping from an existing column
- add a new column with an `if-else` condition using `pl.when`
- add a new column with a condition on multiple columns
- add a new column with multiple `if-elif` conditions
Sorting and fast-track algorithms4:11
Sorting and fast-track algorithms
Notebook: selecting_transformations/07-Sorting.ipynb
By the end of this lesson you will be able to:
- sort a `DataFrame`
- sort a column with an expression
- take advantage of fast-track algorithms `set_sorted`

In this lecture we learn how to sort both on a `DataFrame` and within an expression. We also introduce the fast-track algorithms on sorted data. The fast-track algorithims are optimisations that can not be included as part of the built-in query optimiser so we show how to trigger them on simple problems here.
Transforming a DataFrame5:06
Transforming a `DataFrame`

In this lecture you will learn how to:
- re-order the columns of a `DataFrame` using `pipe`
- rename columns from a `DataFrame`
- transform a `DataFrame` in a function using `pipe`
Iterating through a DataFrame0:12
Iterating through a DataFrame
New lecture - download the notebook from this lecture if it was not included in your single download

By the end of this lecture you will be able to:
- iterate through a column row-by-row
- iterate through multiple columns row-by-row
- understand the performance effect of the different options
Selecting columns
Adding new columns
Adding a new column

Missing values4:52
Missing values
By the end of this lecture you will be able to:
- identify missing values in a `DataFrame`
- count the number of missing values in a column
- find and drop `null` or non-`null` values
Replacing missing values6:16
Replacing missing values
By the end of this lecture you will be able to:
- replace missing values with a constant
- replace missing values with a strategy
Replacing missing values with expressions5:28
Replacing missing values with expressions

By the end of this lecture you will be able to:
- replace missing values with an expression on the same column
- replace missing values based on other columns
- replace missing values with interpolation
Missing values
Numerical dtypes and precision0:20
Numerical dtypes and precision

By the end of this lecture you will be able to:
- get the upper and lower bounds you can represent at a given precision
- estimate the size of a `DataFrame` in memory
- compare the effect of working with 32-bit and 64-bit representations

In this lecture we examine the affect of varying the numerical precision on computational speed, memory usage and precision. In some use cases this can be a simple way of improving performance and reducing memory usage.
Introducing categorical data4:42
Introducing categorical data

By the end of this lecture you will be able to:
- convert from string to categorical dtype
- get the integer mapping values
- sort categorical data
Categoricals and the string cache4:42
Categoricals and the string cache
By the end of this lecture you will be able to:
- filter a categorical column
- coordinating categortical mappings across objects with the string cache

We introduce the string cache here but we will see that it is essential when combining `DataFrames` with categorical columns. We will see that using categoricals can lead to much faster join operations than with strings.
Introduction to nested dtypes: List, Struct and Object0:09
Introduction to nested dtypes: List, Struct and Object

By the end of this lesson you will be able to:
- create columns with List, Struct and Object dtypes
- explain the difference between the List, Struct and Object dtypes
- unnest the fields in a Struct dtype
List dtype 1: Creating and transforming List columns10:23
List dtype 1: Creating and transforming List columns
By the end of this lecture you will be able to:
- select `pl.List` columns
- explode each `pl.List` row into its own row
- convert a `pl.List` column to a `pl.Struct` or `pl.DataFrame`
- convert a `pl.List` column to a Numpy array
List dtype 2: using expressions on List columns8:37
Text transformation0:09
Transforming text data
By the end of this lecture you will be able to:
- to modify text data in Polars
- to split text data
- to merge text columns to create a new column
Nested dtypes

Statistics8:54
Statistics
Notebook: statistics_aggregation/01-Statistics.ipynb
By the end of this lecture you will be able to:
- calculate statistics on a `DataFrame` or expression
- calculate cumulative, rolling and exponentially-weighted statistics
- calculate horizontal statistics
Value counts4:20
Value counts
Notebook: statistics_aggregation/02-ValueCounts.ipynb

By the end of this lecture you will be able to:
- count occurrences in a column with `value_counts`
- create a bar chart of the outputs
- use `value_counts` in an expression
- use `value_counts` in lazy mode
Groupby 1: Key concepts12:06
Notebook: 05_statistics_aggregation/03-GroupbyObject.ipynb
By the end of this lecture you will be able to:
- do a group by-aggregation
- group by multiple columns
- group by expressions
- sort group by outputs
- use group by in lazy mode and
- do fast-track grouping on a sorted column
Groupby 2: Iterating and group values9:15
Groupby 2: Group iteration and aggregations
Notebook: 05_statistics_aggregation/04-GroupbyAggregations.ipynb
By the end of this lecture you will be able to:
- iterate over groups
- get group values
- do multiple aggregations
- apply user-defined functions on aggregations
Counting values
Working with GroupBy and groups
Grouping and aggregations
Quantiles and histograms8:51
Quantiles
Notebook: statistics_aggregation/05-Quantiles.ipynb

By the end of this lecture you will be able to:
- calculate a quantile on a `DataFrame`
- calculate multiple quantiles efficiently
- create and visualise a histogram
Introduction to group operations with over()0:10
Introduction to group operations

By the end of this lecture you will be able to:
- do group operations by a single column
- do group operations by multiple columns
- calculate percentage breakdowns within groups
- cache group operations with the query optimiser
Pivot & melt11:02
Pivoting and melting
Notebook: 05_statistics_aggregation/09-PivotMelt.ipynb
By the end of this lecture you will be able to:
- make a `DataFrame` wide with `pivot`
- make a `DataFrame` long with `melt`

Concatenating DataFrames12:42
Concatenation
Notebook: 06_joins_and_concats/01-Concatenation.ipynb

By the end of this lecture you will be able to:
- vertically concatenate a list of `DataFrames`
- horizontally concatenate a list of `DataFrames`
- diagonally concatenate a list of `DataFrames`
Concatenating DataFrames0:07
Concatenation

By the end of this lecture you will be able to:
- vertically concatenate a list of `DataFrames`
- horizontally concatenate a list of `DataFrames`
- diagonally concatenate a list of `DataFrames`
Joins17:43
Joins

By the end of this lecture you will be able to:
- do left, inner, cross and outer joins
- validate joins
- do fast-track joins on sorted columns
Joins on string, categorical and enum columns4:24
Joins on string, categorical and enum columns

By the end of this lecture you will be able to:
- join on categorical columns
- join on enum columns
- assess whether joins on categorical or enum columns would be faster
Filtering a DataFrame by another DataFrame2:40
By the end of this lecture you will be able to:
- filter a `DataFrame` to include values present in another `DataFrame`
- filter a `DataFrame` to include values not present in another `DataFrame`
- compare the performance of these `join` operations with a `filter` operation
Inequality joins0:05
In this notebook you will learn how to:
- do asof joins on nearest neighbours
- do inequality joins based on one or more conditions

Read a single CSV file11:53
By the end of this lecture you will be able to:
- set the column names when reading a CSV file
- parse a CSV file
- specify a dtype schema when reading a CSV file
- modify CPU and memory usage when reading a CSV file
Excel files10:48
Excel files
By the end of this lecture you will be able to:
- read from an Excel file
- pass arguments to the XML parser
- pass arguments to the CSV parser
Read JSON and serialize a DataFrame7:28
Read JSON and serialize a DataFrame
By the end of this lecture you will be able to:
- read JSON
- read & scan newline delimited JSON
- serialize and deserialize a DataFrame while keeping type information
CSV files 3: reading larger-than-memory CSV files in batches0:22
CSV files 3: reading larger-than-memory CSV files in batches
By the end of this lecture you will be able to:
- read larger-than-memory datasets with batching

In the coming lectures we see how to process larger-than-memory datasets using *streaming*. Streaming better as Polars takes care of the batching and has algorithms to combine the chunks correctly for many operations such as groupbys and joins.

We cover manual batching in this lecture to allow you to:
- understand how Polars carries out streaming underneath the hood
- create your own custom batching algorithms
CSV files 4: streaming larger-than-memory datasets0:14
CSV files 4: streaming larger-than-memory datasets
By the end of this lecture you will be able to:
- process larger-than-memory datasets from CSVs with streaming

With streaming Polars can process a full query on a larger-than-memory dataset by:
- reading each CSV file in batches
- adapting its standard operations to work on batches instead of the full dataset at once
Parquet files 1: single Parquet files0:10
Parquet files 1: Single Parquet files
By the end of this lecture you will be able to:
- read from a Parquet file
- use query optimisation to read a subset of columns
- get the schema of a Parquet file
- write a Parquet file with compression
Reading from a database0:08

Introduction to time series dtypes0:09
Introduction to time series dtypes

By the end of this lecture you will be able to:
- create a datetime series with `pl.date_range`
- explain the difference between Polars datetime dtypes
- extract the integer representation underlying datetime dtypes
Time zones0:16
Working with time zones

By the end of this lecture you will be able to:
- add a time zone to a datetime
- change the time zone
- explain the use case of the different time zone functions
- get the time difference between time zones

Working with time zones can be tricky. In this lecture we break it down to understand how the different time zone functions work.
Time zones quiz
Parsing datetime strings0:07
Parsing datetime strings

By the end of this lecture you will be able to:
- parse datetime strings from a CSV file
- convert datetime strings into time series dtypes
Adjusting datetimes0:08
Adjusting datetimes

By the end of this lecture you will be able to:
- add an offset to a datetime
- truncate a datetime to the start of an interval
- round a datetime to an interval
Parsing and adjusting datetimes quiz
Extracting datetime components0:09
Extracting datetime components

By the end of this lecture you will be able to:
- extract date components from a datetime dtype
- extract week-of-year and day-of-year from a datetime dtype
- extract time components from a datetime dtype
Filtering time series0:06
Filtering time series
By the end of this lecture you will be able to:
- filter by datetimes
- filter by a date range
- filter on a duration
Temporal groupby - introduction to groupby_dynamic0:07
Introduction to `groupby_dynamic`

By the end of this session you will be able to:
- do groupby and aggregations using `groupby_dynamic`
- use `groupby_dynamic` on multiple columns
- use `groupby_dynamic` in lazy mode
Controlling the `groupby_dynamic` window0:08
The `groupby_dynamic` window

By the end of this lecture you will be able to:
- set the frequency, length and offset of windows
- control the closure of windows
- set the display value for each window

Visualisations with Plotly0:10
Visualisation with Plotly

By the end of this lecture on you will be able to:
- work with Plotly via Pandas or directly from Polars
- create bar, row, grouped bar and scatter charts with Plotly
- create time series charts with Plotly
Visualisations with Matplotlib0:16
Visualisation with Matplotlib

By the end of this lecture on you will be able to:
- work with Matplotlib directly from Polars
- create bar, row, grouped bar and scatter charts with Matplotlib
- create time series charts with Matplotlib

Note: the examples in this notebook follow the examples in the Plotly notebook adapted for Matplotlib
Visualisations with Seaborn0:07
Visualisation with Seaborn
By the end of this lecture on you will be able to:
- work with Seaborn directly from Polars and via Pandas
- create a range of charts with Seaborn

Requirements

Computer with Windows/Linux/MacOS and a python installation

Description

In this course I show you how to take advantage of Polars - the fast-growing open source dataframe library that is becoming the go-to dataframe library for data scientists in python. I am a Polars contributor with a focus on making Polars accessible to new users and I keep this course up-to-date with new releases of Polars - updated to version 1.30.0

"A thorough introduction to Polars" - Ritchie Vink, creator of Polars

"Thank you for your great work with this course - I've optimized some code thanks to it already!" Maiia Bocharova

The course is for data scientists who have some familiarity with a dataframe library like Pandas but who want to move to Polars because it is easier to write and faster to run. The core materials are Jupyter notebooks that examine each topic in depth. Each notebook comes with a set of exercises to help you develop your understanding of the core concepts. For many key topics this course is the only source of documentation for learners and comes from my time examining the Polars source code.

An important note about videos: this is a primarily a notebook course and not a video course. Not all of the lectures have videos and some of the videos may have components that are not up-to-date. Why? Because the Polars API has changed too often to allow me to keep videos up-to-date. Instead I focus on keeping the notebooks up-to-date with an extensive automated testing system that alerts me to changes in the API. I release an updated version of the course about twice a month in response to changes in Polars.

The course introduces the syntax of Polars and shows you the many ways that Polars allows you to produce queries that are easy to read and write. However, the course also delves deeper to help you understand and exploit the algorithms that drive the outstanding performance of Polars.

By the end of the course you will have optimised ways to:

load and transform your data from CSV, Excel, Parquet, cloud storage or a database
run your analysis in parallel
understand optimal patterns for building queries
work with larger-than-memory datasets
carry out aggregations on your data
combine your datasets with joins and concatenations
work with nested dtypes including lists and structs
optimise the speed and memory usage of your queries
work with string and categorical data
visualise your outputs with Matplotlib, Seaborn, Plotly, hvPlot & Altair
prepare your data for machine learning pipelines with sklearn

Who this course is for:

Data scientists with no familiarity with Polars and want to get up and running
Data scientists with some familiarity with Polars but want a deeper understanding
Pandas or other dataframe library users

Data Analysis with Polars

What you'll learn

Explore related topics

Course content

Up and running with Polars11 lectures • 1hr 4min

Filtering rows4 lectures • 17min

Selecting columns and transforming dataframes9 lectures • 35min

Data types and missing values10 lectures • 46min

Grouping and aggregation7 lectures • 55min

Combining dataframes6 lectures • 38min

Input/Output7 lectures • 31min

Time series analysis8 lectures • 1min

Nested dtypes3 lectures • 1min

Requirements

Description

Who this course is for: