Udemy Business

Predă pe Udemy

Transformă cunoștințele tale într-o oportunitate de a ajunge la milioane de persoane din întreaga lume.

Află mai multe

Coșul tău este gol.

Continuă cumpărăturile

Autentifică-te

Înscrie-te

Data Analysis with Polars and Python

Name: Data Analysis with Polars and Python
Rating: 4.7 (24 reviews)

Master data analysis with the powerful Polars library! Up-to-date for 2026. All datasets included --- beginners welcome!

Creat deBoris Paskhaver

Ultima actualizare: 02.2026

Engleză

Ce vei învăța

Master data manipulation operations in Polars including sorting, filtering, grouping, pivoting, joining and more!
Understand Polar's functional, expression-based syntax for building up complex chains of logic
Use LazyFrames to create complex query plans that Polars can optimize for efficiency
Work with a variety of data including text, temporal, numeric, nested structures, and more

Exerciții de codare

Acest curs include exercițiile noastre de codare actualizate, astfel încât să îți poți exersa abilitățile pe măsură ce înveți.

Imagine a unui exemplu de exercițiu de codare

Conținutul cursului

18 secțiuni • 191 lecții • Durată totală 22 h 12 min

Welcome to Polars8:45
Welcome to Data Analysis with Polars in Python. Polars is a data analysis library written in the Rust programming language with support for Python bindings. In this lesson, we introduce the features and functionalities of the library. We also discuss our setup steps which involve installing the uv Python manager, downloading the data sets, reviewing Python, and then getting started with Polars.
Download Course Materials (Datasets and Jupyter Notebooks)0:21
Download the datasets and Jupyter Notebooks for the course from GitHub.
[macOS] Intro to Terminal7:12
Welcome to a sequence of videos dedicated to installing Python and Polars on a macOS computer. First up, we'll need to get acquainted with the Terminal, a command-line interface where the user can issue text commands to the operating system. We practice with the pwd, ls, and cd commands.
[macOS] Install uv, a Python package and project manager5:10
In this lesson we install the uv command-line program for managing Python projects . uv will help us download Python, Polars, and Jupyter Lab (our coding environment).
[macOS] Download Course Materials and Setup Project3:11
In this lesson, we download the Jupyter Notebooks and datasets from the course's public repository on GitHub. We also use the uv sync command to set up Python, Polars, JupyterLab, and all other project dependencies within our course folder.
[Windows] Intro to PowerShell8:04
Welcome to a sequence of videos dedicated to installing Python and Polars on a Windows computer. First up, we'll need to get acquainted with the PowerShell, a command-line interface where the user can issue text commands to the operating system. We practice with the pwd, ls, and cd commands.
[Windows] Install uv, a Python package and project manager4:14
In this lesson we install the uv command-line program for managing Python projects . uv will help us download Python, Polars, and Jupyter Lab (our coding environment)
[Windows] Download Course Materials and Setup Project4:29
In this lesson, we download the Jupyter Notebooks and datasets from the course's public repository on GitHub. We also use the uv sync command to set up Python, Polars, JupyterLab, and all other project dependencies within our course folder.
Jupyter Lab Startup and Shutdown7:03
In this lesson, we discuss the startup and shutdown process for JupyterLab. We execute the uv run jupyter-lab command from the Terminal. We work within a Jupyter Notebook, then save our work, then shut down the Python kernel for the Notebook, and finally close the Jupyter Lab server.
Intro to Jupyter Lab12:10
In this lesson, we introduced the interface of JupyterLab, including how to create cells, delete cells, execute cells, restart the kernel, and more.
Setting Up Ruff Formatter in Jupyter Lab2:16
In this lesson, we configure our Jupyter Lab settings to run the Ruff formatter upon every every cell's execution. Ruff will format the code to ensure a consistent (and pretty!) aesthetic standard for our Python code.
Import Libraries into Jupyter Lab4:04
Use the import keyword to bring in libraries like Polars into the Jupyter notebook. We can assign an alias (alternate names) to a library with the as keyword. The popular community convention for Polars is pl.
Quiz

Comments7:03
A comment is a line of code ignored by the Python interpret. We create a comment with a hashtag (#), which effectively disables the line. Developers use comments to provide documentation, metadata, diagrams, and more.
Data Types10:19
In this lesson, we review the primitive data types. in Python: integers, floating-points, strings, Booleans, and the None object., We also introduce operators, symbols that perform operations on the values.
Operators12:28
In this lesson, we expand our study of operators by introducing various symbols for addition, subtraction, multiplication, two types of division, exponentiation, modulo, and more.
Equality and Inequality Operators8:53
The equality operator (==) compares whether two values are equal/identical. The complementary inequality operator (!=) confirms that two values are not equal. In this lesson, we practice deriving Booleans from various equality comparisons.
Variables7:01
A variable is a name for a value in the program. It serves as a placeholder for the value that also provides context on what the value represents. In this lesson, we declare variables and reassign values to them.
Declare Variables
Built-In Functions12:03
A function is a reusable procedure, a sequence of steps that execute in order. It can accept inputs (parameters) and produce an output (return value). In this lesson, we invoke some of Python's top-level functions including len, int, float, and type.
Built-In Functions
Custom Functions15:27
Custom functions can encapsulate reusable business logic. In this lesson, we define a convert_to_fahrenheit function that converts a Celsius temperature to Fahrenheit. We define a parameter and produce a return value.
Custom Functions
String Methods18:23
A method is a function attached to an object. An object is just a data value in our program. We invoke methods with a dot, the method name, and a pair of parentheses. Like functions, methods can accept arguments and produce a return value. In this lesson, we practice with string methods like upper, lower, strip, and startswith. We also introduce in the in keyword to check for inclusion.
String Methods
Lists13:22
A list is a mutable data structure for storing elements in order. We declare it with a pair of square brackets ([]). The length of a list is a count of its elements. In this lesson, we instantiate lists and practice adding and removing elements from them.
Creating Lists
Index Positions and Slicing17:51
An index position is a numeric order in line that Python assigns to each element within a list. The index starts counting from 0. In this lesson, we practice extracting elements/characters from lists/strings using their index position.
Index Positions and Slicing
Tuples7:40
A tuple is an immutable collection of zero or more elements in sequence. We declare tuples by separating multiple values with commas. The community convention is to wrap the tuple in parentheses.
Dictionaries13:37
A dictionary is an unordered collection of key-value pairs. A key serves as a unique identifier for a value. In this lesson, we practice creating dictionaries, as well as reading/writing write key-value pairs.
Creating Dictionaries
Classes and Objects6:32
A class is a blueprint for creating one or more objects (digital data structures). We provide an analogy for blueprints and houses in the real world. We also discuss how Python offers shortcuts for creating common objects like lists, dictionaries, and strings.
Importing Modules11:02
A module is a Python file that holds code (classes, functions, constants, etc). Developers use modules to organize related code. We can import modules into our Jupyter Notebook with the import keyword to gain access to additional functionalities within the language. In this lesson, we practice bringing in functionality from the datetime module, which has classes for working with temporal data (datetimes, dates, etc).
Importing Libraries3:36
The import keyword can import both modules and libraries within the Python ecosystem. In this lesson, we import the polars library for data analysis and assign it the alias pl.
Unsigned and Signed Integers11:02
In this lesson, we discuss the different types of integers available in Rust, including unsigned integers (zero or positive values) and signed integers (negative or positive values). We also introduce the smallest unit of memory, the bit.
Quiz

Import the Polars Series5:20
In this section, we'll start exploring the Series, a one-dimensional column of ordered, homogenous data (data of the same type). We kick things off by importing Polars and exploring how we can see its version.
Create a Series10:23
Let's create a few Series! In this lesson, we use the pl.Series constructor to accept a list of values and instantiate a Chips Series. We also explore the name and values parameters to customize the Series name and the source of data.
Data Type Inference10:10
Polars can be strict in its analysis of data. In this lesson, we use the dtype parameter of the Series constructor to customize the data type of the Series' values. We introduce the data types available at the top-level of the Polars library. We also use the strict parameter to ask Polars to be more permissive with mismatches in data types.
Attributes3:50
An attribute is a piece of data/information that lives on an object. We access an attribute with dot syntax, then the attribute name. In this lesson, we access some Series attributes including name, dtype, and shape.
Missing Values6:43
Polars use a null value to represent missing data. null is the equivalent of Python's None or pandas's NaN. Most operations on null values will produce null values. We also discuss the differences between null and not a number (NaN) values.
The alias Method4:25
A method is a functionality available on an object. We invoke a method with a dot, the method name, and a pair of parentheses. Methods can accept arguments and produce a return value. In this lesson, we use the alias method to rename a column/Series.
Import a CSV File with the read_csv Function11:09
It's time to bring in some data from the outside world! In this lesson, we use the read_csv function to bring in a comma-separated values (CSV) files of data. We also discuss how a CSV text file stores its data. Polars imports a CSV as a 2-dimensional DataFrame, so we also discuss how to convert it to a Series.
The head and tail Methods7:07
In this lesson, we introduce the head and tail method to extract a specified number of rows from the beginning and end of a Polars data structure.
Memory Optimization and the schema_overrides Parameter11:26
In this lesson, we introduce the schema and schema_overrides parameters to the read_csv function. They allow the developer to customize the inferred data of columns. Both parameters accept dictionary arguments. The schema parameter requires a complete mapping of columns to desired data types; the schema_overrides parameter only needs the columns that will replace Polars' default inferred types.
Sorting a Series6:36
In this lesson, we use the sort method to sort a Series in both ascending and descending order. We discuss how Polars sort columns of numeric values vs. strings. Most methods in Polars will return a new object rather than mutate the existing one.
Mathematical Methods6:48
In this lesson, we introduce common Series methods for mathematical operations including sum, mean, len, count, null_count, max, min, product, and more.
Rounding Methods4:02
In this lesson, we introduce several Series methods for rounding including ceil (round up), floor (round down), and round (round up or down depending on proximity).
How Polars Differs from Pandas12:33
For data analysts experienced with Pandas, this lesson offers a comparison between the Series object in Pandas vs. Polars.
Quiz

Intro to DataFrames3:09
A DataFrame is a 2-dimensional table consisting of rows and columns. In this lesson, we discuss its basic mechanics with a simple example.
Create a DataFrame from Scratch2:55
In this lesson, we instantiate a DataFrame to scratch by passing a dictionary to the constructor. The keys serve as column names, and the values are lists of data to populate the columns.
Read a DataFrame from CSV7:06
In this lesson, we review the pl.read_csv function to read in a DataFrame from a CSV file. We practice shared methods like head and tail, then introduce DataFrame-specific attributes like columns and dtypes.
No Index, No Problem4:27
Unlike in Pandas, a Polars DataFrame does not have an index. In this lesson, we show how to create an index from scratch if you'd like an experience closer to Pandas.
Intro to Expressions6:02
An expression is a building block, a step in a reusable computation that will be executed at a later point. In this lesson, we build up two expression with the pl.col function. The first targets a column, and the second calculates the mean of its values.
The select Method I6:25
The select method executes one or more expressions, returning the computed columns in a new DataFrame. In this lesson, we pass the expression objects from the previous lesson to select and observe the results!
Renaming Columns8:42
In this lesson, we get reacquainted with the alias method to rename a column. The alias method is critical because the select method will throw an error if multiple columns end up with the same name.
The select Method II11:13
In this lesson, we explore the variations of the arguments we can pass to the select method. We expand on the pl.col syntax to show the different inputs it can accept including multiple strings and lists of strings.
The select Method III: Targeting by Data Type4:12
The pl.col function is even more flexible than we thought! In this lesson, we target DataFrame columns by their data types.
Expressions as Building Blocks5:40
An expression is not coupled to a specific DataFrame or a data type. In this lesson, we apply the same expression to two different DataFrames to prove this point.
Expressions that Count Values7:21
In this lesson, we introduce methods for counting the number of present and missing values within a column. We also show off the helpful describe method for generating a summary of various stats about the DataFrame.
Extracting One or More Rows6:15
In this lesson, we practice using the item and slice methods to pull out one or multiple rows from the DataFrame. We also show off the slice method's flexibility with negative values!
List Slicing Syntax6:17
A Polars DataFrame supports Python's list slicing syntax (although the development team advises against it!). In this lesson, we review the syntax option as well as the shortcuts to pull from the beginning of the DataFrame and to the end of the DataFrame.
Expressions that Target Row Values5:46
In this lesson, we introduce a complementary approach, the get method on an expression, to extract the row values for one or more columns at a specific row index.
Extracting a Single Value from DataFrame with the item Method3:31
If you're looking to locate a value the numeric intersection of a row and column, the item method can help you accomplish that! We'll learn about it in this lesson.
Extracting Rows by Index Positions with the gather and gather_every Methods3:58
In this lesson, we use the gather method to pull out multiple rows by index position and the gather_every method to pull out rows at a consistent interval.
Extracting a Random Set of Values2:30
The sample method extracts a random collection of rows from the DataFrame. In this lesson, we practice using it to target both a fixed number of rows and a total percentage of rows.
Casting Columns to Different Types6:43
In this lesson, we use the cast method to convert the values in a column from one data type to another. We review how choosing a smaller numeric data type can reduce the memory footprint of the data structure.
Customizing the DataFrame Schema5:55
In this lesson, we review the schema_overrides and schema parameters to customize the data types of the columns in a DataFrame. Both parameters accept dictionaries, but schema requires the complete mapping of columns to types while schema_overrides only needs the columns where we want to replace Polars' default inference.
Renaming Columns3:50
In this lesson, we use both the alias and rename methods to name columns within a new DataFrame. We also discuss how to alter column names at the point of dataset import.
The name Attribute5:16
Polars nests additional expression methods under attributes/namespaces. The name attribute/namespace holds methods for adjusting column names. In this lesson, we introduce methods for changing the casing of column names and concatenating a string to the beginning/end of each column name.
Dropping Columns3:17
In this lesson, we use the drop method to remove one or more columns from a DataFrame. We also review the different options for creating an expression targeting multiple columns.
Replacing Values3:24
In this lesson, we learn about the replace method, which swaps the values in a column. We demonstrate two syntax options, parameters and dictionaries, to specify old values and replacement values.
Mathematical Operations I10:16
Time to math! In this lesson, we tackle the symbols and methods for common mathematical operations including addition, subtraction, division, multiplication, exponentiation, and remainder. We also learn the equals method for comparing the equality of DataFrames.
Mathematical Operations II3:56
We can use multiple columns within an expression! In this lesson, we calculate the product of elements across two columns. We also discuss how Polars handles types in calculations.
Cumulative Mathematical Operations6:59
In this lesson, we introduce a family of methods for cumulative operations (tallying the value up until the current row in the DataFrame).
The with_columns Method7:30
The with_columns method creates a new DataFrame that keeps all existing columns and adds new columns from the expressions on the right side. It allows us to keep our old work and expand on it with new calculations. This is a powerful method!
The all and exclude Functions5:15
In this lesson, we review two top-level Polars functions for creating expressions that target multiple columns: all for targeting all columns and exclude for targeting all columns except for the ones specified.
Quiz

The fill_null Method11:47
Polars uses the null keyword to represent a missing value. In this lesson, we cover the first strategy for dealing with null values: replacing them! We use both constants and forward/backward strategies to populate the missing data. Don't miss out!
Interpolation5:02
Interpolation replaces missing values using linear interpolation, which draws a straight line between two values and fills in the gaps along that line. It's a convenient way to fill missing gaps in data that follows a linear pattern.
Dropping Missing Data10:42
The other option for missing data is to remove it entirely. The drop_nulls method removes null values from the target column. We talk about the limitations of the method when combined with the with_columns method.
Sorting by a Single Column11:35
Sorting changes the order of rows based on one or more columns' values. In this lesson, we invoke the sort method to sort columns with a variety of different data types.
Sorting by Multiple Columns I6:33
We may want to sort within a group of equal values. In this lesson, we show to sort a DataFrame by multiple columns. We also pass the descending parameter to customize the sort order per column.
Sorting by Multiple Columns II7:13
In this lesson, we expand on the sorting concepts by passing the descending parameter to customize the sort order per column. Unlike Pandas, Polars requires a list with a length equal to the number of sorted columns.
Characters vs Bytes9:41
Length can be tricky! In this lesson, we discuss the differences between characters and bytes. We also introduce the complementary len_bytes and len_chars methods underneath the str namespace.
Sorting based on Expressions7:28
We can sort a column using the results of another expression! In this lesson, we sort a DataFrame column using the lengths of its string values.
The top_k and bottom_k Methods3:53
top_k and bottom_k are convenience methods to extract a specific number of rows with the the largest/smallest values in a given column.
The rank Method4:49
The rank method assigns each row value a position in line based on its numeric ranking.
The shuffle Method5:05
The shuffle method randomizes the order of elements in a column. If you call this a totally random lesson, you'd be right!
Counting and Extracting Unique Values6:17
In this lesson, we learn the n_unique method for counting the number of distinct values in a column and the unique method to pull them out. We explore these methods on expressions as well as the top-level Polars library.
The value_counts Method8:24
The value_counts method counts the number of occurrences of each unique value. It returns a column of structs, a data structure consisting of key-value pairs that is comparable to a Python dictionary. We discuss how to extract the struct's contents into separate columns.
Quiz

Introducing the Dataset4:10
In this lesson, we introduce the coffee_sales dataset that we'll use throughout this section. It is a collection of transactions from a coffee chain with a wide variety of data types.
The filter Method10:06
This lesson introduces the filter method, which is the primary way to. extract rows that satisfy a condition We'll explore how the filter method relies on expressions that produce Boolean values.
Filtering with Mathematical Operators5:31
This lesson walks through filtering rows using mathematical comparison operators like >, <, ==, and !=. Polars applies the logical comparison operation on every row value to produce a Boolean column.
Filtering with Missing Values8:25
This lesson covers how to filter rows based on missing data. You’ll learn to use methods like is_null and is_not_null instead of direct equality checks with Python's None object.
Filtering with Boolean Columns2:33
In this lesson, we filter directly on boolean columns. This pattern is common when working with precomputed flags or conditions stored in the DataFrame.
Applying And Logic (Multiple Boolean Expressions)12:26
In this lesson, we combine multiple filter conditions using the logical AND operator ( & ). You’ll see how chaining conditions allows for more precise row selection.
Keyword Argument Filtering3:24
This lesson shows an alternative approach for filtering using keyword arguments. It provides a concise alternative when filtering on exact column values (although it is not recommended by the Polars team!).
Applying Or Logic5:57
In this lesson, we introduce the complementary OR operator ( | 0 for filtering rows that match one of several conditions.
Operator Precedence5:28
This lesson explains how operator precedence affects filter expressions. You’ll learn when parentheses are required to ensure Polars evaluates conditions correctly.
Applying Exclusive OR (xor) Logic3:04
The next operator in line is XOR ( ^ ), which ensures that one condition is true but not the other one.
Filtering for Unique and Duplicate Values6:20
This lesson covers filtering rows based on uniqueness or duplication. Polars identifies a value as a duplicate if it occurs more than once in the column.
Filtering with Datetimes9:07
In this lesson, we filter rows using datetime/temporal values. We use Python's native datetime module to create the datetime objects to compare row values against.
The is_between Method7:19
This lesson introduces is_between, which simplifies filtering for values that fall within a range. The method accepts the lower and upper bounds of the interval, which are both exclusive.
The is_in Method3:04
The is_in method filters rows based on inclusion in a list of values. The method offers a shortcut to declaring multiple OR conditions.
The remove Method6:19
This lesson shows off the remove method, which excludes rows that match a given condition. It’s conceptually the inverse of the filter method.
Negation with Tilde Symbol7:27
In this lesson, we apply logical negation using the tilde (~) operator. Trues become Falses, and Falses become Trues. The operator allows you to elegantly invert filter conditions.
When, Then, Otherwise7:16
In this lesson, we introduce conditional logic using the when, then, and otherwise methods. These methods are designed to be chained in sequence and they model the if/else if/else paradigm from programming languages.
Partitioning DataFrames10:17
This final lesson shows how to partition/split a DataFrame into multiple subsets based on a filter condition. The partition method is a nice prerequisite to the groupby object that we'll introduce later in the course.
Quiz

Introducing the Datasets5:44
Welcome to the Joins section. A join merges two DataFrames based on shared values across specified columns. In this lesson, we demonstrate the datasets for our movie fictional streaming service and discuss their logical relationships.
Inner Joins13:23
An inner join matches rows with equal values in both DataFrames. Polars will exclude a key if it does not exist in the other DataFrame. In this lesson, we join the users and watch_history DataFrames, looking for the user IDs that are found in both tables.
The on Parameter2:54
The on parameter specifies the column whose values will be compared across the two DataFrames to be joined. In this lesson, we cover the complementary left_on and right_on parameters for when the column names differ.
Full Joins8:49
A full join merges two DataFrames, joining rows there is a match on values but also keeping rows where is no match. In this lesson, we perform a full join on the users and plans DataFrames to identify both the orphan users and orphan plans across the datasets.
Left and Right Joins6:19
A left join keeps all the records from the left DataFrames and merges matching rows (where possible) from the right DataFrame. Polars will substitute null for values in the right DataFrame's columns when there is no match.
Semi Join2:53
A semi join keeps only the left DataFrame rows that have a match in the right DataFrame. However, Polars does not concatenate the right DataFrame's columns to the new DataFrame. The join is closer to a filter operation than a proper join.
Anti Join2:37
In this lesson, we introduce the complementary join to a semi join, the anti join. An anti join keeps the left DataFrame rows that do not have a match in the right DataFrame. In this lesson, we join the users and support DataFames to identify the users who did not file a ticket/complaint.
Cross Joins/Cartesian Products4:19
A cross join matches every row from the left DataFrame with every row from the right DataFrame. The strategy is called a Cartesian product. The resulting DataFrame's length will be equal to the product of the two DataFrame's lengths.
Joining on Multiple Columns9:00
Polars can join DataFrames based on matching values across multiple columns. Values must match across both columns in order for the rows to be paired together in the joined DataFrame.
The validate Parameter12:31
The validate parameter to the join method asserts on the uniqueness of the join keys in both DataFrames. Think of validate as a safety check before the join. In this lesson, we explore the syntax for specifying unique join keys and multiple join keys in the left and right DataFrames.
The join_asof Method I11:04
The join_asof method matches values on the nearest match rather than an exact match. It is ideal for timeseries data, when we care about proximity rather than perfect equality.
The join_asof Method II: Tolerance6:31
The tolerance parameter sets the constraint/boundary by which the join_asof match can occur in the given search direction. In this lesson, we explore how setting a different time window affects the results of joining our outages and uptime_checks DataFrames.
The join_asof Method III: The by Parameter11:24
Some datasets require a join by exact keys before performing an approximate match. In this lesson, we expand the join_asof method to apply the by parameter to designate the exact join column between two joined DataFrames.
Quiz

Vertical Concatenation4:02
Concatenation stacks/glues two DataFrames together in a specified direction. We kick this section off by performing vertical concatenation, which adds the second DataFrame's rows to the end of the first DataFrame.
Horizontal Concatenation4:27
Horizontal concatenation merges the second DataFrame's columns on the right side of the first DataFrame.
Diagonal Concatenation3:40
In this lesson, we practice diagonal concatenation, which adds both rows and columns to the end of the first DataFrame. Diagonal concatenation expands in a DataFrame in both height (rows) and width (columns).
Align Concatenation9:37
Align concatenation joins rows together based on shared column values, then performs a diagonal concatenation. When there are no matches, Polars fill the missing cells with null.
Relaxed Concatenation9:13
Relaxed concatenation is a less strict form of concatenation that coerces columns to their supertypes. The supertype is a type with the capacity to model all of the original types. In this lesson, we explore different arguments to the how parameter to make our concatenations relaxed.
Rechunking10:27
Rechunking is the process of merging multiple chunks of data together so that it is stored contiguously in memory. Rechunking requires an upfront cost (Polars must copy data) but improves performance in future queries. In this lesson, we introduce the rechunk method and the n_chunks method for seeing how many chunks each column occupies in memory.
The vstack Method11:33
The pd.concat function requires the complete list of DataFrames to merge upfront. In this lesson, we introduce the vstack method on a DataFrame, which allows us to concatenate one DataFrame at a time.
The extend Method7:20
In this lesson, we introduce the extend method for concatenation. We also compare it the vstack method including each method's capacity for rechunking.
The hstack Method2:41
In this quick lesson, we cover the complementary hstack method to horizontally concatenate a DataFrame on the right side of another.
Quiz

Wide vs. Long DataFrames6:00
Wide and long describe two ways of organizing data in a table. Wide DataFrames store the same variable across multiple columns. They expand horizontally with more data. Long DataFrames store each variable in a single column. They expand vertically with more data.
The unpivot Method to Convert a Wide DataFrame to a Long DataFrame6:54
In this lesson, we use the unpivot method to transform a DataFrame from a wide format to a long format. This is equivalent to the melt method in Pandas.
The pivot Method to Convert a Long DataFrame to a Wide DataFrame4:22
Next up is the pivot method, which converts a long DataFrame into a wide DataFrame. The distinct values from a column become the new column headers, and Polars spreads out the values across the correct intersection of index and column.
Pivot Tables I6:36
A pivot table reshapes data by turning unique values into new rows or columns, then summarizing corresponding values with an aggregation operation. In this lesson, we practice with simple operations like pulling out the first and last value for each intersection of row and column.
Pivot Tables II5:22
The aggregate functions from the previous lesson chose one value from a set of possible values. In this lesson, we introduce additional functions that perform aggregate operations across all values.
The transpose Method3:33
The transpose method swaps the axes of a DataFrame. The column headers become row entries, and the row values become column headers. We also discuss some additional parameters to ensure all data is brought over.
Quiz

Arrays and Lists8:41
Polars has two collection types: the array and list. Each row in a list column stores a homogenous collection of zero or more elements in order. In this lesson, we practice creating a list column from scratch.
The str.split Method4:45
This lesson offers a more realistic way you might arrive at a list column: the str.split method, which splits a string based on every occurrence of a delimiter.
The list Namespace5:50
Polars nests list operations underneath a list attribute/namespace. In this lesson, we explore some convenience methods to calculate the lengths of the lists and pull out one or multiple elements from each list.
Sorting the Lists2:59
In this lesson, we review the sort method on a DataFrame and contrast it with the list.sort method on a column of lists.
The explode Method3:50
The list.explode method creates a row entry for every list value. It is sometimes called a "flatten" operation; it creates a one-dimensional sequence of values from a collection of nested lists.
Exploding with Multiple Columns of Lists8:42
In this lesson, we practice exploding multiple columns to find every combination of values across two columns of lists.
Mathematical Operations5:25
In this lesson, we introduce more methods underneath the list namespace, including mathematical operations like sum, max, min, and mean.
The list.eval, list.any, and list.all Methods8:43
In this lesson, we utilize the list.eval method to map each list element to a new value. We combine it with the pl.element function to perform a comparison on every list element.
Concatenating Column Values10:02
In this lesson, we introduced 3 top-level Polars functions approaches for concatenating column values: pl.format, pl.concat_str, and pl.concat_list. We also cover the list.join method for concatenating the contents in a string list with a separator.
Arrays6:21
An array is near identical to a list; it's an ordered container for elements. The difference is that each row's array must be of the same length. If this condition can be met, columns of arrays will be more performant than columns of lists.
The arr Attribute4:07
In this lesson, we review the familiar methods we covered earlier underneath the list attribute but now under the complimentary arr attribute for array columns.
Quiz

Cerințe

Basic/intermediate experience with a spreadsheet software like Microsoft Excel/Google Sheets (common functions, vlookups, countif, pivot tables etc)
Basic experience with the Python programming language (we'll cover the basics if you're brand new!)
Strong knowledge of data types (strings, integers, floating points, booleans) etc

Descriere

Welcome to the most comprehensive Polars course on Udemy!

Data Analysis with Polars and Python offers 22+ hours of in-depth video tutorials on the powerful Polars data analysis library. The course also includes a wide collection of datasets, quizzes, and coding challenges to aid your learning.

Why Polars?

The core of Polars is written in Rust, one of the fastest programming languages in the world. At the same time, the library enables us to write our code in Python, the most popular language in the world. We gain the best of both worlds -- the speed and efficiency of Rust and the simplicity and elegance of Python.

Who is this Course For?

The course is designed for learners of all skill levels, from experienced data analysts to students who have never programmed before. Lessons include:

installing Python and Polars on your computer
understanding the core mechanics of Python
working with the Jupyter Lab coding environment

Whether you've spent time in a spreadsheet software like Microsoft Excel/Google Sheets or another data analysis library like Pandas, Polars can help take your data analysis skills to the next level.

What Topics Will We Cover?

We'll cover the core objects of Polars including:

Series
DataFrames
LazyFrames

Most of our work will focus on the DataFrame, a 2-dimensional table of rows and columns. We'll cover data manipulation operations including:

sorting
filtering
grouping
aggregating
de-duplicating
pivoting
deleting
joining
replacing
working with text data
working with temporal/datetime data

We'll also cover some of Polar's unique column data types including:

lists
arrays
structs

and more!

Data Analysis with Polars and Python

I'm excited to share everything I've learned about Polars, a powerful library that is quickly emerging as a dominant competitor in Python's data science ecosystem. I look forward to seeing you in the course!

Cui se adresează acest curs:

Data analysts and business analysts
Excel/Google Sheets users who looking to learn a more powerful software for data analysis
Developers familiar with Pandas who want to explore the rising entrant in the Python data science ecosystem

Data Analysis with Polars and Python

Ce vei învăța

Explorează subiecte asociate

Exerciții de codare

Conținutul cursului

Introduction12 lecții • 1 h 7 min.

Python Crash Course16 lecții • 2 h 56 min.

Series13 lecții • 1 h 41 min.

DataFrames I28 lecții • 2 h 38 min.

DataFrames II13 lecții • 1 h 38 min.

DataFrames III - Filtering18 lecții • 1 h 58 min.

Joins13 lecții • 1 h 37 min.

Concatenation9 lecții • 1 h 3 min.

Reshaping6 lecții • 33 min.

Arrays and Lists11 lecții • 1 h 9 min.

Cerințe

Descriere

Cui se adresează acest curs: