
Answer data questions with Polars in Python by using real world data sets, write readable Polars code for reuse, and create visualizations with Matplotlib to communicate insights.
Compute the median budget for the movies by selecting the budget column and applying dot median in the data frame. Shows 15 million USD as the median budget.
Load the Polars data frame, compute decades from release years, group by decade to count movies, and visualize the results with a bar plot.
Use Polars in Python to group by director, sum profits, alias the profit column, and obtain the top five directors by profit.
Use Polars in Python to select the title column, compute its length, sort by length descending, and identify the longest movie title.
Inspect a data frame to reveal language distribution, using the original language column and value_counts with sort enabled to display languages in descending order, highlighting English as most represented.
Learn to answer data questions with Polars in Python by chaining a dataframe, selecting title and cast, and filtering where the cast contains the actor to reveal relevant movie titles.
Replace the director column with vote average to answer the favorite movie rating question using polars in python. The example shows Catch Me If You Can has a rating of 7.7.
Calculate the popularity of Catch Me If You Can and compare it to the dataset mean with Polars in Python, then create a boolean column Popularity above average.
Use Polars in Python to convert movie runtimes from minutes to hours, select title and runtime, and reveal Catch Me If You Can is 2.5 hours.
Answer question three by calculating the average number of likes per comment using Polars in Python, selecting the votes column, and showing the result as 11 likes.
Clean the time column in a polars dataframe by removing the word edited, group by time, count occurrences, convert to pandas, and plot a bar plot of comments per year.
Identify the top five authors by reply count using Polars in Python, by applying top_k on author counts and selecting five.
Compute the average number of likes per reply using Polars in Python, refining existing code to select the relevant data and produce an average of two likes per reply.
Load and explore the massive Stack Overflow survey data set with Polars in Python, inspect 89,184 rows and 84 columns, and focus on key columns to answer questions.
Use Polars in Python to clean the yes code column, cast to integers, compute average years of coding by profession, and reveal senior executive as having the highest coding experience.
Add a language have worked with column in the data frame, filter under 18 years old. Compute the Python percentage among under 18 years old, showing 68 percent.
Demonstrates using Polars in Python to compute the median years of coding experience for JavaScript developers, by selecting relevant columns, converting strings to numbers, and handling Na values.
Use Polars in Python to chain filter on the country column by a G7 list and show the unique G7 countries.
Learn to answer country question 3 with Polars in Python by filtering for India and Python, then calculate the India Python knowledge percentage of 49.65%.
Create a beautiful pie chart showing database usage for MySQL, Microsoft SQL Server, and Oracle, converting Polars to pandas and styling with matplotlib.
Description
Data science and analysis is all about asking questions about your data. This course is deliberately designed for that purpose. We’ll be answering a variety of questions based on three real-world datasets. You’ll learn effective patterns for data manipulation and how to write easy-to-read code that can be re-used. To do the analysis, we’ll use Polars, a blazingly fast DataFrame library for Python that enables you to handle large datasets effortlessly. Additionally, you’ll learn how to create beautiful visualizations to effectively communicate the message of your analysis.
Real-world Datasets
This course uses three real-world datasets: 1) IMDB Movies, 2) YouTube Comments, 3) Stack Overflow Survey. You will tackle a variety of questions related to these datasets, employing diverse approaches to extract meaningful information. From statistical analysis to data visualization, you will develop a versatile skill set that will empower you to address complex data-related inquiries. This hands-on experience with real-world datasets will not only enhance your technical proficiency but also prepare you for the multifaceted challenges that arise in the world of data science and analysis.
Beautiful Visualizations
You will learn how to create visually appealing charts using code, enabling you to adeptly convey the insights derived from your analysis. The focus is on the art and science of designing compelling visuals that effectively communicate the intended message, enhancing your ability to present and share your analytical findings in a clear and impactful manner.