A Crash Course In PySpark
- Python Familiarity, which can be learned through my 'No Nonsense Python' course
Spark is one of the most in-demand Big Data processing frameworks right now.
This course will take you through the core concepts of PySpark. We will work to enable you to do most of the things you’d do in SQL or Python Pandas library, that is:
Getting hold of data
Handling missing data and cleaning data up
Aggregating your data
And Writing it back
All of these things will enable you to leverage Spark on large datasets and start getting value from your data.
Let’s get started.
- People wanting to leverage their big data with Spark
- How is this course structured00:55
- Introduction to our development environment02:22
- Introduction to our dataset & dataframes02:10
- Environment configuration code snippet02:15
- Ingesting & Cleaning Data17:31
- Answering our scenario questions10:21
- Bringing data into dataframes06:11
- Inspecting A Dataframe03:39
- Handling Null & Duplicate Values05:31
- Selecting & Filtering Data05:09
- Applying Multiple Filters02:19
- Running SQL on Dataframes02:10
- Adding Calculated Columns03:19
- Group By And Aggregation03:22
- Writing Dataframe To Files00:59
- Challenge Overview02:18
- Challenge Solution03:24
- Thanks for joining me to learn PySpark!00:20
Hey guys! I am a data engineer by trade and specialize in Python, SQL, Spark, Hive, MongoDB and more. I've come on Udemy to try and make simple, short crash courses into these technologies as I personally find the longer courses too drawn out & I often lose interest. The idea is to keep it short and sharp!
For loads of advanced Spark, Python & Big Data topics, please visit my website (the button on this page will take you there) - where I talk about scaling up to enterprise grade solutions.