A Crash Course In PySpark

Learn all the fundamentals of PySpark
Rating: 4.6 out of 5 (90 ratings)
3,526 students
A Crash Course In PySpark
Rating: 4.6 out of 5 (90 ratings)
3,526 students
PySpark, Apache Spark, Big Data Analytics, Big Data Processing, Python

Requirements

  • Python Familiarity, which can be learned through my 'No Nonsense Python' course
Description

Spark is one of the most in-demand Big Data processing frameworks right now.


This course will take you through the core concepts of PySpark. We will work to enable you to do most of the things you’d do in SQL or Python Pandas library, that is:

  • Getting hold of data

  • Handling missing data and cleaning data up

  • Aggregating your data

  • Filtering it

  • Pivoting it

  • And Writing it back

All of these things will enable you to leverage Spark on large datasets and start getting value from your data.

Let’s get started.

Who this course is for:
  • People wanting to leverage their big data with Spark
Course content
5 sections • 19 lectures • 1h 15m total length
  • Introduction
    00:47
  • How is this course structured
    00:55
  • Introduction to our development environment
    02:22
  • Introduction to our dataset & dataframes
    02:10
  • Environment configuration code snippet
    02:15
  • Ingesting & Cleaning Data
    17:31
  • Answering our scenario questions
    10:21
  • Bringing data into dataframes
    06:11
  • Inspecting A Dataframe
    03:39
  • Handling Null & Duplicate Values
    05:31
  • Selecting & Filtering Data
    05:09
  • Applying Multiple Filters
    02:19
  • Running SQL on Dataframes
    02:10
  • Adding Calculated Columns
    03:19
  • Group By And Aggregation
    03:22
  • Writing Dataframe To Files
    00:59
  • Challenge Overview
    02:18
  • Challenge Solution
    03:24
  • Thanks for joining me to learn PySpark!
    00:20

Instructor
Data Engineer at Kodey
Kieran Keene
  • 4.5 Instructor Rating
  • 376 Reviews
  • 18,338 Students
  • 3 Courses

Hey guys! I am a data engineer by trade and specialize in Python, SQL, Spark, Hive, MongoDB and more. I've come on Udemy to try and make simple, short crash courses into these technologies as I personally find the longer courses too drawn out & I often lose interest. The idea is to keep it short and sharp!


For loads of advanced Spark, Python & Big Data topics, please visit my website (the button on this page will take you there) - where I talk about scaling up to enterprise grade solutions.