PySpark: Python, Spark and Hadoop Coding Framework & Testing

Name: PySpark: Python, Spark and Hadoop Coding Framework & Testing
Rating: 4.3 (217 reviews)

PyCharm : Big Data Python Spark, PySpark Coding Framework, Logging, Error Handling, Unit Testing, PostgreSQL, Hive

Created byFutureX Skills

Last updated 12/2025

English

What you'll learn

Python Spark PySpark industry standard coding practices - Logging, Error Handling, reading configuration, unit testing
Building a data pipeline using Hive, Spark and PostgreSQL
Python Spark Hadoop development using PyCharm

Course content

10 sections • 53 lectures • 4h 25m total length

Introduction1:40
What is Big Data Spark?2:23

What is Spark?5:25
Running Spark on Google Cloud Dataproc5:01
Running Python Spark 3 on Google Colab1:38
Spark for Data Transformation1:39
What is a DataFrame?1:30
RDDs - The fundamental building block1:59
Python basics6:08
PySpark - Creating RDDs4:53
Python functions and lambda expressions2:57
RDD - Transformation & Action10:45
PySpark Data Engineering: Solve Real Business Problems22:04
Spark SQL and Temporary Views - Querying DataFrames with SQL14:31

Requirements

Basic programming skills
Basic database skills
Hadoop entry level knowledge

Description

This course will bridge the gap between academic learning and real-world applications, preparing you for an entry-level Big Data Python Spark developer role. You will gain hands-on experience and learn industry-standard best practices for developing Python Spark applications. Covering both Windows and Mac environments, this course ensures a smooth learning experience regardless of your operating system.

You will learn Python Spark coding best practices to write clean, efficient, and maintainable code. Logging techniques will help you track application behavior and troubleshoot issues effectively, while error handling strategies will ensure your applications are robust and fault-tolerant. You will also learn how to read configurations from a properties file, making your code more adaptable and scalable. Key Modules :

Python Spark coding best practices for clean, efficient, and maintainable code using PyCharm
Implementing logging to track application behavior and troubleshoot issues
Error handling strategies to build robust and fault-tolerant applications
Reading configurations from a properties file for flexible and scalable code
Developing applications using PyCharm in both Windows and Mac environments
Setting up and using your local environment as a Hadoop Hive environment
Reading and writing data to a Postgres database using Spark
Working with Python unit testing frameworks to validate your Spark applications
Building a complete data pipeline using Hadoop, Spark, and Postgres

Prerequisites:

Basic programming skills
Basic database knowledge
Entry-level understanding of Hadoop

This course uses high-quality AI-generated text-to-speech narration to complement the powerful visuals and enhance your learning experience.

Who this course is for:

Students looking at moving from Big Data Spark academic background to a real world developer role

PySpark: Python, Spark and Hadoop Coding Framework & Testing

What you'll learn

Explore related topics

Course content

Introduction2 lectures • 4min

Setting up Hadoop Spark development environment3 lectures • 23min

Creating a PySpark coding framework5 lectures • 21min

Logging and Error Handling5 lectures • 33min

Creating a Data Pipeline with Hadoop Spark and PostgreSQL6 lectures • 25min

Reading configuration from properties file2 lectures • 5min

Unit testing PySpark application3 lectures • 9min

spark-submit3 lectures • 3min

Appendix - Big Data Hadoop Hive for beginners12 lectures • 1hr 4min

Appendix - PySpark on Colab and DataFrame deep dive12 lectures • 1hr 19min

Requirements

Description

Who this course is for: