Spark components partitions, transformations, and actions
Working with DataFrame API
Working with columns and rows
Working with dates
Working with RDDs
Understanding of Python and Spark
PySpark is the collaboration of Apache Spark and Python. This course covers all the fundamentals of Apache Spark with Python and teaches you everything you need to know about developing Spark applications using PySpark, the Python API for Spark. At the end of this course, you will gain in-depth knowledge about Apache Spark and general big data analysis.
This course helps you get comfortable with PySpark, explaining what it has to offer and how it can enhance your data science work. We'll first get into the Spark ecosystem, detailing its advantages over other data science platforms, APIs, and tool sets.
Next, we'll look at the DataFrame API and how it's the platform's answer to many big data challenges. We'll also go over Resilient Distributed Datasets (RDDs), the building blocks of Spark.