Apache Spark Core and Structured Streaming 3.0 In-Depth
What you'll learn
- Strong focus on the practicality by getting into hands-on mode with plentiful of examples
- Develop in-depth understanding of the underlying concepts the core of Apache Spark
- Know the ways to get the best performance from Spark in production
- Avoid the common pitfalls when writing Spark applications
- In Depth exploration of Spark Structured Streaming 3.0 using Python API.
- Get introduced to Apache Kafka on a high level in the process.
- Understand the nuances of Stream Processing in Apache Spark
- Discover various features Spark provides out of the box for Stream Processing
Requirements
- We'll be using Python API in Spark Programming. However, we'll explain all the programs in details but fundamental knowledge of Python will be beneficial.
Description
Apache Spark has turned out to be the most sought-after skill for any big data engineer. An evolution of MapReduce programming paradigm, Spark provides unified data processing from writing SQL to performing graph processing to implementing Machine Learning algorithms. It effectively uses cluster nodes and better memory management to spread the load across cluster of nodes to get faster results. Apache Spark drives the mission of data-driven-decision-making in thousands of organizations.
In order to fairly appreciate the benefits of the libraries of Apache Spark, it is essential to know the foundations right. This course aims exactly at that part. It starts from the beginner level and gradually explains all the complex concepts in an easy to reflect manner. It gives a profound description of the features and working of the framework through 5 different use cases with detailed hands on implementations. In fact, some hands-on sessions and solutions to the use-cases are explained in a full classroom mode with videos extending over 40 mins. After taking this course, you will gain the expertise on Spark Core and usage of further libraries like Spark SQL, Structured Streaming, Spark ML and GraphX will be much easier to visualize, implement and optimize.
This illustrative course will build your foundational knowledge. You will learn the differences between batch & stream processing, programming model, the APIs and the challenges specific to stream processing. Quickly we'll move to understand the concepts of stream processing with wide varieties of examples & hands-on, dealing with inner working and taking a use case towards the end. All of this activity will be on cloud using Spark 3.0.
Who this course is for:
- Data engineers and developers who wish to leverage the fast analytics using Apache Spark in production environments.
Instructor
With over a decade of industry experience in distributed computation, Amit has been involved in a variety of big data engineering projects across different domains. He is a certified Hadoop Developer with skills in architecting, designing, developing, programming, administering and instructing.
Currently working as a Senior Data Engineer, he deals with data, its processing challenges and optimizations on a daily basis. In the past he has been worked with multiple companies including one of the biggest e-commerce company, the largest on-demand cab provider, the biggest social networking product and the largest on-demand music and video provider. He has been a mentor to hundreds of professionals and his videos have helped hundreds of thousands of learners.