Spark Structured Streaming 3.0 : All You Need to Know
What you'll learn
- In Depth exploration of Spark Structured Streaming 3.0 using Python API.
- Get introduced to Apache Kafka on a high level in the process.
- Understand the nuances of Stream Processing in Apache Spark
- Discover various features Spark provides out of the box for Stream Processing
Requirements
- Understanding of Spark SQL and Python (or pyspark) will be helpful
Description
Getting faster action from the data is the need of many industries and Stream Processing helps doing just that. But it comes with its own set of theories, challenges and best practices.
Apache Spark has seen tremendous development being in stream processing. The rich features of Spark Structured Streaming introduces a learning curve and this course is aimed at bringing all those concepts in a friendly and easy to reflect manner. Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you would express a batch computation on static data. The Spark SQL engine will take care of running it incrementally and continuously and updating the final result as streaming data continues to arrive. It allows data engineers and data scientists to process real-time data from various sources including (but not limited to) Kafka, Flume, and Amazon Kinesis.
This illustrative course will build your foundational knowledge. You will learn the differences between batch & stream processing, programming model, the APIs and the challenges specific to stream processing. Quickly we'll move to understand the concepts of stream processing with wide varieties of examples & hands-on, dealing with inner working and taking a use case towards the end. All of this activity will be on cloud using Spark 3.0.
Who this course is for:
- Data Engineers looking to expand their skill set, Data Scientists who wish want hands on working with stream processing and Technical Architects who want to evaluate the Spark Structured Streaming for their use cases
Instructor
With over a decade of industry experience in distributed computation, Amit has been involved in a variety of big data engineering projects across different domains. He is a certified Hadoop Developer with skills in architecting, designing, developing, programming, administering and instructing.
Currently working as a Senior Data Engineer, he deals with data, its processing challenges and optimizations on a daily basis. In the past he has been worked with multiple companies including one of the biggest e-commerce company, the largest on-demand cab provider, the biggest social networking product and the largest on-demand music and video provider. He has been a mentor to hundreds of professionals and his videos have helped hundreds of thousands of learners.