Master Big Data Realtime Streaming

Name: Master Big Data Realtime Streaming
Rating: 3.9 (19 reviews)

Learn the Core Concepts of Big data Realtime Streaming Analytics and also work with Hands On Examples

Created byShruti Mantri, Aditya S

Last updated 7/2021

English

What you'll learn

Learn designing an end-to-end Real-time Streaming pipeline for Big Data using latest technologies.
Understand the different components in Big Data streaming pipeline.
Use Kafka as the connecting tool between ETL components in the real-time streaming pipeline.
Use Apache Flink, Spark Streaming and Kafka Streams to perform different transformations and aggregations.
Use Druid and Pinot as OLAP technologies in the streaming pipeline.
Use Superset to visualize the real-time incoming data stream to explore and visualize the transformed data.
Hands-on Practicals helping you build all the components and forming a complete end-to-end pipeline.
Learn multiple technologies used in Real-time Streaming pipelines, and you can use the one that better suits your use-case.

Course content

9 sections • 39 lectures • 4h 51m total length

Introduction3:37
Learn real-time streaming by capturing, transforming, and analyzing data from websites, apps, and smart devices to drive on-the-fly insights in areas like e-commerce and fraud detection.
Course Objectives1:37
Explore how to ingest data into Kafka, perform real-time processing with Flink, Spark Streaming, and Kafka Streams, and analyze with Pinot and Druid, ending with dashboards in Superset.
Walkthrough of the Real Time Pipeline1:59
Walk through building real-time pipeline: data generator emits data, push to a Kafka topic, process with Spark Streaming, Flink, and KStreams, store in Druid and Pinot, and visualize in Superset.
Additional System Requirements Needed0:28

Installing Kafka on Mac / Linux9:38
Install apache zookeeper and kafka on mac or linux, configure zookeeper with default zoo.cfg and port 2181, then download kafka and set zookeeper.connect to localhost:2181 before starting kafka server.
Introduction to Kafka Basics3:45
Explore Kafka basics, an open source distributed high-throughput streaming platform, and learn how topics, partitions, and offsets organize data across brokers with replicas for fault-tolerant pipelines.
Helpful Kafka Commands6:45
Learn to manage Kafka topics end-to-end by creating topics with partitions, listing and describing topics, producing messages via the console producer, consuming from beginning, and deleting topics.
Generating Data into Kafka7:25
Develop a Python data generator that streams California real estate transactions into a Kafka topic by serializing lines as JSON with a processing timestamp, emitting every two seconds.

Introduction to Apache Flink3:51
Explore Apache Flink, a distributed, stateful streaming engine that processes bounded and unbounded data with in-memory speed, scales to thousands of tasks, and runs on clusters or standalone.
Creating a Simple Flink Job19:11
Connect Apache Pinot data to Superset via a docker-based Superset setup and Pinot connector, while illustrating Flink's stateful, large-scale streaming with in-memory state and low-latency processing.
Data Transformations Using Flink11:07
Transform real estate transaction data in Flink by normalizing city to lowercase, converting state to California, calculating price per square foot, and deriving a house type enum.
Writing Output to Kafka8:04
Write the transformed real estate transaction payload to a Kafka topic using a Flink coproducer and a real estate transaction transformed serialization schema.
Data Aggregations Using Flink14:15
Develop a real estate transaction aggregation pipeline using flink, creating an estate transaction aggregate with city and type keys, serialisation schemas, and kafka-based streaming.
Installing Flink on Mac / Linux4:10
Install flink on mac or linux by downloading the flink package, uncompressing it, configuring the files, and starting the cluster to access the flink dashboard at localhost:8081.
Running Job On Flink Cluster8:05
Learn to deploy Flink jobs onto a local Flink cluster by adding dependencies and plugins, and by executing flink run commands to run and manage map and reduce streaming apps.

Introduction to Spark Streaming3:12
This lecture introduces Spark streaming for real-time processing, reading from Kafka, applying structured streaming, and writing results back to Kafka or systems, with micro-batch and continuous modes and fault recovery.
Setting up the Project13:01
Set up a real-time streaming project by configuring Kafka brokers and topics for real estate data. Create the Spark streaming project with Scala and a real estate input schema.
Data Transformation using Spark Streaming19:40
Build a Spark streaming app that reads Kafka data, applies schema-based transformations (city lowercase, price per square foot), and writes results back to Kafka with checkpointing in micro-batches.
Data Aggregation using Spark Streaming6:46
Expose Apache Pinot data to Superset by installing Superset with Docker and using the Pinot connector. The lecture notes that Druid connects with Superset via JDBC.

Introduction to Kafka Streams3:08
Explore Kafka Streams for real-time processing by reading data from a topic, performing transformations and aggregations, and writing results back to a topic with exactly-once semantics.
Setting up the Project21:00
Set up a real-time streaming project by verifying the broker, creating raw, transmission, and aggregated data topics, and configuring Jackson-based serializers for real estate transaction data.
Data Transformation using Kafka Streams25:34
Transform real estate transaction data with Kafka Streams. Build a topology that reads input, applies city lowercase and state mapping, computes price per square foot, writes to an output topic.
Data Aggregation using Kafka Streams16:50
Master how to perform aggregations with Kafka Streams by transforming raw data into aggregate objects, grouping by city and state, and writing aggregated results to an aggregate topic.

Introduction to Apache Pinot4:11
Discover Apache Pinot as a real-time distributed OLAP system for low-latency streaming analytics. Understand data freshness, segments, brokers, and controller and zookeeper roles in building enterprise dashboards.
Installing Apache Pinot on Mac / Linux5:26
Download the latest Pinot binary, extract it, and start zookeeper, the Pinot controller, and broker on Mac or Linux; then verify all components via the data explorer at localhost:9000.
Ingesting data from Kafka into Pinot9:42
Ingest real-time data from Kafka into Pinot by defining a real estate transaction guestrooms schema, creating a real-time table, and validating data flow from Kafka to Pinot.
Querying data from Pinot2:03
Learn how to query data from Pinot with basic selector queries on streaming records, group by city, order by county, and apply transformations for realtime insights.

Introduction to Apache Druid2:53
Apache Druid is a high-performance real-time analytics database that ingests batch and streaming data, enabling encrypted at rest and in transit, authentication and authorization, and low-latency, high-concurrency queries.
Installing Apache Druid on Mac / Linux8:07
download the latest Apache Druid binary for Mac or Linux, unzip it, and use the micro quickstart single-server configuration; adjust ZooKeeper port 2181 and verify at localhost:18888.
Ingesting data from Kafka into Druid6:17
Ingest real estate data from a Kafka topic into Apache Druid using a Flink transform, configuring the bootstrap service, starting a Flink cluster, and loading latest-offset data into Druid.
Querying data from Druid1:48
Query data from Druid after ingesting from Kafka to observe real-time updates in the query tab, tracking the live growth from 40 records.

Introduction to Apache Superset3:46
Explore data quickly with Apache Superset, a modern, fast, and easy-to-use data exploration and visualization platform with rich visualizations and dashboards, and broad database integration.
Installing Superset on Mac / Linux5:20
Install and run Apache Superset on macOS or Linux using Docker to connect Pinot's datastore through a Pinot connector, exposing Pinot data via Superset dashboards.
Exploring Pinot data on Superset7:33
Connect to Pinot on Superset, configure the broker and controller, and create real-time charts showing citywise real estate transactions and average price per square foot.
Changes for Druid-Superset Connection6:50
Configure Druid basic security and authentication, update the broker IP, set admin password, restart Druid, and validate the Druid-Superset connection to access datasets.
Exploring Druid data on Superset7:17
Explore Druid data on Superset by connecting to Druid, adding datasets, and building realtime charts to analyze city-level metrics and data flowing from Flink.
Creating Dashboards on Superset3:35
Learn to create real-time dashboards in Superset, add charts, and configure auto refresh with a 10-second cadence, then start a data generator to see live updates.

Requirements

An exposure to Big Data world will help you better appreciate Real-time Streaming pipelines, but is completely optional.
Basic knowledge of Java and Scala will be helpful, but not mandatory.

Description

Getting real-time insights from huge volumes of data is very important for a majority of companies today.

Big data Real-time streaming is used by some of the biggest companies in the world like e-commerce companies, Video streaming companies, Banks, Ride-hailing companies, etc.

Knowing about the concepts of realtime streaming and the various realtime streaming technologies will be a great addition to your skillset and will enable you to build some of the most cutting-edge solutions that exist today.

We have created this Hands-On Course so that you get a good understanding about how realtime streaming systems can be built

This course will ensure that you get a hands-on experience with Apache Kafka, Apache Flink, Spark Streaming, Kafka Streams, Apache Pinot, Apache Druid, and Apache Superset.

This course covers the following topics

An Introduction to Kafka with hands-on Kafka setup
Understanding basic transformations and aggregations which can be done in a real time system
Learn how transformations and aggregations can be done using Apache Flink with hands-on coding exercises
Learn how transformations and aggregations can be done using Spark streaming with hands-on coding exercises
Learn how Kafka streams can be used to perform transformations and aggregations with hands-on coding exercises
Ingest data into Apache Pinot which is an OLAP technology
Ingest data into Apache Druid which is also an OLAP technology
Using Apache Superset to create some insightful dashboards

If you are interested in learning how all these technologies can be connected together to build an end to end real-time streaming system, then this course is for you.

Who this course is for:

Students who want to learn building real-time streaming pipelines from SCRATCH to its Live Project Implementation.
Students who want to learn latest technologies that are used in Big Data Engineering.
Developers who want to learn different well-known tools to build streaming pipelines.
Students who want to pursue and grow career in Data Engineering.

Master Big Data Realtime Streaming

What you'll learn

Explore related topics

Course content

Introduction4 lectures • 8min

Ingesting data into Kafka4 lectures • 28min

Real Time Data Processing2 lectures • 3min

Processing events using Flink7 lectures • 1hr 9min

Processing events using Spark Streaming4 lectures • 43min

Processing events using Kafka Streams4 lectures • 1hr 7min

Putting data into Apache Pinot4 lectures • 21min

Putting data into Apache Druid4 lectures • 19min

Dashboarding6 lectures • 34min

Requirements

Description

Who this course is for: