
Learn real-time streaming by capturing, transforming, and analyzing data from websites, apps, and smart devices to drive on-the-fly insights in areas like e-commerce and fraud detection.
Explore how to ingest data into Kafka, perform real-time processing with Flink, Spark Streaming, and Kafka Streams, and analyze with Pinot and Druid, ending with dashboards in Superset.
Walk through building real-time pipeline: data generator emits data, push to a Kafka topic, process with Spark Streaming, Flink, and KStreams, store in Druid and Pinot, and visualize in Superset.
Install apache zookeeper and kafka on mac or linux, configure zookeeper with default zoo.cfg and port 2181, then download kafka and set zookeeper.connect to localhost:2181 before starting kafka server.
Explore Kafka basics, an open source distributed high-throughput streaming platform, and learn how topics, partitions, and offsets organize data across brokers with replicas for fault-tolerant pipelines.
Learn to manage Kafka topics end-to-end by creating topics with partitions, listing and describing topics, producing messages via the console producer, consuming from beginning, and deleting topics.
Develop a Python data generator that streams California real estate transactions into a Kafka topic by serializing lines as JSON with a processing timestamp, emitting every two seconds.
Explore payload transformations: lowercase the city, convert the state to California if not sea level, compute price per square foot from price and area, and derive a beds-based state attribute.
Aggregate data by city and type to count occurrences and track the latest payload time for each combination.
Explore Apache Flink, a distributed, stateful streaming engine that processes bounded and unbounded data with in-memory speed, scales to thousands of tasks, and runs on clusters or standalone.
Connect Apache Pinot data to Superset via a docker-based Superset setup and Pinot connector, while illustrating Flink's stateful, large-scale streaming with in-memory state and low-latency processing.
Transform real estate transaction data in Flink by normalizing city to lowercase, converting state to California, calculating price per square foot, and deriving a house type enum.
Write the transformed real estate transaction payload to a Kafka topic using a Flink coproducer and a real estate transaction transformed serialization schema.
Develop a real estate transaction aggregation pipeline using flink, creating an estate transaction aggregate with city and type keys, serialisation schemas, and kafka-based streaming.
Install flink on mac or linux by downloading the flink package, uncompressing it, configuring the files, and starting the cluster to access the flink dashboard at localhost:8081.
Learn to deploy Flink jobs onto a local Flink cluster by adding dependencies and plugins, and by executing flink run commands to run and manage map and reduce streaming apps.
This lecture introduces Spark streaming for real-time processing, reading from Kafka, applying structured streaming, and writing results back to Kafka or systems, with micro-batch and continuous modes and fault recovery.
Set up a real-time streaming project by configuring Kafka brokers and topics for real estate data. Create the Spark streaming project with Scala and a real estate input schema.
Build a Spark streaming app that reads Kafka data, applies schema-based transformations (city lowercase, price per square foot), and writes results back to Kafka with checkpointing in micro-batches.
Expose Apache Pinot data to Superset by installing Superset with Docker and using the Pinot connector. The lecture notes that Druid connects with Superset via JDBC.
Explore Kafka Streams for real-time processing by reading data from a topic, performing transformations and aggregations, and writing results back to a topic with exactly-once semantics.
Set up a real-time streaming project by verifying the broker, creating raw, transmission, and aggregated data topics, and configuring Jackson-based serializers for real estate transaction data.
Transform real estate transaction data with Kafka Streams. Build a topology that reads input, applies city lowercase and state mapping, computes price per square foot, writes to an output topic.
Master how to perform aggregations with Kafka Streams by transforming raw data into aggregate objects, grouping by city and state, and writing aggregated results to an aggregate topic.
Discover Apache Pinot as a real-time distributed OLAP system for low-latency streaming analytics. Understand data freshness, segments, brokers, and controller and zookeeper roles in building enterprise dashboards.
Download the latest Pinot binary, extract it, and start zookeeper, the Pinot controller, and broker on Mac or Linux; then verify all components via the data explorer at localhost:9000.
Ingest real-time data from Kafka into Pinot by defining a real estate transaction guestrooms schema, creating a real-time table, and validating data flow from Kafka to Pinot.
Learn how to query data from Pinot with basic selector queries on streaming records, group by city, order by county, and apply transformations for realtime insights.
Apache Druid is a high-performance real-time analytics database that ingests batch and streaming data, enabling encrypted at rest and in transit, authentication and authorization, and low-latency, high-concurrency queries.
download the latest Apache Druid binary for Mac or Linux, unzip it, and use the micro quickstart single-server configuration; adjust ZooKeeper port 2181 and verify at localhost:18888.
Ingest real estate data from a Kafka topic into Apache Druid using a Flink transform, configuring the bootstrap service, starting a Flink cluster, and loading latest-offset data into Druid.
Query data from Druid after ingesting from Kafka to observe real-time updates in the query tab, tracking the live growth from 40 records.
Explore data quickly with Apache Superset, a modern, fast, and easy-to-use data exploration and visualization platform with rich visualizations and dashboards, and broad database integration.
Install and run Apache Superset on macOS or Linux using Docker to connect Pinot's datastore through a Pinot connector, exposing Pinot data via Superset dashboards.
Connect to Pinot on Superset, configure the broker and controller, and create real-time charts showing citywise real estate transactions and average price per square foot.
Configure Druid basic security and authentication, update the broker IP, set admin password, restart Druid, and validate the Druid-Superset connection to access datasets.
Explore Druid data on Superset by connecting to Druid, adding datasets, and building realtime charts to analyze city-level metrics and data flowing from Flink.
Learn to create real-time dashboards in Superset, add charts, and configure auto refresh with a 10-second cadence, then start a data generator to see live updates.
Getting real-time insights from huge volumes of data is very important for a majority of companies today.
Big data Real-time streaming is used by some of the biggest companies in the world like e-commerce companies, Video streaming companies, Banks, Ride-hailing companies, etc.
Knowing about the concepts of realtime streaming and the various realtime streaming technologies will be a great addition to your skillset and will enable you to build some of the most cutting-edge solutions that exist today.
We have created this Hands-On Course so that you get a good understanding about how realtime streaming systems can be built
This course will ensure that you get a hands-on experience with Apache Kafka, Apache Flink, Spark Streaming, Kafka Streams, Apache Pinot, Apache Druid, and Apache Superset.
This course covers the following topics
An Introduction to Kafka with hands-on Kafka setup
Understanding basic transformations and aggregations which can be done in a real time system
Learn how transformations and aggregations can be done using Apache Flink with hands-on coding exercises
Learn how transformations and aggregations can be done using Spark streaming with hands-on coding exercises
Learn how Kafka streams can be used to perform transformations and aggregations with hands-on coding exercises
Ingest data into Apache Pinot which is an OLAP technology
Ingest data into Apache Druid which is also an OLAP technology
Using Apache Superset to create some insightful dashboards
If you are interested in learning how all these technologies can be connected together to build an end to end real-time streaming system, then this course is for you.