
Install Java on Ubuntu first, run sudo apt-get update, verify with java -version, then proceed with installing Kafka, Hadoop, Spark, and Cassandra for an end-to-end streaming pipeline.
Install and configure Apache Kafka with Zookeeper to enable real-time streaming pipelines; create directories, download and unzip Kafka, update config with IPv4 address, then start Zookeeper and Kafka to test.
Install Apache Cassandra, the NoSQL database, on Ubuntu by adding the repository, installing apt-transport-https, importing the GPG key, updating repositories, and enabling the service, then connect to Cassandra via SQL command.
Create a Cassandra keyspace and a table with json fields from the Game of Thrones dataset—character name, actor name, house name, nickname—for future spark streaming.
The course is designed to provide a comprehensive understanding of real-time big data processing using Kafka, Spark, and Cassandra. In today's world, data is produced at an unprecedented rate, and the ability to process and analyze this data in real-time is critical for making informed decisions. This course focuses on the fundamental concepts and architecture of Kafka, Spark, and Cassandra, and how they work together to create a robust big data processing pipeline.
Students will learn how to set up Kafka clusters and work with Kafka producers and consumers. Students will also learn about Kafka Streams, a client library for building real-time streaming applications that process data directly within Kafka.
Throughout the course, students will gain hands-on experience through practical exercises and projects that simulate real-world scenarios. By the end of the course, students will have a understanding of how to use Kafka, Spark, and Cassandra to build real-time big data processing systems.
Course Objectives:
Understand the fundamental concepts of real-time big data processing
Learn the architecture setup of Kafka, Spark, and Cassandra
Understand how Kafka, Spark, and Cassandra work together to create a real-time big data processing pipeline
Gain hands-on experience with Kafka, Spark, and Cassandra through practical exercises and projects
Learn how to build a real-time big data processing pipeline from scratch
This course is intended for software engineers, data engineers, and data analysts who have a basic understanding of programming concepts and are familiar with SQL.