Apache Spark Project for Beginners: A Complete Project Guide

Name: Apache Spark Project for Beginners: A Complete Project Guide
Rating: 4.1 (62 reviews)

Real-Time Message Processing Application

Created byPARI MARGU

Last updated 2/2020

English

What you'll learn

End to End Apache Spark Project Development
How Real-Time Streaming Application Works
Features of Spark Structured Streaming using Scala
How Apache Kafka works well with Apache Spark
How to make use of NoSQL like MongoDB and RDBMS like MySQL in Real-Time Streaming Application
How to build nice Visualisation Dashboard using Python

Course content

1 section • 7 lectures • 2h 32m total length

Architecture | Part 110:30
Develop an end-to-end stream processing architecture using Spark Streaming, Kafka, and a persistence layer with MongoDB, then visualize results with a Python dashboard.
Environment Setup | Part 218:54
Hi Friends, Please find below link for Apache Hadoop, Spark and Hive setup.
Step 1: Install Apache Hadoop - https://youtu.be/FzH7K4N0kT8
Step 2: Install Apache Hive - https://youtu.be/Mi0m78a5W4U
Apache Spark 2.4.4 Installation | Part 1 - https://youtu.be/hXLr1dMWVVE
Apache Spark 2.4.4 Installation | Part 2 - https://youtu.be/690DO_lXkns
Kafka Producer using Python | Part 317:45
Build a real-time streaming workflow by consuming meetup RSVP data and publishing messages to a Kafka topic with a Python producer, as part of the Apache Spark project for beginners.
Spark Structured Streaming | MongoDB | Part 435:07
Spark Structured Streaming | MySQL | Part 530:03
Visualisation using Python Dash | Part 630:05
Demo | Part 710:30

Requirements

Any basic programming language
Basic understanding of Apache Spark

Description

End to End Project Development of Real-Time Message Processing Application: In this Apache Spark Project, we are going to build Meetup RSVP Stream Processing Application using Apache Spark with Scala API, Spark Structured Streaming, Apache Kafka, Python, Python Dash, MongoDB and MySQL. And we are going to build a data pipeline which takes data from stream data source(Meetup Dot Com RSVP Stream API Data) to Data Visualisation using Apache Spark and other big data frameworks.

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance.

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

A NoSQL (originally referring to "non-SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.

Who this course is for:

Beginners who want to get End to End Apache Spark/Big Data Project Development Process and Architecture