Getting Started with Apache Flink

Name: Getting Started with Apache Flink
Rating: 3.8 (17 reviews)

An Overview of Apache Flink

Created byAnurag Kaushik

Last updated 11/2022

English

English [Auto],

What you'll learn

Architecture of Apache Flink
Distributed Execution
Job Manager & Task Manager
How to install & download flink on different machines

Course content

1 section • 18 lectures • 54m total length

What is Apache Flink2:27
Explore Apache Flink's streaming-first architecture, its real streaming processing with batch as a special case, and key topics like history, architecture, and running a sample application.
History of Apache Flink1:42
Trace the history of Apache Flink from the Stratosphere project to incubator project, showing evolution from a Java API to a platform for batch, stream, graph processing, and machine learning.
Architecture of Apache Flink2:12
Explore the architecture of Apache Flink, its layered components and data stream and data set APIs, the runtime with job graphs, and deployment options from local to yarn or cloud.
Features of Apache Flink4:11
Discover Apache Flink's high-performance, low-latency features: exactly-once stateful computation, flexible time and session windows, fault-tolerant distributed snapshots, and memory-efficient stream and batch data processing.
Distributed Execution1:36
Coordinate resources through the job client, job manager, and task manager. Submit the job via the job client; the job manager allocates resources and orchestrates tasks.
Job Manager6:34
Explore how Flink's job managers coordinate task execution, scheduling, and fault-tolerant checkpoints with the Akka actor system enabling leader election and communication with task managers.
Task Manager1:04
Explore how task managers serve as worker nodes that execute tasks in the JVM, allocate memory per task slot, balance parallelism across slots, and share TCP connections and heartbeat messages.
Job Client1:35
Accept the user program, transform it into a data flow, submit it to job manager, and return results after execution, as Flink partitions operators and streams for parallel, distributed processing.
Download JDK1:54
download and install the Java Development Kit (JDK) from Oracle, accept the license, and install JDK 10.0.2 on Windows x64, optionally changing the path and creating a new folder.
Set Path on Enviro. Variables1:28
Learn to set the path in environment variables by navigating from control panel to advanced system settings, add java home, and update the path accordingly.
Downloading Flink1:35
Download Flink binaries to get started on Windows or Linux, and choose the binary compatible with your Hadoop version.
Installation Of Flink2:16
Install Flink on Windows by extracting files, navigating to the flink bin directory, and running start-cluster.bat with administrator rights. Confirm the local instance and web UI at localhost:8081.
Download VM ware workstation player & ubuntu Iso image2:09
Learn how to download VMware Workstation Player and Ubuntu ISO image, including navigating to official sites and initiating downloads.
Installation of VMware workstation player1:26
Install VMware Workstation Player by double-clicking, accepting terms, choosing default settings, creating shortcuts in the Start Menu folder, and completing the installation.
Ubuntu Installation2:50
Install Ubuntu inside VMware Workstation Player by creating a new virtual machine, configuring a username and password, and finishing setup; set memory to 2048 mb.
Ubuntu Installation Part-20:40
Start the window, enter the password, and sign in; if not started, go to player, then manage, then virtual machine settings, and set memory to 2048.
Multiple Java Installation9:59
Install java on ubuntu: update package index, install the java runtime environment and jdk, add oracle java eight installer, configure default java with update-alternatives, set java_home in /etc/environment, and verify.
Installation of Flink on Ubuntu9:02
Learn to install Apache Flink on Ubuntu, prepare Java, download and extract Flink, set permissions, configure Flink home and environment, and start Flink in local mode with the dashboard.

Requirements

Basic knowledge of SQL

Description

Apache Flink is an open source, native analytic database for Apache Hadoop. It is shipped by vendors such as Cloudera, MapR, Oracle, and Amazon. The examples provided in this course have been developing using Cloudera Apache Flink. This course is intended for those who want to learn Apache Flink.

Apache Flink is used to process huge volumes of data at lightning-fast speed using traditional SQL knowledge.

To make the most of this course, you should have a good understanding of the basics of Hadoop and HDFS commands. It is also recommended to have a basic knowledge of SQL before going through this course.

Apache Flink is the next generation Big Data tool also known as 4G of Big Data.

It is the true stream processing framework (doesn’t cut stream into micro-batches).

Flink’s kernel (core) is a streaming runtime which also provides distributed processing, fault tolerance, etc.

Flink processes events at a consistently high speed with low latency.

It processes the data at lightning fast speed.

It is the large-scale data processing framework which can process data generated at very high velocity.

Flink is an alternative to MapReduce, it processes data more than 100 times faster than MapReduce. It is independent of Hadoop but it can use HDFS to read, write, store, process the data. Flink does not provide its own data storage system. It takes data from distributed storage.

Who this course is for:

Students, Programmers, Learners