
Trace the history of Apache Flink from the Stratosphere project to incubator project, showing evolution from a Java API to a platform for batch, stream, graph processing, and machine learning.
Discover Apache Flink's high-performance, low-latency features: exactly-once stateful computation, flexible time and session windows, fault-tolerant distributed snapshots, and memory-efficient stream and batch data processing.
Explore how Flink's job managers coordinate task execution, scheduling, and fault-tolerant checkpoints with the Akka actor system enabling leader election and communication with task managers.
Explore how task managers serve as worker nodes that execute tasks in the JVM, allocate memory per task slot, balance parallelism across slots, and share TCP connections and heartbeat messages.
Apache Flink is an open source, native analytic database for Apache Hadoop. It is shipped by vendors such as Cloudera, MapR, Oracle, and Amazon. The examples provided in this course have been developing using Cloudera Apache Flink. This course is intended for those who want to learn Apache Flink.
Apache Flink is used to process huge volumes of data at lightning-fast speed using traditional SQL knowledge.
To make the most of this course, you should have a good understanding of the basics of Hadoop and HDFS commands. It is also recommended to have a basic knowledge of SQL before going through this course.
Apache Flink is the next generation Big Data tool also known as 4G of Big Data.
It is the true stream processing framework (doesn’t cut stream into micro-batches).
Flink’s kernel (core) is a streaming runtime which also provides distributed processing, fault tolerance, etc.
Flink processes events at a consistently high speed with low latency.
It processes the data at lightning fast speed.
It is the large-scale data processing framework which can process data generated at very high velocity.
Flink is an alternative to MapReduce, it processes data more than 100 times faster than MapReduce. It is independent of Hadoop but it can use HDFS to read, write, store, process the data. Flink does not provide its own data storage system. It takes data from distributed storage.