What is Kafka Streams?

A free video tutorial from Stephane Maarek | AWS Certified Cloud Practitioner,Solutions Architect,Developer
Best Selling Instructor, 10x AWS Certified, Kafka Guru
56 courses
2,186,984 students
Lecture description
Learn what is Kafka Streams at a high level
Learn more from the full course
Apache Kafka Series - Kafka Streams for Data Processing
Learn the Kafka Streams API with Hands-On Examples, Learn Exactly Once, Build and Deploy Apps with Java 8
04:49:11 of on-demand video • Updated September 2023
Write four Kafka Streams application in Java 8
Configure Kafka Streams to use Exactly Once Semantics
Scale Kafka Streams applications
Program with the High Level DSL of Kafka Streams
Build and package your application
Write tests for your Kafka Streams Topology
And so much more!
English [Auto]
Hi and welcome to this Apache Kafka series. New course. This course is on Kafka streams. I'm glad you join me on this course and get ready. It's going to be lots of learning. So first, of course, introduction. We're going to get started with Kafka streams. We're going to see how to run a first application. We're going to see what is Kafka streams, understand how it fits in the ecosystem, etc., etc. So with this brief introduction, I really hope you can get a take away from it and understand what we're going to do for the rest of the course. So first question is what is Kafka streams? Kafka Streams is an easy data processing and transformation library within Kafka. It ships with the Kafka binary. It's within Kafka projects, so it's not an external library created by a third party. So here you have Kafka and you can create Kafka streams, applications of any kind. It could be to transform data, it could be to enrich data, to perform, for example, threat detection or monitoring and alerting. So there's lots of basically applications. The idea is that Kafka Streams is a library that you set on top of Kafka and that you creates your application on. So what is Kafka's dreams, really? It's a standard Java application. It's just a Java library, and you just launch it like any Java application, and we'll see this during the course. You don't need to create a cluster for Extreme's application like you would for SPARC or Flink or Nephi, and I'll have a lecture that goes over the difference. But the easy thing is that it's just a Java application, no clusters, it's highly scalable, it's elastic and fault tolerant because it inherits every specific benefit that Kafka provides because it's integrated with Kafka. And that makes it really, really awesome. It has exactly one capabilities and there is a section in this course about what exactly once means. But this is the first library in the world that provides streaming exactly once capabilities tighten with Kafka. And that's a huge thing in the streaming world. It processes record one at a time, so there is no batching. So this is true streaming some other libraries like spark streaming process things in batches and then it works for any application size. So even if you have a small project or a very, very large project, you write the same code, you get the same application and skills the same way. So it's really awesome. So let's look at the architecture design. Okay, so you've seen that slide. If you looked at my Kafka connect course, but let's get over this again. So you have a cluster and it has several brokers in this case four, but it can be from 1 to 100 or whatever you want. And you have your sources and usually the ways you onboard the sources in the perfect Kafka architecture design is that you have a connect cluster and if you don't know what a cluster is, I recommend you look at my connect course. You can find a link in the last lecture of this course. So you have your sources and you connect cluster basically on boards it onto Kafka and now your data is in Kafka and you want to process it. That's where you have your stream's application. So Kafka streams application basically sit on the right hand side and they do from Kafka to Kafka. And that's really cool because all the data processing, all the data transformation is tightly integrated with Kafka. Finally, you want to expose this, transform data to your source, to your syncs, for example, a database, Elasticsearch or whatever. Then you again use your connect cluster for this. And this is all described in my connect course. So in the connect course you saw the left hand side. And in this course we're really going to see the right hand side to do data transformation and processing using Kafka streams. So a bit of history about Kafka streams. This API was introduced as part of Kafka zero ten, which was sometimes in 2016 and has been fully mature as part of Kafka zero ten zero 11, which is June 2017. So this is a really new library. Again, the API can change and it will change, but what are you learning here is still very applicable in case of any changes. As I said before, it's the only library that can leverage the new exactly one capability from Kafka zero 11 and have a whole section on this. And then it is a serious contender to other streaming processing frameworks such as SPARC, Flink or NISI or any other streaming library. So really, really good to get to learn it. And I'm glad you are taking this journey with me. And then finally, as I said, it's a new library, so it's prone to changes. So don't be afraid. If things change in the future, what you need to learn is the ideas behind it. The API and all the changes will be somewhat minor, as in the future.