Running your first Kafka Streams Application: WordCount

A free video tutorial from Stephane Maarek | AWS Certified Cloud Practitioner,Solutions Architect,Developer
Best Selling Instructor, 10x AWS Certified, Kafka Guru
Rating: 4.7 out of 5Instructor rating
65 courses
2,405,454 students
Running your first Kafka Streams Application: WordCount

Lecture description

Full end to end run of your first Kafka Streams application.

We will download Kafka, start our own cluster, and run producers, consumers, and our first Kafka Streams application

Learn more from the full course

Apache Kafka Series - Kafka Streams for Data Processing

Learn the Kafka Streams API with Hands-On Examples, Learn Exactly Once, Build and Deploy Apps with Java 8

04:49:11 of on-demand video • Updated February 2024

Write four Kafka Streams application in Java 8
Configure Kafka Streams to use Exactly Once Semantics
Scale Kafka Streams applications
Program with the High Level DSL of Kafka Streams
Build and package your application
Write tests for your Kafka Streams Topology
And so much more!
English [Auto]
Okay. So let's get into the heart of the subject. We're going to run our first Kafka Stream's application. So we're not going to code it. We're not going to analyze code right now. We're just going to see how to run it, see what it does. Understand how Kafka streams work at a high level, just as an introduction to ease into it. So first we're going to download the Kafka binaries because we're going to run Kafka in a different way before we run Kafka using Len Dupe. And in this course I want to share around Kafka using the binaries. It's very important you know how you both work, you know how you know which one you prefer, etc., etc. It really matters for you to understand the full extent of how these things work. So we're going to start Zookeeper in Kafka using these binaries, and it's going to be very easy. We're then going to create an output topics using the Kafka Topics command that you already know. Then we're going to publish data to the input topic. We're going to run the word count. Example It's a Kafka stream's application that's provided with Kafka as an example and word count does what it does, what it says. It's just counts words in sentences. So we're going to see this example and then we're going to stream the output topic using a Kafka console producer, a consumer. So a full rundown of everything you've seen except the word can example on launching Kafka using the binaries. But it's going to be very easy, very hands on already because that's that's the course is going to be the hands on and you're going to learn something already. So let's get going. If you go on Google and type Apache Kafka and take the first link, you'll be redirected to the Apache Kafka page. What you have to do here is go ahead and download Apache Kafka using the bottom left button. Once you get to download, it is extremely important you choose or above. Do not choose 0 to 11 .0.0 because otherwise this first part of the tutorial will not work in the future lectures. It's fine if you choose whichever, but for this part of the lecture, please download this one. So you go ahead and you have to. Binary downloads you can do. Scala 211. Scala 212. But the web page recommends to use it to 11. So I'll just click on the first link. What we'll do here is basically download the Kafka binaries. It's a different approach from what we've done in the other courses, but it's good for you to have an overview of how to use Kafka using the raw binaries directly. So here we go. We are in this page as you just take the first link and you should take the first link too. And what this will do is that it will download Apache Kafka. As you see, the download has started in the bottom left. This is a compressed archive, so really what you have to do is just decompress it using your favorite tool. Maybe you winrar, maybe some common line, whatever you feel like, it's pretty easy. So once Kafka is downloaded, you will extract it and you will end up in a directory. And that directory will contain six different things two files and four directories. So we have license notice, bin config lib and site docs bin is basically where the binaries are and config is where the configs are. Libs is where the jar dependencies are and inside docs is just documentation. So what we'll do now is we'll follow through on the code that you can download as a resource of this lecture and we'll follow through and we'll just basically launch our first Kafka Stream's application. We won't go into the code, but we'll look at what the output and the outcome of that Kafka stream application is. By the way, if you're a Windows user, you can use the course intro, Windows batch file and follow along. The commands are very, very, very similar and it's worth for you just to watch this video, just to apply the commands in this file instead of the other one. Okay. So first, what we have to do before starting Kafka is to start Zookeeper. So the command for this has been zookeeper server start config zookeeper properties. And the reason we do bin slash zookeeper server start is because that's where the Zookeeper script is located. It's in that binary folder. Finally, the config we're interested in to zookeeper is this file right here. I won't go into the details, but if you're interested about learning about Zookeeper, you can definitely check out my Kafka setup course to tell you more about Zookeeper. Anyway, the defaults are fine for now we press enter and as you can see we get a few log output right here. But at the end we say we see that it says binding to port 2181 and Zookeeper is launched. That's great. Once Zookeeper is launched, the next thing we have to do is to start Kafka fairly easy. We use another bin Kafka server, start command and then we join the server there. Properties for Kafka and these are default basic servers or properties. So I'll just go ahead and paste those and you can see that Kafka is started. There was some log on Zookeeper as well, but what we see is that on the bottom, right, it says Kafka server started and the Kafka version is zero at 11 zero one. Perfect. Make sure that it is that one or above not 0 to 11.00. Okay. So now we have Kafka and Zookeeper started. I'll just go in to other shells. Do not ever close these to OC because if you close them the shells, then you'll stop Kafka or your stop zookeeper. Okay. We definitely want them to be running all the time, so we keep them open. So in another two shells I'll go and run my other commands. The first thing we have to do is to create the input topic for our streams application. That's fairly easy. I will just copy the command and explain to you. We use the Kafka topics command again. It's in the bin directory. Minus minus create as we've seen before in the beginners class minus my zookeeper at localhost unit one. And for everyone it will be at localhost 1.1. The replication vector is one, the partitions are one just to make it very educative. And then finally the topic is named streams plaintext inputs. We press enter and we're greeted with the response that the topic was created. We will go ahead and also create the output topic. And it's very important in a Kafka stream's application that you do create the inputs and the output topics ahead of time. Otherwise the application will not do this for you. So now what we can do is verify that the topics were indeed created. So we do bin. Have got to pick sides. And then we do minus. Minus. Zookeeper. Local host. Two one it one minus minus list. And as we see, the two topics that are being created is the one we created before. Perfect. So far, so good. Now what we're doing is publish data to the input topic. For this, I will start as a console producer and it takes an agreement of broker lists and broker lists. Your broker is going to be at local post 9092. By the way, local host is the exact same as 120 7001. This is exactly the same thing and it will be the same for everyone again. Finally, we publish to a specific topic and the topic is streams plaintext inputs. When we're there, we see that there's a little arrow that means that some data is ready to be input. So we'll just go ahead and enter some data. So I'll just go ahead and enter some text Kafka streams with me as one sentence. Kafka streams, data processing the second sentence, and then Kafka streams course the third sentence. So let's go ahead. Kafka streams enter, then Kafka data processing enter and then Kafka streams course enter to break out of here. I'll just do control C and this is done. So far so good, right? So what we see already is that in that text because we'll do a word count, we have Kafka appearing three times, streams appearing to time, Udemy one time, data one time, processing one time and course one time. So we expect the count of Kafka to be three, the count of streams to be two, and so on, and the count of Udemy, GB one, etc. Let's just verify that some data was indeed published the topic. So for this we start a console consumer, the bootstrap server the same. The topic is the same, but we add the minus minus from beginning flag to tell that we want to fetch all the data in that topic we press enter and as you can see, the data was indeed retrieved. So that means that Kafka does hold our data to because the consumer is a long running process, you need to stop it and to stop it. You do control, see it says process it all three messages. So that's awesome, right? Our input topic has the data. Now we just need to go ahead and process it. But before we do so, I want to start a consumer on the output topic. So let me just copy the command and describe it to you. I'll copy it on the right hand side. So that command basically starts a consumer, but it's a little bit more tricky than what you had before. For example, that's command connects to the bootstrap server at localhost in engine and to the topic is streams word count outputs, which is the upper topic we plan on doing from beginning. We've seen this before, but then there is a formatter and that formatter will basically go ahead and give us the values of the keys and the values of the values in the console. Not for much to worry about, but this is important, that distinction, because we will see basically the keys and the values being printed on the log, which is what we want. So I go ahead and press enter and the console consumer has started, but obviously because there is no data being written to it yet to the output topic, we don't see anything. So now the magic moment is when we start the stream's application and for this we do start bin Kafka run class. That is such. And then the word count M.O., which is go ahead and copy that command. And I want you to pay attention to what's going to happen on the right when I press enter. Here we go. So as you can see, the the word count application is still running. But on the right, there was a lot of information that got printed out. So that's the outputs of our Kafka Stream's application. So let's go ahead and look at the outputs. The first thing is Kafka one. Then we have streams one and new the mean one. They basically correspond to this Kafka one, streams one, Udemy one. So the application goes and look at this one sentence. Then we have Kafka too. It's because then the Kafka streams application reads Kafka, and they already encountered Kafka one. So it says Kafka two. Then we get data which is one and processing which is one. Finally get Kafka. So three because we were at two before then we get streams, streams of isn't going to be four. So we have a two now. And of course one it's a classic word counts, but it's been in a streaming way, meaning that every single computation was pushed back into Kafka, which is really cool. So what I can do is now stop the Kafka Stream's application by doing Control C and stop my control consumer. So here we go. We just read nine messages in each message was an update in my streaming computation for my Kafka Streams application, which is really, really, really cool. So to summarize what we did right here, we basically created a topic, an input topic, create an output topic. Then we run the Kafka streams application and automatically in went ahead and created that word count for us in a streaming fashion. That was real time. That was awesome. Okay. It was not a batch. If it was a batch fashion, it would only have given us Kafka three, but never Kafka two or Kafka one. And that's the power of Kafka. Strange. Right here you can do streaming. Finally, I want to draw your attention on something. If you do the Kafka Topics Command again to get the list of the topics we see now that there are more topics. Obviously there's consumer upsets because we had a consumer, but there is streams word count, count, change, log and streams word counts, counts répartition. And you ask me, what are those? Well, these are internal Kafka streams. Topic and Kafka streams will go ahead and create those based on the computation you do, you don't manage them, but they contain basically a lot of good data which allows Kafka Stream to do its magic. Okay, so we'll understand everything in more detail than X sections. But here I really wanted to give you a little taster around how Kafka streams work, how it's being used, and what we can do with it. So hope you're excited. Hope you really like that word count exemple. And I'll see you soon.