Running your first Kafka Streams Application: WordCount

Stephane Maarek | AWS Certified Cloud Practitioner,Solutions Architect,Developer
A free video tutorial from Stephane Maarek | AWS Certified Cloud Practitioner,Solutions Architect,Developer
Best Selling Instructor, Kafka Guru, 9x AWS Certified
4.7 instructor rating • 41 courses • 922,316 students

Lecture description

Full end to end run of your first Kafka Streams application.

We will download Kafka, start our own cluster, and run producers, consumers, and our first Kafka Streams application

Learn more from the full course

Apache Kafka Series - Kafka Streams for Data Processing

Learn the Kafka Streams API with Hands-On Examples, Learn Exactly Once, Build and Deploy Apps with Java 8

04:48:46 of on-demand video • Updated July 2021

  • Write four Kafka Streams application in Java 8
  • Configure Kafka Streams to use Exactly Once Semantics
  • Scale Kafka Streams applications
  • Program with the High Level DSL of Kafka Streams
  • Build and package your application
  • Write tests for your Kafka Streams Topology
  • And so much more!
English [Auto] OK, so let's get into the heart of the subject. We're going to run our first KAF costumes application, so we're not going to Kodet, we're not going to analyze code right now, which is going to show you how to run it, see what it does. Understand how Cafcass streams work at a high level just as an introduction, OK, to ease into it. So first, we're going to download a Kafka binaries because we're going to run Kafka in a different way before we run Kafe using Lendu. And in this course, I want to show you how to run Kafka using the binaries. It's very important, you know how you both work, you know, how do you know which one you prefer, etc., etc.. It really matters for you to understand the full extent of how these things work. So we're going to start Zookeeper and Kafka using these binaries and it's going to be very easy where they're going to create and outputs the topics using the Kafka topics come in that you already know, then we won't publish data to the input topic. We're going to run the word count example. It's a Kafka streams application that's provided with Kafka as an example. And word count does what it does, what it says, it's just counts words and sentences. So we're going to see this example and then we're going to stream the output topic using a Kafka Council producer consumer. OK, so a full rundown of everything you've seen except the word count example in launching Kafka, using the binaries. But it's going to be very easy, very hands on already because that's just of course, it's going to be hands on and you're going to learn something already. So let's get going. If you're going to Google and type Pachi Kafka and take the first link, you'll be redirected to the Apache Gocha page where you have to do here is go ahead and download Apache Kafka using the bottom left button. Once you get you download, it is extremely important you choose zero 11, zero one or above. OK, do not choose zero to 112000 because otherwise this first part of the tutorial will not work in the future lectures. It's fine if you choose whichever, but for this part of the lecture, please download this one. OK, so you go ahead and you have two binary downloads. You can do Scarlet's 11 to 12, but the webpage recommends to use Kaleta 11. So I'll just click on the first link. What we'll do here is basically download the Kafka binaries. This is a different approach from what we've done in the other courses, but it's good for you to have an overview of how to use Kafka using the raw binaries directly. So here we go. We are in this page as you take the first link and you should take the first link to. And what this will do is that they will download Apache Cafcass. You see, the download has started in the bottom left. This is a compressed archive. So really what you have to do is just decompress it using your favorite tool, maybe you Indras, maybe some come in line, whatever you feel like is pretty easy. So once Kafka is downloaded, you will extract it and you will end up in a directory. And that directory will contain six different things to files and for directories. OK, so we have license notice bin config Leyb and site docs. Bin is basically where the binaries are and config is where the contents are lived is where the dependencies are and inside the box is just documentation. So what we'll do now is we'll follow through on the code they can download as a resource of this lecture. OK, and we'll follow through and we'll just basically launch our first Cafcass Tromsø application. OK, we won't go into the code, but we'll look at what the output and the outcome of that Kapstream application is. By the way, if you're a Windows user, you can use the course into a Windows Dogbert file and follow along. The commands are very, very, very similar. And it's worth for you to watch this video just apply the commands in this file instead of the other one. OK, so first, what we have to do before starting Katka is to start zookeeper. So the command for this has been zookeeper's zookeeper server start config zookeeper properties. And the reason we do Behn zookeeper's of restart is because that's where the zookeeper script is located. It's in that binary folder. Finally, the conflict we're interested in to launch Zookeeper is this file right here? I won't go into the details, but if you're interested about learning about the zookeeper, you can definitely check out my Cafcass set, of course, to tell you more about Zookeeper. Anyway, the defaults are fine for now. We press enter and as you can see, we get a few lugg output right here. But at the end we say we see that it says Binta parts to one at one end zookeeper has launched. That's great. Once the zookeeper is launched, the next things we have to do is to start Jeffco fairly easy. We use another bean carcass of a start command and then we join the Servitor properties for COFCO. And these are default basic service properties. So just go ahead and post those and you can see that Cafcass is started. There was some log on zookeeper as well. But what we see is that on the bottom right, it says Cafcass service started and the calf conversion zero that 11 zero one perfect. Make sure it is that one or above. OK, not 011 Rosero zero. OK, so now we have cafÈ and Zookeeper started. I'll just go in to other, um, Shel's. Do not ever close these two, OK, because if you close them the shells then you'll stop Kapalua stop zookeeper. OK, we definitely want them to be running all the time so we keep them open. So another two shells, I'll go and run my other commands. The first thing we have to do is to create the input topic for our streams application. That's fairly easy. I will just copy that command and explain to you we use the configured topics come in again. It's in the bin directory minus minus create as we've seen before in the beginners class, minus many zookeeper at localhost. You win at one end for everyone you will be at local has to win at one. The replication vector is one, the partitions are one just to make it very objective. And then finally, the topic is named Streams, plaintext inputs, press enter and we're greeted with the response that the topic was created. We will go ahead and also create the output topic. And it's very important in a classroom streams application that you do create the inputs and the output topics ahead of time, OK, otherwise the application will not do this for you. So now what we can do is verify, you know, that the topics were indeed created. So we dubbin. Topics that S.H. and then we do minus minus zookeeper localhost. Do you want it one minus, minus lists and as we see, the two topics that are being created is the one we created before perfect. So far, so good. Now, what we're doing is published data to the input topic. For this, I will start with Kathy Kunzel producer, and it takes an argument of broker lists and broker lists, your broker is going to be at localhost nine zero nine two. By the way, localhost is the exact same as one twenty seven zero zero one. This is exactly the same thing and it will be the same for everyone again. Finally, we publish to a specific topic and the topic is streams, plaintext inputs. When we're there, we see that there's a little arrow. That means that some data is ready to be input. So we'll just go ahead and enter some data. So I'll just go ahead and enter some text. Kavkaz Tremosa to me as one sentence, Cafcass streams data processing the second sentence and then Cafcass streams course the third sentence. So let's go ahead Cafcass streams readme enter then Kafka Data Processing Center and then Kafka streams of course enter to break out of here. I'll just do control see. And this is done. So far so good. Right. So what we see already is that in that text because we'll do our counts, we have Kafka appearing three times streams appearing to time udemy one time data one time processing one time in course one time. So we expect the count of Cafcass would be three to count of streams, to be two and so on and accounts of each of one etc.. Let's just verify that some data was indeed published a topic. So for this we start a console consumer. The bootstraps are the same, the topic is the same, but we add the minus minus from beginning flag to tell that we want to fetch all the data in the topic we press enter. And as you can see, the data was indeed retrieved. So that means that Kafka does hold our data up to because a consumer is a long running process. You need to stop it and to stop it. You do control see, it says process it all three messages. So that's awesome. Right. Our input topic has the data. Now we just need to go ahead and process it. But before we do so, I want to start a consumer on the output topic. So let me just copy the command and describe it to you. I'll copy it on the right hand side so that command basically starts the consumer. But it's a little bit more tricky than what you had before. For example, that command connected dewdrops over at local hospitals or on to the topic is streams, word count outputs, which is the output topic we plan on doing from beginning. We've seen this before, but then there's a formatter and that formatter will basically go ahead and give us the values of the keys and values of the values in the console. Not for Machree to worry about, but it is important, that distinction, because we will see basically the keys and the values being printed on the log, which is what we want. So I go ahead and press enter and the console consumer has started, but obviously because there is no data written to it yet to the output topic, we don't see anything. So now the magic moment is when we start the streams application. And for this we just been kept current class of research. And then the word cantonal, which is go ahead and copy the comment and I want you to pay attention to what's going to happen on the right. When I enter. Here we go. So, as you can see, there's the word count application is still running, but on the right there was a lot of information they got printed out. So that's the outputs of our Cafcass Dreams application. So let's go ahead and look at the outputs. The first thing is Kafka one. Then we have Streambeds one and nudnik one. They basically correspond to this Kafka one streams one udemy one. So the application goes and look at this one sentence. Then we have Kafka to it's because then the cephalosporins application reads Kafka and already a kind of Kafka one. So it says Kafka too. Then we get data which is one and processing which is one final we get Kafka. So three because we were at two before then. We get streams, streams I wasn't going to before. So we have to now. And of course, one is a classic word, John's, but it's been in a streaming way, meaning that every single computation was pushed back into Kafka, which is really cool. So what I can do is now stop the Kafka strange application by having control C and stop my control consumer. So here we go. We just read nine messages in each message was an updates in my streaming computation from my cephalosporins application, which is really, really, really cool. So to summarize what we do right here, we basically created a topic and input topic, created an output topic. Then we ran the streams application and automatically in went ahead and created that word count for us in a streaming fashion. That was real time. That was awesome. OK, it was not a batch. If it was a batch fashion, it would only have given us KFKA three, but never caricature cafu one. And that's the power of cephalosporins right here. You can do streaming. Finally, I want to draw your attention on something. If you do the cafè topic, comment again to get the list of the topics we see now that there are more topics. Obviously there's consumer offsets because we the consumer, but there's streams where account counts, changelog and streams, word counts, counts, repartition. And you ask me, what are those? Well, these are internal Cafcass streams topic and Cafcass streams. We'll go ahead and create those based on the computation you do. You don't manage them, but they contain basically a lot of good data which allows Cafcass stream to do its magic. OK, so we'll understand everything in more detail the next sections. But here I really wanted to give you a little taster around how Cafcass streams work, how it's being used and what we can do with it. So I hope you're excited. I hope you really like that's a word can example. And I'll see you soon.