Running your first Kafka Streams Application: WordCount

Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
A free video tutorial from Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
Best Selling Instructor, Kafka Guru, 9x AWS Certified
4.7 instructor rating • 36 courses • 536,116 students

Lecture description

Full end to end run of your first Kafka Streams application.

We will download Kafka, start our own cluster, and run producers, consumers, and our first Kafka Streams application

Learn more from the full course

Apache Kafka Series - Kafka Streams for Data Processing

Learn the Kafka Streams API with Hands-On Examples, Learn Exactly Once, Build and Deploy Apps with Java 8

04:48:46 of on-demand video • Updated September 2020

  • Write four Kafka Streams application in Java 8
  • Configure Kafka Streams to use Exactly Once Semantics
  • Scale Kafka Streams applications
  • Program with the High Level DSL of Kafka Streams
  • Build and package your application
  • Write tests for your Kafka Streams Topology
  • And so much more!
English [Auto] OK so let's get into the heart of the subject. We're going to run our first calf guestrooms application so we're not going to code it. We're not going to analyze code right now which is going to show how to run it. See what it does understand how Cafcass streams work at a higher level. Just as an introduction. Ease into it. So first we're going to delve into Kafka binaries because we're going to run Kafka in a different way before we run cafe using Lendu. And in this course I want to run Kafka using the Minories. It's very important you know how you both work you know how do you know which one you prefer etc. etc. it really matters for you to understand the full extent of how these things work. So we're going to start a zookeeper in Casco using these Minories and it's going to be very easy. We're going to create an output topics using the topics come in that you already know. Then we you publish data to the input topic we're going to run the word count example. It's a Cafcass trimmed application that's provided with Caprica as an example and word count does what it does what it says It just counts words in sentences. So we're going to see this example and then we're going to stream the output topic using a cascade cancel producer consumer. OK. So a full rundown of everything you've seen except the word count example and you know launching Kavkaz using the binary. But it's going to be very easy very hands on already because that's just of course it's going to be hands on and you're going to learn something already. So let's get going. If you go to Google and type but you go and take the first link. You'll be redirected to the Apache Gurka page where you have to do here is go ahead and download Apache Kavkaz using the bottom left button. Once you get to download it is extremely important. You choose 0 11. 0 1 or above. OK. Do not show 0 to 11.00 because otherwise this first part of the drill will not work in the future lecture's it's fine if you choose whichever But for this part of the lecture Please download this one. OK. So you go ahead and you have two binary downloads you can do skeletally 11 to 12. But the web page recommends that you use Kela to 11. So I just click on the first link what we'll do here is basically download the binary is this is a different approach from what we've done in the other courses but it's good for you to have an overview of how to use Kafka and using the raw binary directly. So here we go we are in this page. I just take the first link and you should take that first link to. And what this will do is that it will download Pejic Afghans use you to download has started in the bottom left. This is a compressed archive. So really what you have to do is just decompress it using your favorite tool maybe rar or maybe some come in line whatever you feel like is pretty easy. So once Kafka is downloaded you will extract it and you will end up in a directory and that directory will contain six different things two files and four directories OK. So we have license notice been config lib and site docs been is basically where the binary is are and config is where the things are live is where the dependencies are. And inside docs it's just documentation. So what we'll do now is we'll follow through on the code they can download as a resource of this lecture. OK. And we'll follow through and we'll just basically launch our first Cafcass tearooms application. OK we won't go into the code but we'll look at what the outputs and the outcome of that Cafcass stream application is. By the way if you're a Windows user you can use of course intro windows that bat file and follow along the commands are very very very similar and it's worth for you to watch this video. Just apply the commands in this file instead of the other one. OK. So first what we have to do before starting Kafka is to start zookeeper. So the command for this has been zookeeper's service starts config zookeeper properties. And the reason we do been slash zookeeper's of restart is because that's where the zookeeper's script is located. It's in that binary folder. Finally the country we're interested in did you want to keep her. Is this file right here. I will give you the details but if you're interested about learning about zookeeper you can definitely check out my kaffiyeh set of course to tell you more about zookeeper. Anyway the defaults are fine for now. We press enter and as you can see we get a few LUGG output right here. But at the end we say we see that it says buying two ports to win it 1 and zookeeper's launched. That's great. Once zookeeper's launched the next thing we have to do is to start Casco fairly easy. We use another been Cacace server start command and then we join the server properties for kaffiyeh and these are default basic servers or properties. So just go ahead and paste those. And you can see that can't get it started. There was some luck on zookeeper as well but what we see is that on the bottom right it says Cafcass service started and the Kafka version 0 11 does 0 1. Make sure it is 1 or above ok not 0 to 11.0 0. OK so now we have Kefaya and zookeeper's started. I'll just go into other shells. Do not ever close these two. OK. Because if you close them the shells then you'll stop gaf how your Stobbs a keeper. OK we definitely want them to be running all the time so we keep them open. So another two shells. I'll go and run my other commands. The first thing we have to do is to create the input topic for our stream's application that's fairly easy I'll just copy the command and explain it to you. We use the character comment. Again it's in the bin directory minus minus create as we've seen before in the beginners class management zookeeper at localhost it one and for everyone it will be local has to win it one. The replication factor is one the purchases are one just make it very educative and then finally the topic is named stream's plaintext inputs. You press enter and were greeted with the response that the topic was created. We will go ahead and also create the output topic and it's very important Cafcass streams obligation that you do creates the inputs and the output updates ahead of time. OK. Otherwise the application will not do this for you. So now all we can do is verify you know that the topics were indeed created. So we do been Kafka topics picks SH. And then we do minus man as zookeeper localhost you want it one minus minus list. And as we see the two topics are being created is the one we created before Perfect so far so good. Now what we're doing is published data to the input topic for this. I will start with cafi producer and it takes an argument of broker lists and broker lists. Your broker is going to be at local host knows your 9:2 by the way local host is the exact same as 1:27 0 0 1. This is exactly the same thing and you will be the same for everyone again. Finally we publish to a specific topic and the topic is stream's text inputs. When we're there we see that there's a little arrow that means that some there has to be input. So we'll just go ahead and enter some data. So I'll just go ahead and enter some text Cafcass trims you to me as one sentence streams that are processing. The second sentence and then Cafcass streams course the third sentence. So let's go ahead Cafcass trains you Demy enter then Kafka data processing center and then Kafka streams of course enter to break out of here. I'll just do Control-C and this is done. So far so good right. So what we see already is that in that text because we'll do our counts we have Cafcass appearing three times streams appearing to time you Damie one time data one time processing one time in course one time. So we expect the count of Cafcass would be three to count upstreams to be two and so on account of it to be 1 et cetera. Let's just verify that some data was indeed published that topic. So for this we start a console consumer the bootstraps are the same. The topic is the same but we add the minus minus from beginning flag to tell that we want to fetch all the data in a topic you press enter. And as you can see the data was indeed retrieved. So there is a catch. God does hold our data up too because a consumer is a long running process. You need to stop it and to stop it you do Control-C it says process it all three messages. So it's awesome right. Our impot topic has the data. Now we just need to go ahead and processes. But before we do so I want to start a consumer on the topic. So let me just copy the command and describe it to you. A computer on the right hand side. So that command basically starts a consumer but it's a little bit more tricky than what you had before. For example that command connectedly Trapster at local host nation and to the topic is stream's word count outputs which is the upper topic. We plan on doing from beginning which is before but then there is a formatter and that formatter will basically go ahead and give us the values of the keys and the values of the values in the console not for you to worry about. But it is important that distinction because we will see basically the keys and the values being printed on the log which is what we want. So I go ahead and press enter and the console consumer has started. But obviously because there is no data being read into it yet to the output Dubnyk we don't see anything. So now the magic moment is when we start the stream's application. And for this we'd use Been Katharyn cluster or such and then the word count that all would just go ahead and copy that command and I want you to pay attention to what's going on on the right when I press Enter Here we go. So as you can see the word count application is still running. But on the right there was a lot of information that got printed out. So that's the outputs of our Carriker streams application. So let's go ahead and look at the outputs. The first thing is calfskin one then we have trainmen's one and you don't need one. They basically correspond to this KAFF got one stream's one you did me one. So the application goes in look at this one sentence. Then we have can't get to it because then the evolution's application reads Kafka and we already have it once so it says Kafka to then we get data which is one and processing which is one. Finally get Kafka. So three because we were up to before and we get streams streams I wasn't going to before so you have to know. And course one is a classic word JONES But it's been in a streaming way mean that every single computation was pushed back into Kafka which is really cool. So what I can do is now set up the Cafcass streams application by doing Control-C and stop my consequence. So here we go. We just read nine messages in each message was an update in my streaming computation for my cat because tearooms application which is really really really cool. So to summarize what we did right here we basically created a topic and impot topic credit Abbo topic. Then we ran the kephas streams application and automatically went ahead and created that word count for us it is trimming fashion. That was real time. That was awesome. OK. It was not a batch. It was a batch fashion it would only have given us character 3 but never get get to a Cat 1. And that's a perfect stranger here you can do streaming. Finally I want to draw your attention on something if you do the get get up command again to get the list of the topics we see now that there are more topics. Obviously there's consumer offsets because we get a consumer but there's streams that count count change changelog and streams or accounts counts repartitioned and you ask me what are those. Well these are internal Cafcass streams topic and Cafcass streams will go ahead and create those based on the competition you do you don't manage them but they contain basically a lot of good data out which allows Cafcass shtoom to do its magic. OK. So we'll understand everything in more detail the next sections here. I really wanted to give you a little taster around how Cafcass streams work how it's being used and what we can do with it. So hope you're excited. Hope you're like that's a word counterexample and I'll see you soon.