What is Avro?

Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
A free video tutorial from Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
Best Selling Instructor, Kafka Guru, 9x AWS Certified
4.7 instructor rating • 37 courses • 574,934 students

Lecture description

Learn what Apache Avro is and how it came to be

Learn more from the full course

Apache Kafka Series - Confluent Schema Registry & REST Proxy

Kafka - Master Avro, the Confluent Schema Registry and Kafka REST Proxy. Build Avro Producers/Consumers, Evolve Schemas

04:23:56 of on-demand video • Updated October 2020

  • Write simple and complex Avro Schemas
  • Create, Write and Read Avro objects in Java
  • Write a Java Producer and Consumer leveraging Avro data and the Schema Registry
  • Learn about Schema Evolution
  • Perform Schema evolution using the command line and in Java
  • Utilize the REST Proxy using a REST Client
English [Auto] OK so welcome to the section on mastering Avro and we'll learn about all this key ma data types and real recommendations on Avro. But first you may be like what is Avro. What even is what is it. So let me just walk you through an evolution of data formats from the most basic although you Avro So we have the comma separated value or C as V and this year is very basic We have a set of columns and then we add a row. So for example column one would be John going to call in 325 and so on call in fives is a Boolean through. And then wrote two we have Mary Poppins. But look at this column three went from 25 to 60. So what is it. Is it in us or is it as the string. Who knows right. Then we go to call them three row three and row three is missing data. So we don't have the last two columns. What the heck we wanted them. We were expecting them. So those are most likely problems you've already had. We'll see as we see these advantages are it's easy to parse. It's easy to read it's easy to make sense of but it disadvantages are big the data types of elements has to be inferred that you need to guess what column is what. And it's not a guarantee anyone can do anything. Parsing becomes very tricky when you did a counting. The column names may or may not be there and so on. So you know commas may change. It's horrible. So now we have relational tables and so relational tables in databases basically the ADD types. So here is a create table statements for a database and just create the table distributor's by the way DD will be an integer and name will be in our chart. And the database will use any data that does not comply with this type. OK. So that's is very important because here we're just defined types and the database will refuse anything that doesn't comply on top of this. We have named columns. OK there is no such concept as an order. Every column is a name and that's how we call it. So there's free type it's amazing data fits in a table thats really cool but disadvantages of this is that number one your data has to be flat. OK. Row columns. Number two the data base the data is stored in a database and the definition will be different for each database. Ok so well that will be extremely hard to access the data across databases and across languages you need to have a driver for each specific database. So that becomes a bit tricky its not something that you share with others in that format. Then we have Jason and Jason is short for javascript object notation and just one format is awesome because it can be shared across the network as much as you want. So heres an example of adjacent objects so we have an idea type in name and image thumbnail and you see there is like some nested values and stuff like an image as a u r l and and with the new heights in a thumbnail also has a l a width and height. So its pretty cool right because its all text based and it can be nested data and stuff. So we like is pretty good right. So just has advantages. Number one is a date I can take any form you want could be an array could be nested element could be whatever you are. Jason is widely accepted on the web. I mean every single language has elaborate you parsed Jason so that's awesome. And then he can easily be shared over a network. It's just a text strings. It's easy right. But there are some inconvenience and so some of them are that the data has no schema being enforced. You could easily turn a string into integer Jason will just accept anything. OK. And then finally just an object can be really big in size because of the repeated keys. OK. Before we had me show you we had you. I'll repeat it twice and repeated twice that height repeated twice and so on damages a lot of space you a lot of images or LOT of thumbnails. OK. So for all these Ventris advantages there is one answer. It's Avro and Avro is defined by a schema in the schema itself is written in Jason. So to get started you can view Avro has Jason with a schema attached to it. So this is what an average cuma will look like. And we are going over the course to fully understand what that schema represents what it means. Don't worry about it too much. OK. But Avro just you remember it has a schema and then it has a payload. And so what are the advantages. Well the data is fully typed. OK before we define that our user name was a string and that our age was an integer. So data is fully typed and it's named as well. You can compress it automatically. So by the way if the column name is very very long it doesn't matter. It will be compressed. So less UAP usage this Kema comes alongside with the data. So there is no data out there just lonely. There's always having its Kema nearby. And that means the data itself is self explanatory. You can embed the documentation in the schema so you can document your schema so that if anyone were to use your data and it takes your schema from your data it will know exactly what you did or represents the data can be read across any language or any language it's just a binary protocol. Compatibility of language can differ but usually it's pretty good especially for Java. And then your schema can evolve over time. In a safe manner we can add rows we can add columns we can add elements and fields and types. Use your schema can evolve based on some rules but you can make it change over time because your damage change over time. Your schema may evolve with it. If you discern though some languages may have some trouble to you know support Avro and then you can't really see or print the data without using the ever tools. OK and that's because it's compressed and serialized. So an error of a Jason document you just double click and there you go. You read it forever or you can't just double click and read it. You need to use some tools to read it. So before I go and do de-valued you of course. But what about pro-Tibet for thrift or parking or office. You are in a format that you like. Overall they are all pretty much doing the same thing which is to compress data and put it in some way. OK I won't get into some debates but that Cafcass level what we care about is one message being self explicit and fully describable. And because we're dealing with streaming OK so there's no RC No Parking no coloniser based format Arroyo's really good support already from Hadoop technologies like hive and others. So it is a pretty good candidate that ever is a really good candidate had doop ecosystem but also has been chosen as the only supported format for this conference registry so far. So there is no choice it will just go along with that and it's fine. It's been working for ages. Okay and then finally don't go and be like. But what about the 4 months of Roe vs Pirtle buffer. Oh my God. Unless you start doing one million messages per seconds you're fine. OK I've done programs using Avro that have been reaching insane volumes without even worrying once about performance. Performance is great with Avro. Don't worry about it. OK. Hope you just are liking the format. Understanding it and just going along with it. OK your optimisation face you're in a development phase. All right so that was it for Richard Avro. I promise we're going to get deep into ever next. But thanks for watching all the way. See you in the next lecture.