What is Oozie?

Loony Corn
A free video tutorial from Loony Corn
An ex-Google, Stanford and Flipkart team
4.2 instructor rating • 75 courses • 121,508 students

Lecture description

A very first principles discussion of why you would want to use Oozie.

Learn more from the full course

From 0 to 1: The Oozie Orchestration Framework

A first-principles guide to working with Workflows, Coordinators and Bundles in Oozie

04:01:46 of on-demand video • Updated February 2018

  • Install and set up Oozie
  • Configure Workflows to run jobs on Hadoop
  • Configure time-triggered and data-triggered Workflows
  • Configure data pipelines using Bundles
English [Auto] If you're taking this class you're probably working in the Hadoop ecosystem. You've probably written a couple of my producers you may have querido or data using. You may have used big. Now you have a whole bunch of ways to process the data that you have in hand. You have a whole bunch of tasks and you're finding it hard to manage all of them when somebody you're in such as Uzzi. The first thing you think of is that Uzzi is a cute name but what does that do. How will it help me manage all these dependent processes. And given that I have all these other cool tools at my disposal vendors Uzzi fitted you've come to the right class. This class will look at Uzzi from the very first principles before they dive into the nitty gritty of HUD. Let's take a slight detour and consider the factory which manufactures these forecasts. So you have the supercool cars that they need Beanie's and there is a factory which manufactures these. Now these are not a single step manufacturing process. There's a whole bunch of things that go on producing these. This factory needs to manufacture the tires. Rambos the Balkanisation and so on. And then manufactures the hubcaps hubcaps have meat especially in cool cars. Now the company then needs to procure tools these don't manufacture themselves. A company needs tools. The company needs Walkmen who use those tools. All of these Mattioli together to get of the now to be completely honest with you. There's a movie bit of simplification in this entire set up that I just mentioned. This process is slightly simplified but you get the idea there are many of us that go and do even the simplest operations the test Meeking of the every end is accomplished by completing a series of tasks. Some of these tasks may be done see the meaning one after the other. Some of these tasks are independent of one another and they can be done in badly. You can start the process off at the. Some of these tasks in the manufacture of beads may be dependent on other stuff you need that book before you can Balkanize it. Other tasks may be completely independent. Making the buyer the travel portion and the hub cap are independent. So you have Sealion are badly dependent on independents. What does this resemble. That's right. If you've done any computer science this should be very familiar to you. This is a graph that are nodes and edges which meet up this graph every node in this graph represent a graph that you have to perform to achieve your ultimate him whether it's building a fire or doing anything its nodes which are represented in this way are feeling ignored and they are dependent on one another the blue node is dependent on that the node complete before it can execute. The red node requires that both the node and the blue node be complete before it can execute its own task. RCD dependent. You can also have nodes which can execute in battle and are independent of each other that as the example you see on screen. This is a bit like if they click graph there is a direction to be just and it is a click because it doesn't have any cycle's such a graph is going over look feel and this is how Uzzi comes into the picture. A more formal definition of a book you would be a set of actions and the order and conditions under which those actions should be performed. That is a book. What specifies a whole bunch of actions and the order and conditions under which those actions can be performed. What are those set of actions in the workflow. It could be a mass produced job. It could be a higher quality. It could be shellscript using Python or some other scripting language. It could be a big Gwennie or a simple job program. All of these are actions which can be part of an Uzi book. Be spoken about so far. But if you're manufacturing cars getting buyers alone is not enough buyers are just one part of a garage. In addition to tires you need steering wheels you need the wheel you need the engine of the car. You need to view mirrors. You need Vinci to see this and many many other things which are not present in the diagram. You need to be able to find the car and then you need to put them all together fit the engine in the car and the other components as well. Each of these actions that I mentioned right now are individual book floors by themselves. There are many subtask which go into each of these individual walk Fewkes. For example in being a car requires procuring in getting a bookmen and getting everything set up so the job can be done each of these stuff can be considered to be detected it's cyclic graph by itself. So there are many bags directed the graph that you can see in this picture. Now it's totally possible that each of these individual workflows you can manually process and execute you look at each of these manually and have no automation built in to launch these processes. As you can imagine this is pretty onerous it's much easier to have some kind of controlling mechanism for individual flows which ensures that all the conditions are ready. The environment is set up for up task or a workflow to execute. You might want workflows to run at a certain time and frequency the first shift to fit engines with the new cars start at 8:00 a.m. in the morning every day. This could be a roll of your assembly line. However you have to ensure that the input to the work flow is available before you kickstarted. For example if you were indeed treating engines within cars there are a whole bunch of things which have to be ready before you do stuff like you'll have to make sure that the workman is available. The engine is built and has been delivered. The our body is complete and now fixing the engine can begin. So a whole bunch of things have to happen before this workflow can be executed. This needs a coordinate. You can think of a quarter need some kind of controlling mechanism which ensures that the work flow is executed at the specified time or a specified frequency. You want something to be done every day. You have a coordinator for that. You want something to be done every 20 minutes. You have a coordinator for that. The coordinator also checks for one more thing the availability of input data input data is not available then the workflow is delayed the input data is present. If you have a book Flow which does not depend on any input data then it's your immediate trigger. It only runs at a specified time with the specified frequency. Remember the coordinator does two things both of which are important. It kickstarts the workflow at a certain time with a certain frequency. It also beats the input data. That book is available if the input data is not available. The coordinator is responsible for being the start of the workflow the data is present each of these workflows which you build one piece of a card has its own coordinator and that coordinator is responsible for managing that book. All of these work flows come together to build a good building or God is made up of many individual fuse each would flow runs like its own pain at its own frequency with its own input. But they all come together to give you a god. So if you can think of is built with the collection of coordinate or jobs that each coordinator ex-gang of one book Through the timing of each of these Kornelius can be different. The deeper that each of the work flows require can be different. For example building a veto will require different input than the building of Enschede. So each of these workflows have their own coordinator's which manage their frequency and their input a collection of coordinator jobs which work together which can be started stopped and modified together. It's called abundance. This is the Uzzi for a collection of quarta meters. If you think about all these complicated assets and dependencies which come together to build a car. It's pretty obvious that the output of one coordinator job managing work flow can be the input to another coordinator job. So these coordinator jobs representing book flows are chained together the output of one is the input to another and then you set up the processing in this way. This is a data plan a get a pipeline in both transforming data in phases. The final output is a thing VOP flows coordinator's bundles. These are the basic building blocks of Uzzi. And this is how Uzzi lets you manage complex systems with dependencies. This is what Uzzi is all about. And if you want to get a formal definition you can think of Uzzi as an orchestration system for how do jobs every job within the book floor every action then the work flow is a Hadoop job and Uzzi or the streets. All of these using both flows coordinate earth and abundance.