From 0 to 1: The Oozie Orchestration Framework

Name: From 0 to 1: The Oozie Orchestration Framework
Rating: 4.5 (596 reviews)

A first-principles guide to working with Workflows, Coordinators and Bundles in Oozie

Created byLoony Corn

Last updated 2/2018

English

What you'll learn

Install and set up Oozie
Configure Workflows to run jobs on Hadoop
Configure time-triggered and data-triggered Workflows
Configure data pipelines using Bundles

Course content

8 sections • 24 lectures • 4h 1m total length

You, This Course and Us1:38

Running MapReduce on the command line4:40
Run a simple MapReduce job using the command line. If you're comfortable running MR jobs you can simply skip this!
The attached zip files has a lot of MR examples, we just run the simplest one.
The lifecycle of a Workflow6:12
Workflows are basic Oozie building blocks, a brief introduction to how Workflows work
Running our first Oozie Workflow MapReduce application11:15
It's real when you can run stuff! Running our very first MapReduce Workflow on Oozie.
The job.properties file8:45
The properties specified to configure a Workflow.
The workflow.xml file12:06
The actual code (well it's XML, but that is code as far as Oozie is concerned)
A Shell action Workflow7:46
Control nodes, Action nodes and Global configurations within Workflows9:57
Workflows have advanced control structures to determine which action to execute and ways to specify global configuration for all actions.

Running our first Coordinator application12:27
Coordinators manage workflows and run them at a specified time, and frequency provided the input data is available.
A time-triggered Coordinator definition8:52
A time-triggered Coordinator is very similar to a Unix cron job
Coordinator control mechanisms7:09
Oozie allows pretty fine-grained control over the running of Workflows, you can specify timeouts, throttling, concurrency and the execution order of Workflows materialized by the same Coordinator.
Data availability triggers10:03
Workflow actions might depend on input data. Coordinators can be configured such that workflows are not launched till the right data is available for them. Such triggers are called data availability triggers.
Running a Coordinator which waits for input data6:11
A running example of a Coordinator which launches multiple Workflows, some of which have input data available and others which do not.
Coordinator configuration to use data triggers15:25
Configuring data input triggers is slightly complicated. We have to make sure that we specify the right data instances that the Workflow is interested in.

Hadoop Install Modes8:32
Hadoop has 3 different install modes - Standalone, Pseudo-distributed and Fully Distributed. Get an overview of when to use each
Hadoop Install Step 1 : Standalone Mode15:46
How to set up Hadoop in the standalone mode. Windows users need to install a Virtual Linux instance before this video.
Hadoop Install Step 2 : Pseudo-Distributed Mode11:44
Set up Hadoop in the Pseudo-Distributed mode. All Hadoop services will be up and running!

[For Linux/Mac OS Shell Newbies] Path and other Environment Variables8:25
If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares.
Setting up a Virtual Linux Instance - For Windows Users15:58
Hadoop is basically for Linux/Unix systems. If you are on Windows, you can set up a Linux Virtual Machine on your computer and use that for the install.

Requirements

Students should have basic knowledge of the Hadoop eco-system and should be able to run MapReduce jobs on Hadoop

Description

Prerequisites: Working with Oozie requires some basic knowledge of the Hadoop eco-system and running MapReduce jobs

Taught by a team which includes 2 Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing jobs.

Oozie is like the formidable, yet super-efficient admin assistant who can get things done for you, if you know how to ask

Let's parse that

formidable, yet super-efficient: Oozie is formidable because it is entirely written in XML, which is hard to debug when things go wrong. However, once you've figured out how to work with it, it's like magic. Complex dependencies, managing a multitude of jobs at different time schedules, managing entire data pipelines are all made easy with Oozie

get things done for you: Oozie allows you to manage Hadoop jobs as well as Java programs, scripts and any other executable with the same basic set up. It manages your dependencies cleanly and logically.

if you know how to ask: Knowing the right configurations parameters which gets the job done, that is the key to mastering Oozie

What's Covered:

Workflow Management: Workflow specifications, Action nodes, Control nodes, Global configuration, real examples with MapReduce and Shell actions which you can run and tweak

Time-based and data-based triggers for Workflows: Coordinator specification, Mimicing simple cron jobs, specifying time and data availability triggers for Workflows, dealing with backlog, running time-triggered and data-triggered coordinator actions

Data Pipelines using Bundles: Bundle specification, the kick-off time for bundles, running a bundle on Oozie

Who this course is for:

Yep! Engineers, analysts and sysadmins who are interested in big data processing on Hadoop
Nope! Beginners who have no knowledge of the Hadoop eco-system

From 0 to 1: The Oozie Orchestration Framework

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 2min

A Brief Overview Of Oozie2 lectures • 22min

Oozie Install And Set Up1 lecture • 16min

Workflows: A Directed Acyclic Graph Of Tasks7 lectures • 1hr 1min

Coordinators: Managing Workflows6 lectures • 1hr

Bundles: A Collection Of Coordinators For Data Pipelines2 lectures • 20min

Installing Hadoop in a Local Environment3 lectures • 36min

Appendix2 lectures • 24min

Requirements

Description

Who this course is for: