From 0 to 1: The Oozie Orchestration Framework
4.6 (43 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,804 students enrolled
Wishlisted Wishlist

Please confirm that you want to add From 0 to 1: The Oozie Orchestration Framework to your Wishlist.

Add to Wishlist

From 0 to 1: The Oozie Orchestration Framework

A first-principles guide to working with Workflows, Coordinators and Bundles in Oozie
Best Selling
4.6 (43 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,804 students enrolled
Created by Loony Corn
Last updated 11/2016
English
Curiosity Sale
Current price: $10 Original price: $50 Discount: 80% off
30-Day Money-Back Guarantee
Includes:
  • 4 hours on-demand video
  • 36 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Install and set up Oozie
  • Configure Workflows to run jobs on Hadoop
  • Configure time-triggered and data-triggered Workflows
  • Configure data pipelines using Bundles
View Curriculum
Requirements
  • Students should have basic knowledge of the Hadoop eco-system and should be able to run MapReduce jobs on Hadoop
Description

Prerequisites: Working with Oozie requires some basic knowledge of the Hadoop eco-system and running MapReduce jobs

Taught by a team which includes 2 Stanford-educated, ex-Googlers  and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing jobs. 

Oozie is like the formidable, yet super-efficient admin assistant who can get things done for you, if you know how to ask

Let's parse that 

formidable, yet super-efficientOozie is formidable because it is entirely written in XML, which is hard to debug when things go wrong. However, once you've figured out how to work with it, it's like magic. Complex dependencies, managing a multitude of jobs at different time schedules, managing entire data pipelines are all made easy with Oozie

get things done for youOozie allows you to manage Hadoop jobs as well as Java programs, scripts and any other executable with the same basic set up. It manages your dependencies cleanly and logically. 

if you know how to askKnowing the right configurations parameters which gets the job done, that is the key to mastering Oozie

What's Covered: 

Workflow Management: Workflow specifications, Action nodes, Control nodes, Global configuration, real examples with MapReduce and Shell actions which you can run and tweak

Time-based and data-based triggers for Workflows: Coordinator specification, Mimicing simple cron jobs, specifying time and data availability triggers for Workflows, dealing with backlog, running time-triggered and data-triggered coordinator actions

Data Pipelines using Bundles: Bundle specification, the kick-off time for bundles, running a bundle on Oozie


Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2 people developing technical video content. Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is to *NOT offer additional technical support over email or in-person*. The truth is, direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.

It is a hard trade-off.

Thank you for your patience and understanding!


Who is the target audience?
  • Yep! Engineers, analysts and sysadmins who are interested in big data processing on Hadoop
  • Nope! Beginners who have no knowledge of the Hadoop eco-system
Students Who Viewed This Course Also Viewed
Curriculum For This Course
24 Lectures
04:01:48
+
Introduction
1 Lecture 01:38
+
A Brief Overview Of Oozie
2 Lectures 22:01

A very first principles discussion of why you would want to use Oozie.

Preview 11:16

Basic Oozie component overview, and where Oozie fits in the Hadoop ecosystem.

Oozie architectural components
10:45
+
Oozie Install And Set Up
1 Lecture 16:29

Time to install Oozie and run some workflows. Do use the attached text file which has detailed instructions and all the commands you'll need. 

Installing Oozie on your machine
16:29
+
Workflows: A Directed Acyclic Graph Of Tasks
7 Lectures 01:00:41

Run a simple MapReduce job using the command line. If you're comfortable running MR jobs you can simply skip this!

The attached zip files has a lot of MR examples, we just run the simplest one.

Preview 04:40

Workflows are basic Oozie building blocks, a brief introduction to how Workflows work

Preview 06:12

It's real when you can run stuff! Running our very first MapReduce Workflow on Oozie.

Running our first Oozie Workflow MapReduce application
11:15

The properties specified to configure a Workflow.

The job.properties file
08:45

The actual code (well it's XML, but that is code as far as Oozie is concerned)

The workflow.xml file
12:06

A Shell action Workflow
07:46

Workflows have advanced control structures to determine which action to execute and ways to specify global configuration for all actions.

Control nodes, Action nodes and Global configurations within Workflows
09:57
+
Coordinators: Managing Workflows
6 Lectures 01:00:07

Coordinators manage workflows and run them at a specified time, and frequency provided the input data is available.

Running our first Coordinator application
12:27

A time-triggered Coordinator is very similar to a Unix cron job

A time-triggered Coordinator definition
08:52

Oozie allows pretty fine-grained control over the running of Workflows, you can specify timeouts, throttling, concurrency and the execution order of Workflows materialized by the same Coordinator.

Coordinator control mechanisms
07:09

Workflow actions might depend on input data. Coordinators can be configured such that workflows are not launched till the right data is available for them. Such triggers are called data availability triggers.

Data availability triggers
10:03

A running example of a Coordinator which launches multiple Workflows, some of which have input data available and others which do not.

Running a Coordinator which waits for input data
06:11

Configuring data input triggers is slightly complicated. We have to make sure that we specify the right data instances that the Workflow is interested in.

Coordinator configuration to use data triggers
15:25
+
Bundles: A Collection Of Coordinators For Data Pipelines
2 Lectures 20:27

Bundles can be used to define data pipelines where multiple coordinators need to be managed together as a single Oozie job

Bundles and why we need them
09:15

The bundle kick-ff time can help you determine when the Bundle coordinators run on Oozie. 

The Bundle kick-off time
11:12
+
Installing Hadoop in a Local Environment
3 Lectures 36:02

Hadoop has 3 different install modes - Standalone, Pseudo-distributed and Fully Distributed. Get an overview of when to use each

Hadoop Install Modes
08:32

How to set up Hadoop in the standalone mode. Windows users need to install a Virtual Linux instance before this video. 

Hadoop Install Step 1 : Standalone Mode
15:46

Set up Hadoop in the Pseudo-Distributed mode. All Hadoop services will be up and running! 

Hadoop Install Step 2 : Pseudo-Distributed Mode
11:44
+
Appendix
2 Lectures 24:23

If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. 

[For Linux/Mac OS Shell Newbies] Path and other Environment Variables
08:25

Hadoop is basically for Linux/Unix systems. If you are on Windows, you can set up a Linux Virtual Machine on your computer and use that for the install. 

Setting up a Virtual Linux Instance - For Windows Users
15:58
About the Instructor
Loony Corn
4.3 Average rating
5,043 Reviews
39,320 Students
76 Courses
An ex-Google, Stanford and Flipkart team

Loonycorn is us, Janani Ravi and Vitthal Srinivasan. Between us, we have studied at Stanford, been admitted to IIM Ahmedabad and have spent years  working in tech, in the Bay Area, New York, Singapore and Bangalore.

Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft

Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too

We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Udemy!

We hope you will try our offerings, and think you'll like them :-)