Hadoop Developer In Real World

Free Cluster Access * HDFS * MapReduce * YARN * Pig * Hive * Flume * Sqoop * AWS * EMR * Optimization * Troubleshooting
4.7 (359 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
1,030 students enrolled Bestselling in Data Science
$200
Take This Course
  • Lectures 76
  • Length 15.5 hours
  • Skill Level All Levels
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 10/2015 English

Course Description

From the creators of the successful Hadoop Starter Kit course hosted in Udemy, comes Hadoop In Real World course. This course is designed for anyone who aspire a career as a Hadoop developer. In this course we have covered all the concepts that every aspiring Hadoop developer must know to SURVIVE in REAL WORLD Hadoop environments.

The course covers all the must know topics like HDFS, MapReduce, YARN, Apache Pig and Hive etc. and we go deep in exploring the concepts. We just don’t stop with the easy concepts, we take it a step further and cover important and complex topics like file formats, custom Writables, input/output formats, troubleshooting, optimizations etc.

All concepts are backed by interesting hands-on projects like analyzing million song dataset to find less familiar artists with hot songs, ranking pages with page dumps from wikipedia, simulating mutual friends functionality in Facebook just to name a few.

What are the requirements?

  • Although you don't have to be an expert in Java, basic knowledge in Java programming is required as we will be looking at programs in Java.
  • Basic Linux commands

What am I going to get from this course?

  • Understand what is Big Data, the challenges with Big Data and how Hadoop propose a solution for the Big Data problem
  • Work and navigate Hadoop cluster with ease
  • Install and configure a Hadoop cluster on cloud services like Amazon Web Services (AWS)
  • Understand the difference phases of MapReduce in detail
  • Write optimized Pig Latin instruction to perform complex data analysis
  • Write optimized Hive queries to perform data analysis on simple and nested datasets
  • Work with file formats like SequenceFile, AVRO etc
  • Understand Hadoop architecture, Single Point Of Failures (SPOF), Secondary/Checkpoint/Backup nodes, HA configuration and YARN
  • Tune and optimize slowing running MapReduce jobs, Pig instructions and Hive queries
  • Understand how Joins work behind the scenes and will be able to write optimized join statements
  • Wherever possible, students will be introduced to difficult questions that are asked in real Hadoop interviews

What is the target audience?

  • This course is for anyone who aspire a career as a Hadoop Developer
  • This course is for anyone who want to learn and understand in depth about Hadoop and Big Data

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: Thank You and Let's Get Started
Course Structure
Preview
11:03
Tools & Setup (Windows)
09:09
Tools & Setup (Linux)
07:42
Section 2: Introduction To Big Data
What is Big Data?
17:47
Understanding Big Data Problem
Preview
14:24
History of Hadoop
03:46
Test your understanding of Big Data
6 questions
Section 3: HDFS
HDFS - Why Another Filesystem?
Preview
13:20
Blocks
07:50
Working With HDFS
16:09
HDFS - Read & Write
09:31
HDFS - Read & Write (Program)
08:38
Test your understanding of HDFS
5 questions
HDFS Assignment
Article
Section 4: MapReduce
Introduction to MapReduce
Preview
08:51
Dissecting MapReduce Components
Preview
18:03
Dissecting MapReduce Program (Part 1)
12:00
Dissecting MapReduce Program (Part 2)
16:09
Combiner
06:20
Counters
06:43
Facebook - Mutual Friends
17:38
New York Times - Time Machine
15:43
Test your understanding of MapReduce
12 questions
MapReduce Assignment
Article
Section 5: Apache Pig
Introduction to Apache Pig
12:52
Loading & Projecting Datasets
13:41
Solving a Problem
13:32
Complex Types
21:12
Pig Latin - Joins
19:53
Million Song Dataset (Part 1)
10:29
Million Song Dataset (Part 2)
15:01
Page Ranking (Part 1)
08:11
Page Ranking (Part 2)
19:26
Page Ranking (Part 3)
12:17
Test your understanding of Apache Pig
13 questions
Apache Pig Assignment
Article
Section 6: Apache Hive
Introduction to Apache Hive
09:58
Dissect a Hive Table
10:14
Loading Hive Tables
11:17
Simple Selects
06:07
Managed Table vs. External Table
06:20
Order By vs. Sort By vs. Cluster By
09:44
Partitions
19:31
Buckets
07:27
Hive QL - Joins
09:21
Twitter (Part 1)
09:33
Twitter (Part 2)
08:43
Test your understanding of Apache Hive
18 questions
Apache Hive Assignment
Article
Section 7: Architechture
HDFS Architechture
12:46
Secondary Namenode
11:24
Highly Available Hadoop
08:48
MRv1 Architechture
10:49
YARN
11:22
Test your understanding of Hadoop Architechture
10 questions
Section 8: Cluster Setup
Vendors & Hosting
06:35
Cluster Setup (Part 1)
23:43
Cluster Setup (Part 2)
25:35
Cluster Setup (Part 3)
18:01
15:46

With Amazon EMR we can start a brand new Hadoop cluster and run MapReduce jobs in matter of minutes. This lecture will walk through step by step how to set up a Hadoop cluster and run MapReduce jobs in it.

Test your understanding of Cluster Setup
7 questions
Section 9: Hadoop Administrator In Real World (Upcoming Course)
13:08

In this lecture we will learn about the benefits of Cloudera Manager, differences between Packages and Parcels and lifecycle of Parcels.

24:07

In this lecture we will see how to install a 3 node Hadoop cluster on AWS using Cloudera Manager

Section 10: File Formats
Compression
14:55
Sequence File
18:32
AVRO
19:08
File Formats - Pig
18:08
File Formats - Hive
11:03
Test your understanding of File Formats
10 questions
Section 11: Troubleshooting and Optimizations
Exploring Logs
08:49
MRUnit
09:31
MapReduce Tuning
12:39
Pig Join Optimizations (Part 1)
17:17
Pig Join Optimizations (Part 2)
13:56
Hive Join Optimizations
19:36
Test your understanding of Troubleshooting & Optimizations
15 questions
Section 12: Apache Sqoop
13:59

This lecture will give an introduction to Apache Sqoop and demonstrate Sqoop imports to bring data from a traditional databases like MySQL to HDFS

09:01

This lecture will cover custom Sqoop imports and how Sqoop can be used to export tables in different file formats

15:33

This lecture will cover Sqoop jobs & incremental imports.

07:30

This lecture will demonstrate how Sqoop can be used to create and populate a Hive Table directly and also how to export data from HDFS to a MySQL table

Section 13: Apache Flume
14:03

In this lecture, we will see an introduction to Flume and we will look in detail about the different flume components - source, channel and sink. We will also look at a very simple flume configuration to ingest log messages to HDFS.

07:19

In this lecture we will ingest log messages from a single source and replicate the flume events in to HDFS and local file system.

16:23

In this lecture we will simulate ingesting logs from multiple data centers using avro source and sinks and consolidate the flume events in to a centralized location and segregate flume events using a concept called multiplexing.

08:09

In this lecture we will see how to write a custom source to stream live tweets from Twitter using Flume.

Section 14: Bonus
Preparing For Hadoop Interviews
19:18

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Hadoop In Real World, Expert Big Data Consultants

We are a group of Senior Hadoop Consultants who are passionate about Hadoop and Big Data technologies. We have experience across several key domains from finance and retail to social media and gaming. We have worked with Hadoop clusters ranging from 50 all the way to 800 nodes.

We have been teaching Hadoop for several years now. Check out our FREE and successful Hadoop Starter Kit course at Udemy.

Ready to start learning?
Take This Course