Find online courses made by experts from around the world.
Take your courses with you and learn anywhere, anytime.
Learn and practice real-world skills and achieve your goals.
From the creators of the successful Hadoop Starter Kit course hosted in Udemy, comes Hadoop In Real World course. This course is designed for anyone who aspire a career as a Hadoop developer. In this course we have covered all the concepts that every aspiring Hadoop developer must know to SURVIVE in REAL WORLD Hadoop environments.
The course covers all the must know topics like HDFS, MapReduce, YARN, Apache Pig and Hive etc. and we go deep in exploring the concepts. We just don’t stop with the easy concepts, we take it a step further and cover important and complex topics like file formats, custom Writables, input/output formats, troubleshooting, optimizations etc.
All concepts are backed by interesting hands-on projects like analyzing million song dataset to find less familiar artists with hot songs, ranking pages with page dumps from wikipedia, simulating mutual friends functionality in Facebook just to name a few.
Not for you? No problem.
30 day money back guarantee.
Learn on the go.
Desktop, iOS and Android.
Certificate of completion.
|Section 1: Thank You and Let's Get Started|
Tools & Setup (Windows)
Tools & Setup (Linux)
|Section 2: Introduction To Big Data|
What is Big Data?
Understanding Big Data ProblemPreview
History of Hadoop
Test your understanding of Big Data
|Section 3: HDFS|
HDFS - Why Another Filesystem?Preview
Working With HDFS
HDFS - Read & Write
HDFS - Read & Write (Program)
Test your understanding of HDFS
|Section 4: MapReduce|
Introduction to MapReducePreview
Dissecting MapReduce ComponentsPreview
Dissecting MapReduce Program (Part 1)
Dissecting MapReduce Program (Part 2)
Facebook - Mutual Friends
New York Times - Time Machine
Test your understanding of MapReduce
|Section 5: Apache Pig|
Introduction to Apache Pig
Loading & Projecting Datasets
Solving a Problem
Pig Latin - Joins
Million Song Dataset (Part 1)
Million Song Dataset (Part 2)
Page Ranking (Part 1)
Page Ranking (Part 2)
Page Ranking (Part 3)
Test your understanding of Apache Pig
Apache Pig Assignment
|Section 6: Apache Hive|
Introduction to Apache Hive
Dissect a Hive Table
Loading Hive Tables
Managed Table vs. External Table
Order By vs. Sort By vs. Cluster By
Hive QL - Joins
Twitter (Part 1)
Twitter (Part 2)
Test your understanding of Apache Hive
Apache Hive Assignment
|Section 7: Architechture|
Highly Available Hadoop
Test your understanding of Hadoop Architechture
|Section 8: Cluster Setup|
Vendors & Hosting
Cluster Setup (Part 1)
Cluster Setup (Part 2)
Cluster Setup (Part 3)
With Amazon EMR we can start a brand new Hadoop cluster and run MapReduce jobs in matter of minutes. This lecture will walk through step by step how to set up a Hadoop cluster and run MapReduce jobs in it.
Test your understanding of Cluster Setup
|Section 9: Hadoop Administrator In Real World (Upcoming Course)|
In this lecture we will learn about the benefits of Cloudera Manager, differences between Packages and Parcels and lifecycle of Parcels.
In this lecture we will see how to install a 3 node Hadoop cluster on AWS using Cloudera Manager
|Section 10: File Formats|
File Formats - Pig
File Formats - Hive
Test your understanding of File Formats
|Section 11: Troubleshooting and Optimizations|
Pig Join Optimizations (Part 1)
Pig Join Optimizations (Part 2)
Hive Join Optimizations
Test your understanding of Troubleshooting & Optimizations
|Section 12: Apache Sqoop|
This lecture will give an introduction to Apache Sqoop and demonstrate Sqoop imports to bring data from a traditional databases like MySQL to HDFS
This lecture will cover custom Sqoop imports and how Sqoop can be used to export tables in different file formats
This lecture will cover Sqoop jobs & incremental imports.
This lecture will demonstrate how Sqoop can be used to create and populate a Hive Table directly and also how to export data from HDFS to a MySQL table
|Section 13: Apache Flume|
In this lecture, we will see an introduction to Flume and we will look in detail about the different flume components - source, channel and sink. We will also look at a very simple flume configuration to ingest log messages to HDFS.
In this lecture we will ingest log messages from a single source and replicate the flume events in to HDFS and local file system.
In this lecture we will simulate ingesting logs from multiple data centers using avro source and sinks and consolidate the flume events in to a centralized location and segregate flume events using a concept called multiplexing.
In this lecture we will see how to write a custom source to stream live tweets from Twitter using Flume.
|Section 14: Bonus|
Preparing For Hadoop Interviews
We are a group of Senior Hadoop Consultants who are passionate about Hadoop and Big Data technologies. We have experience across several key domains from finance and retail to social media and gaming. We have worked with Hadoop clusters ranging from 50 all the way to 800 nodes.
We have been teaching Hadoop for several years now. Check out our FREE and successful Hadoop Starter Kit course at Udemy.