Hadoop Starter Kit

Hadoop learning made easy and fun. Learn HDFS, MapReduce and introduction to Pig and Hive with FREE cluster access.

Created byBig Data In Real World

Last updated 2/2017

English

What you'll learn

Understand the Big Data problem in terms of storage and computation
Understand how Hadoop approach Big Data problem and provide a solution to the problem
Understand the need for another file system like HDFS
Work with HDFS
Understand the architecture of HDFS
Understand the MapReduce programming model
Understand the phases in MapReduce
Envision a problem in MapReduce
Write a MapReduce program with complete understanding of program constructs
Write Pig Latin instructions
Create and query Hive tables

Course content

8 sections • 15 lectures • 3h 19m total length

Course Introduction2:55
In this short video, we will give an overview of the course and walk through the each section of this course.

What Is Big Data?17:54
In this lesson, we will learn -

What is Big Data and some examples of Big Data

The problems that come with Big Data in terms of storage and computation

What Hadoop can offer in terms of solutions to the Big Data problems

Compare traditional solutions with Hadoop
Understanding Big Data Problem14:46
In this lesson, we will -

Take a sample big data problem,

Analyze the problem and understand the complexities in terms of storage and computation

Finally, we will work on a solution together
Test your understanding of Big Data

HDFS - Why Another Filesystem?13:29
In this lesson, we will learn -

What is a file system and it's features

Existing file systems

Limitations of existing file systems in distributed computing

How HDFS is different from local file system

Basics of HDFS

Benefits of HDFS
Working With HDFS17:26
In this lesson, we will see -

Practical differences between HDFS and local file system

Manipulate files and directories in HDFS

Commands to check or update permissions, replications and file system check

Physical blocks location and preview at hdfs-site.xml

HDFS commands location in cluster - /hirw-starterkit/hdfs/commands
HDFS Architechture12:50
In this lesson, we will learn about -

Data Node

Name Node

Information held by Name Node

HDFS configuration files

Topology - Node, Rack, Cluster
Test your understanding of HDFS

Introduction To MapReduce8:51
In this lesson we will learn MapReduce using a good illustrative example. You will not be bored with Word Count problem, we promise !!! This lesson covers the following -

The basics of MapReduce

Introduction to Phases of MapReduce phases

Introduction to technical terms like Mapper, Reducer, InputSplit etc.
Dissecting MapReduce Components18:05
In this lesson we will -

Dive deeper in to each phase of MapReduce

Learn the difference between InputSplit vs Block

Significance of Shuffle phase

Partitioner, Combiner etc
Dissecting MapReduce Program (Part 1)12:05
In this lesson we will write a MapReduce program in Java to calculate the maximum closing of stock symbol from a stocks dataset. We will walk through every single line code and understand the programming concepts involved in writing MapReduce code.

Location of code, jar, readme file in cluster - /hirw-starterkit/mapreduce/stocks
Dissecting MapReduce Program (Part 2)17:13
In this lesson we will write a MapReduce program in Java to calculate the maximum closing of stock symbol from a stocks dataset. We will walk through every single line code and understand the programming concepts involved in writing MapReduce code.

Location of code, jar, readme file in cluster - /hirw-starterkit/mapreduce/stocks
Test your understanding of MapReduce

Requirements

Basic linux commands
Basic Java knowledge is only needed to understand MapReduce programming in Java. Pig, Hive and other lessons does not need Java knowledge

Description

The objective of this course is to walk you through step by step of all the core components in Hadoop but more importantly make Hadoop learning experience easy and fun.

By enrolling in this course you can also get free access to our multi-node Hadoop training cluster so you can try out what you learn right away in a real multi-node distributed environment.

ABOUT INSTRUCTOR(S)

We are a group of Hadoop consultants who are passionate about Hadoop and Big Data technologies. 4 years ago when we were looking for Big Data consultants to work in our own projects we did not find qualified candidates because the big data industry was very new and hence we set out to train qualified candidates in Big Data ourselves giving them a deep and real world insights in to Hadoop.

WHAT YOU WILL LEARN IN THIS COURSE

In the first section you will learn about what is big data with examples. We will discuss the factors to consider when considering whether a problem is big data problem or not. We will talk about the challenges with existing technologies when it comes to big data computation. We will breakdown the Big Data problem in terms of storage and computation and understand how Hadoop approaches the problem and provide a solution to the problem.

In the HDFS, section you will learn about the need for another file system like HDFS. We will compare HDFS with traditional file systems and its benefits. We will also work with HDFS and discuss the architecture of HDFS.

In the MapReduce section you will learn about the basics of MapReduce and phases involved in MapReduce. We will go over each phase in detail and understand what happens in each phase. Then we will write a MapReduce program in Java to calculate the maximum closing price for stock symbols from a stock dataset.

In the next two sections, we will introduce you to Apache Pig & Hive. We will try to calculate the maximum closing price for stock symbols from a stock dataset using Pig and Hive.

Who this course is for:

This course is for anyone who wants to learn about Big Data technologies.
No advanced programming knowledge is needed
This course is for anyone who wants to learn about distributed computing and Hadoop

Hadoop Starter Kit

What you'll learn

Explore related topics

Course content

Welcome & Let's Get Started1 lecture • 3min

Introduction to Big Data2 lectures • 33min

HDFS3 lectures • 44min

MapReduce4 lectures • 56min

Apache Pig1 lecture • 12min

Apache Hive1 lecture • 8min

Hadoop Administrator In Real World (Upcoming Course)2 lectures • 37min

Our Hadoop Developer course1 lecture • 6min

Requirements

Description

Who this course is for: