Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Java Parallel Computation on Hadoop

Learn to write real, working data-driven Java programs that can run in parallel on multiple machines by using Hadoop.

Created byIvan Ng, Frahaan Hussain

Last updated 8/2014

English

What you'll learn

Know the essential concepts about Hadoop
Know how to setup a Hadoop cluster in pseudo-distributed mode
Know how to setup a Hadoop cluster in distributed mode (3 physical nodes)
Know how to develop Java programs to parallelize computations on Hadoop

Course content

12 sections • 43 lectures • 2h 46m total length

Welcome!1:01
Explore how to use the Java language to write programs that run on multiple machines with Hadoop, covering parallel computation concepts and the core framework.

Cloudera VM1:22
Demonstration: Using the VM1:02
Shared Folders between your host OS and VM3:31
Tips about Shared Folders0:32
Accessing HDFS1:33
Running MapReduce2:44
Wire a driver, a mapper, and a reducer, package them into a jar, and specify Shakespeare input and output paths.
Demonstration: Accessing HDFS4:35
Demonstration: Running MapReduce2:52
Demonstration: Web Console for HDFS2:58
Demonstrates using the web console for HDFS to browse the file system, locate outputs in user directories, and verify word counts for a Shakespeare article.
Demonstration: Web Console for MapReduce1:33

The Problem and Design4:48
Learn to perform large-scale data sorting with Java on the Hadoop framework, outlining the problem and a two-phase map and reduce design to deduplicate words across documents.
Demonstration: Develop and Run the program12:59
Demonstrate developing and running a Hadoop MapReduce program in Java, from Eclipse project import to configuring the driver and map-reduce classes, and executing the job with input and output paths.
Data Sorting - Source Code0:05

Requirements

An understanding of the Java programming language

Description

Build your essential knowledge with this hands-on, introductory course on the Java parallel computation using the popular Hadoop framework:

- Getting Started with Hadoop

- HDFS working mechanism

- MapReduce working mecahnism

- An anatomy of the Hadoop cluster

- Hadoop VM in pseudo-distributed mode

- Hadoop VM in distributed mode

- Elaborated examples in using MapReduce

Learn the Widely-Used Hadoop Framework

Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0.

All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.

Who are using Hadoop for data-driven applications?

You will be surprised to know that many companies have adopted to use Hadoop already. Companies like Alibaba, Ebay, Facebook, LinkedIn, Yahoo! is using this proven technology to harvest its data, discover insights and empower their different applications!

Contents and Overview

As a software developer, you might have encountered the situation that your program takes too much time to run against large amount of data. If you are looking for a way to scale out your data processing, this is the course designed for you. This course is designed to build your knowledge and use of Hadoop framework through modules covering the following:

- Background about parallel computation

- Limitations of parallel computation before Hadoop

- Problems solved by Hadoop

- Core projects under Hadoop - HDFS and MapReduce

- How HDFS works

- How MapReduce works

- How a cluster works

- How to leverage the VM for Hadoop learning and testing

- How the starter program works

- How the data sorting works

- How the pattern searching

- How the word co-occurrence

- How the inverted index works

- How the data aggregation works

- All the examples are blended with full source code and elaborations

Come and join us! With this structured course, you can learn this prevalent technology in handling Big Data.

Who this course is for:

IT Practitioners
Software Developers
Software Architects
Programmers
Data Analysts
Data Scientists

Java Parallel Computation on Hadoop

What you'll learn

Explore related topics

Course content

Overview1 lecture • 1min

Background knowledge about Hadoop3 lectures • 15min

The Hadoop Ecosystem3 lectures • 19min

Get Ready in pseudo-distributed mode10 lectures • 23min

Get Ready in distributed mode5 lectures • 18min

Large-scale Word Counting3 lectures • 18min

Large-scale Data Sorting3 lectures • 18min

Large-scale Pattern Searching3 lectures • 17min

Large-scale Item Co-occurrence3 lectures • 16min

Large-scale Inverted Index3 lectures • 20min

Requirements

Description

Who this course is for: