Java Parallel Computation on Hadoop

Learn to write real, working data-driven Java programs that can run in parallel on multiple machines by using Hadoop.
4.0 (53 ratings)
Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
4,626 students enrolled
Instructed by Ivan Ng Development / Databases
25% off
Take This Course
  • Lectures 43
  • Length 3 hours
  • Skill Level All Levels
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works


Find online courses made by experts from around the world.


Take your courses with you and learn anywhere, anytime.


Learn and practice real-world skills and achieve your goals.

About This Course

Published 8/2014 English

Course Description

Build your essential knowledge with this hands-on, introductory course on the Java parallel computation using the popular Hadoop framework:

- Getting Started with Hadoop

- HDFS working mechanism

- MapReduce working mecahnism

- An anatomy of the Hadoop cluster

- Hadoop VM in pseudo-distributed mode

- Hadoop VM in distributed mode

- Elaborated examples in using MapReduce

Learn the Widely-Used Hadoop Framework

Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0.

All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.

Who are using Hadoop for data-driven applications?

You will be surprised to know that many companies have adopted to use Hadoop already. Companies like Alibaba, Ebay, Facebook, LinkedIn, Yahoo! is using this proven technology to harvest its data, discover insights and empower their different applications!

Contents and Overview

As a software developer, you might have encountered the situation that your program takes too much time to run against large amount of data. If you are looking for a way to scale out your data processing, this is the course designed for you. This course is designed to build your knowledge and use of Hadoop framework through modules covering the following:

- Background about parallel computation

- Limitations of parallel computation before Hadoop

- Problems solved by Hadoop

- Core projects under Hadoop - HDFS and MapReduce

- How HDFS works

- How MapReduce works

- How a cluster works

- How to leverage the VM for Hadoop learning and testing

- How the starter program works

- How the data sorting works

- How the pattern searching

- How the word co-occurrence

- How the inverted index works

- How the data aggregation works

- All the examples are blended with full source code and elaborations

Come and join us! With this structured course, you can learn this prevalent technology in handling Big Data.

What are the requirements?

  • An understanding of the Java programming language

What am I going to get from this course?

  • Know the essential concepts about Hadoop
  • Know how to setup a Hadoop cluster in pseudo-distributed mode
  • Know how to setup a Hadoop cluster in distributed mode (3 physical nodes)
  • Know how to develop Java programs to parallelize computations on Hadoop

Who is the target audience?

  • IT Practitioners
  • Software Developers
  • Software Architects
  • Programmers
  • Data Analysts
  • Data Scientists

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.


Section 1: Overview
Section 2: Background knowledge about Hadoop
Existing Technical Limitations
Requirements for the new approach
Hadoop solving the limitations
Section 3: The Hadoop Ecosystem
Overview of HDFS
Overview of MapReduce
Overview of Hadoop clusters
Section 4: Get Ready in pseudo-distributed mode
Cloudera VM
Demonstration: Using the VM
Shared Folders between your host OS and VM
Tips about Shared Folders
Accessing HDFS
Running MapReduce
Demonstration: Accessing HDFS
Demonstration: Running MapReduce
Demonstration: Web Console for HDFS
Demonstration: Web Console for MapReduce
Section 5: Get Ready in distributed mode
About the Environment
Setup the Master node - Exercise Manual
6 pages
Setup the Slave node - Exercise Manual
6 pages
Start the Master node - Exercise Manual
2 pages
Start the Slave node - Exercise Manual
2 pages
Section 6: Large-scale Word Counting
The Problem and Design
Demonstration: Develop and Run the program
Word Counting - Source Code
Section 7: Large-scale Data Sorting
The Problem and Design
Demonstration: Develop and Run the program
Data Sorting - Source Code
Section 8: Large-scale Pattern Searching
The Problem and Design
Demonstration: Develop and Run the program
Pattern Searching - Source Code
Section 9: Large-scale Item Co-occurrence
The Problem and Design
Demonstration: Develop and Run the program
Item Co-occurrence - Source Code
Section 10: Large-scale Inverted Index
The Problem and Design
Demonstration: Develop and Run the program
Inverted Index - Source Code
Section 11: Large-scale Data Aggregation
The Problem and Design
Demonstration: Develop and Run the program
Data Aggregation - Source Code
Section 12: Data Preparation
Dataset 0
Dataset 1
Dataset 2

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Ivan Ng, Instructor on Emerging Technologies

Along my path working as a software architect in the last 15 years for different products like Learning Management System, Online Game, RFID-based warehousing systems and high-frequency advertising systems for companies like Prudential, AXA, Bank of China, I also delivered numerous training on a wide range of IT related topics for more than 10 years - topics include Big Data, Mobility, Front-end Engineering, Cloud Computing, Server Architecture and Data Analytic - for different institutes like HP Education, Oracle Education, Hong Kong Open University of Hong Kong, Chinese University of Hong Kong.

I enjoy the time interacting with the participants and understand the practical requirements encountered under different needs.

I have my first master degree in Information Technology and the 2nd master degree in Quantitative Finance.

Ready to start learning?
Take This Course