Java Parallel Computation on Hadoop
4.3 (107 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
14,885 students enrolled

Java Parallel Computation on Hadoop

Learn to write real, working data-driven Java programs that can run in parallel on multiple machines by using Hadoop.
4.3 (107 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
14,885 students enrolled
Last updated 8/2014
English
English [Auto]
Current price: $13.99 Original price: $19.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 3 hours on-demand video
  • 10 articles
  • 13 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Know the essential concepts about Hadoop
  • Know how to setup a Hadoop cluster in pseudo-distributed mode
  • Know how to setup a Hadoop cluster in distributed mode (3 physical nodes)
  • Know how to develop Java programs to parallelize computations on Hadoop
Course content
Expand all 43 lectures 03:02:42
+ Background knowledge about Hadoop
3 lectures 15:10
Requirements for the new approach
03:17
Hadoop solving the limitations
06:00
+ The Hadoop Ecosystem
3 lectures 18:38
Overview of HDFS
06:28
Overview of MapReduce
08:12
Overview of Hadoop clusters
03:58
+ Get Ready in pseudo-distributed mode
10 lectures 22:42
Cloudera VM
01:22
Demonstration: Using the VM
01:02
Tips about Shared Folders
00:32
Accessing HDFS
01:33
Running MapReduce
02:44
Demonstration: Accessing HDFS
04:35
Demonstration: Running MapReduce
02:52
Demonstration: Web Console for HDFS
02:58
Demonstration: Web Console for MapReduce
01:33
+ Get Ready in distributed mode
5 lectures 02:19
About the Environment
02:19
Setup the Master node - Exercise Manual
6 pages
Setup the Slave node - Exercise Manual
6 pages
Start the Master node - Exercise Manual
2 pages
Start the Slave node - Exercise Manual
2 pages
+ Large-scale Word Counting
3 lectures 18:15
The Problem and Design
04:39
Demonstration: Develop and Run the program
13:31
Word Counting - Source Code
00:05
+ Large-scale Data Sorting
3 lectures 17:52
The Problem and Design
04:48
Demonstration: Develop and Run the program
12:59
Data Sorting - Source Code
00:05
+ Large-scale Pattern Searching
3 lectures 17:01
The Problem and Design
05:04
Demonstration: Develop and Run the program
11:52
Pattern Searching - Source Code
00:05
+ Large-scale Item Co-occurrence
3 lectures 15:36
The Problem and Design
04:37
Demonstration: Develop and Run the program
10:54
Item Co-occurrence - Source Code
00:05
+ Large-scale Inverted Index
3 lectures 19:52
The Problem and Design
04:22
Demonstration: Develop and Run the program
15:25
Inverted Index - Source Code
00:05
Requirements
  • An understanding of the Java programming language
Description

Build your essential knowledge with this hands-on, introductory course on the Java parallel computation using the popular Hadoop framework:

- Getting Started with Hadoop

- HDFS working mechanism

- MapReduce working mecahnism

- An anatomy of the Hadoop cluster

- Hadoop VM in pseudo-distributed mode

- Hadoop VM in distributed mode

- Elaborated examples in using MapReduce

Learn the Widely-Used Hadoop Framework

Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0.

All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.

Who are using Hadoop for data-driven applications?

You will be surprised to know that many companies have adopted to use Hadoop already. Companies like Alibaba, Ebay, Facebook, LinkedIn, Yahoo! is using this proven technology to harvest its data, discover insights and empower their different applications!

Contents and Overview

As a software developer, you might have encountered the situation that your program takes too much time to run against large amount of data. If you are looking for a way to scale out your data processing, this is the course designed for you. This course is designed to build your knowledge and use of Hadoop framework through modules covering the following:

- Background about parallel computation

- Limitations of parallel computation before Hadoop

- Problems solved by Hadoop

- Core projects under Hadoop - HDFS and MapReduce

- How HDFS works

- How MapReduce works

- How a cluster works

- How to leverage the VM for Hadoop learning and testing

- How the starter program works

- How the data sorting works

- How the pattern searching

- How the word co-occurrence

- How the inverted index works

- How the data aggregation works

- All the examples are blended with full source code and elaborations

Come and join us! With this structured course, you can learn this prevalent technology in handling Big Data.

Who this course is for:
  • IT Practitioners
  • Software Developers
  • Software Architects
  • Programmers
  • Data Analysts
  • Data Scientists