Hadoop Developer Course with MapReduce and Java

Name: Hadoop Developer Course with MapReduce and Java
Rating: 3.0 (34 reviews)

Learn Basics of Hadoop and MapReduce with Java

Created byInflame Tech

Last updated 2/2019

English

What you'll learn

Introduction to Hadoop
Basics of MapReduce
MapReduce with YARN

Course content

9 sections • 77 lectures • 7h 58m total length

1.1 Introduction3:35
Explore how Hadoop handles large-scale data processing with map and reduce tasks, leveraging data locality, distributing work across a cluster, and shuffling and sorting by key to produce output.
1.2 Prerequisites0:23
1.3 what you will learn2:06
Explore MapReduce basics, Hadoop components and architecture limitations, data inputs, sequence files and Avro, custom partitioning, joining techniques, and Java API indexing.
1.4 Need of MapReduce1:24

2.1 What is Hadoop7:00
2.2 Hadoop History3:41
2.3 Comparison of HDFS with RDBMS8:02
2.4 Hadoop Cluster4:24
2.5 Hadoop Features3:10
Explore Hadoop features that deliver robust clustering, scalable performance, and data safety across distributed clusters worldwide for handling large data workloads.
2.6 Cluster Modes in Hadoop2:25
2.7 Hadoop Core Components1:44
2.8 What is HDFS3:53
Discover how hdfs stores data as 64 mb blocks, replicates them across a distributed cluster to improve throughput and availability, and offers a unix-like interface for efficient reads and writes.
2.9 Block Replication in HDFS1:00
Demonstrate block replication by creating replicas on different machines and verify that the correct replica is produced.
2.10 HDFS and MapReduce2:19
Discover how hdfs stores big data in blocks and tunes block size across a cluster. Learn how mapreduce applies functions to distributed data to map, process, and reduce results.
2.11 HDFS Daemons6:36
Explore how HDFS daemons coordinate with the name node and data nodes, manage blocks and replication, maintain the namespace, and perform heartbeats and snapshots for reliable storage.

3.1 What is MapReduce5:38
3.2 Why MapReduce3:08
Explain why MapReduce matters in Hadoop by examining data replication and locality, locating data nodes, and expressing problems in any language through map and reduce.
3.3 History of MapReduce3:30
3.4 Use Cases to Illustrate Advantages of MapReduce2:06
Explore how MapReduce handles both structured and unstructured data to analyze customer behavior from web and retail data, unlocking opportunities to boost business.
3.5 MapReduce Applications6:43
MapReduce applications illuminate data analysis by extracting and transporting data from databases, enabling log analysis, validation for accuracy and consistency, and securing meaningful insights from complex data.
3.6 Anatomy of MapReduce Program1:25
3.7 Map and Reduce Function4:19
Explore how the map and reduce functions transform input into key-value pairs, group by keys like words, and produce aggregated outputs.
3.8 Hands-On Session18:59
Build a MapReduce program in Java to compute the maximum temperature from text input in this hands-on session, using tokenization, input/output handling, and map and reduce logic.

4.1 Dataflow in MapReduce2:03
Explore the dataflow in MapReduce, detailing how map outputs become input for reduce with key-value elements, and discuss reliability on commodity hardware.
4.2 Job Submission Flow of MapReduce3:14
Explore the end-to-end job submission flow in MapReduce, from input through map to shuffle, and finally to the reduce phase and output.
4.3 MapReduce Example5:54
Explore a MapReduce style example that counts occurrences by splitting input into regions, mapping each element to a value, and reducing to a final output.
4.4 MapReduce Daemons6:06
Explore how MapReduce daemons coordinate jobs across commodity hardware to process input data in parallel, manage job lifecycles, and produce reliable outputs while reducing costs.
4.5 Job Tracker3:57
4.6 Task Tracker2:05
Explore the task tracker concept, learning how to schedule and monitor a job, manage notifications, and ensure successful completion in a data processing workflow.
4.7 Task Assignment by JobTracker1:45
Follow how the job tracker handles task assignment by locating an available slot on a machine, preferring the same or nearest machine when possible and switching if unavailable.
4.8 Submission of MapReduce Job2:02
Learn how to prepare, configure, and submit a MapReduce job, including setting input and output, packaging the code, and moving data.
4.9 Hands-On11:50
4.10 Combiner and partitioner7:22
Explore how the combiner reduces intermediate data and how the partitioner directs data into partitions before the shuffle in MapReduce.
4.11 Dataflow with a Single, Multiple and No Reduce Task4:38

5.1 Hadoop 1.x Architecture7:12
5.2 Hadoop 1.x Problems6:10
This lecture outlines Hadoop 1.x problems, including namespace and metadata limits, scalability with billions of items, and the absence of high availability necessitating manual recovery of the secondary.
5.3 NameNode-No Horizontal Scalability1:45
5.4 No High Availability in NameNode3:41
Explains why there is no high availability in the NameNode, detailing fsimage and edit log roles, recovery implications, and the need to switch to a different framework.
5.5 JobTracker-Overburdened2:35
5.6 MRv13:17
Explore the key ideas of transaction processing, including online and financial transactions, and how system design impacts performance and profitability in complex workflows.
5.7 Hadoop 2.x New Features4:04
Explore Hadoop 2.x new features, including federation and high availability, as the architecture evolves to improve scalability, reliability, and API use.
5.8 Hadoop 2.x Architecture2:44
Analyze the Hadoop 2.x architecture, focusing on how data is handled and recovery processes within the system. Explore how context and resource visibility shape understanding of the architecture.
5.9 HDFS High Availability in Hadoop 2.x Architecture4:02
Demonstrate high availability in Hadoop 2.x architecture with HDFS, using standby and secondary mechanisms to keep data accessible even when components go down, including snapshots.
5.10 YARN-Moving Beyond MapReduce5:02
5.11 Different Processing Applications in YARN3:25
5.12 MRv2 (YARN)2:31
Explore how MRv2 (YARN) manages resources and schedules MapReduce jobs, using a resource manager to negotiate for job execution, monitor status, and ensure applications restart when needed.
5.13 YARN MR Application Execution Flow4:30
Explore how a yarn mapreduce application submits to the resource manager, the scheduler allocates resources to the application, and core work runs inside containers.
5.14 YARN Workflow7:23
5.15 MapReduce 2.x Cluster Architecture4:02
Explore MapReduce 2.x cluster architecture by examining how the resource manager and application master coordinate monitoring, resource management, and client jobs.
5.16 Hands-On37:50
Explore how the resource manager, application master, and yarn coordinate distributed apps, with hands-on CDH configuration, delegation security, and map-based graph processing concepts.

6.1 InputSplit and RecordReader5:13
6.2 Mapper, Reducer and Driver Class2:12
6.3 New vs Old API8:06
6.4 Generic Option Parser, Tool and ToolRunner2:44
Explore the generic option parser in Hadoop, showing how it simplifies running MapReduce jobs by parsing standard command line arguments into a configuration object, the driver class, and arguments.
6.5 GenericOptionsParser and ToolRunner Options3:33
Configure Hadoop jobs with GenericOptionsParser and ToolRunner by specifying configuration files and hyphen options such as -D property=value and -conf. Copy and archive files across file systems to prepare resources.
6.6 Writables in Hadoop2:04
Explore writables in Hadoop and how to serialize data as byte streams for mapreduce. Implement primitive Java types like long and use equals and toString for data handling.
6.7 Serialization and Deserialization19:11
Learn how an object's state becomes a sequence of bytes through serialization for network transmission or long-term storage, and how deserialization reconstructs the object from streams.
6.9 Chaining of Jobs18:02
6.10 Listing and Killing Jobs1:15
6.11 Distributed Cache14:17
Discover how the Hadoop distributed cache makes common data available to all map tasks, improving performance, scalability, and data consistency.
6.12 Counters17:00
Explore how Hadoop MapReduce counters track statistics about the entire job, using grouped and dynamic counters in Java to tally records across tasks and retrieve aggregated results.
6.13 Test cases in Hadoop12:21

7.1 Schedulers8:19
7.2 Implement Fair Scheduler in CDH5:11
Implement the fair scheduler in CDH by configuring Hadoop with a pre-configured distribution and writing code to enable fair scheduling.
7.3 Data Compression in Hadoop4:10
7.4 Different Compression Techniques in Hadoop8:00
Explore Hadoop compression techniques, including block-based and record-level approaches, their impact on speed and storage, and how compatibility and full support influence MapReduce workflows.
7.5 Hands-On9:31
Experience a data compression workflow and compute the maximum temperature from an input file through a hands-on session, tracing the program flow from compression to final output.
7.6 Multiple Inputs13:30
Learn to handle multiple inputs in MapReduce using Java, including reading input files, joining data, and producing outputs with practical map and reduce strategies.
7.7 Tuning2:04
7.8 Profiling Map and Reduce Task6:24
Explore profiling map and reduce tasks using a Java profiler to optimize map and reduce phases, reduce shuffle overhead, and lower network utilization.
7.9 Filtering and Projection in Map Phase3:12
Learn how filtering and projection in the map phase reduce shuffled data by omitting unneeded fields and keeping only relevant records, enhancing map output efficiency and later shuffle performance.
7.10 Use Combiner Class11:32
Learn how the combiner reduces data transfer by applying a local reduction to mapper outputs before they reach the reducer, with input and output types matching.
7.11 Analyze XML data using Map Reduce Framework8:40
Explore how to analyze XML data with the Map Reduce framework in Hadoop, configuring jobs, processing XML inputs, and producing text-indexed outputs.
7.12 Custom Partitioner in Map Reduce13:26
Learn to implement a custom partitioner in MapReduce that routes records by month using IP address keys, with driver setup to produce month-based outputs.

8.1 Joining in MapReduce15:14
Learn how joining in MapReduce merges datasets on a common key, such as department id, using employee and department records, with map, reduce, and driver configuration to produce joined output.
8.2 Different Input and Output Formats in MapReduce14:12
Explore different input and output formats in MapReduce, including text input format and sequence file formats (text and binary), with keys, values, headers, and compression.

Requirements

Basics of JAVA
Knowledge of programming would be beneficial.

Description

This course will help you to comprehend MapReduce Programming, how to set up an environment for the same, how to submit and execute MapReduce applications. We will begin from the top and after that peel profound into the Advanced concepts of MapReduce. Towards the finish of the MapReduce course, you will hold skill on:

Processing unstructured data.

Analyse complex and large data sets in Hadoop framework.

YARN - NextGen MapReduce.

Designing and Implementing complex queries using MapReduce approach.

Will be able to break Big Data into meaningful information, process data in parallel on Hadoop cluster and make available for users.

Learn how to extract patterns and business trends.

Who this course is for:

Hadoop Beginners
Professionals who want to learn Hadoop
Graduates looking to build a career in Big Data Analytics
Aspiring Data Scientists

Hadoop Developer Course with MapReduce and Java

What you'll learn

Explore related topics

Course content

Module-1 Introduction to Course4 lectures • 7min

Module-2 A Look at Hadoop11 lectures • 44min

Module-3 MapReduce Basics8 lectures • 46min

Module-4 Understanding MapReduce11 lectures • 51min

Module-5 MapReduce with YARN16 lectures • 1hr 40min

Module-6 Advanced MapReduce Concepts - I12 lectures • 1hr 46min

Module-7 Advance Mapreduce Concepts-II12 lectures • 1hr 34min

Module-8 Advance Mapreduce Concepts-III2 lectures • 29min

Program & Projects for full Course1 lecture • 1min

Requirements

Description

Who this course is for: