Hadoop Basic Course for Beginners to Professionals

Getting Started with Hadoop: An open source framework to handle Big data

Created byEdulearners Technologies

Last updated 2/2019

English

What you'll learn

Basics of big data
History of Hadoop
Difference between RDBMS and Hadoop
Cluster Modes in Hadoop
HDFS Daemons and Mapreduce daemons
HADOOP CLUSTER ARCHITECTURE
HDFS Commands
Combiner & Partitioner
Mapreduce

Course content

4 sections • 31 lectures • 2h 33m total length

1.1 What is Big data3:04
Discover what big data is, its velocity, volume, variety, and veracity, and identify the three data types: structured, semi-structured, and unstructured, with attention to data quality.
1.2 Facts about Big Data3:17
Learn facts about big data generated by social media and mobile devices, noting billions of users, millions of posts and videos per minute, and data from sensors and geolocation.
1.3 Big Data Scenarios2:17
1.4 Apache Hadoop framework3:05
1.5 Hadoop Users1:53
Explore how enterprise Hadoop users leverage managed platforms like Amazon Elastic MapReduce and IBM Big Insights to run real-time analytics on large data sets, integrating Cloudera and Cassandra.
1.6 Hadoop History2:01
1.7 Difference between RDBMS and Hadoop1:07
1.8 Cluster Modes in Hadoop0:59
explores the three Hadoop cluster modes—standalone (local) mode, pseudo distributed mode, and fully distributed mode—highlighting local file systems, development use, and production deployment.
1.9 Hadoop Ecosystem4:30
Explore the Hadoop ecosystem, including HDFS, MapReduce, and Pig Latin for distributed data processing. Learn distributed workflow and job coordination with ZooKeeper and import/export between RDBMS and Hadoop.
1.10 HDFS Daemons and Mapreduce daemons2:36
1.11 Hadoop Cluster Architecture2:45
Explore the Hadoop cluster architecture, detailing master and slave roles, the name node, secondary name node, data nodes, and how job and task trackers coordinate data access and metadata.
1.12 Learning Hadoop7:41
Discover why Hadoop drives big data analytics across industries, enables scalable, cost-effective data processing, and opens diverse career paths in big data technologies.
1.13 Hadoop Distributions and Compatibilities0:57
Explore Hadoop distributions and their compatibilities, including Cloudera CDH, Greenplum, and Hortonworks, along with supported operating systems such as open source variants and others.
1.14 Hadoop Ecosystem in detail18:25

2.1 Hadoop Distributed File System6:47
2.2 HDFS Files and Blocks5:20
Learn how HDFS stores data in 64 MB blocks, distributes blocks across nodes, and uses replication (default three) for fault tolerance under name node management.
2.3 HDFS Components and Architecture9:24
Explore the hdfs architecture with the name node as metadata master, data nodes for blocks, replication for fault tolerance, and the secondary name node (checkpoint node) managing fsimage.
2.4 HDFS File Read-Write6:29
Learn hdfs read-write operations in a distributed file system, including file creation, block placement, replication, and reading data from data nodes via the name node.
2.5 Installation of Apache Hadoop3:43
Install Apache Hadoop via the Cloudera distribution CDH 5.2 on VMware Player, after downloading the VM image and ensuring adequate RAM and free disk space.

3.1 Mapreduce4:23
3.2 Map-Reduce Operation2:30
3.3 Map-reduce Example2:18
Demonstrate a MapReduce word count example in Hadoop by tokenizing text, mapping words to ones, shuffling and sorting by key, and reducing to final counts.
3.4 HDFS Input Splits2:20
The lecture explains HDFS input splits as logical divisions of data over physical blocks, showing how split size dictates map tasks: large splits create fewer maps, small splits create more.
3.5 Mapreduce Architecture5:46
Explore the MapReduce architecture in Hadoop, detailing two-stage processing with map and reduce tasks, data splits, partitioning, shuffling, and local storage of intermediate results.
3.6 Combiners3:32
Explain how the combiner processes map output before reduce, reducing intermediate data and transfer time, and note advantages like data reduction and drawbacks such as lack of guaranteed combiner execution.
3.7 Partitioner4:21
Explore how the partitioner routes map outputs to reducers via key-based hashing, ensuring all values for a key reach the same reducer. Learn how to balance load with custom partitioners.
3.8 Shuffling and Sorting in Hadoop3:03

4.1 cdh 5.2 Installation10:59
Download CDH 5.2 and extract the file, then run it in a virtual machine to start CDH 5.2 and learn how to use Tolleson.
4.2 HDFS Commands part-15:35
Explore essential HDFS commands for beginners, including leaving safe mode, using the file system utility to check health, create directories, and list contents while understanding replication factor and directory status.
4.3 HDFS Commands part-29:46
Explore essential HDFS commands for beginners, including copy from local, listing directories recursively, and displaying file contents with cat, enabling efficient data management in Hadoop.
4.4 HDFS Commands part-312:19
Learn to perform core HDFS commands for copying, moving, and deleting files across directories, creating directories, and verifying results using practical syntax examples.

Requirements

Basics of Big data
Basics of NoSQL databases
Basics of Programming
Programming terminologies

Description

Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

This basic course provides a quick introduction to Big Data, MapReduce algorithm, and Hadoop Distributed File System.

This course has been prepared for professionals aspiring to learn the basics of Big Data Analytics using Hadoop Framework and become a Hadoop Developer. Software Professionals, Analytics Professionals, and ETL developers are the key beneficiaries of this course.

Before you start proceeding with this course, we assume that you have prior exposure to Core Java, database concepts, and any of the Linux operating system flavors.

Who this course is for:

This course has been prepared for professionals aspiring to learn the basics of Big Data Analytics using Hadoop Framework and become a Hadoop Developer.
Software Professionals, Analytics Professionals, and ETL developers are the key beneficiaries of this course.

Hadoop Basic Course for Beginners to Professionals

What you'll learn

Explore related topics

Course content

Module 1 Introduction to Big data & Hadoop14 lectures • 55min

Module 2 Introduction to HDFS (Hadoop Distributed File System)5 lectures • 32min

Module 3 Basics of Mapreduce Model8 lectures • 28min

Module 4 Hadoop Installation & Practical4 lectures • 39min

Requirements

Description

Who this course is for: