Hadoop emerged in response to the proliferation of masses and masses of data collected by organizations, offering a strong solution to store, process, and analyze what has commonly become known as Big Data. It comprises a comprehensive stack of components designed to enable these tasks on a distributed scale, across multiple servers and thousands of machines.
Learning Hadoop 2 introduces you to the powerful system synonymous with Big Data, demonstrating how to create an instance and leverage Hadoop ecosystem's many components to store, process, manage, and query massive data sets with confidence.
We open this course by providing an overview of the Hadoop component ecosystem, including HDFS, Sqoop, Flume, YARN, MapReduce, Pig, and Hive, before installing and configuring our Hadoop environment. We take a look at Hue, the graphical user interface of Hadoop.
We will then discover HDFS, Hadoop’s file-system used to store data. We will learn how to import and export data, both manually and automatically. Afterward, we turn our attention toward running computations using MapReduce, and get to grips working with Hadoop’s scripting language, Pig. Lastly, we will siphon data from HDFS into Hive, and demonstrate how it can be used to structure and query data sets.
About The Author
Randal Scott King is the Managing Partner of Brilliant Data, a consulting firm specialized in data analytics. In his 16 years of consulting, Scott has amassed an impressive list of clientele from mid-market leaders to Fortune 500 household names. Scott lives just outside Atlanta, GA, with his children.
This video will offer the overview of the course.
This video will introduce you to the basic concepts of Hadoop Distributed File System (HDFS) and Yet Another Resource Negotiator (YARN), which are the two core components of Hadoop.
An introduction to the basic concepts of Sqoop and Flume, two tools for the automation of data import into Hadoop.
An introduction to the basic concepts of MapReduce, the computation engine of Hadoop.
An introduction to the basic concepts of Pig, a scripting language for Hadoop.
An introduction to the basic concepts of Hive, Hadoop’s data warehousing solution.
This video will explain how to get data from databases into HDFS.
This video will cover how to import streaming data using the Flume tool.
This video will explore how to build “Word Count” in Eclipse, then save it to a .jar and run it from MapReduce.
Coding the same word counting program, but this time in Pig.
This video will discuss how to use Pig to perform common Extract, Transform, and Load functions on data.
This video will explore how to use predefined code called User Defined Functions (UDFs) in Pig scripts.
This video will cover how to get data into Hive from a database without going to HDFS first.
Using queries in Hive to find information.
A quick summary of what the viewer has learned in the entire course.
Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.
With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.
From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.
Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.