Building Hadoop Clusters
3.5 (3 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
53 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Building Hadoop Clusters to your Wishlist.

Add to Wishlist

Building Hadoop Clusters

Deploy multi-node Hadoop clusters to harness the Cloud for storage and large-scale data processing
3.5 (3 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
53 students enrolled
Created by Packt Publishing
Last updated 9/2015
Current price: $12 Original price: $85 Discount: 86% off
3 days left at this price!
30-Day Money-Back Guarantee
  • 2.5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion

Training 5 or more people?

Get your team access to Udemy's top 2,000 courses anytime, anywhere.

Try Udemy for Business
What Will I Learn?
  • Explore Amazon's Web Services to manage big data
  • Configure network and security settings when deploying instances to the cloud
  • Explore methods to connect to cloud instances using your client machine
  • Set up Linux environments and configure settings for services and package installations
  • Examine Hadoop's general architecture and what each service brings to the table
  • Harness and navigate Hadoop's file storage and processing mechanisms
  • Install and master Apache Hadoop User Interface (HUE)
View Curriculum
  • This video series assumes no prior knowledge of any cloud technologies, Hadoop, or Linux.

Hadoop is an Apache top-level project that allows the distributed processing of large data sets across clusters of computers using simple programming models. It allows you to deliver a highly available service on top of a cluster of computers, each of which may be prone to failures. While Big Data and Hadoop have seen a massive surge in popularity over the last few years, many companies still struggle with trying to set up their own computing clusters.

This video series will turn you from a faltering first-timer into a Hadoop pro through clear, concise descriptions that are easy to follow.

We'll begin this course with an overview of Amazon's cloud service and its use. We'll then deploy Linux compute instances and you'll see how to connect your client machine to Linux hosts and configure your systems to run Hadoop. Finally, you'll install Hadoop, download data, and examine how to run a query.

This video series will go beyond just Hadoop; it will cover everything you need to get your own clusters up and running. You will learn how to make network configuration changes as well as modify Linux services. After you've installed Hadoop, we'll then go over installing HUE—Hadoop's UI. Using HUE, you will learn how to download data to your Hadoop clusters, move it to HDFS, and finally query that data with Hive.

Learn everything you need to deploy Hadoop clusters to the Cloud through these videos. You'll grasp all you need to know about handling large data sets over multiple nodes.

About the Author

Sean Mikha is a technical architect who specializes in implementing large-scale data warehouses using Massively Parallel Processing (MPP) technologies. Sean has held roles at multiple companies that specialize in MPP technologies, where he was a part of implementing one of the largest commercial clinical data warehouses in the world. Sean is currently a solution architect, focusing on architecting Big Data solutions while also educating customers on Hadoop technologies. Sean graduated from UCLA with a BS in Computer Engineering, and currently lives in Southern California.

Who is the target audience?
  • If you are a system administrator or anyone interested in building a Hadoop cluster to process large sets of data, this video course is for you.
Compare to Other Hadoop Courses
Curriculum For This Course
24 Lectures
Deploying Cloud Instances for Hadoop 2.0
3 Lectures 14:39

There are many choices when building a Hadoop cluster. We will give you the tools, education, and skills to build your own at a low cost.

Introduction to the Cloud and Hadoop

When deploying an Amazon instance for Hadoop, you will have to choose the correct AMI and set it up properly. This video will show you how.

Deploying a Linux Amazon Machine Image

Learn how to set up a static IP address for Amazon instances, and manage instances by starting and terminating.

Preview 04:12
Setting Up Network and Security Settings
3 Lectures 16:23

The goal of this video is to help you identify the correct inbound and outbound IP addresses and ports specifications for Hadoop so that you can set up your security settings properly.

Preview 04:54

Security must be defined when building clusters. We need to make sure our security settings are compatible with Hadoop.

Identifying and Allocating Security Groups

Connecting to Amazon instances requires special tools and configuration settings. This video will show you how to prepare to connect through Windows.

Configuration of Private Keys in a Windows Environment
Connecting to Cloud Instances
3 Lectures 13:58

There are multiple ways to connect to Amazon. We show you how to use these different methods to connect.

Overview of the Connectivity Options for Windows to the Amazon Cloud

Putty is a free utility to connect to Amazon instances, but it may be difficult to set up. We will show you how to set it up in detail.

Installing and Using Putty for Connectivity to Windows Clients

We will need special tools and settings to show you how to get your private key to the Amazon instances.

Transferring Files to Linux Nodes with PSCP
Setting Up Network Connectivity and Access for Hadoop Clusters
3 Lectures 22:58

Before we begin setting up our Hadoop cluster, we will need to gain a better understanding of the overall architecture. We will cover the key Hadoop components so you know how it works.

Defining the Hadoop Cluster

We need to set up SSH properly so that we will not have to verify our credentials each time we log in.

Setting Up Password-less SSH on the Head Node

To install Hadoop properly, we will need to configure the network details on each node.

Gathering Network Details and Setting Up the HOSTS File
Setting Up Configuration Settings across Hadoop Clusters
3 Lectures 21:34
To have a proper Hadoop installation, we need to make sure we install all dependencies. Having the proper software repositories setup will go a long way in having a smooth Hadoop installation.
Setting Up Linux Software Repositories

It is very difficult to manage a Hadoop cluster if you have to reissue the same commands over and over again to every instance. The pdsh utility will allow us to run a command once and apply it many times to the data nodes in our cluster.

Using the Parallel Shell Utility (pdsh)

There are a series of steps we will need to take for a proper Hadoop installation. In this video, we will show you how to prep the data nodes, remove any conflicting software, and set up the daemon processes needed.

Prepping for Hadoop Installation
Creating a Hadoop Cluster
3 Lectures 19:30

To install Hadoop, our environment has to be set up correctly. We will check the Linux environment and download Ambari.

Building a Hadoop Cluster

We will take you step-by-step through the first half of installing Hadoop. This will make sure you configure all of the settings correctly.

Installing Hadoop 2 – Part 1

We continue stepping through the Hadoop installation process. Here, we take the detailed steps needed to configure Hadoop services.

Installing Hadoop 2 – Part 2
Loading and Navigating the Hadoop File System (HDFS)
3 Lectures 20:58

Before you are able to effectively use the Hadoop file system, you will have to understand its architecture and how it works. This section will show you exactly how HDFS is configured.

Understanding the Hadoop File System

To get a file transferred to HDFS, you will need to take a set of steps that distribute data across nodes.

Preview 07:36

Hadoop can be difficult and complex to troubleshoot. With Ambari, the complexity of this task is reduced with a rich graphical UI.

Ambari Server and Dashboard
Hadoop Tools and Processing Files
3 Lectures 23:56

Hadoop comes with many components like Hive and Pig. However, using them is difficult because they use a command-line interface. In this section, we install HUE, the Apache Hadoop UI that solves our interface problems.

Preview 10:16

For us to proceed with using HUE, we need to make multiple Hadoop configuration changes as well as install the Hadoop code on our servers.

Preview 07:37

We can now use HUE and do not need to deal with the command line. We will use HUE to load data into a table and query that data – one of the most common use cases for Hadoop.
Using HUE
About the Instructor
Packt Publishing
3.9 Average rating
8,274 Reviews
59,300 Students
689 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.