Real World Hadoop - Deploying Hadoop with Cloudera Manager
3.9 (21 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
158 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Real World Hadoop - Deploying Hadoop with Cloudera Manager to your Wishlist.

Add to Wishlist

Real World Hadoop - Deploying Hadoop with Cloudera Manager

Once you have a running Cloudera Manager installation, we walk through the installation and logic of the Hadoop daemons
3.9 (21 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
158 students enrolled
Created by Toyin Akin
Last updated 12/2016
English
Current price: $10 Original price: $55 Discount: 82% off
5 hours left at this price!
30-Day Money-Back Guarantee
Includes:
  • 2 hours on-demand video
  • 2 Articles
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Able to see Cloudera Manager at work installing a distributed Hadoop cluster easily
  • Acquire the concepts in which to split the various Hadoop services across cluster nodes.
  • Get a picture as to how one can operate a Hadoop cluster in Production.
View Curriculum
Requirements
  • The software needed for this course is freely available
  • This course is not recommened if you have no desire to work with/in distributed computing
Description

If you already have a running Cloudera Manager installation this course follows on with the logic behind the placement of the Hadoop master/slave daemons across your cluster. We actually go ahead and discuss the placement and perform the installation of Hadoop.

If you do not have a Cloudera Manager installation and you want to follow along hands on, you can complete the course : "Real World Vagrant - Automate a Cloudera Manager Build - Toyin Akin" beforehand.

"Big Data" technology is a hot and highly valuable skill to have – and this course will teach you how to quickly deploy a Hadoop Cluster using the Cloudera stack. 

Cloudera allows you to download a QuickStart Virtual machine which is great for developers, but this is of no use for the Operations team to start the planning and the building out of DEV / UAT and PROD environments within their organizations. What assumptions were made when the QuickStart VM was put together?

In addition, hosting all of Cloudera's processes as well as Hadoop's processes on one VM is not a model that any large organization can or should follow. The Hadoop services need to be split out across multiple VMs/Servers. In fact that's the whole point out Hadoop! 

Distributed Data and Distributed Compute.

After all, if you are developing against or operating a distributed environment, it needs to be tested. Tested in terms of the forcing various failure modes within the cluster and ensuing that the cluster can still respond to user requests. Killing the QuickStart VM destroys the entire cluster!

You'll learn the same techniques these large enterprise guys use to move to the next step in building out an enterprise grade Hadoop cluster.

If you are a developer, the operations team can build out that centralized cluster in which you are truly testing against a distributed cluster. Testing code against the Quickstart VM may work, but as any experienced distributed developer knows, verifying code against a pseudo cluster on a single machine is different than verifying against code against a truly distributed cluster. 

As an example bottlenecks in Networks or CPU cycles will come to light. In addition, this will also assist in capacity planing of the UAT / PROD cluster as initial metrics can be acquired.

If you are in operations then this gives the operations team an environment for the team to start learning how to jointly operate the cluster. Here the team can start to understand cluster metrics, adding/removing cluster nodes, managing the various Hadoop services (Zookeeper, HDFS, YARN and Spark) and a lot more. We also look at managing Cloudera Hadoop Parcels as well as changing Hadoop versions once a cluster is deployed.

The operation team can start to develop procedures and change management documentation ready for Production operation of a Hadoop cluster.

.

Here I present a curriculum as to the current state of my Cloudera courses.

My Hadoop courses are based on Vagrant so that you can practice and destroy your virtual environment before applying the installation onto real servers/VMs.

.

For those with little or no knowledge of the Hadoop eco system Udemy course : Big Data Intro for IT Administrators, Devs and Consultants

.

I would first practice with Vagrant so that you can carve out a virtual environment on your local desktop. You don't want to corrupt your physical servers if you do not understand the steps or make a mistake. Udemy course : Real World Vagrant For Distributed Computing

.

I would then, on the virtual servers, deploy Cloudera Manager plus agents. Agents are the guys that will sit on all the slave nodes ready to deploy your Hadoop services Udemy course : Real World Vagrant - Automate a Cloudera Manager Build

.

Then deploy the Hadoop services across your cluster (via the installed Cloudera Manager in the previous step or your own Cloudera Manager installation). We look at the logic regarding the placement of master and slave services. Udemy course : Real World Hadoop - Deploying Hadoop with Cloudera Manager

.

If you want to play around with HDFS commands (Hands on distributed file manipulation). Udemy course : Real World Hadoop - Hands on Enterprise Distributed Storage.

.

You can also automate the deployment of the Hadoop services via Python (using the Cloudera Manager Python API). But this is an advanced step and thus I would make sure that you understand how to manually deploy the Hadoop services first. Udemy course : Real World Hadoop - Automating Hadoop install with Python!

.

There is also the upgrade step. Once you have a running cluster, how do you upgrade to a newer hadoop cluster (Both for Cloudera Manager and the Hadoop Services). Udemy course : Real World Hadoop - Upgrade Cloudera and Hadoop hands on



Who is the target audience?
  • Software engineers who want to expand their skills into the world of distributed computing
  • System Engineers that want to expand their skillsets beyond the single Hadoop server
  • Developers who want to write/test their Hadoop code against a centralized, distributed Hadoop enviroment
Students Who Viewed This Course Also Viewed
Curriculum For This Course
24 Lectures
01:54:33
+
Introduction
5 Lectures 15:09

Here we discuss the benefits of moving from Cloudera's QuickStart VM to a small Development Environment for the Enterprise. This gives the operation team opportunity to play with a centralized environment where the operation of the cluster can be tested in DEV.

Preview 04:42

Suggested course curriculum to follow ...

Preview 00:00

If you want to follow the course hands on ...

Preview 00:21

We discuss a minimal set of VMs or servers that can be put into place to build a small DEV Hadoop Cluster.

Development Topology I
04:54

Second part of the new Development Topology

Development Topology II
05:11
+
Setup - Hadoop Cluster
5 Lectures 30:50

Here we select the nodes where the various Hadoop processes will be distributed across

Preview 04:38

Here we configure Cloudera Manager with the Cloudera parcel location. A Cloudera parcel is simply a packaged version of a  Hadoop distribution.

Setting up Parcel location
06:42

Here we select the core services needed to run a distributed Data / Compute Hadoop Cluster.

Hadoop Services
06:24

Here we summarize this Distributed Data Service as well as discuss the placement of the various processes that underlies the HDFS service

HDFS - process placement and overview
08:22

Here we summarize this Distributed "Compute" Service as well as discuss the placement of the various processes that underlies the YARN service

YARN - process placement and overview
04:44
+
Setup - Cloudera Manager Services
2 Lectures 06:52

Quick summary and placement of Cloudera Manager's processes.

Cloudera Manager Services Summary
04:22

Database selection for the Cloudera Manager services.


Database selection
02:30
+
Hadoop Cluster Navigation
4 Lectures 18:23

Running the new Hadoop Cluster for the first time.

Starting the Hadoop Cluster
06:00

Quick navigation of the Hadoop Cluster.

Hadoop Cluster Services Navigation
04:43

A quick look at some of the charts provided by Cloudera Manager.

Cluster Charts
03:30

An example of aggregating the logs across all the Hadoop processes (including Cloudera Manager processes) and searching for keywords

Distributed Logging
04:10
+
Switching to a different version of Hadoop
3 Lectures 17:35

Deploying an additional Cloudera Parcel that houses a different version of the Hadoop binaries.

Preview 08:44

Here we update the running cluster with the changed Hadoop version.

Restart cluster with changed Hadoop version
04:08

Here we discuss how a development team can package their application into a Parcel and deploy using Cloudera Manager's distribution engine.

Custom Services
04:43
+
Cluster Operations
4 Lectures 20:09

Here we look at some Cluster Events and see how we can respond to it.

Cluster Events
05:08

Even though an organization can use Cloudera Express in Production. This does not imply that running a Hadoop cluster is free!

People cost of Operating a Cloudera stack.
04:41

Here we look at some of the 100's of configuration elements that has to be looked at and approved before moving a cluster into DEV / PROD.

Cluster Configuration Elements
05:10

Here we look at expanding the capacity of the cluster

Adding a new Cluster Node
05:10
+
Conclusion
1 Lecture 05:36

Summary

Summary
05:36
About the Instructor
Toyin Akin
3.9 Average rating
135 Reviews
1,367 Students
15 Courses
Big Data Engineer, Capital Markets FinTech Developer

I spent 6 years at "Royal Bank of Scotland" and 5 years at the investment bank "BNP Paribas"  developing and managing Interest Rate Derivatives services as well as engineering and deploying In Memory DataBases (Oracle Coherence), NoSQL and Hadoop clusters (Cloudera) into production.

In 2016, I left to start my own training, POC-D. "Proof Of Concept - Delivered", which focuses on delivering training on IMDB (In Memory Database), NoSQL, BigData and DevOps technology. 

From Q3 2017, this will also include FinTech Training in Capital Markets using Microsoft Excel (Windows), JVM languages (Java/Scala) as well as .NET (C#, VB.NET, C++/CLI, F# and IronPythyon)

I have a YouTube Channel, publishing snippets of my videos. These are not courses. Simply ad-hoc videos discussing various distributed computing ideas.

Check out my website and/or YouTube for more info

See you inside ...