Real World Hadoop - Automating Hadoop install with Python!
4.1 (12 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
233 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Real World Hadoop - Automating Hadoop install with Python! to your Wishlist.

Add to Wishlist

Real World Hadoop - Automating Hadoop install with Python!

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Cloudera Manager's Python API. Hands on.
4.1 (12 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
233 students enrolled
Created by Toyin Akin
Last updated 2/2017
English
Curiosity Sale
Current price: $10 Original price: $90 Discount: 89% off
30-Day Money-Back Guarantee
Includes:
  • 4 hours on-demand video
  • 3 Articles
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Simply run a single command on your desktop, go for a coffee, and come back with a running distributed environment for cluster deployment
  • Quickly build an environment where Cloudera and Hadoop software can be installed.
  • Ability to automate the installation of software across multiple Virtual Machines
View Curriculum
Requirements
  • Basic programming or scripting experience is required.
  • You will need a desktop PC and an Internet connection. The course is created with Windows in mind.
  • The software needed for this course is freely available
  • You will require a computer with a Virtualization chipset support - VT-x. Most computers purchased over the last five years should be good enough
  • Optional : Some exposure to Linux and/or Bash shell environment
  • 64-bit Windows operating system required (Would recommend Windows 7 or above)
  • This course is not recommened if you have no desire to work with/in distributed computing
  • This course is built on top of - "Real World Vagrant - Automate a Cloudera Manager Build"
Description

Note : This course is built on top of the "Real World Vagrant - Automate a Cloudera Manager Build - Toyin Akin" course

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Python! Instruct Cloudera Manager to do the work! Hands on. Here we use Python to instruct an already installed Cloudera Manager to deploy your Hadoop Services.

.The Cloudera Manager API provides configuration and service lifecycle management, service health information and metrics, and allows you to configure Cloudera Manager itself. The API is served on the same host and port as the Cloudera Manager Admin Console, and does not require an extra process or extra configuration. The API supports HTTP Basic Authentication, accepting the same users and credentials as the Cloudera Manager Admin Console.

.

Here are some of the cool things you can do with Cloudera Manager via the API:

Deploy an entire Hadoop cluster programmatically. Cloudera Manager supports HDFS, MapReduce, YARN, ZooKeeper, HBase, Hive, Oozie, Hue, Flume, Impala, Solr, Sqoop, Spark and Accumulo.
    Configure various Hadoop services and get config validation.
    Take admin actions on services and roles, such as start, stop, restart, failover, etc. Also available are the more advanced workflows, such as setting up high availability and decommissioning.
    Monitor your services and hosts, with intelligent service health checks and metrics.
    Monitor user jobs and other cluster activities.
    Retrieve timeseries metric data.
    Search for events in the Hadoop system.
    Administer Cloudera Manager itself.
    Download the entire deployment description of your Hadoop cluster in a json file.

Additionally, with the appropriate licenses, the API lets you:

    Perform rolling restart and rolling upgrade.
    Audit user activities and accesses in Hadoop.
    Perform backup and cross data-center replication for HDFS and Hive.
    Retrieve per-user HDFS usage report and per-user MapReduce resource usage report.

.

Here I present a curriculum as to the current state of my Cloudera courses.

My Hadoop courses are based on Vagrant so that you can practice and destroy your virtual environment before applying the installation onto real servers/VMs.

.

For those with little or no knowledge of the Hadoop eco system
Udemy course : Big Data Intro for IT Administrators, Devs and Consultants

.

I would first practice with Vagrant so that you can carve out a virtual environment on your local desktop. You don't want to corrupt your physical servers if you do not understand the steps or make a mistake.
Udemy course : Real World Vagrant For Distributed Computing

.

I would then, on the virtual servers, deploy Cloudera Manager plus agents. Agents are the guys that will sit on all the slave nodes ready to deploy your Hadoop services
Udemy course : Real World Vagrant - Automate a Cloudera Manager Build

.

Then deploy the Hadoop services across your cluster (via the installed Cloudera Manager in the previous step). We look at the logic regarding the placement of master and slave services.
Udemy course : Real World Hadoop - Deploying Hadoop with Cloudera Manager

.

If you want to play around with HDFS commands (Hands on distributed file manipulation).
Udemy course : Real World Hadoop - Hands on Enterprise Distributed Storage.

.

You can also automate the deployment of the Hadoop services via Python (using the Cloudera Manager Python API). But this is an advanced step and thus I would make sure that you understand how to manually deploy the Hadoop services first.
Udemy course : Real World Hadoop - Automating Hadoop install with Python!

.

There is also the upgrade step. Once you have a running cluster, how do you upgrade to a newer hadoop cluster (Both for Cloudera Manager and the Hadoop Services).
Udemy course : Real World Hadoop - Upgrade Cloudera and Hadoop hands on

Who is the target audience?
  • Software engineers who want to expand their skills into the world of distributed computing
  • System Engineers that want to expand their skillsets beyond the single server
  • Developers who want to write/test their code against a valid distributed enviroment
Students Who Viewed This Course Also Viewed
Curriculum For This Course
21 Lectures
03:48:34
+
Rational
2 Lectures 12:51

Even though we can automate the installation of Cloudera or Ambari components, how can we automate the installation of the Hadoop Services themselves? This course looks at automating Hadoop Services.

Preview 12:51

Suggested course curriculum to follow ...

Preview 00:00
+
Anaconda (Python) Setup within Vagrant
6 Lectures 01:14:51
Course Notes and Resources
00:31

Walking over the Cloudera / Hadoop Cluster Topology that we will be working against.

Walking over the Cloudera / Hadoop Cluster Topology
13:34

Part ! - Booting up the Virtual Machines and installing Hadoop

Part I - Installing Hadoop
11:53

Part II - Booting up the Virtual Machines and installing Hadoop

Part II - Installing Hadoop
19:49

Here, we install Anaconda.

NOTE : Within the vagrant script we use the "else" clause. It is more correct to use the "elsif" clause and so this has been reflected within the resources file.

Automating the Installation of Anaconda
19:25

Destroy our Hadoop Cluster. Python will now be taking over ...

Destroy our Hadoop Cluster. Python will now be taking over ...
09:39
+
Python - Automate Deployment of Cloudera Management Services
5 Lectures 51:30

Here we connect to Cloudera Manager via Python. In addition, we make some basic changes to some Coudera Manager GUI elements as well as configure some Administration settings.

Preview 12:35

Via our Cloudera Manager API object, we obtain a handle to a newly created Cloudera Manager Services Container. Via this container, we will start to manipulate the Cloudera Manager Services.

Python - Obtain a handle to a new Clouderea Manager Services Container
16:14

Here we log onto the Cloudera Manager Virtual Machine in order to acquire the database credentials needed to configure our Cloudera Manager Services. Some of these services place entries within the database. This step is NOT for the Hadoop Cluster. Just for Cloudera Manager.

Python, Acquire database credentials in order to configure the CM Services
11:14

Python - Deploy and start the Cloudera Manager Services

Python - Deploy and start the Cloudera Manager Services
09:10

Python - Verify the Cloudera Manager Services

Python - Verify the Cloudera Manager Services
02:17
+
Python Automate Deployment of the Hadoop Cluster
7 Lectures 01:20:54
Python - Create Hadoop Cluster Container and deploy parcels
Python - Create Hadoop Cluster Container and deploy parcels
11:17

Complete Cloudera Parcel setup - Deploying the Packaged Hadoop Binaries
Complete Cloudera Parcel setup - Deploying the Packaged Hadoop Binaries
08:58

Creating directories in advance

Creating directories in advance
00:34

Python - Configure and Install a Multinode ZooKeeper Cluster.
Python - Configure and Install a Multinode ZooKeeper Cluster.
15:10

Python - Configure and Install a Multinode HDFS Cluster.
Python - Configure and Install a Multinode HDFS Cluster.
19:04

Python - Configure and Install a Multinode YARN Cluster.

Python - Configure and Install a Multinode YARN Cluster.
19:36

Python - Configure and Install a Multinode SPARK Cluster.

Python - Configure and Install a Multinode SPARK Cluster.
06:15
+
Conclusion
1 Lecture 08:29
Conclusion
Conclusion
08:29
About the Instructor
Toyin Akin
3.9 Average rating
135 Reviews
1,369 Students
15 Courses
Big Data Engineer, Capital Markets FinTech Developer

I spent 6 years at "Royal Bank of Scotland" and 5 years at the investment bank "BNP Paribas"  developing and managing Interest Rate Derivatives services as well as engineering and deploying In Memory DataBases (Oracle Coherence), NoSQL and Hadoop clusters (Cloudera) into production.

In 2016, I left to start my own training, POC-D. "Proof Of Concept - Delivered", which focuses on delivering training on IMDB (In Memory Database), NoSQL, BigData and DevOps technology. 

From Q3 2017, this will also include FinTech Training in Capital Markets using Microsoft Excel (Windows), JVM languages (Java/Scala) as well as .NET (C#, VB.NET, C++/CLI, F# and IronPythyon)

I have a YouTube Channel, publishing snippets of my videos. These are not courses. Simply ad-hoc videos discussing various distributed computing ideas.

Check out my website and/or YouTube for more info

See you inside ...