Practical Guide to setup Hadoop and Spark Cluster using CDH
What you'll learn
- Learn Hadoop and Spark Administration using CDH
- Provision Cluster from GCP (Google Cloud Platform) to setup Hadoop and Spark Cluster using CDH
- Setup Ansible for server automation to setup pre-requisites to setup Hadoop and Spark Cluster using CDH
- Setup 8 node cluster from scratch using CDH
- Understand Architecture of HDFS, YARN, Spark, Hive, Hue and many more
- Basic Linux Skills
- A 64 bit computer with minimum of 4 GB RAM
- Operating System - Windows 10 or Mac or Linux Flavor
Cloudera is one of the leading vendor for distributions related to Hadoop and Spark. As part of this Practical Guide, you will learn step by step process of setting up Hadoop and Spark Cluster using CDH.
Install - Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.
Set up a local CDH repository
Perform OS-level configuration for Hadoop installation
Install Cloudera Manager server and agents
Install CDH using Cloudera Manager
Add a new node to an existing cluster
Add a service using Cloudera Manager
Configure - Perform basic and advanced configuration needed to effectively administer a Hadoop cluster
Configure a service using Cloudera Manager
Create an HDFS user's home directory
Configure NameNode HA
Configure ResourceManager HA
Configure proxy for Hiveserver2/Impala
Manage - Maintain and modify the cluster to support day-to-day operations in the enterprise
Rebalance the cluster
Set up alerting for excessive disk fill
Define and install a rack topology script
Install new type of I/O compression library in cluster
Revise YARN resource assignment based on user feedback
Commission/decommission a node
Secure - Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices
Configure HDFS ACLs
Install and configure Sentry
Configure Hue user authorization and authentication
Enable/configure log and query redaction
Create encrypted zones in HDFS
Test - Benchmark the cluster operational metrics, test system configuration for operation and efficiency
Execute file system commands via HTTPFS
Efficiently copy data within a cluster/between clusters
Create/restore a snapshot of an HDFS directory
Get/set ACLs for a file or directory structure
Benchmark the cluster (I/O, CPU, network)
Troubleshoot - Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios
Resolve errors/warnings in Cloudera Manager
Resolve performance problems/errors in cluster operation
Determine reason for application failure
Configure the Fair Scheduler to resolve application delays
You will start with creating Cloudera QuickStart VM (in case you have laptop with 16 GB RAM with Quad Core). This will facilitate you to get comfortable with Cloudera Manager.
You will be able to sign up for GCP and avail credit up to $300 while offer lasts. Credits are valid up to year.
You will then understand brief overview about GCP and provision 7 to 8 Virtual Machines using templates. You will also attaching external hard drive to configure for HDFS later.
Once servers are provisioned, you will go ahead and set up Ansible for Server Automation.
You will take care of local repository for Cloudera Manager and Cloudera Distribution of Hadoop using Packages.
You will then setup Cloudera Manager with custom database and then Cloudera Distribution of Hadoop using Wizard that comes as part of Cloudera Manager.
As part of setting up of Cloudera Distribution of Hadoop you will setup HDFS, learn HDFS Commands, Setup YARN, Configure HDFS and YARN High Availability, Understand about Schedulers, Setup Spark, Transition to Parcels, Setup Hive and Impala, Setup HBase and Kafka etc.
Who this course is for:
- System Administrators who want to understand Big Data eco system and setup clusters
- Experienced Big Data Administrators who want to learn how to manage Hadoop and Spark Clusters setup using CDH
- Entry level professionals who want to learn basics and Setup Big Data Clusters
20+ years of experience in executing complex projects using a vast array of technologies including Big Data and the Cloud.
ITVersity, Inc. - is a US-based organization that provides quality training for IT professionals and we have a track record of training hundreds of thousands of professionals globally.
Building an IT career for people with required tools such as high-quality material, labs, live support, etc to upskill and cross-skill is paramount for our organization.
At this time our training offerings are focused on the following areas:
* Application Development using Python and SQL
* Big Data and Business Intelligence
* Datawarehousing, Databases