Setup lab for Hadoop and Spark using cloud
4.4 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
60 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Setup lab for Hadoop and Spark using cloud to your Wishlist.

Add to Wishlist

Setup lab for Hadoop and Spark using cloud

This is to setup lab to practice Hadoop and Spark development skills.
4.4 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
60 students enrolled
Last updated 7/2016
English
Curiosity Sale
Current price: $10 Original price: $20 Discount: 50% off
30-Day Money-Back Guarantee
Includes:
  • 2 hours on-demand video
  • 1 Article
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • At the end of the course students will be able to set up single node cloudera lab on public cloud such as AWS to practice developing application on Hadoop and Spark.
View Curriculum
Requirements
  • There is no pre-requisites to take up this training. Any IT professional can take this training.
Description

Following are the highlights of the course

  • To understand the set up process of Cloudera Distribution of Hadoop on cloud
  • Audience can use Hadoop, Spark, Hive, Pig, Setup lab etc to find this course
  • Videos as well as code snippets are provided
  • It will take approximately 3 hours to complete the course
  • Course is structured to understand about leveraging the cloud to learn hadoop
  • To have hands on lab for practicing Hadoop and Spark

Learning Big Data in a proper environment is very important. To set up lab using Cloudera Quickstart VM or Hortonworks Sandbox on PC requires good configuration. Lot of guys might not be able to afford brand new laptop of 16 GB. Alternative to using PC, is to set up lab in Cloud. As part of this course, I will provide step by step instructions about how to set up reliable Big Data lab in cloud. In future I am planning to come up with many Big Data courses and this will be foundation for it.

Who is the target audience?
  • Any one who want to have hands on practice on developing applications using technologies such as Hadoop, Spark, Hive, Pig, Sqoop, Oozie etc.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
12 Lectures
02:07:32
+
Setup lab for Hadoop and Spark using cloud
5 Lectures 38:02

Introduction

This course covers how to set up single node lab on cloud (for eg: AWS).

  • Signup to the cloud
  • Provision ec2 instance from AWS
  • Setup MySQL Database
  • Setup Pre-requisites (OS level) for Hadoop
  • Install Cloudera Manager
  • Install Cloudera Distribution of Hadoop
  • Validate HDFS, YARN+MR2
  • Validate Hive, Pig, Sqoop etc
  • Setup retail_db database (for sqoop)
  • Setup gen_logs (for streaming)
Preview 03:25

Here are the steps that need to be followed:

  • Sign up for AWS account
  • Choose N. Virginia region on top right corner
  • Click on Launch Instance
  • Go to community AMI and search for ami-8997afe0
  • Choose m3.xlarge type
  • Create VPC
  • Add Storage - 50 GB
  • Launch instance and create keypair
  • Make sure to download pem file and changed the permissions to read only on Linux or Mac OS
  • For Windows, pem file need to be imported to putty
  • Make sure the server is up and running
Preview 18:00

For this ami, even though we provision 50 GB default root file system will be allocated only with 8 GB. It can be increased by running "resize2fs /dev/xvde" command

Resize root file system
03:34

Following are the steps to install mysql and set up mysql databases:

  • yum -y install mysql-server
  • service mysqld start
  • chkconfig mysqld on
  • mysql -u root
  • mysql cli will be launched, create rman, hive and oozie databases
  • create database rman;
  • create user rman identified by 'rman';
  • create database hive;
  • grant all on rman.* to rman;
  • create user hive identified by 'hive';
  • grant all on hive.* to hive;
  • create database oozie;
  • create user oozie identified by 'oozie';
  • grant all on oozie.* to oozie;
Setup MySQL Database
05:23

Copy below code to prepareNode.sh file on the cloud instance

chmod +x prepareNode.sh

Run ./prepareNode.sh

Here is the code to be copied and run:

echo "****************************"
echo "Starting Prepare Host"
echo "****************************"

#set umask
echo -e "\nSetting Umask to 022 in .bashrc"
umask 022
echo "umask 022" >> ~/.bashrc

#disable SELinux
echo -e "\nDisabling SELinux"
sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

#Turn on NTPD
echo "Setting up NTPD and syncing time"
#Need to add a check to see if NTPD is installed.  If not install it
sudo yum -y install ntp
sudo chkconfig ntpd on
sudo ntpd -q
sudo service ntpd start

# Turn off autostart of iptables and ip6tables
echo -e "\nChecking ipTables and ip6table are off"
sudo service iptables stop
sudo chkconfig iptables off
sudo service ip6tables stop
sudo chkconfig ip6tables off

#Set Swapiness
echo -e "\nSetting Swapiness to 0"
echo 0 | sudo tee /proc/sys/vm/swappiness
echo vm.swappiness = 0 | sudo tee -a /etc/sysctl.conf

#Turn on NSCD
#echo -e "\nTurning on NSCD"
#chkconfig --level 345 ncsd on
#ncsd -g

#Set File Handle Limits
echo -e "\nSetting File Handle Limits"
sudo -- sh -c 'echo hdfs – nofile 32768 >> /etc/security/limits.conf'
sudo -- sh -c 'echo mapred – nofile 32768 >> /etc/security/limits.conf'
sudo -- sh -c 'echo hbase – nofile 32768 >> /etc/security/limits.conf'
sudo -- sh -c 'echo hdfs – nproc 32768 >> /etc/security/limits.conf'
sudo -- sh -c 'echo mapred – nproc 32768 >> /etc/security/limits.conf'
sudo -- sh -c 'echo hbase – nproc 32768 >> /etc/security/limits.conf'

echo -e "\n****************************"
echo "Prepare Nodes COMPLETE!"
echo "****************************"


Setup Pre-requisites for Hadoop
07:40
+
Setup Cloudera Distribution of Hadoop
2 Lectures 28:29

Following are the steps to download and install cloudera manager:

  • Login to the host using ssh command or putty
  • wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
  • chmod +x cloudera-manager-installer.bin
  • ./cloudera-manager-installer.bin
  • Once installed validate by using public dns which starts with http://ec2-xx-xx-xxx-xx.compute-1.amazonaws.com:7180 in the browser
  • Login using admin/admin for username/password
Install Cloudera Manager
08:23

Following are the steps to setup Cloudera distribution of hadoop

  • Launch browser
  • Enter http://ec2-xx-xx-xxx-xx.compute-1.amazonaws.com:7180 into address bar and hit go
  • Login using admin/admin for username/password
  • Accept enterprise trial for 60 days
  • Follow video to set up the cluster
  • Make sure you change database connectivity to mysql database and also enter correct usernames and passwords used while setting up rman, hive and oozie databases earlier.


Install Cloudera Distribution of Hadoop
20:06
+
Validate cluster setup
2 Lectures 41:04
Validate cluster setup - HDFS and YARN+MR2
21:51

Validate cluster setup - Hive, Pig, Sqoop and Spark
19:13
+
Setup additional tools
3 Lectures 19:57

Following are the steps to set up retail_db

  • yum -y install git
  • git clone https://github.com/dgadiraju/code.git (to download code repository for sql script)
  • cp code/hadoop/edw/database/retail_db.sql .
  • rm -rf code (remove code repository)
  • Login to mysql using "mysql -u root -p"
  • create database retail_db;
  • create user retail_dba identified by 'itversity';
  • grant all on retail_db.* to retail_dba;
  • exit
  • mysql -u retail_dba -p
  • use retail_db;
  • source retail_db.sql
  • Run "select * from departments;" to validate tables are successfully created and data is loaded as expected
  • Run "show tables;", it should return departments, categories, products, order_items, orders and customers tables.


Setup environment for practice - Add retail_db
10:01

Here are the steps to setup gen_logs

  • yum -y install git
  • git clone https://github.com/dgadiraju/code.git (to download code repository for sql script)
  • cp -rf code/hadoop/edw/scripts/gen_logs /opt
  • rm -rf code (remove code repository)
  • ln -s /opt/gen_logs/start_logs.sh /usr/bin/start_logs.sh
  • ln -s /opt/gen_logs/stop_logs.sh /usr/bin/stop_logs.sh
  • ln -s /opt/gen_logs/tail_logs.sh /usr/bin/tail_logs.sh
  • start_logs.sh --to start generating web logs
  • tail_logs.sh --to preview while logs are being generated (Hit ctrl-c to come out)
  • stop_logs.sh --to stop generating web logs
Setup environment for practice - Add gen_logs to generate streaming logs
09:41

By the time you finish this course with hands on practice, you will have single node lab on cloud such as to practice Hadoop and Spark and also to prepare for most of the Hadoop and Spark developer certifications.

Conclusion
00:15
About the Instructor
Durga Viswanatha Raju Gadiraju
4.5 Average rating
149 Reviews
8,026 Students
4 Courses
Technology Adviser and Evangelist

13+ years of experience in executing complex projects using vast array of technologies including Big Data and Cloud.

I found itversity, llc - a US based startup to provide quality training for IT professionals and staffing as well as consulting solutions for enterprise clients. I have trained thousands of IT professionals in vast array of technologies including Big Data and Cloud.

Building IT career for people and provide quality services to the clients will be paramount to our organization.

As an entry strategy itversity will be providing quality training in the areas of ABCD

* Application Development
* Big Data and Business Intelligence
* Cloud
* Datawarehousing, Databases