Setup Big Data Development Environment
4.5 (85 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
6,563 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Setup Big Data Development Environment to your Wishlist.

Add to Wishlist

Setup Big Data Development Environment

Setup Big Data Development Environment for free on Mac or Windows
4.5 (85 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
6,563 students enrolled
Last updated 8/2016
English
Price: Free
Includes:
  • 5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Understand how to setup development environment to learn big data technologies.
View Curriculum
Requirements
  • Students need to have modern laptop with 64 bit OS and at least 16 GB RAM
Description

Big Data is open source and there are many technologies one need to learn to be proficient in Big Data eco system tools such as Hadoop, Spark, Hive, Pig, Sqoop etc. This course will cover how to set up development environment on personal computer or laptop using distributions such as Cloudera or Hortonworks. Both Cloudera and Hortonworks provide virtual machine image which contain all Big Data eco system tools packaged. This free course will provide 

  • Comparison of Virtualization software such as Virtualbox and VMWare
  • Step by step instructions to set up virtualization software such as virtualbox or VMWare
  • Choosing Cloudera or Hortonworks image
  • Step by step instructions to set up VM using chosen image
  • Setup necessary additional components such as MySQL database and log generation tool
  • Review HDFS, Map Reduce, Sqoop, Pig, Hive, Spark etc
Who is the target audience?
  • Any one who want to learn multiple technologies in Big Data eco system. They need to have basic programming skills.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
37 Lectures
05:12:30
+
Introduction
6 Lectures 26:37
Getting Started
04:26

Overview of Big Data sandboxes or virtual machine images
05:11

Pre-requisites
03:24

Choosing Virtualization Software (very important)
05:52

Installing VMWare Fusion on Mac
03:34

Installing Oracle VirtualBox on Mac
04:10
+
Cloudera Quickstart VM on VMWare Fusion
2 Lectures 22:24
Setup Cloudera Quickstart VM - VMWare image
10:16

Review retail_db and gen_logs in Cloudera Quickstart VM
12:08
+
Cloudera Quickstart VM on Virtual Box
3 Lectures 33:38
Download Cloudera Quickstart VM for Virtualbox
03:35

Setup Cloudera Quickstart VM for Virtualbox
17:55

Review retail_db and gen_logs in Cloudera Quickstart VM
12:08
+
Hortonworks Sandbox on VMWare Fusion
3 Lectures 28:52
Setup Hortonworks Sandbox on VMWare - Mac
12:32

Setup MySQL Database - retail_db
10:08

Setup gen_logs application to generate logs
06:12
+
Hortonworks Sandbox on Virtual Box
4 Lectures 33:36
Setup Hortonworks Sandbox on Virtual Box
12:33

Reset admin password
04:43

Setup MySQL Database - retail_db
10:08

Setup gen_logs application to generate logs
06:12
+
Setup IDE for Map Reduce
9 Lectures 01:06:22

As part of this topic we will see how to setup and validate IDE to develop map reduce applications.

  • Pre-requisites
  • Download Eclipse with Maven plugin
  • Install Eclipse with Maven plugin
  • Create java application using Maven Project
  • Run the default program of simple java application
Setup Eclipse with Maven Plugin - Introduction
02:11

Following are the installation steps to setup eclipse with maven plugin

  • Make sure Java 1.7 or later installed (1.8 recommended)
  • Setup Eclipse
  • Setup Maven
  • STS (Spring Tool Suite) comes with eclipse with maven plugin
  • We recommend STS
  • If you already have eclipse, just add maven plugin from marketplace
Setup Eclipse with Maven Plugin
08:07

This class will facilitate to create simple maven project with eclipse.

  • Open eclipse with maven plugin (STS)
  • For first time, create new workspace simpleapps
  • File -> New -> Maven Project
  • Give artifact id, group id. Make sure you give correct package name.
  • It will create Maven project with App.java
  • Run application and validate


Create java application using Maven Project
08:49

This is introduction to develop word count program using Hadoop map reduce using java using eclipse.

  • Create new workspace and new maven project
  • Updating pom file with dependencies
  • Generate test data
  • Copy existing map reduce job for word count
  • Go to "Run Configurations" and add parameters
  • Run the program and validate the results
Develop word count program introduction
02:17

Following are the steps

  • Create new workspace directory bigdata-mr for all map reduce applications
  • Launch STS with new workspace directory
  • Create new maven project
    • groupId: org.itversity
    • artifactId: mr
    • Name: demomr
  • Open pom.xml
  • In pom.xml if <name> tag shows some thing else, make sure to replace it to demomr
  • Also rename the project name to demomr (from mr)
  • Define repositories in pom.xml(if necessary)
  • Define dependencies in pom.xml (see below)
  • Save and wait, so that maven downloads all the necessary packages
  • Make sure there are no failures
  • Develop wordcount program
    • Create package wordcount
    • Create java program WordCount in package wordcount
    • See the code below
Develop word count program
11:33

Following are the steps to run word count program

  • Make sure there are no errors
  • Generate test data as demonstrated
  • Pass input path and output path as arguments
  • Run the program
  • Go to output path
  • Validate files in output path
Run word count program
07:44

As part of this topic we will see how to download and configure sample github project covering map reduce apis

  • Understand resources available to learn map reduce apis in detail
  • Download sample github project
  • Import github project as maven project
  • Make sure there are no errors highlighted in eclipse
  • Run and validate github project
Setup github project - Introduction
05:59

Following are the steps to download and setup github project

  • Our project that is created earlier is named as demomr. Delete it from STS.
  • Go to github and download the repository or run git clone command
  • Make sure the downloaded directory is in right location
  • Open STS pointing to correct workspace
  • Import it as new project
  • Make sure there are no errors
Download and setup github project
12:36

Following are the steps to validate github project

  • Make sure there are no errors
  • Run word count program as demonstrated using Eclipse
  • Go to output directory and check whether files are created
  • Validate output files created
Validate github project
07:06
+
Setup IDE for Scala and Spark
10 Lectures 01:41:01

Even though we have virtual machine images from Cloudera and Hortonworks with all the necessary tools installed, it is good idea to set up development environment on our PC along with IDE. It require following to be installed to set up development environment for building scala based spark applications

  • Java
  • Scala
  • Sbt
  • Eclipse with Scala IDE

Here is the development life cycle

  • Develop using IDE (eg: Eclipse with Scala IDE)
  • Use code versioning tools such as SVN, github for team development
  • Use sbt to build jar file
  • Ship the jar file (application) to the remote servers
  • Schedule it through some scheduler 

As part of this topic,

  • we will see how scala and sbt are installed on the PC
  • Treat the virtual machines as test or production servers.
  • We will not be covering code versioning tools.

Following topics will be covered to setup scala and sbt

  • Make sure java is installed
  • Download and install scala binaries
  • Launch scala CLI/interpreter and run simple scala program
  • Download and install sbt
  • Write simple scala application
  • Build using sbt
  • Run the application using sbt
Setup scala and sbt - Introduction
04:28

Here are the instructions to setup scala

  • Download scala binaries
  • Install (untar) scala binaries
  • Update environment variable PATH
  • Launch scala interpreter/CLI and run simple scala program
  • Copy below code snippet and paste in scala interpreter/CLI
Setup and Validate Scala
14:19

Following are the steps to create simple scala application

  • Make sure you are in right directory
  • Create src/main/scala mkdir -p src/main/scala
  • create file hw.scala under src/main/scala
  • Paste above code, save and exit
  • Run using scala src/main/scala/hw.scala
Run simple scala application
05:50

Here are the instructions to setup sbt

  • Download sbt
  • Install sbt
  • Go to the directory where you have scala source code
  • Create build.sbt
  • Package and run using sbt
Setup sbt and run scala application
10:54

We are in the process of setting up development environment on our PC/Mac so that we can develop the modules assigned. Following tasks are completed.

  • Make sure Java is installed
  • Setup Eclipse (as part of Setup Eclipse with Maven)
  • Setup Scala
  • Setup sbt
  • Validate all the components

As part of this topic we will see

  • Installation of Scala IDE for Eclipse
  • Develop simple application using Scala IDE
  • Add eclipse plugin for sbt (sbteclipse)
  • Validate the integration of eclipse, scala and sbt 
Setup Scala IDE for Eclipse - Introduction
03:18

Before setting up Scala IDE let us understand the advantages of having IDE

  • The Scala IDE for Eclipse project lets you edit Scala code in Eclipse.
  • Syntax highlighting
  • Code completion
  • Debugging, and many other features
  • It makes Scala development in Eclipse a pleasure.

Steps to install Scala IDE for Eclipse

  • Launch Eclipse
  • Go to "Help" in top menu -> "Eclipse Marketplace"
  • Search for Scala IDE
  • Click on Install
  • Once installed restart Eclipse
  • Go to File -> New -> and see whether "Scala Application" is available or not.
Install Scala IDE for Eclipse
11:00

Now we will see how sbt can be integrated with Scala IDE for Eclipse

Steps to integrate sbt with Scala IDE for Eclipse

  • Check whether ~/.sbt/<version>/plugins/plugins.sbt exists, if not
  • Create plugins directory under ~/.sbt/<version> mkdir -p ~/.sbt/0.13/plugins
  • cd ~/.sbt/0.13/plugins
  • Add addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0") to plugins.sbt

Advantages of integrating sbt with eclipse

  • Ability to generate eclipse projects
  • Build deployable jar file
  • Works well with scala

There are 2 ways to create jar file with sbt and Scala IDE for scala

  • First create project in Eclipse and then integrate with sbt
    • Go to project diretory
    • Create build.sbt and add name, version and scalaVersion
    • Run sbt package
    • Now jar file will be created
  • Use sbt to generate eclipse project
    • Go to working directory and create directory for project
    • Run mkdir -p /src/main/scala
    • Run sbt eclipse
    • Now project layout is ready
    • Use Scala for IDE to import the project
    • Create code using Scala for IDE
    • Test it and build it using sbt package
    • Now jar file will be created
Integrate sbt with Scala IDE for Eclipse
17:56

As part of this topic we will see 

  • Create sbt project for Spark using Scala
  • Integrate with Eclipse
  • Develop simple Spark application using Scala
  • Run it on the cluster

To perform this, we need

  • Hadoop cluster or Virtual Machine
  • Scala IDE for Eclipse
  • Integration of sbt with Scala IDE (sbteclipse plugin)
Develop Spark applications using Scala IDE - Introduction
01:37

Here we will see how to setup Spark Scala project using sbt

  • Create working directory for new project
  • Under working directory create src/main/scala directory structure
  • Create build.sbt with name, version, scalaVersion
  • Also update build.sbt with libraryDepencies for spark
    • libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.6.2"
  • Complete build.sbt for spark 1.6.2 and scala 2.11 is provided below
  • Run sbt eclipse
  • Run sbt package
Develop Spark applications using Scala IDE and sbt
15:16

As the program is successfully developed, we will see how we can run it on the cluster

  • Build jar file using sbt sbt package
  • Make sure you have the environment ready with VM or Cluster
  • If not, follow this to setup environment on PC or AWS
  • scp jar file to the target environment (VM in this case)
  • Run it on the remote cluster
Run Spark applications on cluster
16:23
About the Instructor
Durga Viswanatha Raju Gadiraju
4.5 Average rating
147 Reviews
7,915 Students
4 Courses
Technology Adviser and Evangelist

13+ years of experience in executing complex projects using vast array of technologies including Big Data and Cloud.

I found itversity, llc - a US based startup to provide quality training for IT professionals and staffing as well as consulting solutions for enterprise clients. I have trained thousands of IT professionals in vast array of technologies including Big Data and Cloud.

Building IT career for people and provide quality services to the clients will be paramount to our organization.

As an entry strategy itversity will be providing quality training in the areas of ABCD

* Application Development
* Big Data and Business Intelligence
* Cloud
* Datawarehousing, Databases