Setup Big Data Development Environment
4.1 (265 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
18,052 students enrolled

Setup Big Data Development Environment

Setup Big Data Development Environment for free on Mac or Windows
4.1 (265 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
18,052 students enrolled
Last updated 3/2018
English
English [Auto-generated]
Price: Free
This course includes
  • 6.5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What you'll learn
  • Understand how to setup development environment to learn big data technologies.
Requirements
  • Students need to have modern laptop with 64 bit OS and at least 16 GB RAM
Description

Big Data is open source and there are many technologies one need to learn to be proficient in Big Data eco system tools such as Hadoop, Spark, Hive, Pig, Sqoop etc. This course will cover how to set up development environment on personal computer or laptop using distributions such as Cloudera or Hortonworks. Both Cloudera and Hortonworks provide virtual machine image which contain all Big Data eco system tools packaged. This free course will provide 

  • Comparison of Virtualization software such as Virtualbox and VMWare
  • Step by step instructions to set up virtualization software such as virtualbox or VMWare
  • Choosing Cloudera or Hortonworks image
  • Step by step instructions to set up VM using chosen image
  • Setup necessary additional components such as MySQL database and log generation tool
  • Review HDFS, Map Reduce, Sqoop, Pig, Hive, Spark etc
Who this course is for:
  • Any one who want to learn multiple technologies in Big Data eco system. They need to have basic programming skills.
Course content
Expand all 51 lectures 06:19:04
+ Introduction
6 lectures 26:37
Getting Started
04:26
Overview of Big Data sandboxes or virtual machine images
05:11
Pre-requisites
03:24
Choosing Virtualization Software (very important)
05:52
Installing VMWare Fusion on Mac
03:34
Installing Oracle VirtualBox on Mac
04:10
+ Cloudera Quickstart VM on VMWare Fusion
2 lectures 22:24
Setup Cloudera Quickstart VM - VMWare image
10:16
Review retail_db and gen_logs in Cloudera Quickstart VM
12:08
+ Cloudera Quickstart VM on Virtual Box
3 lectures 33:38
Download Cloudera Quickstart VM for Virtualbox
03:35
Setup Cloudera Quickstart VM for Virtualbox
17:55
Review retail_db and gen_logs in Cloudera Quickstart VM
12:08
+ Hortonworks Sandbox on VMWare Fusion
3 lectures 28:52
Setup Hortonworks Sandbox on VMWare - Mac
12:32
Setup MySQL Database - retail_db
10:08
Setup gen_logs application to generate logs
06:12
+ Hortonworks Sandbox on Virtual Box
4 lectures 33:36
Setup Hortonworks Sandbox on Virtual Box
12:33
Reset admin password
04:43
Setup MySQL Database - retail_db
10:08
Setup gen_logs application to generate logs
06:12
+ Setup Eclipse IDE for Map Reduce
9 lectures 01:06:22

As part of this topic we will see how to setup and validate IDE to develop map reduce applications.

  • Pre-requisites
  • Download Eclipse with Maven plugin
  • Install Eclipse with Maven plugin
  • Create java application using Maven Project
  • Run the default program of simple java application
Setup Eclipse with Maven Plugin - Introduction
02:11

Following are the installation steps to setup eclipse with maven plugin

  • Make sure Java 1.7 or later installed (1.8 recommended)
  • Setup Eclipse
  • Setup Maven
  • STS (Spring Tool Suite) comes with eclipse with maven plugin
  • We recommend STS
  • If you already have eclipse, just add maven plugin from marketplace
Setup Eclipse with Maven Plugin
08:07

This class will facilitate to create simple maven project with eclipse.

  • Open eclipse with maven plugin (STS)
  • For first time, create new workspace simpleapps
  • File -> New -> Maven Project
  • Give artifact id, group id. Make sure you give correct package name.
  • It will create Maven project with App.java
  • Run application and validate


Create java application using Maven Project
08:49

This is introduction to develop word count program using Hadoop map reduce using java using eclipse.

  • Create new workspace and new maven project
  • Updating pom file with dependencies
  • Generate test data
  • Copy existing map reduce job for word count
  • Go to "Run Configurations" and add parameters
  • Run the program and validate the results
Develop word count program introduction
02:17

Following are the steps

  • Create new workspace directory bigdata-mr for all map reduce applications
  • Launch STS with new workspace directory
  • Create new maven project
    • groupId: org.itversity
    • artifactId: mr
    • Name: demomr
  • Open pom.xml
  • In pom.xml if <name> tag shows some thing else, make sure to replace it to demomr
  • Also rename the project name to demomr (from mr)
  • Define repositories in pom.xml(if necessary)
  • Define dependencies in pom.xml (see below)
  • Save and wait, so that maven downloads all the necessary packages
  • Make sure there are no failures
  • Develop wordcount program
    • Create package wordcount
    • Create java program WordCount in package wordcount
    • See the code below
Develop word count program
11:33

Following are the steps to run word count program

  • Make sure there are no errors
  • Generate test data as demonstrated
  • Pass input path and output path as arguments
  • Run the program
  • Go to output path
  • Validate files in output path
Run word count program
07:44

As part of this topic we will see how to download and configure sample github project covering map reduce apis

  • Understand resources available to learn map reduce apis in detail
  • Download sample github project
  • Import github project as maven project
  • Make sure there are no errors highlighted in eclipse
  • Run and validate github project
Setup github project - Introduction
05:59

Following are the steps to download and setup github project

  • Our project that is created earlier is named as demomr. Delete it from STS.
  • Go to github and download the repository or run git clone command
  • Make sure the downloaded directory is in right location
  • Open STS pointing to correct workspace
  • Import it as new project
  • Make sure there are no errors
Download and setup github project
12:36

Following are the steps to validate github project

  • Make sure there are no errors
  • Run word count program as demonstrated using Eclipse
  • Go to output directory and check whether files are created
  • Validate output files created
Validate github project
07:06
+ Setup Eclipse IDE for Scala and Spark
10 lectures 01:41:01

Even though we have virtual machine images from Cloudera and Hortonworks with all the necessary tools installed, it is good idea to set up development environment on our PC along with IDE. It require following to be installed to set up development environment for building scala based spark applications

  • Java
  • Scala
  • Sbt
  • Eclipse with Scala IDE

Here is the development life cycle

  • Develop using IDE (eg: Eclipse with Scala IDE)
  • Use code versioning tools such as SVN, github for team development
  • Use sbt to build jar file
  • Ship the jar file (application) to the remote servers
  • Schedule it through some scheduler 

As part of this topic,

  • we will see how scala and sbt are installed on the PC
  • Treat the virtual machines as test or production servers.
  • We will not be covering code versioning tools.

Following topics will be covered to setup scala and sbt

  • Make sure java is installed
  • Download and install scala binaries
  • Launch scala CLI/interpreter and run simple scala program
  • Download and install sbt
  • Write simple scala application
  • Build using sbt
  • Run the application using sbt
Setup scala and sbt - Introduction
04:28

Here are the instructions to setup scala

  • Download scala binaries
  • Install (untar) scala binaries
  • Update environment variable PATH
  • Launch scala interpreter/CLI and run simple scala program
  • Copy below code snippet and paste in scala interpreter/CLI
Setup and Validate Scala
14:19

Following are the steps to create simple scala application

  • Make sure you are in right directory
  • Create src/main/scala mkdir -p src/main/scala
  • create file hw.scala under src/main/scala
  • Paste above code, save and exit
  • Run using scala src/main/scala/hw.scala
Run simple scala application
05:50

Here are the instructions to setup sbt

  • Download sbt
  • Install sbt
  • Go to the directory where you have scala source code
  • Create build.sbt
  • Package and run using sbt
Setup sbt and run scala application
10:54

We are in the process of setting up development environment on our PC/Mac so that we can develop the modules assigned. Following tasks are completed.

  • Make sure Java is installed
  • Setup Eclipse (as part of Setup Eclipse with Maven)
  • Setup Scala
  • Setup sbt
  • Validate all the components

As part of this topic we will see

  • Installation of Scala IDE for Eclipse
  • Develop simple application using Scala IDE
  • Add eclipse plugin for sbt (sbteclipse)
  • Validate the integration of eclipse, scala and sbt 
Setup Scala IDE for Eclipse - Introduction
03:18

Before setting up Scala IDE let us understand the advantages of having IDE

  • The Scala IDE for Eclipse project lets you edit Scala code in Eclipse.
  • Syntax highlighting
  • Code completion
  • Debugging, and many other features
  • It makes Scala development in Eclipse a pleasure.

Steps to install Scala IDE for Eclipse

  • Launch Eclipse
  • Go to "Help" in top menu -> "Eclipse Marketplace"
  • Search for Scala IDE
  • Click on Install
  • Once installed restart Eclipse
  • Go to File -> New -> and see whether "Scala Application" is available or not.
Install Scala IDE for Eclipse
11:00

Now we will see how sbt can be integrated with Scala IDE for Eclipse

Steps to integrate sbt with Scala IDE for Eclipse

  • Check whether ~/.sbt/<version>/plugins/plugins.sbt exists, if not
  • Create plugins directory under ~/.sbt/<version> mkdir -p ~/.sbt/0.13/plugins
  • cd ~/.sbt/0.13/plugins
  • Add addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0") to plugins.sbt

Advantages of integrating sbt with eclipse

  • Ability to generate eclipse projects
  • Build deployable jar file
  • Works well with scala

There are 2 ways to create jar file with sbt and Scala IDE for scala

  • First create project in Eclipse and then integrate with sbt
    • Go to project diretory
    • Create build.sbt and add name, version and scalaVersion
    • Run sbt package
    • Now jar file will be created
  • Use sbt to generate eclipse project
    • Go to working directory and create directory for project
    • Run mkdir -p /src/main/scala
    • Run sbt eclipse
    • Now project layout is ready
    • Use Scala for IDE to import the project
    • Create code using Scala for IDE
    • Test it and build it using sbt package
    • Now jar file will be created
Integrate sbt with Scala IDE for Eclipse
17:56

As part of this topic we will see 

  • Create sbt project for Spark using Scala
  • Integrate with Eclipse
  • Develop simple Spark application using Scala
  • Run it on the cluster

To perform this, we need

  • Hadoop cluster or Virtual Machine
  • Scala IDE for Eclipse
  • Integration of sbt with Scala IDE (sbteclipse plugin)
Develop Spark applications using Scala IDE - Introduction
01:37

Here we will see how to setup Spark Scala project using sbt

  • Create working directory for new project
  • Under working directory create src/main/scala directory structure
  • Create build.sbt with name, version, scalaVersion
  • Also update build.sbt with libraryDepencies for spark
    • libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.6.2"
  • Complete build.sbt for spark 1.6.2 and scala 2.11 is provided below
  • Run sbt eclipse
  • Run sbt package
Develop Spark applications using Scala IDE and sbt
15:16

As the program is successfully developed, we will see how we can run it on the cluster

  • Build jar file using sbt sbt package
  • Make sure you have the environment ready with VM or Cluster
  • If not, follow this to setup environment on PC or AWS
  • scp jar file to the target environment (VM in this case)
  • Run it on the remote cluster
Run Spark applications on cluster
16:23
+ Setup Development Environment for Scala and Spark using IntelliJ
14 lectures 01:06:34
Introduction
03:06
Setup Java and JDK
05:16
Install Scala with IntelliJ IDE
06:53
Develop Hello World Program using Scala
09:07
Setup sbt and run application HelloWorld
04:18
Add spark dependencies to the application
04:31
Setting up winutils.exe on Windows (64 bit)
04:37
Setup Data Sets - retail_db
03:16
Develop first spark application - Get revenue for each order from order_items
07:46
Build Jar file using sbt
02:07
Download and install Spark using 7z on Windows
04:07
Configure environment variables for Spark on Windows
02:12
Running spark job using spark-shell
03:03
Validating spark job from jar file using spark-submit
06:15