Real World Vagrant - Build an Apache Spark Development Env!
4.1 (13 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
290 students enrolled

Real World Vagrant - Build an Apache Spark Development Env!

With a single command, build an IDE, Scala and Spark (1.6.2 or 2.0.1) Development Environment! Run in under 3 minutes!!
4.1 (13 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
290 students enrolled
Created by Toyin Akin
Last updated 1/2017
Current price: $62.99 Original price: $89.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 3 hours on-demand video
  • 2 articles
  • 5 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Simply run a single command on your desktop, go for a coffee, and come back with a running distributed environment for cluster deployment
  • Ability to automate the installation of software across multiple Virtual Machines
Course content
Expand all 15 lectures 03:13:03
+ Justification
2 lectures 14:01

For testing or playing around with Apache Spark. Don't contaminate your computer! Keep your PC or Mac clean by simply building and running a custom spark development environment within a Virtual Machine.

Preview 14:01

Suggested Spark Udemy curriculum courses to follow ...

Preview 00:00
+ First Steps
4 lectures 48:10
Base Vagrant file

Here we make sure that we are all on the same page. We boot up a vanilla virtual machine. Good grasp of basic Vagrant commands are essential.

Quick Overview of your Vagrant Environment

Here I show you have to simply modify your Vagrant file to switch to a graphical Centos (RHEL) Linux Virtual Machine. We will navigate briefly within this Virtual Machine. This is the image we will be configuring. Amazing that you have a Graphical O/S in under 1.2GB!

Boot up a Vanilla Centos Desktop on your own Desktop or Laptop

Here we tune the Development Virtual Machine by configuring the clock (sync with external servers), hostname, firewall and some O/S low level settings.

Tune the Virtual Machine
+ Automation Steps to Build Your Spark Development Environment
6 lectures 01:26:57

Java has already been configured to be installed. Here we add maven to the picture.

Automate the Installation of Maven. The Java Build Tool

Here, the Scala programming language will be installed. 

Automate the installation of Scala. It's like the Java that should have been!

At last! We install the Eclipse tool for Scala and show that the same development environment that you know and love on Windows/Mac, is that same on Centos. Oh yes, you can still code in Java!

Automate the Installation of ScalaIDE. Eclipse for Scala!

Scala has already been configured to be installed. Here we add sbt to the picture.

Automate the Installation of sbt. The Scala Build Tool

We go through downloading the Spark binaries and adding the automation and configuration of Spark within Vagrant.

Automate the Installation of Spark. You get to choose. 1.6.2 or 2.0.1

Here we add a working Spark example. This example uses a combination of tools. sbt, sbt-eclipse, ScalaIDE and Spark.

Automate the Installation of a Spark 1.6.2 Example.
+ Build our Virtual Machine Image
1 lecture 17:51

Here we perform some cosmetic changes. Change keyboard layout as well as create a link to the ScalaIDE and then generate our final development Virtual Machine. All the hard work above will be contained in a ~ 2.3GB vagrant box file.

Build the Final Development Environment
+ Conclusion
2 lectures 26:03

Hard to believe... This new environment will now boot up in under 2.5 minutes (On my machine anyway!) We also execute the Spark example within the ScalaIDE. Ensuring everything works. You can now give this final box and vagrant file to your colleague and they can have a Spark Environment up and running in under 2.5 minutes.

Preview 19:39
  • Basic programming or scripting experience is required.
  • You will need a desktop PC and an Internet connection. The course is created with Windows in mind.
  • The software needed for this course is freely available
  • This Course is based on top of my previous course - "Real World Vagrant For Distributed Computing"
  • You will require a computer with a Virtualization chipset support - VT-x. Most computers purchased over the last five years should be good enough
  • Optional : Some exposure to Linux and/or Bash shell environment
  • 64-bit Windows operating system required (Would recommend Windows 7 or above)
  • This course is not recommened if you have no desire to work with/in distributed computing

Note : This course is built on top of the "Real World Vagrant For Distributed Computing - Toyin Akin" course

This course enables you to package a complete Spark Development environment into your own custom 2.3GB vagrant box.

Once built you no longer need to manipulate your Windows machine in order to get a fully fledged Spark environment to work. With the final solution, you can boot up a complete Apache Spark environment in under 3 minutes!!

Install any version of Spark you prefer. We have codified for 1.6.2 or 2.0.1. but it's pretty easy to extend this for a new version.

Why Apache Spark ...

Apache Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.
Apache Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.
Apache Spark can combine SQL, streaming, and complex analytics.

Apache Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Who this course is for:
  • Software engineers who want to expand their skills into the world of distributed computing
  • Developers who want to write/test their code against Scala / Spark