Real World Vagrant - Build an Apache Spark Development Env!
What you'll learn
- Simply run a single command on your desktop, go for a coffee, and come back with a running distributed environment for cluster deployment
- Ability to automate the installation of software across multiple Virtual Machines
Course content
- Preview14:01
- Preview00:00
Requirements
- Basic programming or scripting experience is required.
- You will need a desktop PC and an Internet connection. The course is created with Windows in mind.
- The software needed for this course is freely available
- This Course is based on top of my previous course - "Real World Vagrant For Distributed Computing"
- You will require a computer with a Virtualization chipset support - VT-x. Most computers purchased over the last five years should be good enough
- Optional : Some exposure to Linux and/or Bash shell environment
- 64-bit Windows operating system required (Would recommend Windows 7 or above)
- This course is not recommened if you have no desire to work with/in distributed computing
Description
Note : This course is built on top of the "Real World Vagrant For Distributed Computing - Toyin Akin" course
This course enables you to package a complete Spark Development environment into your own custom 2.3GB vagrant box.
Once built you no longer need to manipulate your Windows machine in order to get a fully fledged Spark environment to work. With the final solution, you can boot up a complete Apache Spark environment in under 3 minutes!!
Install any version of Spark you prefer. We have codified for 1.6.2 or 2.0.1. but it's pretty easy to extend this for a new version.
Why Apache Spark ...
Apache Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.
Apache Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.
Apache Spark can combine SQL, streaming, and complex analytics.
Apache Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.
Who this course is for:
- Software engineers who want to expand their skills into the world of distributed computing
- Developers who want to write/test their code against Scala / Spark
Instructor
I spent 6 years at "Royal Bank of Scotland" and 5 years at the investment bank "BNP Paribas" developing and managing Interest Rate Derivatives services as well as engineering and deploying In Memory DataBases (Oracle Coherence), NoSQL and Hadoop clusters (Cloudera) into production.
In 2016, I left to start my own training, POC-D. "Proof Of Concept - Delivered", which focuses on delivering training on IMDB (In Memory Database), NoSQL, BigData and DevOps technology.
From Q3 2017, this will also include FinTech Training in Capital Markets using Microsoft Excel (Windows), JVM languages (Java/Scala) as well as .NET (C#, VB.NET, C++/CLI, F# and IronPythyon)
I have a YouTube Channel, publishing snippets of my videos. These are not courses. Simply ad-hoc videos discussing various distributed computing ideas.
Check out my website and/or YouTube for more info
See you inside ...