Note : This course is built on top of the "Real World Vagrant - Build an Apache Spark Development Env! - Toyin Akin" course. So if you do not have a Spark + ScalaIDE environment already installed (within a VM or directly installed), you can take the stated course above.
Scala IDE provides advanced editing and debugging support for the development of pure Scala and mixed Scala-Java applications.
Now with a shiny Scala debugger, semantic highlight, more reliable JUnit test finder, an ecosystem of related plugins, and much more.
Scala Debugger. Stepping through closures and Scala-aware display of debugging information.
Spark Monitoring and Instrumentation
While creating RDDs, performing transformations and executing actions, you will be working heavily within the monitoring view of the Web UI.
Every SparkContext launches a web UI, by default on port 4040, that displays useful information about the application. This includes:
A list of scheduler stages and tasks
A summary of RDD sizes and memory usage
Information about the running executors
Why Apache Spark ...
Apache Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Apache Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Apache Spark can combine SQL, streaming, and complex analytics.
Apache Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.
Suggested Spark Udemy curriculum courses to follow. You do not need to
take/purchase the first three courses if you already have spark
Spark job compensation for those in this field.
Recommended Hardware for Spark and Hadoop labs ...
Resource files for the course
Walking through the Base Vagrant Spark Box
Upgrade and Package the Vagrant Box to Spark 2
Register the updated Vagrant Spark Box
Boot up and Walkthrough of Spark ScalaIDE Environment
Configure and Startup a Spark Environment for Distributed Computing
Scala Spark RDD, Transformations, Actions and Monitoring I
Scala Spark RDD, Transformations, Actions and Monitoring II
Scala Spark RDD, Transformations, Actions and Monitoring III
Scala Spark RDD, Transformations, Actions and Monitoring IV
Scala Spark RDD, Transformations, Actions and Monitoring V
Scala Spark RDD, Transformations, Actions and Monitoring VI
Scala Spark RDD, Transformations, Actions and Monitoring VII
Scala Spark RDD, Transformations, Actions and Monitoring VIII
I spent 6 years at "Royal Bank of Scotland" and 5 years at the investment bank "BNP Paribas" developing and managing Interest Rate Derivatives services as well as engineering and deploying In Memory DataBases (Oracle Coherence), NoSQL and Hadoop clusters (Cloudera) into production.
In 2016, I left to start my own training, POC-D. "Proof Of Concept - Delivered", which focuses on delivering training on IMDB (In Memory Database), NoSQL, BigData and DevOps technology.
From Q3 2017, this will also include FinTech Training in Capital Markets using Microsoft Excel (Windows), JVM languages (Java/Scala) as well as .NET (C#, VB.NET, C++/CLI, F# and IronPythyon)
I have a YouTube Channel, publishing snippets of my videos. These are not courses. Simply ad-hoc videos discussing various distributed computing ideas.
Check out my website and/or YouTube for more info
See you inside ...