Real World Spark 2 - Spark Core Overview
3.4 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
79 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Real World Spark 2 - Spark Core Overview to your Wishlist.

Add to Wishlist

Real World Spark 2 - Spark Core Overview

Why you should take a look at Spark 2. The easiest, open source, modern cluster computation engine to write code against
3.4 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
79 students enrolled
Created by Toyin Akin
Last updated 12/2016
English
Curiosity Sale
Current price: $10 Original price: $90 Discount: 89% off
30-Day Money-Back Guarantee
Includes:
  • 2.5 hours on-demand video
  • 2 Articles
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Grasp why "Spark" alongside Scala, Python, Java or R is a perfect combination for distributed computing
  • Understand some of the common ways of interacting with Spark
  • Take a sneak peak of Spark in action
View Curriculum
Requirements
  • Nice to have some development background
Description

Why Apache Spark ...

Apache Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Apache Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Apache Spark can combine SQL, streaming, and complex analytics.

Apache Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Spark Overview

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Jupyter Notebook

Jupyter Notebook is a system similar to Mathematica that allows you to create "executable documents". Notebooks integrate formatted text (Markdown), executable code (Scala),

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

The Jupyter Notebook is based on a set of open standards for interactive computing. Think HTML and CSS for interactive computing on the web. These open standards can be leveraged by third party developers to build customized applications with embedded interactive computing.

Spark shell

Spark’s shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. It is available in either Scala (which runs on the Java VM and is thus a good way to use existing Java libraries) or Python.

ScalaIDE

Scala IDE provides advanced editing and debugging support for the development of pure Scala and mixed Scala-Java applications.

Spark Monitoring and Instrumentation

Every SparkContext launches a web UI, by default on port 4040, that displays useful information about the application. This includes:

A list of scheduler stages and tasks A summary of RDD sizes and memory usage Environmental information. Information about the running executors

Who is the target audience?
  • Software engineers who want to expand their skills into the world of distributed computing
  • Application Developers, Data Scientists, Statisticians, Hadoop Administratiors
  • Developers who want to write and develop distributed systems
Students Who Viewed This Course Also Viewed
Curriculum For This Course
18 Lectures
02:34:10
+
Why Spark
4 Lectures 22:58

Suggested Spark Udemy curriculum courses to follow. You do not need to
take/purchase the first two courses

Preview 00:00


Why Spark - For Developers
04:10
+
Author and Compensation
2 Lectures 18:37

My experience within the Enterprise

​My experience​ within the Enterprise
11:28

Spark job compensation for those in this field.

​Spark job compensation for those in this field.
07:09
+
Spark Overview - Open Source Distributed Computing
6 Lectures 31:15
Spark Concepts
06:53

Why not IBM SPSS ?
05:26

Why not SAS - an awesome analytical computation platform?
07:35

Spark Deployment modes
06:21

Cloudera, Hortonworks, MapR
04:48

Examples of Spark Environments
00:12
+
Spark Environments for Interactive Analysis
2 Lectures 23:51
Spark Shell
10:56

PySpark
12:55
+
Spark Environments for Developers
1 Lecture 09:53
ScalaIDE
09:53
+
Spark Environments for Data Scientists - Publish your reports for distribution.
2 Lectures 29:21
Jupyter Python
12:33

Jupyter Scala
16:48
+
Conclusion
1 Lecture 18:15
Enterprise Deployments
18:15
About the Instructor
Toyin Akin
3.8 Average rating
135 Reviews
1,374 Students
15 Courses
Big Data Engineer, Capital Markets FinTech Developer

I spent 6 years at "Royal Bank of Scotland" and 5 years at the investment bank "BNP Paribas"  developing and managing Interest Rate Derivatives services as well as engineering and deploying In Memory DataBases (Oracle Coherence), NoSQL and Hadoop clusters (Cloudera) into production.

In 2016, I left to start my own training, POC-D. "Proof Of Concept - Delivered", which focuses on delivering training on IMDB (In Memory Database), NoSQL, BigData and DevOps technology. 

From Q3 2017, this will also include FinTech Training in Capital Markets using Microsoft Excel (Windows), JVM languages (Java/Scala) as well as .NET (C#, VB.NET, C++/CLI, F# and IronPythyon)

I have a YouTube Channel, publishing snippets of my videos. These are not courses. Simply ad-hoc videos discussing various distributed computing ideas.

Check out my website and/or YouTube for more info

See you inside ...