
In this lecture we will go over the prerequisites and the structure of the course.
In this lecture we will explain how to get Spark running on your computer in less than 10 minutes.
In this lecture, we will compare Hadoop and Spark on three fundamental aspects - Storage, Computation, Computational Speed and Resource Management.
To understand a solution like Spark, we first need to understand the problems spark is going to solve. In this lesson we will talk about the pain points, challenges or inefficiens that Spark tries to solve in 2 different areas - iterative machine learning and interactive data mining.
If you plan to learn Spark or any technology you need to have a clear understanding of why that technology is better than the other similar technlogies in the ecosystem and not only you should know why the technology is better you should also understand how the technology is better and that is exactly the goal of this lesson.
The purpose of this lesson is to help you understand an important issue with in-memory distributed computation with big datasets, that is fault tolerance. This lesson will help you understand the need for RDD
In this lesson, we are going to go one level deep and understand what is RDD. Most aspiring Spark learners have looked at the RDD paper, but don't worry we will explain RDD with out asking you to refer to the RDD paper.
We see a lot of misconceptions when it comes to RDD especially with new Spark learners. We can’t let that happen to our students so we created this seperate lesson to address those misconceptions.
In this lesson we are going to calculate maximum volume of each stock symbol in our stocks dataset. Our goal in this chapter is to understand the types of operations we can do on the RDDs.
In this lesson we will explore what are the types of dependencies between RDDs. More importantly we will see why dependencies between RDDs are important to understand.
In this lesson we will learn how a logical plan in Spark gets converted to Physical task and finally ends up as jobs, stages and tasks in Spark.
In this lesson we will learn how a logical plan in Spark gets converted to Physical task and finally ends up as jobs, stages and tasks in Spark.
Spark's speciality is in-memory computing. In this lesson we will explore what is kept in memory and what is not and how Spark manages memory.
Most Spark learners don’t have a good grasp on fault tolerance and we don’t blame them because fault tolerance is an abstract concept and you can’t get a handle on it until you see things in action. So in this lesson we are going to do a full circle, we are going to demonstarted how fault tolreance works in Spark.
In this lesson, we will explore object oriented programming principles vs. functional programming principles. We will also see the need for a new programming language like Scala.
In this lesson we will learn about some important concepts and features in Scala. Don't worry we won't use the HelloWorld sample because it is not cool :-)
In this lesson we will explore functions in Scala. We will talk about anonymous and higher-order functions.
When our students asked us to create a course on Spark, we looked at other Spark related courses in the market and also what are some of the common questions students are asking in websites like stackoverflow and other forums when they try to learn Spark and we saw a recurring theme.
Most courses and other online help including Spark's documentation is not good in helping students understand the foundational concepts. They explain what is Spark, what is RDD, what is "this" and what is "that" but students were most interested in understanding core fundamentals and more importantly answer questions like -
and that is exactly what you will learn in this free Spark Starter Kit course. The aim of this course is to give you a strong foundation in Spark.