Learning Path: SMACK: Getting Started with the SMACK Stack
0.0 (0 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Learning Path: SMACK: Getting Started with the SMACK Stack to your Wishlist.

Add to Wishlist

Learning Path: SMACK: Getting Started with the SMACK Stack

Build scalable and efficient data processing platforms
0.0 (0 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2 students enrolled
Created by Packt Publishing
Last updated 7/2017
English
Curiosity Sale
Current price: $10 Original price: $200 Discount: 95% off
30-Day Money-Back Guarantee
Includes:
  • 11 hours on-demand video
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Basic concepts of Scala
  • Analysing data using Spark in Scala
  • Creation of fast data processing using SMACK Stack
View Curriculum
Requirements
  • Experience with Scala is essential
  • Basic knowledge of data processing concepts
Description

If you want to outrun your competitors by taking business decisions using your data, then this course is for you. 

SMACK is an open source full stack for big data architecture. It is a combination of Spark, Mesos, Akka, Cassandra, and Kafka. This stack is the newest technique developers have begun to use to tackle critical real-time analytics for big data. 

SMACK: Getting Started with Scala, Spark, and the SMACK Stack gets you familiar with Scala and understanding the various features offered by it. You will also get to understand the process for data analysis using Spark. Finally, you will be introduced to the SMACK Stack which helps us to process data blazingly fast. Development using these technologies can be summarized as: More data: Less Time. 

This Learning Path is a learner material and the curriculum is so planned to meet your learning needs. It starts with the basics of Apache Spark, one of the trending big data processing frameworks on the market today.  We it moves on to Scala, which has emerged as an important tool for performing various data analysis tasks efficiently. It will help you leverage popular Scala libraries and tools to perform core data analysis tasks with ease in Spark. In the last part, we will teach you how to integrate the SMACK stack to create a highly efficient data analysis system for fast data processing.

By the end of the course, you’ll be able to analyze and process data swiftly and efficiently as compared to other traditional data analytic systems.

About the Author:

For this course, we have combined the best works of this esteemed author:

 Nishant Garg has over 16 years of software architecture and development experience in various technologies, such as Java Enterprise Edition, SOA, Spring, Hadoop, Hive, Flume, Sqoop, Oozie, Spark, YARN, Impala, Kafka, Storm, Solr/Lucene, NoSQL databases (such as HBase, Cassandra, and MongoDB), and MPP databases (such as GreenPlum). He received his MS in software systems from the Birla Institute of Technology and Science, Pilani, India, and is currently working as a senior technical architect for the Big Data R&D Labs with Impetus Infotech Pvt. Ltd. Nishant has also undertaken many speaking engagements on big data technologies and is also the author of Learning Apache Kafka & HBase Essestials, Packt Publishing.

Anatolii Kmetiuk has been working with Scala-based technologies for four years. He has experience in Deep Learning models for text processing. He is interested in Category Theory and Type-level programming in Scala. Another field of interest is Chaos and Complexity Theory and Artificial Life, and ways to implement them in programming languages. 

Raúl Estrada Aparicio is a programmer since 1996 and Java Developer since 2001. He loves functional languages such as Scala, Elixir, Clojure, and Haskell. He also loves all the topics related to Computer Science. With more than 12 years of experience in High Availability and Enterprise Software, he has designed and implemented architectures since 2003.His specialization is in systems integration and has participated in projects mainly related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys Mobile Programming and Game Development. He considers himself a programmer before an architect, engineer, or developer.

Who is the target audience?
  • Data Analysts, Data Scientists, and Business Analysts can use this course to make highly precise and fast data models.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
73 Lectures
10:53:44
+
Apache Spark Fundamentals
19 Lectures 02:18:10

This video provides an overview of the entire course.

Preview 03:44

What are the origins of Apache Spark and what are its uses?

Spark Introduction
04:53

What are the various components in Apache Spark?

Spark Components
06:02

This video gets us familiar with the tools used in Apache spark.

Getting Started
10:41

This video explains the complete historical journey of project Nutch to Apache Hadoop—how the project Hadoop was started, what were the research papers that influenced the Spark project, and so on. In the end, various goals achieved by developing Hadoop are explained.

Preview 06:49

In this video, we are going to look at the Apache Hadoop background running JVM processes—name node, data node, resource manager, and node manager. It also provides an overview of Hadoop components—HDFS, YARN, and Map Reduce programming mode.

Hadoop Processes and Components
07:24

This video shares more details about Hadoop components Hadoop distributed filesystem—Goals, HDFS components, and the working of HDFS. It also explains another Hadoop component YARN—components, lifecycle, and its use cases.

HDFS and YARN
07:10

This video provides an overview of Map Reduce—the Hadoop programming model and its execution behavior at various stages.

Map Reduce
06:47

The aim of this video is to introduce the Scala language and its features, and by the end of this video, you should be able to get started with Scala.

Introduction to Scala
07:16

The aim of this video is to explain the fundamentals of Scala Programming, such as Scala classes, fields, methods, and the different types of arguments, such as default and named arguments passed to class constructors and methods.

Scala Programming Fundamentals
07:42

The aim of this video is to explain the objects in Scala language, singleton object in Scala, and outline the usages of objects in Scala applications. It also describes companion objects.

Objects in Scala
06:22

The aim of this video is to explain the structure of the Scala collections hierarchy. Look at the examples of different collection types, such as Array, Set, and Map. It also covers how to apply functions to data in collections and outlines the basics of structural sharing.

Collections
08:36

The aim of this video is to start your learning of Apache Spark fundamentals. It introduces you to the Spark component architecture and how different components are stitched together for Spark execution.

Spark Execution
07:39

The aim of this video is to take the first step towards Spark programming. It explains the Spark Context and also shares the need of Resilient Distributed Datasets called RDD. It also explains the execution approach change in Map Reduce due to RDD.

Understanding RDD
07:06

The aim of this video is to explain the operations that can be applied on RDDs. These operations are in the form of transformations and actions. It explains various operations under both the categories with examples.

RDD Operations
09:06

The aim of this video is to explain and demonstrate data loading and storing in Spark from different file types; such as text, CSV, JSON file, and sequence file; different filesystems, such as local filesystem, Amazon S3, and HDFS; and different databases, such as My SQL, Postgres, HBase, and so on.

Loading and Saving Data in Spark
10:15

The aim of this video is to explain the motivations behind key-value-based RDD and the creation of such RDDs. Next, it explains the various transformations and actions that can be applied on key-value-based RDD. Finally, it explains data partitioning techniques in Spark.

Managing Key-Value Pairs
06:56

The aim of this video is to explain a few more advance concepts, such as accumulators, broadcast variables, and passing data to external programs using pipes.

Accumulators
06:56

The aim of this video is to demonstrate the writing of Spark jobs using Eclipse-based Scala IDE, creating Spark job JAR files, and, finally, copying and executing the Spark job on Hadoop cluster.

Writing a Spark Application
06:46

Test Your Knowledge
5 questions
+
Spark for Data Analysis in Scala
12 Lectures 02:05:25

This video will give an overview of entire course

Preview 03:45

We need a data set to practice the skills learned in this course. We download the Houses Prices dataset from Kaggle for this.

Downloading the Competition Dataset
03:53

Spark Notebook is a convenient environment for data analysis and reproducible research. We need to install it.

Installing Spark Notebook
13:29

Before proceeding to load the data, we need to understand how Spark represents and handles it. This theoretical part covers it.

Spark Abstractions – RDD, DataFrame
07:48

Now that we know the theory, we need to actually see how to load the example dataset in Spark.

Loading CSV data into DataFrame
12:41

Before building a statistical model of a dataset, one must have some understanding of that dataset. This video provides tools to build a visual intuition about the data in the dataset.

Different types of widgets supported for Spark Notebook for DataFrame visualizat
10:49

Another way to draw insights from the data is to look at its statistical metrics. This video describes how to compute them with Spark.

Statistical Functions Supported by Spark
15:43

Preprocess the data before feeding it to a ML algorithm. This video describes how to do that with standard SQL/Collections methods.

Operations on DataFrame
12:53

SparkSQL operations are powerful, but SparkML supports some common ML operations out of the box. Learning them may greatly reduce the work to be done.

Feature Transformers
13:10

A particular kind of operation on data that is commonly used is slicing the features (taking a subset of them) based on a predicate.

Feature Selectors
05:21

Before proceeding to concrete examples of using SparkML, we need to understand its structure.

Architecture
08:32

The result of data analysis is usually a model of the data in question. This video explains how to do data modeling with the ML algorithms that Spark has.

Algorithms: Linear Regression and Regression Trees
17:21

Test Your Knowledge
3 questions
+
Fast Data Processing Systems with SMACK Stack
42 Lectures 06:30:09

This video gives an overview of the entire course.

Preview 05:19

To find an efficient solution, we need to learn about the data processing challenges first.

Modern Data-Processing Challenges
06:28

It is important to know the process or pipeline of SMACK to use it better.

The Data-Processing Pipeline Architecture
07:09

To use each technology, you need to understand each technology.

SMACK Technologies
07:04

Now learn about data expert profiles and how data processing can be a data center operation.

Understanding Data Expert Profiles and Changing the Data Center Operations
08:36

We need to understand Scala hierarchy and the selection of a Scala to work with Scala. This video will teach you that.

Scala Collections
07:44

Iterators are an important part of Scala. This video uses iterators and shows their importance.

Iterators in Scala
03:43

This video shows a host of functions with Scala that includes filtering, merging, sorting and also sets, arrays queues, and stacks.

More Functions with Scala
18:52

This video shows the comparison between the Actor Model and traditional OOP, then describing about the actor system and reference.

Actor Model In a Nutshell
13:32

Here, we will be learning about the functioning of actors using various katas.

Working with Actors
09:39

Apache Spark cluster-based installations can become a complex task, when we integrate Mesos, Kafka, and Cassandra from: databases, telecommunications, operating systems, and infrastructure.

Spark Concepts
06:44

Spark has four design goals: make in memory (Hadoop is not in-memory) data storage, distribute in a cluster, be fault tolerant, and be fast and efficient.

Resilient Distributed Datasets
22:00

Apache Spark has its own built-in cluster standalone manager but you can run multiple cluster managers, including Apache Mesos, Hadoop YARN, and Amazon EC2.

Spark in Cluster Mode
20:26

Spark Streaming is the module for managing data flows. Much of Spark is built with the concept of RDD. It provides the concept of DStreams or Discretized Streams.

Spark Streaming
20:02

NoSQL is a distributed database with an emphasis on scalability, high availability, and ease of administration, the opposite of established relational databases.

NoSQL
04:33

The task of creating a scalable database massively decentralized, optimized for read operations, painlessly modifying data structures. The solution was found by combining two existing technologies that is Google's BigTable and Amazon's Dynamo.

Apache Cassandra Installation
09:50

Cassandra offers to create a back up on the local computer. It creates a copy of the base using a snapshot. It is possible to make a snapshot of all the key spaces. Compression increases the cluster nodes capacity, reducing the data size on the disk.

Backup and Compression
04:17

If you use an incremental backup, it is also necessary to provide the incremental backups created after the snapshot. There are multiple ways to perform a recovery from the snapshot.

Recovery Techniques
03:32

Work with DBMS optimization

Recovery Techniques – DBMS Optimization, Bloom Filter, and More
15:08

The Spark Cassandra connector is a client used to achieve this connection, but this client is special because it has been designed specifically for Spark and not for a specific language.

The Spark Cassandra Connector
04:47

In this video, you will learn the basics of the Spark Cassandra connector

Introduction to the Spark Cassandra Connector
05:20

Spark streaming allows for handling and processing of high throughput and fault tolerant live data streams. In this video, you will learn about Spark Cassandra streaming and create a stream.

Cassandra and Spark Streaming Basics
03:35

Once our Spark Cassandra is set up, we'll look at the different operations we can perform with Cassandra.

Functions with Cassandra
11:57

In this video, we will use the Akka Cassandra connector to build a simple Akka application, make HTTP requests, and store the data in Cassandra.

Akka and Cassandra
10:54

Increasing data requires better data processing systems. Hence, Kafka comes into picture. In this video, you will learn about the features of Kafka and basics of Kafka.

Introducing Kafka
10:46

We need to install Kafka to work with it. This video will enable you to do that.

Installation
02:16

Clusters are Kafka’s Publisher-subscriber messaging systems. In this video, you will learn to program with them.

Cluster
13:14

In this video, we will look at how the Kafka architecture is designed and understand the components that make it what it is.

Architecture
09:55

Producers are applications that create messages and publish them to the broker. You need to understand the working of producers.

Producers
05:59

Consumers are applications that consume the messages published by the broker. So they are the next step in the Kafka architecture.

Consumers
07:19

To process large volumes of data, we require to integrate Kafka with other big data tools. Integration teaches us that. Also there are numerous tools provided by Kafka to manage features. We will learn about that in administration.

Integration and Administration
14:01

In this video, we will be looking at the relation between Akka and Spark and Kafka and Akka.

Akka, Spark, and Kafka
08:52

In this video, we will review the connectors between Kafka and Cassandra.

Kafka and Cassandra
02:08

In this video, you will be introduced to Mesos and learn about the Mesos architecture.

The Apache Mesos Architecture
16:28

Resource allocation module of Mesos decides quantity of resources allocated to each framework. Hence, it is important to know about the resource allocation in Mesos.

Resource Allocation
20:34

If you don’t want to use cloud services from Amazon, Google, or Microsoft, we can set up our cluster on our private data center. This video will teach you how to do that.

Running a Mesos Cluster on a Private Data Center
10:01

We need frameworks to deploy, discover, balance load, and handle failure of services. In this video, we will look at the frameworks that are used for service management.

Scheduling and Managing the Frameworks
15:15

Aurora is a Mesos framework for long running services and cron jobs. Learn about job scheduling with Aurora.

Apache Aurora
04:53

Singularity is a platform that enables deploying and running services and scheduled jobs in the cloud or data centers. Combined with Apache Mesos, it provides efficient management of the underlying processes life cycle and effective use of cluster resource. Let's see what it is all about.

Singularity
03:42

In this video, you will learn how to run Apache Spark on Mesos

Apache Spark on Apache Mesos
04:56

In this video, we will deploy Apache Cassandra on Apache Mesos with the help of Marathon.

Apache Cassandra on Apache Mesos
02:12

In this video, we will deploy Apache Kafka on Apache Mesos.

Apache Kafka on Apache Mesos
06:27

Test Your Knowledge
5 questions
About the Instructor
Packt Publishing
3.9 Average rating
7,336 Reviews
52,405 Students
616 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.