SMACK is an open source full stack for big data architecture. It is a combination of Spark, Mesos, Akka, Cassandra, and Kafka. This stack is the newest technique developers have begun to use to tackle critical real-time analytics for big data. This highly practical tutorial will teach you how to integrate these technologies to create a highly efficient data analysis system for fast data processing.We’ll start off with an introduction to SMACK and show you when to use it. First you’ll get to grips with functional thinking and problem solving using Scala. Next you’ll come to understand the Akka architecture. Then you’ll get to know how to improve the data structure architecture and optimize resources using Apache Spark. Moving forward, you’ll learn how to perform linear scalability in databases with Apache Cassandra. You’ll grasp the high throughput distributed messaging systems using Apache Kafka. We’ll show you how to build a cheap but effective cluster infrastructure with Apache Mesos. Finally, you will deep dive into the different aspects of SMACK using 2 practical case studies. By the end of the video, you will be able to integrate all the components of the SMACK stack and use them together to achieve highly effective and fast data processing.
About The Author
Raúl Estrada Aparicio is a programmer since 1996 and Java Developer since 2001. He loves functional languages such as Scala, Elixir, Clojure, and Haskell. He also loves all the topics related to Computer Science. With more than 12 years of experience in High Availability and Enterprise Software, he has designed and implemented architectures since 2003.
His specialization is in systems integration and has participated in projects mainly related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys Mobile Programming and Game Development. He considers himself a programmer before an architect, engineer, or developer.
He is also a Crossfitter in San Francisco, Bay Area, now focused on Open Source projects related to Data Pipelining such as Apache Flink, Apache Kafka, and Apache Beam.
Raul is a supporter of free software, and enjoys to experiment with new technologies, frameworks, languages, and methods.
To find an efficient solution, we need to learn about the data processing challenges first.
It is important to know the process or pipeline of SMACK to use it better.
To use each technology, you need to understand each technology.
Now learn about data expert profiles and how data processing can be a data center operation.
We need to understand Scala hierarchy and the selection of a Scala to work with Scala. This video will teach you that.
Iterators are an important part of Scala. This video uses iterators and shows their importance.
This video shows a host of functions with Scala that includes filtering, merging, sorting and also sets, arrays queues, and stacks.
Apache Spark cluster-based installations can become a complex task, when we integrate Mesos, Kafka, and Cassandra from: databases, telecommunications, operating systems, and infrastructure.
Spark has four design goals: make in memory (Hadoop is not in-memory) data storage, distribute in a cluster, be fault tolerant, and be fast and efficient.
Apache Spark has its own built-in cluster standalone manager but you can run multiple cluster managers, including Apache Mesos, Hadoop YARN, and Amazon EC2.
Spark Streaming is the module for managing data flows. Much of Spark is built with the concept of RDD. It provides the concept of DStreams or Discretized Streams.
NoSQL is a distributed database with an emphasis on scalability, high availability, and ease of administration, the opposite of
established relational databases.
The task of creating a scalable database massively decentralized, optimized for read operations, painlessly modifying data structures. The solution was found by combining two existing technologies that is Google's BigTable and Amazon's Dynamo.
Cassandra offers to create a back up on the local computer. It creates a copy of the base using a snapshot. It is possible to make a snapshot of all the key spaces. Compression increases the cluster nodes capacity, reducing the data size on the disk.
If you use an incremental backup, it is also necessary to provide the incremental backups created after the snapshot. There are multiple ways to perform a recovery from the snapshot.
Work with DBMS optimization
The Spark Cassandra connector is a client used to achieve this connection, but this client is special because it has been designed specifically for Spark and not for a specific language.
In this video, you will learn the basics of the Spark Cassandra connector
Spark streaming allows for handling and processing of high throughput and fault tolerant live data streams. In this video, you will learn about Spark Cassandra streaming and create a stream.
Once our Spark Cassandra is set up, we'll look at the different operations we can perform with Cassandra.
In this video, we will use the Akka Cassandra connector to build a simple Akka application, make HTTP requests, and store the data in Cassandra.
Increasing data requires better data processing systems. Hence, Kafka comes into picture. In this video, you will learn about the features of Kafka and basics of Kafka.
We need to install Kafka to work with it. This video will enable you to do that.
Clusters are Kafka’s Publisher-subscriber messaging systems. In this video, you will learn to program with them.
In this video, we will look at how the Kafka architecture is designed and understand the components that make it what it is.
Producers are applications that create messages and publish them to the broker. You need to understand the working of producers.
Consumers are applications that consume the messages published by the broker. So they are the next step in the Kafka architecture.
To process large volumes of data, we require to integrate Kafka with other big data tools. Integration teaches us that. Also there are numerous tools provided by Kafka to manage features. We will learn about that in administration.
In this video, you will be introduced to Mesos and learn about the Mesos architecture.
Resource allocation module of Mesos decides quantity of resources allocated to each framework. Hence, it is important to know about the resource allocation in Mesos.
If you don’t want to use cloud services from Amazon, Google, or Microsoft, we can set up our cluster on our private data center. This video will teach you how to do that.
We need frameworks to deploy, discover, balance load, and handle failure of services. In this video, we will look at the frameworks that are used for service management.
Aurora is a Mesos framework for long running services and cron jobs. Learn about job scheduling with Aurora.
Singularity is a platform that enables deploying and running services and scheduled jobs in the cloud or data centers. Combined with Apache Mesos, it provides efficient management of the underlying processes life cycle and effective use of cluster resource. Let's see what it is all about.
In this video, you will learn how to run Apache Spark on Mesos.
In this video, we will deploy Apache Cassandra on Apache Mesos with the help of Marathon.
In this video, we will deploy Apache Kafka on Apache Mesos.
Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.
With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.
From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.
Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.