Apache Spark is highly configurable and is gaining rapid popularity in the Big Data markets because of its in-memory data processing that makes it high-speed data processing engine. It also has well-built libraries for machine learning and graph analytics algorithms. This brings in Apache Spark to solve scalable machine learning problems and also work with high streaming real-time data. If you want to get the most out of the trending Big Data framework for all your data processing and machine learning needs, then this course is for you.
This course focuses on performing data streaming, data analytics, and machine learning with Apache Spark. You will learn to load data from a variety of structured sources such as JSON, Hive, and Parquet using Spark SQL and schema RDDs. You will also build streaming applications and learn best practices for managing high-velocity streaming and external data sources. Next, you will explore Spark machine learning libraries and GraphX where you will perform graphical processing and analysis. Finally, you will build projects which will help you put your learnings into practice and get a stronghold of the topic.
Contents and Overview
This training program includes 4 complete courses, carefully chosen to give you the most comprehensive training possible.
The first course, Apache Spark in 7 Days, is designed to give you a fundamental understanding of and hands-on experience in writing basic code as well as running applications on a Spark cluster. You will work on interesting examples and assignments that will demonstrate and help you understand basic operations, querying machine learning, and streaming.
In the second course, Big Data Processing using Apache Spark, you will learn how to leverage Apache Spark to be able to process big data quickly. You will learn the basics of Spark API and its architecture in detail. You will then learn about Data Mining and Data Cleaning, wherein you will understand the Input Data Structure and how Input data is loaded. You will also write actual jobs that analyze data.
The third course, Big Data Analytics Projects with Apache Spark, contains various projects that consist of real-world examples. The first project is to find top selling products for an e-commerce business by efficiently joining data sets in the paradigm. Next, a Market Basket Analysis will help you identify items likely to be purchased together and find correlations between items in a set of transactions. Moving on, you will learn about probabilistic logistic regression by finding an author for a post. Next, you will build a content-based recommendation system for movies to predict whether an action will happen, which you will do by building a trained model. Finally, you will use the MapReduce Spark program to calculate mutual friends on the social network.
In the fourth course, Hands-On Machine Learning with Scala and Spark, you will go through day-to-day challenges that programmers face while implementing ML pipelines and consider different approaches and models to solve complex problems. You will learn about the most effective machine learning techniques and implement them in your favour. You will also implement algorithms with practical hands-on projects wherein you will build data models and understand how they work by using different types of algorithms.
By the end of this course, you will be able to process large datasets, extract features from it, and apply a machine learning model that is well suited to your problem.
Meet Your Expert(s):
We have the best work of the following esteemed author(s) to ensure that your learning journey is smooth:
Karen Yang has been a passionate self-learner in computer science for over 6 years. She has programming, big data processing, and engineering experience. Her recent interests include cloud computing. She previously taught for 5 years in a college evening adult program.
Tomasz Lelek is a Software Engineer and Co-Founder of InitLearn. He mostly does programming in Java and Scala. He dedicates his time and effort to get better at everything. He is currently diving into Big Data technologies. Tomasz is very passionate about everything associated with software development. He has been a speaker at a few conferences in Poland-Confitura and JDD, and at the Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He was also a speaker at an international event in Dhaka. He is very enthusiastic and loves to share his knowledge.