Hands-On Big Data Analysis with Hadoop 3
0.0 (0 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
8 students enrolled

Hands-On Big Data Analysis with Hadoop 3

Build effective analytics solutions with Hadoop 3
0.0 (0 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
8 students enrolled
Created by Packt Publishing
Last updated 9/2019
English
English [Auto-generated]
Current price: $80.99 Original price: $124.99 Discount: 35% off
16 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 1.5 hours on-demand video
  • 1 downloadable resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Store data with HDFS and learn in detail about HBase
  • Share and access data in a SQL-like interface for HDFS
  • Analyze real-time events using Spark Streaming
  • Perform complex big data analytics using MapReduce
  • Analyze data to perform complex processing with Hive and Pig
  • Explore functional programming using Spark
  • Learn to import data using Sqoop
Requirements
  • Basic knowledge of the Hadoop ecosystem and basic Java programming knowledge are assumed.
Description

This course is your guide to performing real-time data analytics and stream processing with Spark. Use different components and tools such as HDFS, HBase, and Hive to process raw data. Learn how tools such as Hive and Pig aid in this process.

In this course, you will start off by learning data analysis techniques with Hadoop using tools such as Hive. Furthermore, you will learn to apply these techniques in real-world big data applications. Also, you will delve into Spark and its related tools to perform real-time data analytics, streaming, and batch processing on your application.

Finally, you'll learn how to extend your analytics solutions to the cloud.

Please note that this course is based on Hadoop 3.0 but the code used in the course is compatible with Hadoop 3.2.

About the Author

Tomasz Lelek is a software engineer, programming mostly in Java and Scala. He has been working with the Spark and ML APIs for the past 5 years, with production experience in processing petabytes of data.

He is passionate about nearly everything associated with software development and believes that we should always try to consider different solutions and approaches before attempting to solve a problem. Recently he was a speaker at conferences in Poland—Confitura and JDD (Java Developers Day)—and at Krakow Scala User Group. He has also conducted a live coding session at the Geecon Conference.

Who this course is for:
  • This course is for big data professionals looking to build quick and efficient data analytics solutions for their big data applications with Hadoop 3.
Course content
Expand all 18 lectures 01:36:46
+ HDFS and HBase – The Hadoop Database
6 lectures 26:35

This video provides an overview of the entire course.

Preview 01:51

In this video, you will learn the purpose of using HBase.

   •  Go through column-oriented database concepts

   •  Look at the pros of HBase

Why HBase?
04:29

In this video, we will be looking at HDFS and HBase.

   •  Explore the HBase architecture

   •  Explore the HBase data structure

HDFS and HBase
02:56

In this video, we will be reviewing the concepts of column-oriented databases.

   •  Know what a column-oriented DB is

   •  Explore the concepts

   •  Learn when to use a column-oriented DB

Column-Oriented Database Concepts
04:02

In this video, you will learn how to create an HBase database using HBase from the Java client library.

   •  Learn how to create a table and family

   •  Learn how to create rows and columns

   •  Learn how to retrieve data

Creating an HBase Database – Using HBase from Java
08:27

In this video, we will be using a tool called Sqoop to export data to an HDFS cluster.

   •  Set up Cloudera Sandbox and start the VirtualBox with Cloudera

   •  Look at the tools that are available

   •  Use Sqoop to import data to HDFS

Using Sqoop to Import Data to HDFS
04:50
+ Data Processing Using MapReduce
5 lectures 28:18

In this video, we will be looking at the MapReduce job architecture.

   •  Explore what a MapReduce job is

   •  Learn how to calculate word count in a distributed way

Preview 05:06

In this video, we will delve into Spark’s key concepts: Spark context driver and RDD.

   •  Explore the Spark architecture

   •  Explore RDD and learn how it is the main building block of a Spark program

   •  Perform partitioning and distinguish between transformations and actions

Learning Spark’s Key Concepts – Spark Context, Driver, and RDD
06:14

In this video, we will delve into functional programming using Spark.

   •  Experiment with the RDD API

   •  Experiment with the DataFrame API

   •  Experiment with the DataSet API

Spark API – Functional Programming Using Spark
05:02

In this video, we will be looking at Spark transformations and actions.

   •  Use transformations API methods

   •  Use actions API methods

Preview 06:17

In this video, we will be writing MapReduce jobs using Apache Spark.

   •  Calculate word count in a MapReduce fashion

   •  Create a Spark job

   •  Test the job

Writing MapReduce Jobs Using Apache Spark
05:39
+ Analyzing Data Using Hive and Pig
4 lectures 24:56

In this video, you will delve into the Pig tool.

   •  Explore what Apache Pig is

   •  Explore an example Pig job

   •  Learn when to use Pig

Preview 06:27

In this video, you will delve into Hive.

   •  Learn what Hive is and when to use it

   •  Learn how Hive is using HDFS

   •  Know what a Metastore is

Hive Architecture and Use Cases
05:45

You will delve into HQL.

   •  Learn how to create table atop HDFS

   •  Learn how to define partitions

   •  Learn how to query a table using HQL

Hive Query Language
05:39

In this video, we will be using Hive and Pig to perform MapReduce queries.

   •  Create a Pig query that fetches data from HDFS

   •  Create a Hive query that fetches data from HDFS

Using Hive and Pig to Perform MapReduce Query
07:05
+ Performing Real-Time Events Analysis Using Spark Streaming
3 lectures 16:57

In this video, you will learn how to handle time in high-velocity streams.

   •  Learn what event time and processing time are

   •  Explore ingestion time

   •  Explore how to handle them

Handling Time in High-Velocity Streams
05:57

You will learn how to build a streaming application.

   •  Handle not-in-order events

   •  Learn how to verify the order of events

   •  Implement sorting in a stream of events

Building Streaming Application
04:09

In this video, you will be filtering bots from a stream of page view events.

   •  Implement stream processing that filters our bots

   •  Use deduplication that we implemented to make stream processing robust

   •  Implement order verification that we implemented to make stream processing robust

Filtering Bots from a Stream of Page View Events
06:51