Solving 10 Hadoop'able Problems
4.4 (2 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
63 students enrolled

Solving 10 Hadoop'able Problems

Need solutions to your big data problems? Here are 10 real-world projects demonstrating problems solved using Hadoop.
4.4 (2 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
63 students enrolled
Created by Packt Publishing
Last updated 5/2019
English
English [Auto]
Current price: $86.99 Original price: $124.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 3 hours on-demand video
  • 1 downloadable resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Explore the Hadoop big data Ecosystem in a nutshell
  • Process payment data from an event stream using the streaming API: Payment Analyzer
  • Detect BOT traffic using Spark Streaming, make log data queryable, and investigate customer data
  • Supply Chain analysis - find top-seller items in a streaming way, enhance top-seller items
  • Analyze Customer churn amounts quantitatively with DataFrame queries
  • Perform IoT sensor data analysis with device response to system failures and data streams
  • High-performance computation with neighborhood aggregations
  • Page ranking using Spark GraphX
  • Threat Analysis – Analyzing weblogs for suspicious activity and anomalies in network traffic
  • Extract information from unstructured text via Spark DataFrames
  • Perform sentiment analysis of posts using Logistic Regression, and find the author of a post
  • Find what product users want to buy using Cloudera Sandbox Toolkit
  • Use movie history to suggest content, and test and experiment with Recommendation Enginec
Course content
Expand all 40 lectures 03:12:23
+ Core Components
3 lectures 14:37

This video gives an overview of the entire course.

Preview 02:52

In this video, we will see what a HDFS is.

Hadoop Distributed File System (HDFS)
06:59

In this video, we will learn about YARN.

Distributed Compute Capability YARN
04:46
+ Downstream Ecosystem
5 lectures 28:07

In this video, we will see what the Hive is.

Preview 07:23

In this video, we will see what a pub-sub is.

Message Queuing and Data Ingestion Kafka
03:50

In this video, we will see some column-oriented database concepts.

NoSQL Datastores – Hadoop HBase, Accumulo
05:32

In this video, we will see Spark architecture.

Machine Learning – Spark and Spark MLlib
06:41

In this video, we will explain Spark Streaming architecture.

Stream Processing – Spark Streaming
04:41
+ Financial, Trade, and Time Series Applications – Trade Surveillance
3 lectures 16:16

In this video, we will process payment data.

Processing Payment Data from an Event Stream
04:50

In this video, we will implement real-time logic on stream of events.

Advanced Aggregations Using Streaming API – PaymentAnalyzer
04:28

In this video, we will save data to HBase.

Storing Time Series Data in HBase
06:58
+ AdTech – Ad Targeting
3 lectures 17:15

In this video, we will implement bots filtering streaming jobs.

Detecting BOT Traffic Using Spark Streaming
06:08

In this video, we will implement HDFS sink that saves data into HDFS.

Make Web Log Data Queryable – Hive Sink
06:48

In this video, we will investigate the data of customers in Hive

Investigating Customers Data in Hive
04:19
+ Business/Point of Sale – Transaction Analysis
2 lectures 13:18

In this video, we will use the streaming way to find the top seller item.

Trending Supply Chain – Finding Top Seller Item in a Streaming Way
08:01

In this video, we will enrich transactions with additional information.

Enriching Top Sellers with Additional Information
05:17
+ Customer Churn Analysis
2 lectures 10:32

In this video, we will perform quantitative analyze on the customer churn.

Analyzing Customer Churn (Quantitative) Using DataFrame Queries
05:36
In this video, we will analyze the amounts of customer churn based on transactional amounts.
Analyzing Customer Churn (Amounts) Using DataFrame Queries
04:56
+ Internet of Things
3 lectures 19:07

In this video, we will take a look at Streaming processing of sensor data.

Storing Low Granularity Structured Sensor Data in HBase
08:41

In this video, we will insert data to HBase from Spark Streaming job.

Consuming Sensor Data Stored in HBase – Scan and Count
03:51
In this video, we will calculate statistics from sensors.
Building Summaries on Data Streaming from Devices
06:35
+ Scientific and High Performance Computing
6 lectures 20:22

In this video, we will see how to represent a graph.

Introducing Spark GraphX – How to Represent a Graph?
02:13

In this video, we will perform operations in graph using GraphX.

Perform Graph Operations Using GraphX
03:56
In this video, we will count degrees of vertices.
Counting Degree of Vertices
03:20

In this video, we will calculate average of neighborhood.

Neighborhood Aggregations – Collecting Neighbors
03:45

In this video, we will see what connected components are.

Structural Operators – Connected Components
02:09

In this video, we will see find page rank using Spark GraphX.

Page Rank Using Spark GraphX
04:59
+ Security Concerns Intrusion Detection – Threat Analysis
4 lectures 12:37
In this video, we will see what an anomaly is and how to detect it.
Anomaly Detection
02:16
The aim of this video is to analyse web logs for suspicious activity and load data into Spark.
Analyzing Web Logs for Suspicious Activity and Loading into Spark
02:11

In this video, we will implement clustering in Spark.

Implementing Clustering – Choosing Number of Clusters
03:59

In this video, we will detect anomalies in network traffic.

Detecting Anomalies in Network Traffic
04:11
+ Text Analysis
5 lectures 13:59

In this video, we will analyse post for an author.

Analyzing Post for an Author
03:23
In this video, we will extract information from unstructured text.
Extracting Information from Unstructured Text
01:01
In this video, we will get to know the algorithms for transforming text into vector of numbers.
Extracting Information Via Spark DataFrame
03:36

In this video, we will see what a supervised and unsupervised ML is.

Sentiment Analysis of Posts Using Logistic Regression
03:36

In this video, we will find an author of a post.

Finding an Author of a Post
02:23
Requirements
  • Knowledge of solving data problems is required
Description

The Apache Hadoop ecosystem is a popular and powerful tool to solve big data problems. With so many competing tools to process data, many users want to know which particular problems are well suited to Hadoop, and how to implement those solutions.

To know what types of problems are Hadoop-able it is good to start with a basic understanding of the core components of Hadoop. You will learn about the ecosystem designed to run on top of Hadoop as well as software that is deployed alongside it. These tools give us the building blocks to build data processing applications. This course covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.

By the end of this course, you will have been exposed to a wide variety of Hadoop software and examples of how it is used to solve common big data problems.

About the Author

Tomasz Lelek is a Software Engineer who programs mostly in Java and Scala. He is a fan of microservice architectures and functional programming. He dedicates considerable time and effort to be better every day. Recently, he's been delving into big data technologies such as Apache Spark and Hadoop. He is passionate about nearly everything associated with software development.

Tomasz thinks that we should always try to consider different solutions and approaches to solving a problem. Recently, he was a speaker at several conferences in Poland - Confitura and JDD (Java Developer's Day) and also at Krakow Scala User Group.

He also conducted a live coding session at Geecon Conference.

Who this course is for:
  • Data Engineers, and Machine Learning and Data analysts