Flume and Sqoop for Ingesting Big Data
4.1 (43 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,609 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Flume and Sqoop for Ingesting Big Data to your Wishlist.

Add to Wishlist

Flume and Sqoop for Ingesting Big Data

Import data to HDFS, HBase and Hive from a variety of sources , including Twitter and MySQL
4.1 (43 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,609 students enrolled
Created by Loony Corn
Last updated 10/2016
English
Current price: $10 Original price: $50 Discount: 80% off
5 hours left at this price!
30-Day Money-Back Guarantee
Includes:
  • 2.5 hours on-demand video
  • 29 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Use Flume to ingest data to HDFS and HBase
  • Use Sqoop to import data from MySQL to HDFS and Hive
  • Ingest data from a variety of sources including HTTP, Twitter and MySQL
View Curriculum
Requirements
  • Knowledge of HDFS is a prerequisite for the course
  • HBase and Hive examples assume basic understanding of HBase and Hive shells
  • HDFS is required to run most of the examples, so you'll need to have a working installation of HDFS
Description

Taught by a team which includes 2 Stanford-educated, ex-Googlers. This team has decades of practical experience in working with Java and with billions of rows of data. 

Use Flume and Sqoop to import data to HDFS, HBase and Hive from a variety of sources, including Twitter and MySQL

Let’s parse that.

Import data : Flume and Sqoop play a special role in the Hadoop ecosystem. They transport data from sources like local file systems, HTTP, MySQL and Twitter which hold/produce data to data stores like HDFS, HBase and Hive. Both tools come with built-in functionality and abstract away users from the complexity of transporting data between these systems. 

Flume: Flume Agents can transport data produced by a streaming application to data stores like HDFS and HBase. 

Sqoop: Use Sqoop to bulk import data from traditional RDBMS to Hadoop storage architectures like HDFS or Hive. 

What's Covered:

Practical implementations for a variety of sources and data stores ..

  • Sources : Twitter, MySQL, Spooling Directory, HTTP
  • Sinks : HDFS, HBase, Hive

.. Flume features : 

Flume Agents, Flume Events, Event bucketing, Channel selectors, Interceptors

.. Sqoop features : 

Sqoop import from MySQL, Incremental imports using Sqoop Jobs


Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2 people developing technical video content. Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is to *NOT offer additional technical support over email or in-person*. The truth is, direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.

It is a hard trade-off.

Thank you for your patience and understanding!


Who is the target audience?
  • Yep! Engineers building an application with HDFS/HBase/Hive as the data store
  • Yep! Engineers who want to port data from legacy data stores to HDFS
Students Who Viewed This Course Also Viewed
Curriculum For This Course
17 Lectures
02:16:04
+
You, This Course and Us
1 Lecture 01:46

Let's start with an introduction about the course, and what we'll know at the end of the course.

Preview 01:46
+
Why do we need Flume and Sqoop?
1 Lecture 18:23

Let's understand Flume and Sqoop and their role in the Hadoop Ecosystem

Preview 18:23
+
Flume
11 Lectures 01:33:51

Installing Flume is pretty straightforward. 

Installing Flume
02:43

A Flume Agent is the most basic unit that can exist independently in Flume. An Agent is made up of Sources, Sinks and Channels.

Preview 10:57

Our first example of a Flume Agent using a Spooling Directory Source, a File Channel and a Logger Sink

Example 1 : Spool to Logger
14:34

A Flume event represents 1 record of data. Flume events consist of event headers and the event body.

Flume Events are how data is transported
06:07

Learn how to use HDFS as a sink with Flume

Example 2 : Spool to HDFS
09:08

HTTP Sources can be pretty handy when you have an application capable of making POST requests.

Example 3: HTTP to HDFS
09:24

Event Headers in Flume carry useful metadata. Use event headers to bucket events in HDFS.

Example 4: HTTP to HDFS with Event Bucketing
05:40

Let's see how to use a HBase sink as the endpoint of the Flume Agent

Example 5: Spool to HBase
06:22

HTTP to HDFS and Logger at the same time. See how to route events using channel selectors.

Example 6: Using multiple sinks and Channel selectors
09:43

Connect with the Twitter API using Flume. Use an Interceptor to do Regex filtering within Flume itself! 

Example 7: Twitter Source with Interceptors
10:48

If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. 

[For Linux/Mac OS Shell Newbies] Path and other Environment Variables
08:25
+
Sqoop
4 Lectures 22:04

Install Sqoop and the connector for Sqoop to MySQL

Installing Sqoop
04:25

Example 8: Sqoop Import from MySQL to HDFS
07:49

Example 9: Sqoop Import from MySQL to Hive
04:26

Example 10: Incremental Imports using Sqoop Jobs
05:24
About the Instructor
Loony Corn
4.3 Average rating
4,985 Reviews
38,981 Students
77 Courses
An ex-Google, Stanford and Flipkart team

Loonycorn is us, Janani Ravi and Vitthal Srinivasan. Between us, we have studied at Stanford, been admitted to IIM Ahmedabad and have spent years  working in tech, in the Bay Area, New York, Singapore and Bangalore.

Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft

Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too

We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Udemy!

We hope you will try our offerings, and think you'll like them :-)