Flume and Sqoop for Ingesting Big Data
4.4 (49 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,825 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Flume and Sqoop for Ingesting Big Data to your Wishlist.

Add to Wishlist

Flume and Sqoop for Ingesting Big Data

Import data to HDFS, HBase and Hive from a variety of sources , including Twitter and MySQL
Best Seller
4.4 (49 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,825 students enrolled
Created by Loony Corn
Last updated 10/2016
Current price: $10 Original price: $50 Discount: 80% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 2.5 hours on-demand video
  • 29 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Use Flume to ingest data to HDFS and HBase
  • Use Sqoop to import data from MySQL to HDFS and Hive
  • Ingest data from a variety of sources including HTTP, Twitter and MySQL
View Curriculum
  • Knowledge of HDFS is a prerequisite for the course
  • HBase and Hive examples assume basic understanding of HBase and Hive shells
  • HDFS is required to run most of the examples, so you'll need to have a working installation of HDFS

Taught by a team which includes 2 Stanford-educated, ex-Googlers. This team has decades of practical experience in working with Java and with billions of rows of data. 

Use Flume and Sqoop to import data to HDFS, HBase and Hive from a variety of sources, including Twitter and MySQL

Let’s parse that.

Import data : Flume and Sqoop play a special role in the Hadoop ecosystem. They transport data from sources like local file systems, HTTP, MySQL and Twitter which hold/produce data to data stores like HDFS, HBase and Hive. Both tools come with built-in functionality and abstract away users from the complexity of transporting data between these systems. 

Flume: Flume Agents can transport data produced by a streaming application to data stores like HDFS and HBase. 

Sqoop: Use Sqoop to bulk import data from traditional RDBMS to Hadoop storage architectures like HDFS or Hive. 

What's Covered:

Practical implementations for a variety of sources and data stores ..

  • Sources : Twitter, MySQL, Spooling Directory, HTTP
  • Sinks : HDFS, HBase, Hive

.. Flume features : 

Flume Agents, Flume Events, Event bucketing, Channel selectors, Interceptors

.. Sqoop features : 

Sqoop import from MySQL, Incremental imports using Sqoop Jobs

Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2 people developing technical video content. Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is to *NOT offer additional technical support over email or in-person*. The truth is, direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.

It is a hard trade-off.

Thank you for your patience and understanding!

Who is the target audience?
  • Yep! Engineers building an application with HDFS/HBase/Hive as the data store
  • Yep! Engineers who want to port data from legacy data stores to HDFS
Students Who Viewed This Course Also Viewed
Curriculum For This Course
17 Lectures
You, This Course and Us
1 Lecture 01:46

Let's start with an introduction about the course, and what we'll know at the end of the course.

Preview 01:46
Why do we need Flume and Sqoop?
1 Lecture 18:23

Let's understand Flume and Sqoop and their role in the Hadoop Ecosystem

Preview 18:23
11 Lectures 01:33:51

Installing Flume is pretty straightforward. 

Installing Flume

A Flume Agent is the most basic unit that can exist independently in Flume. An Agent is made up of Sources, Sinks and Channels.

Preview 10:57

Our first example of a Flume Agent using a Spooling Directory Source, a File Channel and a Logger Sink

Example 1 : Spool to Logger

A Flume event represents 1 record of data. Flume events consist of event headers and the event body.

Flume Events are how data is transported

Learn how to use HDFS as a sink with Flume

Example 2 : Spool to HDFS

HTTP Sources can be pretty handy when you have an application capable of making POST requests.

Example 3: HTTP to HDFS

Event Headers in Flume carry useful metadata. Use event headers to bucket events in HDFS.

Example 4: HTTP to HDFS with Event Bucketing

Let's see how to use a HBase sink as the endpoint of the Flume Agent

Example 5: Spool to HBase

HTTP to HDFS and Logger at the same time. See how to route events using channel selectors.

Example 6: Using multiple sinks and Channel selectors

Connect with the Twitter API using Flume. Use an Interceptor to do Regex filtering within Flume itself! 

Example 7: Twitter Source with Interceptors

If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. 

[For Linux/Mac OS Shell Newbies] Path and other Environment Variables
4 Lectures 22:04

Install Sqoop and the connector for Sqoop to MySQL

Installing Sqoop

Example 8: Sqoop Import from MySQL to HDFS

Example 9: Sqoop Import from MySQL to Hive

Example 10: Incremental Imports using Sqoop Jobs
About the Instructor
Loony Corn
4.3 Average rating
5,428 Reviews
42,398 Students
75 Courses
An ex-Google, Stanford and Flipkart team

Loonycorn is us, Janani Ravi and Vitthal Srinivasan. Between us, we have studied at Stanford, been admitted to IIM Ahmedabad and have spent years  working in tech, in the Bay Area, New York, Singapore and Bangalore.

Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft

Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too

We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Udemy!

We hope you will try our offerings, and think you'll like them :-)