Taught by a team which includes 2 Stanford-educated, ex-Googlers. This team has decades of practical experience in working with Java and with billions of rows of data.
Use Flume and Sqoop to import data to HDFS, HBase and Hive from a variety of sources, including Twitter and MySQL
Let’s parse that.
Import data : Flume and Sqoop play a special role in the Hadoop ecosystem. They transport data from sources like local file systems, HTTP, MySQL and Twitter which hold/produce data to data stores like HDFS, HBase and Hive. Both tools come with built-in functionality and abstract away users from the complexity of transporting data between these systems.
Flume: Flume Agents can transport data produced by a streaming application to data stores like HDFS and HBase.
Sqoop: Use Sqoop to bulk import data from traditional RDBMS to Hadoop storage architectures like HDFS or Hive.
Practical implementations for a variety of sources and data stores ..
.. Flume features :
Flume Agents, Flume Events, Event bucketing, Channel selectors, Interceptors
.. Sqoop features :
Sqoop import from MySQL, Incremental imports using Sqoop Jobs
Using discussion forums
Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(
We're super small and self-funded with only 2 people developing technical video content. Our mission is to make high-quality courses available at super low prices.
The only way to keep our prices this low is to *NOT offer additional technical support over email or in-person*. The truth is, direct support is hugely expensive and just does not scale.
We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.
It is a hard trade-off.
Thank you for your patience and understanding!
Installing Flume is pretty straightforward.
A Flume Agent is the most basic unit that can exist independently in Flume. An Agent is made up of Sources, Sinks and Channels.
Our first example of a Flume Agent using a Spooling Directory Source, a File Channel and a Logger Sink
A Flume event represents 1 record of data. Flume events consist of event headers and the event body.
Learn how to use HDFS as a sink with Flume
HTTP Sources can be pretty handy when you have an application capable of making POST requests.
Event Headers in Flume carry useful metadata. Use event headers to bucket events in HDFS.
Let's see how to use a HBase sink as the endpoint of the Flume Agent
HTTP to HDFS and Logger at the same time. See how to route events using channel selectors.
Connect with the Twitter API using Flume. Use an Interceptor to do Regex filtering within Flume itself!
If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares.
Install Sqoop and the connector for Sqoop to MySQL
Loonycorn is us, Janani Ravi and Vitthal Srinivasan. Between us, we have studied at Stanford, been admitted to IIM Ahmedabad and have spent years working in tech, in the Bay Area, New York, Singapore and Bangalore.
Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft
Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too
We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Udemy!
We hope you will try our offerings, and think you'll like them :-)