Big Data Internship Program - Data Ingestion-Sqoop and Flume

Name: Big Data Internship Program - Data Ingestion-Sqoop and Flume
Rating: 3.9 (100 reviews)

Complete Reference for Apache Sqoop and Flume

Created byBig Data Trunk

Last updated 5/2019

English

What you'll learn

After this course, students will have knowledge and understanding of Data Ingestion .
Have excellent understanding of Apache Sqoop and flume tool with hands-on experience .
Understand the working of a project in real-world scenario.

Course content

5 sections • 30 lectures • 2h 20m total length

Introduction to Data Ingestion5:53
In this video, we have explained what is data ingestion, How to process data, challenges in data ingestion, the key function of data ingestion.
Recap - Big data Internship Program - Part 1 Foundation3:52
This part -1 course is focused on the foundation of Big data . It covers technical items like

Technical Foundation

Refresh your knowledge on Unix
Java based on usage into Big Data .
Understand git /github which is used by most of the companies for source control
Hadoop Installation

Part - 1 is free here

https://www.udemy.com/big-data-internship-program-part-1-foundation
Data Ingestion Tools4:48
In this video, we have explained what is data ingestion and there tools available in markets.
Some more Data Ingestion Tools5:38
In this video, we have explained data ingestion tools Kafka, Chukwa,Storm etc.

Introduction to FileFormats4:04
This video shows different type of file format supported in Hadoop.
Introduction to File Formats
Text/CSV file formats3:01
CSV /Text files are quite common and often used for exchanging data between Hadoop and external systems.
Text/CSV file formats
BinaryFileFormats-Sequence Files2:47
This video shows that Sequence files store data in a binary format with a similar structure to CSV. Like CSV, sequence files do not store metadata with the data so the only schema evolution option is appending new fields.
BinaryFileFormats-Sequence Files
BinaryFileFormats-Avro4:24
Avro files are quickly becoming the best multi-purpose storage format within Hadoop. Avro files store metadata with the data but also allow specification of an independent schema for reading the file. Here we show you all about this file format .
BinaryFileFormats-Avro
Columnar formats-RC and ORC files4:31
RC Files or Record Columnar Files were the first columnar file format adopted in Hadoop. Like columnar databases, the RC file enjoys significant compression and query performance benefits.ORC Files or Optimized RC Files were invented to optimize performance in Hive and are primarily backed by HortonWorks. This video shows about these two file format.
Columnar formats-RC and ORC files
Columnar format-Parquet Files4:10
Parquet Files are yet another columnar file format that originated from Hadoop creator Doug Cutting’s Trevni project. Like RC and ORC, Parquet enjoys compression and query performance benefits, and is generally slower to write than non-columnar file formats. In this video you can learn more about this file format .
Columnar format-Parquet Files

Introduction to sqoop5:53
In this video, we have explained to you what is sqoop, what is flume, sqoop work flow, sqoop architecture.
Introduction to Sqoop
Sqoop Import4:23
In this video, we have explained what is import command, how sqoop import command is executed.
Sqoop Import
Import data from MySql to HDFS6:38
In this video we have explained how to execute commands in terminal,how to get table list, how to get list of data bases, how to import data in hdfs.
Import data from MySql to HDFS
Other variations of Sqoop Import Command5:32
In this video, we have explained how to run sqoop commands, what is structure of sqoop commands, what are the parameters used in the execution of sqoop commands.
Other variations of Sqoop Import Command
Running a Sqoop Export Command5:50
In this video we have explained what is sqoop export, and how it is used.
Running a Sqoop Export Command
Sqoop Jobs5:41
In this video, we have explained what is sqoop jobs how it used and when it is used. how to create jobs, how to list sqoop jobs available.
Sqoop Jobs
Sqoop incremental import5:46
In this video we have explained what is incremental sqoop, and how it works.what are the incremental import parameters etc.
Sqoop incremental import
Lab: Sqoop incremental Import5:29
In this video, we have explained how incremental import works, how to append data to the table.
Test Your Sqoop Knowledge

What is Flume?2:31
In this video, we have explained what is flume, and where it is used.difference between flume and sqoop.
What is Flume?
Data Flow Model4:24
In this video, we have explained how flume works, what is flume agent what are the components of flume agent, how data is flow between various components of the flume.
Data Flow Model
Flume Configuration File4:25
In this video, we have explained what are components of the flume, how they are configured i.e how flume agent is configured.
Flume Configuration File
HelloWorld example in Flume6:09
In this video, we have explained how to run flume agent. and get a result.
Multi Agent flow2:37
In this video, we have explained what is multi-agent flume, what is the consolidation of flume.
Multi Agent flow
Multiplexing5:05
In this video, we have explained what is multiplexing,use of multiplexing, channel selector etc.
Multiplexing
Interceptors in Flume2:30
In this video, we have tried to explain what is an interceptor, why it is used, how it is configured, and how this runs. what are types of interceptors?
Interceptors in Flume
Test Flume Knowledge
Book recommendation Project Overview3:04
In this video, we have tried to explain what is Recommendation with the help of book recommendation concepts.
Book recommendation Project Overview

Book recommendation Project Sqoop Work Part-111:27
In this video, we have shown you how to load data in MySQL and then how to import data in hdfs. through sqoop commands.
BookReccomendation Project- Sqoop Work -Part22:37
In this video, we have explained what is a script,how we can execute our job by using the shell script.
Book recommendation Project - Flume Work7:05
In Video, we have shown how book recommendation is working, how the rating is generated in hdfs through the flume.
Bonus Lecture0:20

Requirements

Should know the basics of BigData concepts like-HDFS,MapReduce and some knowledge of RDBMS.
Should take our Part-1 free course to understand these concepts better. (Not mandatory but desirable).

Description

This course is a part of “Big data Internship Program” which is aligned to a typical Big data project life cycle stage.

Foundation
Ingestion
Storage
Processing
Visualization

This course is focused on the Ingestion in Big data .

Our Course is divided into two part 1) Technical Knowledge with examples and 2) Work on project

Technical Knowledge

Big Data ingestion's concept and means
Sqoop concept and feature.
Good understanding of sqoop tools with arguments
Flume concept and configuration
Flume features: Multiplexing,Flume Agents,Interceptors etc .
Understanding of different File Format supported by Hadoop

Project Part

Get the access to our private GitHub repository
Build the first part of Our Recommendation Book project using sqoop and flume

Who this course is for:

This course is for anyone who wants to learn about Data Ingestion in Hadoop Ecosystem with Sqoop and Flume.
Students who want to do internship.
Big Data Analytics Professional.

Big Data Internship Program - Data Ingestion-Sqoop and Flume

What you'll learn

Explore related topics

Course content

Introduction4 lectures • 20min

Different types of File Formats in Hadoop6 lectures • 23min

Sqoop8 lectures • 45min

Flume8 lectures • 31min

Project Work4 lectures • 21min

Requirements

Description

Who this course is for: