WHY APACHE SQOOP
Apache SQOOP is designed to import data from relational databases such as Oracle, MySQL, etc to Hadoop systems. Hadoop is ideal for batch processing of huge amounts of data. It is industry standard nowadays. In real world scenarios, using SQOOP you can transfer the data from relational tables into Hadoop and then leverage the parallel processing capabilities of Hadoop to process huge amounts of data and generate meaningful data insights. The results of Hadoop processing can again be stored back to relational tables using SQOOP export functionality.
Big data analytics start with data ingestion and thats where apache sqoop comes in picture. It is the first step in getting the data ready.
ABOUT THIS COURSE
In this course, you will learn step by step everything that you need to know about Apache Sqoop and how to integrate it within Hadoop ecosystem. With every concept explained with real world like examples, you will learn how to create Data Pipelines to move in/out the data from Hadoop. In this course, you will learn following major concepts in great details:
APACHE SQOOP - IMPORT TOPICS << MySQL to Hadoop/Hive >>
default hadoop storage
specific target on hadoop storage
overwriting existing data
load specific columns from MySQL table
control data splitting logic
default to single mapper when needed
Sqoop Option files
debugging Sqoop Operations
Importing data in various file formats - TEXT, SEQUENCE, AVRO, PARQUET & ORC
data compression while importing
custom query execution
handling null strings and non string values
setting delimiters for imported data files
setting escaped characters
incremental loading of data
write directly to hive table
using HCATALOG parameters
importing all tables from MySQL database
importing entire MySQL database into Hive database
APACHE SQOOP - EXPORT TOPICS << Hadoop/Hive to MySQL >>
Move data from Hadoop to MySQL table
Move specific columns from Hadoop to MySQL table
Avoid partial export issues
Update Operation while exporting
APACHE SQOOP - JOBS TOPICS << Automation >>
create sqoop job
list existing sqoop jobs
check metadata about sqoop jobs
execute sqoop job
delete sqoop job
enable password storage for easy execution in production
WHAT YOU WILL ACHIEVE AFTER COMPLETING THIS COURSE
After completing this course, you will cover one of the topic that is heavily asked in below certifications. You will need to take other lessons as well to fully prepare for the test. We will be launching other courses soon.
1. CCA Spark and Hadoop Developer Exam (CCA175)
2. Hortonworks Data Platform (HDP) Certified Developer Exam (HDPCD)
WHO ARE YOUR INSTRUCTORS
This course is taught by professionals with extensive experience in handling big data applications for Fortune 100 companies of the world. They have managed to create data pipelines for extracting, transforming & processing over 100's of Terabytes of data in a day for their clients providing data analytics for user services. After successful launch of their course - Complete ElasticSearch with LogStash, Hive, Pig, MR & Kibana, same team has brought to you a complete course on learning Apache Sqoop with Hadoop, Hive, MySQL.
You will also get step by step instructions for installing all required tools and components on your machine in order to run all examples provided in this course. Each video will explain entire process in detail and easy to understand manner.
You will get access to working code for you to play with it and expand on it. All code examples are working and will be demonstrated in video lessons.
Windows users will need to install virtual machine on their device to setup single node hadoop cluster while MacBook or Linux users can directly install hadoop and sqoop components on their machines. The step by step process is illustrated within course.