Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Mastering Apache SQOOP with Hadoop,Hive, MySQL (Mac & Win)

Name: Mastering Apache SQOOP with Hadoop,Hive, MySQL (Mac & Win)
Rating: 4.1 (44 reviews)

The Complete Course on Apache SQOOP. Great for CCA175 Spark & Hortonworks Big Data Hadoop Developer Certifications.

Created byDataShark Academy

Last updated 12/2018

English

What you'll learn

Get Ready for CCA Spark and Hadoop Developer Exam (CCA175)
Get Ready for Hortonworks Data Platform (HDP) Certified Developer Exam (HDPCD)
Advance your career by applying for high paying Big Data jobs
Install & configure Hortonworks Data Platform (HDP) Sandbox on Windows Machine
Crack Big Data Developer Interviews
Develop sound understanding about Data Ingestion process from Relational System (MySQL) to Hadoop ecosystem & vice versa

Course content

7 sections • 47 lectures • 3h 33m total length

Course Objectives1:52
Explore how sqoop migrates data from legacy relational databases to big data, moving data from relational tables to Hadoop through import and export commands with hands-on exercises.
Who we are - DataShark Academy2:14
DataShark Academy offers accelerated big data programs with real-world use cases and hands-on labs, complemented by high-definition lectures to fast-track your career toward your dream job.

What is Apache SQOOP0:58
Explore Apache Sqoop, a framework to migrate data between relational databases and Hadoop systems. It is open source and originally developed by Cloudera, with releases such as 1.4.
Why Apache SQOOP2:00
discover how Apache Sqoop imports data from relational tables into Hadoop to enable batch processing of massive data, export results via SQL, and support ETL for business intelligence.
How SQOOP Works2:43
Explore how Sqoop handles import and export with relational databases. See how it generates Java classes, jars, and runs on a Hadoop cluster to move data.

Install Hortonworks Data Platform Sandbox - ( FOR WINDOWS PC USERS ONLY )0:04
Install Hadoop & SQOOP on Machine - ( FOR MAC/LINUX USERS)0:03
Connect to HDP Sandbox Shell0:03
Get to know SQOOP CLI2:03
Sqoop provides a command line interface for Windows or Mac users to run import commands, connect to a source database, and generate Java classes and a jar for execution.
What is my Hostname3:06
Load Data into MySQL Database1:20
Follow the instructions to load data for exercise.
Data Setup for Exercises1:56
Set up the sandbox and prepare the MySQL database for exercises, then import and inspect the employees table fields such as employee number, first name, last name, gender, and date.
Let's Understand Your Data1:56
Explore the data to be imported from the school database, view the nine tables, and identify columns in employees such as employee number, first name, last name, gender, and date.

Import a Simple MySQL Table into Hadoop HDFS6:24
Learn how to import a MySQL employees table into Hadoop HDFS using Sqoop, configuring connection and destination, and how four mappers split the data.
Import a MySQL Table with Custom Name into Hadoop4:14
Use Sqoop to import a MySQL table into Hadoop, renaming the table during import and selecting an HDFS destination, as shown with employees to employees_history and partitioned data.
Controlling Paralellism in SQOOP Import Flow4:47
Master parallelism in sqoop import by adjusting the number of maps with -m to tune parallelism, producing multiple files or a single file.
Overwrite Existing Data on Hadoop while Importing7:31
Learn how to overwrite an existing Hadoop table during a Sqoop import by deleting the destination directory in dfs to avoid file exists errors, and ensure primary key and driver.
Append to Existing Data on Hadoop while Importing4:07
Learn how to append data to an existing Hadoop table during Sqoop import, using an option file to persist settings and merge data from multiple sources into a Hive table.
Only load specific columns from MySQL table into Hadoop3:34
Learn how to import only selected MySQL columns into Hadoop using Sqoop, preserving column order to reduce data size and storage, with practical terminal examples.
Import MySQL tables with No Primary keys in them - 1st Approach5:54
Import a MySQL table with no primary key using Sqoop's split by to divide data across mappers, ensuring non-null values and even distribution.
Import MySQL tables with No Primary keys in them - 2nd Approach3:24
Import MySQL tables with no primary keys using the second approach, splitting data or enforcing a source primary key, while addressing case sensitivity and map sizing.
Using SQOOP Option files to simplify CLI Commands5:01
Use Sqoop option files to store repeated parameters such as connect, username, and password, making imports easier and for security purposes, with commands reusable across different Sqoop operations.
Running SQOOP Import in Debug mode4:38
Enable sqoop import debug mode to print additional messages, helping you investigate errors, permissions issues, and behind-the-scenes steps.
Importing & Storing Data in Textual Format on Hadoop4:18
Learn to import relational data into Hadoop with Sqoop and store it in text or sequence file formats, comparing space efficiency and read performance.
Importing & Storing Data in AVRO Format on Hadoop7:13
Explore importing and storing data in Avro format on Hadoop using Sqoop, compare Avro with text formats, and troubleshoot import errors with map-reduce options to ensure successful data loading.
Importing & Storing Data in SEQUENCE Format on Hadoop3:07
Explore importing and storing data with sequence files on Hadoop, highlighting space efficiency and faster retrieval, plus the tradeoffs of binary formats for Sqoop workloads.
Importing & Storing Data in PARQUET Format on Hadoop3:16
master the sqoop import workflow to pull data from a relational table into hadoop and store it as parquet files, recognizing parquet’s binary, non-readable nature and naming conventions.
Compressing Imported Data4:50
Learn how to compress imported data with Sqoop by using the compression option and a chosen codec to reduce storage and improve performance.
Running Custom MySQL Queries on Source Tables5:48
Discover how to run a custom SQL query in Sqoop to join multiple source tables, filter female employees joined after 2000, and order by salary for Hive or Hadoop workflows.
Handling NULL values in Source Dataset3:46
Handle null values in the source dataset during a Sqoop import, and apply transformations to replace them and inspect results.
Setting Custom Field Separators in Imported Data2:55
Learn to set custom field separators in imported data with Sqoop and switch from a comma to a different delimiter.
Handling Escape Characters while Importing10:39
learn how to handle escape characters and delimiters when importing data, using enclosing by quotes to protect values. use double or single quotes and backslashes to preserve data integrity.
Avoid Enclosing all Data Values while Importing4:14
Learn how optionally enclosed by data lets Sqoop imports enclose only the needed field values, demonstrated with a name column example and an options file.
Incremental Loading of Delta data while Importing - Part 18:19
Discover how to incrementally load delta data with sqoop by using a check column, last-modified, and last value to fetch only new records since the previous checkpoint.
Incremental Loading of Delta data while Importing - Part 24:51
Explore incremental loading of delta data from a relational database into Hadoop using Apache Sqoop, comparing last-modified and append strategies to handle updated versus new records.
Importing Data Directly into Hive Table6:08
Learn to import relational data into a Hive table using Sqoop, creating a Hive database and table, and handling delimiters to ensure clean data loading.
Using HCATALOG to Load Data in ORC File Format8:26
Use HCatalog to load data into ORC format via catalog tables, storing as binary ORC files with catalog metadata for space savings and optimized reading in Hive and Sqoop.
Load ALL tables from MySQL to Hadoop5:47
Master how to load all tables from a MySQL database into Hadoop with Sqoop, using a single command, and optionally exclude tables while honoring primary keys and a destination directory.
Load ALL tables from MySQL to Hive Database6:29
This lesson demonstrates loading all tables from a MySQL database into Hive using Sqoop, then creating Hive tables on top of the imported data, with selective exclusion options.

Export a Hive table to MySQL table13:33
Learn to export data from a Hive table to a MySQL table using Apache Sqoop, including setting the delimiter and running the export to a relational database.
Export Specific Columns from Hive table to a MySQL table7:52
export specific columns from a hive table to a mysql table using sqoop, handling different column orders and an extra notes column.
Avoid Partial Data Exports in SQOOP10:00
Mastering Sqoop export: avoid partial data exports by using a staging table and a final table, enabling complete rollback and improved data integrity.
When Update Record is OK in SQOOP Export4:33
Learn how to use Sqoop export update key and update mode to identify records by the primary key, inserting when no match and updating only when a match exists.

SQOOP Jobs - Create, List, Show, Execute & Delete Operations12:52
Learn to create, run, and manage Sqoop jobs for incremental data imports, including keeping last value checkpoints, scheduling, and deleting jobs.
Make SQOOP job remember MySQL Database Password For Subsequent executions3:12
Configure sqoop to remember the MySQL password for subsequent executions by setting a property in the sqoop configuration, using the dashboard or config file in the sandbox.

Requirements

Basic knowledge of computers and SQL queries will help. Detailed explanations are provided wherever felt needed in the course.

Description

WHY APACHE SQOOP

Apache SQOOP is designed to import data from relational databases such as Oracle, MySQL, etc to Hadoop systems. Hadoop is ideal for batch processing of huge amounts of data. It is industry standard nowadays. In real world scenarios, using SQOOP you can transfer the data from relational tables into Hadoop and then leverage the parallel processing capabilities of Hadoop to process huge amounts of data and generate meaningful data insights. The results of Hadoop processing can again be stored back to relational tables using SQOOP export functionality.

Big data analytics start with data ingestion and thats where apache sqoop comes in picture. It is the first step in getting the data ready.

ABOUT THIS COURSE

In this course, you will learn step by step everything that you need to know about Apache Sqoop and how to integrate it within Hadoop ecosystem. With every concept explained with real world like examples, you will learn how to create Data Pipelines to move in/out the data from Hadoop. In this course, you will learn following major concepts in great details:

APACHE SQOOP - IMPORT TOPICS << MySQL to Hadoop/Hive >>

default hadoop storage
specific target on hadoop storage
controlling parallelism
overwriting existing data
append data
load specific columns from MySQL table
control data splitting logic
default to single mapper when needed
Sqoop Option files
debugging Sqoop Operations
Importing data in various file formats - TEXT, SEQUENCE, AVRO, PARQUET & ORC
data compression while importing
custom query execution
handling null strings and non string values
setting delimiters for imported data files
setting escaped characters
incremental loading of data
write directly to hive table
using HCATALOG parameters
importing all tables from MySQL database
importing entire MySQL database into Hive database

APACHE SQOOP - EXPORT TOPICS << Hadoop/Hive to MySQL >>

Move data from Hadoop to MySQL table
Move specific columns from Hadoop to MySQL table
Avoid partial export issues
Update Operation while exporting

APACHE SQOOP - JOBS TOPICS << Automation >>

create sqoop job
list existing sqoop jobs
check metadata about sqoop jobs
execute sqoop job
delete sqoop job
enable password storage for easy execution in production

WHAT YOU WILL ACHIEVE AFTER COMPLETING THIS COURSE

After completing this course, you will cover one of the topic that is heavily asked in below certifications. You will need to take other lessons as well to fully prepare for the test. We will be launching other courses soon.

1. CCA Spark and Hadoop Developer Exam (CCA175)

2. Hortonworks Data Platform (HDP) Certified Developer Exam (HDPCD)

WHO ARE YOUR INSTRUCTORS

This course is taught by professionals with extensive experience in handling big data applications for Fortune 100 companies of the world. They have managed to create data pipelines for extracting, transforming & processing over 100's of Terabytes of data in a day for their clients providing data analytics for user services. After successful launch of their course - Complete ElasticSearch with LogStash, Hive, Pig, MR & Kibana, same team has brought to you a complete course on learning Apache Sqoop with Hadoop, Hive, MySQL.

You will also get step by step instructions for installing all required tools and components on your machine in order to run all examples provided in this course. Each video will explain entire process in detail and easy to understand manner.

You will get access to working code for you to play with it and expand on it. All code examples are working and will be demonstrated in video lessons.

Windows users will need to install virtual machine on their device to setup single node hadoop cluster while MacBook or Linux users can directly install hadoop and sqoop components on their machines. The step by step process is illustrated within course.

Who this course is for:

This will be an excellent course for anyone who wants to learn Big Data technologies.
Anyone looking to pass CCA 175 Spark Certification exam in future
Anyone looking to pass Hortonworks Data Platform (HDP) Certified Developer Exam (HDPCD)

Mastering Apache SQOOP with Hadoop,Hive, MySQL (Mac & Win)

What you'll learn

Explore related topics

Course content

Introduction2 lectures • 4min

Apache SQOOP in a nutshell3 lectures • 6min

Environment Setup8 lectures • 11min

Apache SQOOP - IMPORT26 lectures • 2hr 20min

Apache SQOOP - EXPORT4 lectures • 36min

Apache SQOOP - JOBS2 lectures • 16min

Conclusion2 lectures • 2min

Requirements

Description

Who this course is for: