CCA 159 - Data Analyst using Sqoop, Hive and Impala
4.2 (304 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
2,381 students enrolled

CCA 159 - Data Analyst using Sqoop, Hive and Impala

Cloudera Certified Associate - Data Analyst using Technologies like Sqoop, Hive and Impala
Bestseller
4.2 (304 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
2,381 students enrolled
Last updated 6/2020
English
English [Auto]
Price: $199.99
30-Day Money-Back Guarantee
This course includes
  • 20.5 hours on-demand video
  • 1 downloadable resource
  • Full lifetime access
  • Access on mobile and TV
  • Assignments
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Data Ingestion using Apache Sqoop
  • Writing Queries using Apache Hive
  • Using Impala to execute Hive Queries
  • Prepare for CCA 159 Data Analyst Certification Exam
Requirements
  • A 64 bit Computer with at least 8 GB RAM is highly desired
  • Access to Multinode Cluster or our ITVersity Labs (Paid Subscription Required)
  • Setup Cloudera QuickStart VM in high end laptops (16 GB RAM and Quad Core) - Instructions Provided but Not Supported
  • Basic Computer Skills
  • Ability to write based SQL Queries and use Linux based environment
Description

CCA 159 Data Analyst is one of the well recognized Big Data certification. This scenario based certification exam demands in depth knowledge of Hive, Sqoop as well as basic knowledge of Impala.

This comprehensive course covers all aspects of the certification with real world examples and data sets.

  • Overview of Big Data eco system

  • HDFS Commands

  • Creating Tables in Hive

  • Loading/Inserting data into Hive tables

  • Overview of functions in Hive

  • Writing Basic Queries in Hive

  • Joining Data Sets and Set Operations in Hive

  • Windowing or Analytics Functions in Hive

  • Importing data from MySQL to HDFS

  • Performing Hive Import

  • Exporting Data from HDFS/Hive to MySQL

  • Submitting Sqoop Jobs and Incremental Imports

  • and more

Exercises will be provided to prepare before attending the certification. Intention of the course is to boost the confidence to attend the certification.

All the demos are given on our state of the art Big Data cluster. If you do not have multi node cluster, you can sign up for our labs and practice on our multi node cluster.

Who this course is for:
  • Any Big Data Professional or Aspirant who want to learn about Databases and Query Interfaces in Big Data
  • Any Business Intelligence Professional who want to understand Data Analysis in Big Data eco system
  • Any IT Professional who want to prepared for CCA 159 Data Analyst exam
Course content
Expand all 233 lectures 20:31:07
+ Introduction
4 lectures 15:54
Tools for preparation
02:32
Getting Details about the Exam
06:40
Signing up for the Exam
02:26
+ Using Cloudera QuickStart VM
8 lectures 01:12:34
Setup Cloudera QuickStart VM
04:57
Overview of Cloudera QuickStart VM
07:19
Overview of MySQL Databases
06:22
Setup NYSE Database in MySQL
12:07
Overview of HDFS and Setup Datasets
10:51
Overview of Hive and Create External Table
14:23
Validate Sqoop
09:00
+ Using ITVersity labs
6 lectures 19:48
Signing up for the labs
02:33
Connecting to the gateway node of the cluster
03:38
Overview of HDFS in the cluster
02:33
Using Hive in the cluster
05:04
Understanding MySQL in the cluster
02:51
Running Sqoop Commands in the cluster
03:09
+ Overview of Big Data eco system
13 lectures 01:10:16
Hadoop Distributed File System - Quick Overview
03:54
Distributed Computing using YARN and Map Reduce 2 - Quick Overview
06:06
Submitting Map Reduce Job in YARN Framework
04:06
Determining Number of Mappers and Reducers
04:28
Understanding YARN and Map Reduce Configuration Properties
05:58
Reviewing and Overriding Map Reduce Job Run Time Properties
08:32
Reviewing Map Reduce Job Logs - using Resource Manager and Job History Server UI
06:34
Map Reduce Job Counters
05:11
Overview of Hive
03:40
Databases in Big Data and Query Engines
02:34
Overview of Data Ingestion Tools in Big Data
04:25
+ Overview of HDFS Commands
15 lectures 01:20:13
Overview of HDFS and Properties Files
10:18
Overview of "hadoop fs" or "hdfs dfs" command
05:16
Listing Files in HDFS
06:42
User Spaces or Home Directories in HDFS
04:47
Creating Directory in HDFS
05:27
Copying Files and Directories into HDFS
08:02
File and Directory Permissions Overview
04:16
Getting Files and Directories from HDFS
04:37
Previewing Text Files in HDFS - cat and tail
03:46
Copying or Moving Files from one HDFS location to other HDFS location
05:41
Understanding Size of the File System and Data Sets - using df and du
05:01
Overview of Block Size and Replication Factor
05:41
Getting metadata of files using "hdfs fsck"
04:55
Resources and Exercises
03:57
+ Apache Hive - Getting Started
12 lectures 58:53
Overview of Hive Language Manual
04:44
Launching and Using Hive CLI
06:28
Overview of Hive Properties - SET and .hiverc
07:28
Hive CLI History and .hiverc
04:55
Running HDFS Commands using Hive CLI
02:28
Understanding Warehouse Directory
03:56
Creating Database in Hive and Switching to the Database
03:42
Creating First Table in Hive and list the tables
06:52
Retrieve metadata of Hive Tables using DESCRIBE (EXTENDED and FORMATTED)
03:45
Role of Hive Metastore
05:04
Overview of beeline - Alternative to Hive CLI
04:41
Running Hive Queries using Beeline
04:50
+ Apache Hive - Managing Tables in Hive
14 lectures 01:18:33
Create tables in Hive - orders
09:35
Overview of Data Types in Hive
04:40
Adding Comments to Columns and Tables
03:05
Loading Data into Hive Tables from Local File System
09:29
Loading Data into Hive Tables from HDFS Location
05:51
Loading Data into Hive Tables - Overwrite vs. Append
03:26
Creating External Tables in Hive
04:40
Specifying Location for Hive Tables
06:44
Managed Tables vs. External Tables
03:51
Default Delimiters in Hive Tables using Text File Format
07:32
Overview of File Formats - STORED AS Clause
02:47
Differences between Hive and RDBMS
04:43
Truncating and Dropping tables in Hive
07:59
Resources and Exercises
04:11
+ Apache Hive - Managing Tables in Hive - Partitioning and Bucketing
16 lectures 01:34:50
Introduction to Partitioning and Bucketing in Hive
02:35
Creating Tables using orc File Format - order_items
06:42
Inserting Data into order_items using stage table
05:04
Can we use LOAD Command to get data into order_items with orc file format?
09:28
Creating Partitioned Tables in Hive - orders_part with order_month as key
05:51
Adding Partitions to Tables in Hive
04:44
Loading into Partitions in Hive Tables
08:40
Inserting Data into Partitions in Hive Tables
04:20
Inserting data into Partitioned Tables - Using dynamic partition mode
06:56
Creating Bucketed Tables - orders_buck and order_items_buck
03:33
Inserting Data Into Bucketed Tables
04:35
Bucketing with Sorting
04:13
Overview of ACID Transactions in Hive
04:19
Create Tables for ACID Transactions
08:04
Inserting individual records into Hive Tables
07:56
Updating and Deleting data in Hive Bucketed Tables
07:50
+ Apache Hive - Overview of Functions
16 lectures 01:12:39
Overview of Functions
03:05
Validating Functions
03:23
String Manipulation - Case Conversion and Length
04:19
String Manipulation - substr and split
07:41
String Manipulation - trimming and padding Functions
06:27
String Manipulation - Reverse and Concatenating multiple strings
06:32
Date Manipulation - Getting Current Date and Timestamp
02:17
Date Manipulation - Date Arithmetic
05:00
Date Manipulation - trunc
02:41
Date Manipulation - Extracting information using date_format
05:13
Date Manipulation - Extracting information using year, month, day etc
02:38
Date Manipulation - Dealing with Unix Timestamp
04:38
Overview of Numeric Functions
07:21
Type Cast Functions for Data Type Conversion
03:47
Handling null values using nvl
01:52
Query Example - Get Word Count
05:45
+ Apache Hive - Writing Basic Queries
17 lectures 01:28:45
Overview of SQL
05:40
Hive Query - Execution Life Cycle
04:38
Reviewing Logs for Hive Queries
05:09
Projecting Data using SELECT and Overview of FROM Clause
04:18
Using CASE and WHEN as part of SELECT Clause
03:29
Projecting DISTINCT Values
04:45
Filtering Data using WHERE Clause
04:11
Boolean Operations such as OR and AND using multiple fields
07:03
Boolean OR vs. IN
03:50
Filtering data using LIKE Operator
03:31
Basic Aggregations using Aggregate Functions
03:41
Performing basic aggregations such as SUM, MIN, MAX etc using GROUP BY
08:05
Filtering post aggregation using HAVING
03:34
Global Sorting using ORDER BY
04:28
Overview of DISTRIBUTE BY
05:22
Sorting Data with in groups using DISTRIBUTE BY and SORT BY
12:06
Overview of CLUSTER BY
04:55