Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Hive to ADVANCE Hive (Real time usage) :Hadoop querying tool

Name: Hive to ADVANCE Hive (Real time usage) :Hadoop querying tool
Rating: 4.7 (6383 reviews)

In and Out of Apache Hive - From Basic to Advance Hive (Real-world concepts) + Use cases asked in Hive interviews

Highest Rated

Created byJ Garg

Last updated 5/2025

English

What you'll learn

Learn the complete in-and-out details of Apache Hive from Basic to Advance level.
Start by exploring the fundamentals of Hive including Hive's Introduction, Architecture, Installation, SQL vs Hive etc.
Learn Basic Hive concepts to Create Databases & Tables, Insert data, Joins, Views, Mathematical functions, String functions, Conditional statements etc.
Strength of this course is ADVANCE HIVE, covering those Hive features that are actively used in Real-time projects.
Partitioning, Bucketing, Explode & Lateral views, Tablesampling, Variables, Optimized Map Joins, User defined functions (UDFs)
ACID features of Hive, Different types of files in Hive, Custom input formatter, Archiving and Compression techniques etc.
Learn about various Hive Tableproperties - Skip header & footer, Immutable table, Purge, Null format, Parallelism and ORC table properties.
Implement Slowly changing Dimensions (SCD 1) using Hive queries.
How to create a Hive table for XML data and load data in it.
Learn Interviews asked Hive Questions and Use cases.

Course content

19 sections • 77 lectures • 6h 58m total length

Introduction to Hive3:44
This video gives a brief description of Hive.
Announcement0:41
Motivation of Hive1:21
This videos explains the reason of Why Hive was developed.
Sql vs Hive1:32
Since syntax of Hive is similar to that of SQL .This video will explain the similarities and difference between SQL and Hive
Trailer - Working of Hive2:08
This videos explains the general working of Hive. Hive can process and stores structural data only. It does this by linking the metadata of it's table to the file in HDFS.
Important Note* - Hive is not a database . After loading data into hive table, Our HDFS file will not move to Hive rather after loading now Hive will sees that file in a tabular way
Architecture of Hive4:58
Architecture of Hive-
Hive has these components in its architecture:
UI
Driver
Compiler
Metastore
Execution engine

HADOOP AND HIVE INSTALLATION8:17
This PDF contains a step by step procedure to install Hadoop and Hive along with other resources like Java,Virtual Box and Ubuntu
Create databases7:15
Hive is not a database but to store the metadata of its tables Hive uses Databases .By default Hive provides Derby database but in real time projects we use strong databases like MYSQL.
This videos explains How to create database in various ways using different options.
Table creation and loading data into it |Part 18:15
Since Hive stores the data into structural format we create Tables. In this lecture we will create tables in Hive .Tables in Hive can be created in many ways with a lot of options.
After table creation we have to load the data in those Hive tables. Note that loading does not mean transferring data into Hive because Hive is not a database,Rather it will just link the metadata of Hive table to corresponding HDFS file.
Table creation and loading data into it |Part 23:32
Part 2
Since Hive stores the data into structural format we create Tables. In this lecture we will create tables in Hive .Tables in Hive can be created in many ways with a lot of options.

After table creation we have to load the data in those Hive tables. Note that loading does not mean transferring data into Hive because Hive is not a database,Rather it will just link the metadata of Hive table to corresponding HDFS file.
Internal vs External table- Explained10:28
There are two types of tables we can create in Hive i.e. Internal or Managed and External tables.
The main difference between these two tables appears while we drop a table. In case of dropping of Internal tables both the schema as well as data is lost since Hive is responsible for both schema and data but in case of dropping External table only the schema or metadata of table is lost, the data is not lost and will be present in the same HDFS location. The data can still be accessed by other applications.
Create Tables
Insert statement6:30
Insert statement is used to load the data from one Hive table to another Hive table.
Multi insert statement (Advance)2:44
Multi insert statement is used to load data from a 1 table into multiple tables.
Alter Table Schema6:53
Once created a Hive table it's schema can be changed according to new requirements. This lecture explains how a Hive table's schema can be changed in various ways.
Sorting -- sort by, order by, distribute by, cluster by7:23
Order by - In Hive, ORDER BY guarantees total ordering of data, and for that it has to be passed on to a single reducer.
Sort by- Sort by does not ensure full ordering of data rather it ensures ordering of data within a reducer.
Distribute by- Distribute by ensures that all rows with the same Distribute By columns will go to the same reducer.
Cluster by- Cluster By is a shortcut for both Distribute By and Sort By. First Distribute by ensures all same column values in single reducer and then sorts those rows inside the reducer.
Lecture also explains
difference between order by and sort by
order by with limit clause
behaviour order by command in strict and non strict mode

Date and Mathematical functions6:07
Hive provides us with these Date and Mathematical functions.
String functions5:39
This video explains various string functions in Hive.
Split(), Substr(), instr() functions3:05
Split() function in Hive
Substr() function in Hive
instr() function in Hive
These string functions are widely used in Hive.

Conditional statements4:56

Following are the list of conditional statements used in Hive.

if(boolean testCondition, T valueTrue, T valueFalseOrNull)	Returns valueTrue when testCondition is true, returns valueFalseOrNull otherwise.
isnull( a )	Returns true if a is NULL and false otherwise.
isnotnull ( a )	Returns true if a is not NULL and false otherwise.
nvl(T value, T default_value)	Returns default value if value is null else returns value (as of HIve 0.11).
COALESCE(T v1, T v2, ...)	Returns the first v that is not NULL, or NULL if all v's are NULL.
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END	When a = b, returns c; when a = d, returns e; else returns f.
CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END	When a = true, returns b; when c = true, returns d; else returns e.

Explode and Lateral view ( Advance functions)7:21
These are concepts of Advance Hive. Explode function and lateral view
explode() takes in an array as an input and outputs the elements of the array as separate rows.
Lateral view is used to select other table columns with exploded columns.
Rlike function (Advance)3:04
This is an advance function in Hivewhere if any substring of A matches with B then it evaluates to true.
Example: ‘Super’ RLIKE ‘Su’ –> True
Rank(), Dense_rank(), Row_number() (Advance)10:12
Advance Hive functions. They come under hive analytical functions.
In Rank() function equal ranks are given same rank value
In dense_rank() function equal ranks are given same rank value but there will be no gaps as are in rank() function
row_number() function will not give equal rank value for same ranks .
The lecture also contains difference between rank() and dense_rank().
Practice Mathematical Functions

What is Partitioning?1:42
Static partitioning7:02
Partitioning is a data organizing technique in Hive. It is a way of organizing tables into smaller partitions based on values of columns in table.
Partitioning can be done in two ways.

Static Partitioning: In static partitioning the partition column is hardcoded and we have to manually mention the partition name while loading data into it.
Dynamic Partitioning: In dynamic partitioning Hive automatically decides the partitions based on the values of partitioned column .
This lecture covers:
Static partition in Hive table
How to load data into partitioned table in Hive
Dynamic partitioning4:38
Dynamic Partitioning in Hive: In dynamic partitioning Hive automatically decides the partitions based on the values of partitioned column.
This lecture covers:
What is dynamic partition
Load data into dynamic partitioned table.
Difference between static and dynamic partition.
Set dynamic partition property
Alter Partitioned Table and MSCK Repair command (Advance)6:25
This is an Advance Hive concept usually asked in Interviews.
Partitioned tables schema can also be altered like changing partition location, adding new partition, drop a partition.
MSCK repair table command in Hive is used to update the metadata of table in case of manually adding a partition in HDFS location

What is Bucketing?2:27
Bucketing is another data organizing technique in Hive. While partitioning is organizing table into a number of directories, bucketing is organizing table in files.
This video explains:
Bucketing in Hive
Difference between Partitioning and Bucketing in Hive.
How to do bucketing.
Properties to be set to do bucketing.
Where to do buckering
Create Bucketed Table8:11
Tablesampling (Advance)5:09
Tablesampling is Advance Hive concept and a provision of bucketing.
Video explains:
What is tablesampling.
Difference between tablesampling and limit operator in hive
No_drop, Offline command (Advance)5:04
Advance Hive concept:
With No_drop and offline command we can prevent a Hive table or partition from being queried or dropped.
Partitioning
Quiz 1

Inner Joins on 2 Tables3:38
Joins in Hive behave as same as in SQL i.e joining 2 tables based on a joining condition.
This video explains how to join 2 tables in Hive.
These are types of joins supported by Hive : Inner join, Left outer join, Right outer join and Full outer join
Outer Joins on 2 Tables4:35
Joins in Hive behave as same as in SQL i.e joining 2 tables based on a joining condition.
This video explains how to join 2 tables in Hive.
These are types of joins supported by Hive : Inner join, Left outer join, Right outer join and Full outer join
Join 3 Tables in Hive4:32
Advance Hive concept
We can also join 3 tables in a single query in Hive.
This video contains:
How to join 3 tables in Hive
Memory organisation while joining tables.
Memory Management & Optimization of Joins2:23
Map Joins (Advance)5:47
Map join is a Advance Hive join .
The logic behind Map join is that the join operation is executed totally on Map side .No reducer is used in Map joins.
These video contains:
What is Map join in Hive.
How Map join executes
When to use Map join in Hive

Creation of Indexes (Compact and Bitmap)9:52
Indexing in Hive is a optimization technique to reduce the throughput time of query. There is a separate index table created in which indexes of all indexed columns are stored.
In this lecture we will learn:
What is Indexing in Hive
How to create indexing in Hive
Types of Indexing in Hive
With Deferred rebuild command in Indexing
Advantages of Indexing
Where and When to use Indexing in Hive
When not to use Indexing in Hive
Multiple Indexes on same table8:16
As we know there are 2 types of indexes in Hive. The question is Can we create both Indexes on same table at same time. Answer is yes. We can create multiple Indexes on same table. More iis explained in the videos
When and When not to use Indexing1:33
Indexing should be use blindly everywhere in Hive since Indexing can be disadvantageous also in some cases. This video explains When and When not to use Indexing in Hive.

What is UDF1:47
UDF (User defined functions) are backbone of any Real time Hive project because in Live projects requirements are complicated and can not be met with Built-in functions. WIth UDF we can write our own functions according to the requirement and then use those functions in Hive queries.
UDF Implementation - Practical7:40
This video explains how to create a UDF function in Java and how to use it in Hive.

Skipping Header and Footer records while loading in table8:02
Advance Hive table property:
This property is used in Hive tables while loading data into them. We can skip some rows from file to be loaded into our Hive table
Immutable Table property10:28
Immutable property is also used in Real time projects of Hive. This lecture shows behaviour of Insert statement with into and overwrite options when immutable property is set to true.
Purge property + Difference between Drop and Truncate5:12
Purge property is used In Hive when we don't our data doing to trash. When the purge property is set to true the data will be completely gone and cannot be recovered.
This Advance Hive lecture contains:
Purge property in Hive
How to drop a table in Hive
How to truncate table in Hive
Difference between dropping a table and truncating a table in Hive
Null Format property7:26
Advance Hive table property:
By default nothing is null between delimiters in a file for Hive. This property tells Hive what value should be considered as NULL .
ACID/Transactional features of Hive ( Advance)10:12
ACID properties in Hive - Advance Hive concept
ORC Table properties4:01
Advance Hive table property
There are different types of file formats supported by Hive. Hive can store the data in RC ,Parquet ,Textfile formats. Out of these 1 is ORC file format. This video explains the table properties of that Hive table which stores data in ORC format.

Requirements

Basic Knowledge of Hadoop file system (HDFS)
Basic Knowledge of SQL
Rest everything is covered in this course (Hive + Advance Hive)

Description

"Apache Hive is an open source data processing tool on Hadoop that enables programmers to analyze large datasets stored in HDFS by using SQL like queries."

"Basic Hive is not sufficient if you want to work on Real-world projects.”
Make yourself prepared to work on Real time Big data and Hive projects by learning Advance Hive from this course. In this course you will get end-to-end knowledge of Basic + ADVANCE Hive + Interview asked Use cases. This course is very rare of its kind and includes even very thin details of Hive which are not available anywhere online.

What in nutshell is included in the course ?

Learn the Basic Hive concepts to kickstart your learning including :

Introduction to Hive tool, Architecture, Installation, SQL vs Hive.
Create Databases & Tables and insert data into them.
Internal vs External tables concept in Hive
Joins, Views, Mathematical functions, Date functions, String functions, Conditional statements etc.

Learn the ADVANCE Hive concepts that are actively used in Real-world projects like:

Variables in Hive
Partitioning and Bucketing in Hive
Explode & Lateral view in Hive
Table properties of Hive
Custom Input Formatter
Map and Bucketed Joins
Advance functions in Hive
ACID features of Hive
User defined functions (UDFs) in Hive
Compression techniques in Hive
Configuration settings of Hive
Working with Multiple tables in Hive
Loading Unstructured data in Hive

and many more...

This course is a complete package explaining even rarely used commands and concepts in Hive. After completing this course you won't find any topic left.

Apart from this I have included one more section which is Use cases asked in Interviews. Usually students can answer the direct questions asked by Interviewers but got stuck on Use cases . For that I have explained the frequently asked Use cases with their proper practical working in Hive.

Who this course is for:

Data Engineers who wants to learn Basic Hive + ADVANCE HIVE (Real Project Oriented).
Candidates who wants to gain knowledge of Real-world Hive Use cases asked in Interviews.

Hive to ADVANCE Hive (Real time usage) :Hadoop querying tool

What you'll learn

Explore related topics

Course content

Introduction (Theory)6 lectures • 14min

Hive Basic Commands9 lectures • 1hr 1min

Functions in Hive7 lectures • 40min

Partitioning in Hive4 lectures • 20min

Bucketing in Hive4 lectures • 21min

Joins in Hive5 lectures • 21min

Views in Hive3 lectures • 12min

Indexing (Advance)3 lectures • 20min

UDF's (User defined functions) Advance2 lectures • 9min

Table Properties (Advance)6 lectures • 45min

Requirements

Description

Who this course is for: