Hive to ADVANCE Hive (Real time usage) :Hadoop querying tool
4.4 (30 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
107 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Hive to ADVANCE Hive (Real time usage) :Hadoop querying tool to your Wishlist.

Add to Wishlist

Hive to ADVANCE Hive (Real time usage) :Hadoop querying tool

In and Out of Hive - Starting with Basic Hive to Advance Hive along with Use cases asked in Interviews
Best Selling
4.4 (30 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
107 students enrolled
Created by J Garg
Last updated 8/2017
English
Curiosity Sale
Current price: $10 Original price: $100 Discount: 90% off
30-Day Money-Back Guarantee
Includes:
  • 6 hours on-demand video
  • 41 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Learn Full In and out of HIVE (From Basic to Advance level)
  • Strength of this course is ADVANCE HIVE which consists of those Hive areas that are actually used in Real-time projects.
  • Querying and managing large datasets that reside in distributed storage
  • Learn Question and Use cases asked in Interviews
  • The course will be updated frequently, every-time including new topics
  • Datasets and Hive queries are available in resources tab. This will save your typing efforts
View Curriculum
Requirements
  • Basic Knowledge of Hadoop file system (HDFS)
  • Basic Knowledge of SQL
  • Rest everything is covered in this course (Hive + Advance Hive)
Description

Hive is a data processing tool on Hadoop. It is a querying tool for HDFS and the syntax of it's queries is almost similar to our old SQL. Hive is an open source-software that lets programmers analyze large data sets on Hadoop.

Benefits of this course:

"Basic Hive is not sufficient if you want to work on Real-time projects.”

Make yourself prepared to work on Real time Big data and Hive projects by learning Advance Hive from this course. Enroll into this course and get end to end knowledge of Basic + ADVANCE Hive + Interview asked Use cases. This course is very rare of its kind and includes even a very thin detail of Hive.

 In this course you will get to understand a step by step learning of very Basic Hive to Advance Hive (which is actually used in Real-time projects) like:

  • Variables in Hive 
  • Table properties of Hive
  • Map and Bucketed Joins
  • Advance functions in Hive
  • Compression techniques in Hive 
  • Configuration settings of Hive
  • Working with Multiple tables in Hive
  • Loading Unstructured data in Hive

And many more......

This course is a full package explaining even rarely used commands and concepts in Hive.After completing this course you won't find any topic left in Hive. This course is made keeping in mind the Real Implementation of Hive in Live Projects.

Apart from this I have included 1 more section which is Use cases asked in Interviews. Usually students can answer the direct questions asked by Interviewers but got stuck on Use cases . For that I have explained the frequently asked Use cases with their proper practical working in Hive. 

Additionaly ,You can download the Step Step Installation Guide (pdf) to Install Hadoop and Hive

Who is the target audience?
  • Student who wants to learn Basic Hive + ADVANCE HIVE (Real Project Oriented)
  • Any Student who wants to learn What and How Hive queries are used in Real time projects.
  • Student wants to get knowledge of Real-time Use cases asked in Interviews
Students Who Viewed This Course Also Viewed
Curriculum For This Course
61 Lectures
06:09:37
+
Introduction (Theory)
6 Lectures 15:09

This video gives a brief description of Hive.

Preview 01:22

Since syntax of Hive is similar to that of SQL .This video will explain the similarities and difference between SQL and Hive

Sql vs Hive
01:51

What Hive is and What Hive is not

What Hive is Vs What Hive is not
02:21

This videos explains the general working of Hive. Hive can process and stores structural data only. It does this by linking the metadata of it's table to the file in HDFS.

Important Note* - Hive is not a database . After loading data into hive table, Our HDFS file will not move to Hive rather after loading now Hive will sees that file in a tabular way

Preview 02:02

Architecture of Hive-

Hive has these components in its architecture:

  • UI
  • Driver
  • Compiler
  • Metastore
  • Execution engine
Architecture of Hive
03:24

Hive works in these following 3 modes:

  • Embedded mode
  • Local mode
  • Remote mode
Modes of Hive
04:09
+
Hive Basics
8 Lectures 47:28

Hive is not a database but to store the metadata of its tables Hive uses Databases .By default Hive provides Derby database but in real time projects we use strong databases like MYSQL.

This videos explains How to create database in various ways using different options.

Create databases
05:34

Since Hive stores the data into structural format we create Tables. In this lecture we will create tables in Hive .Tables in Hive can be created in many ways with a lot of options.

After table creation we have to load the data in those Hive tables. Note that loading does not mean transferring data into Hive because Hive is not a database,Rather it will just link the metadata of Hive table to corresponding HDFS file. 

Table creation and loading data into it
07:00

There are two types of tables we can create in Hive i.e. Internal or Managed and External tables.

The main difference between these two tables appears while we drop a table. In case of dropping of Internal tables both the schema as well as data is lost since Hive is responsible for both schema and data but in case of dropping External table only the schema or metadata of table is lost, the data is not lost and will be present in the same HDFS location. The data can still be accessed by other applications.

Internal vs External table- Explained
10:28

Insert statement is used to load the data from one Hive table to another Hive table.

Insert statement
05:18

Multi insert statement is used to load data from a 1 table into multiple tables.

Multi insert statement (Advance)
03:58

Once created a Hive table it's schema can be changed according to new requirements. This lecture explains how a Hive table's schema can be changed in various ways.

Alter Table Schema
07:24

Order by - In Hive, ORDER BY guarantees total ordering of data, and for that it has to be passed on to a single reducer.

Sort by-  Sort by does not ensure full ordering of data rather it ensures ordering of data within a reducer.

Distribute by-  Distribute by ensures that all rows with the same Distribute By columns will go to the same reducer.

Cluster by-  Cluster By is a shortcut for both Distribute By and Sort By. First Distribute by ensures all same column values in single reducer and then sorts those rows inside the reducer.

Lecture also explains 

difference between order by and sort by

order by with limit clause

behaviour order by command in strict and non strict mode


Sorting -- sort by, order by, distribute by, cluster by
07:24

This PDF contains a step by step procedure to install Hadoop and Hive along with other resources like Java,Virtual Box and Ubuntu

HADOOP AND HIVE INSTALLATION
00:22
+
Functions
7 Lectures 44:05

Hive provides us with these Date and Mathematical functions.

Date and Mathematical functions
06:36

This video explains various string functions in Hive.

String functions
05:39

  • Split() function in Hive
  • Substr() function in Hive
  • instr() function in Hive

These string functions are widely used in Hive.

Split(), Substr(), instr() functions
03:22

Following are the list of conditional statements used in Hive.

if(boolean testCondition, T valueTrue, T valueFalseOrNull)

Returns valueTrue when testCondition is true, returns valueFalseOrNull otherwise.

isnull( a )

Returns true if a is NULL and false otherwise.

isnotnull ( a )

Returns true if a is not NULL and false otherwise.

nvl(T value, T default_value)

Returns default value if value is null else returns value (as of HIve 0.11).

COALESCE(T v1, T v2, ...)

Returns the first v that is not NULL, or NULL if all v's are NULL.

CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END

When a = b, returns c; when a = d, returns e; else returns f.

CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END

When a = true, returns b; when c = true, returns d; else returns e.

Conditional statements
04:54

These are concepts of Advance Hive. Explode function and lateral view

explode() takes in an array as an input and outputs the elements of the array as separate rows. 

Lateral view is used to select other table columns with exploded columns.


Explode and Lateral view ( Advance functions)
10:18

This is an advance function in Hivewhere if any substring of A matches with B then it evaluates to true.

Example:  ‘Super’ RLIKE ‘Su’ –> True


Rlike function (Advance)
03:04

Advance Hive functions. They come under hive analytical functions.

In Rank() function equal ranks are given same rank value 

In dense_rank() function equal ranks are given same rank value but there will be no gaps as are in rank() function

row_number() function will not give equal rank value for same ranks .

The lecture also contains difference between rank() and dense_rank().

Rank(), Dense_rank(), Row_number() (Advance)
10:12
+
Partitioning and Bucketing
6 Lectures 47:52

Partitioning is a data organizing technique in Hive. It is a way of organizing tables into smaller partitions based on values of columns in table.

Partitioning can be done in two ways.

Static Partitioning: In static partitioning the partition column is hardcoded and we have to manually mention the partition name while loading data into it.

Dynamic Partitioning: In dynamic partitioning Hive automatically decides the partitions based on the values of partitioned column .

This lecture covers:

  • Static partition in Hive table
  • How to load data into partitioned table in Hive
Static partitioning
07:31

Dynamic Partitioning in Hive: In dynamic partitioning Hive automatically decides the partitions based on the values of partitioned column.

This lecture covers: 

  • What is dynamic partition
  • Load data into dynamic partitioned table.
  • Difference between static and dynamic partition.
  • Set dynamic partition property
Dynamic partitioning
07:46

This is an Advance Hive concept usually asked in Interviews.

Partitioned tables schema can also be altered like changing partition location, adding new partition, drop a partition.

MSCK repair table command in Hive is used to update the metadata of table in case of manually adding a partition in HDFS location

Alter Partitioned Table and MSCK Repair command (Advance)
10:34

Bucketing is another data organizing technique in Hive. While partitioning is organizing table into a number of directories, bucketing is organizing table in files.

This video explains:

  • Bucketing in Hive
  • Difference between Partitioning and Bucketing in Hive.
  • How to do bucketing.
  • Properties to be set to do bucketing.
  • Where to do buckering
Bucketing
09:27

Tablesampling is Advance Hive concept and a provision of bucketing.

Video explains:

  • What is tablesampling.
  • Difference between tablesampling and limit operator in hive


Tablesampling (Advance)
03:36

Advance Hive concept:

With No_drop and offline command we can prevent a Hive table or partition from being queried or dropped.

No_drop, Offline command (Advance)
08:58
+
Joins and Views
5 Lectures 24:30

Joins in Hive behave as same as in SQL i.e joining 2 tables bases on a joining condition. 

This video explains how to join 2 tables in Hive.

These are types of joins supported by Hive : Inner join, Left outer join, Right outer join and Full outer join

Join 2 Tables
04:33

Advance Hive concept

We can also join 3 tables in a single query in Hive.

This video contains:

  • How to join 3 tables in Hive
  • Memory organisation while joining tables.

 

Join 3 Tables + Memory management in Joins (Advance)
05:28

Map join is a Advance Hive join .

The logic behind Map join is that the join operation is executed totally on Map side .No reducer is used in Map joins.

These video contains:

  • What is Map join in Hive.
  • How Map join executes
  • When to use Map join in Hive
Preview 02:38

Views in Hive also serve the same purpose as in SQL.A View on a table can be thought of as an image of that table. 

This video contains

What are views in Hive

  • How to create views 
  • Different ways to create views
  • Dropping views in Hive
Views (Creation By different ways)
06:06

Advantages of Views in Hive.

Where to use Views in Hive

Advantages of Views
05:45
+
Indexing (Advance)
3 Lectures 19:41

Indexing in Hive is a optimization technique to reduce the throughput time of query. There is a separate index table created in which indexes of all indexed columns are stored.

In this lecture we will learn:

  • What is Indexing in Hive
  • How to create indexing in Hive
  • Types of Indexing in Hive
  • With Deferred rebuild command in Indexing
  • Advantages of Indexing
  • Where and When to use Indexing in Hive
  • When not to use Indexing in Hive
Creation of Indexes (Compact and Bitmap)
09:52

As we know there are 2 types of indexes in Hive. The question is Can we create both Indexes on same table at same time. Answer is yes. We can create multiple Indexes on same table. More iis explained in the videos

Multiple Indexes on same table
08:16

Indexing should be use blindly everywhere in Hive since Indexing can be disadvantageous also in some cases. This video explains When and When not to use Indexing in Hive.

When and When not to use Indexing
01:33
+
UDF's (User defined functions) Advance
2 Lectures 09:34

UDF (User defined functions) are backbone of any Real time Hive project because in Live projects requirements are complicated and can not be met with Built-in functions. WIth UDF we can write our own functions according to the requirement and then use those functions in Hive queries.

What is UDF
01:47

This video explains how to create a UDF function in Java and how to use it in Hive.

UDF Implementation -Practical
07:47
+
Table Properties (Advance)
5 Lectures 35:09

Purge property is used In Hive when we don't our data doing to trash. When the purge property is set to true the data will be completely gone and cannot be recovered.

This Advance Hive lecture contains:

  • Purge property in Hive
  • How to drop a table in Hive
  • How to truncate table in Hive
  • Difference between dropping a table and truncating a table in Hive
Preview 05:12

Immutable property is also used in Real time projects of Hive. This lecture shows behaviour of Insert statement with into and overwrite options when immutable property is set to true.

Immutable Table property
10:28

Advance Hive table property:

This property is used in Hive tables while loading data into them. We can skip some rows from file to be loaded into our Hive table

Skipping Header and Footer records while loading in table
08:02

Advance Hive table property:

By default nothing is null between delimiters in a file for Hive. This property tells Hive what value should be considered as NULL .

Preview 07:26

Advance Hive table property

There are different types of file formats supported by Hive. Hive can store the data in RC ,Parquet ,Textfile formats. Out of these 1 is ORC file format. This video explains the table properties of that Hive table which stores data in ORC format.

ORC Table properties
04:01
+
Configurations & Settings in Hive (Advance)
4 Lectures 23:22

This lecture includes the first set of configurations and settings that can made in Hive

Part 1
10:17

This lecture includes the second set of configurations and settings that can made in Hive

Part 2
03:33

Hive creates its own small files during and after  query execution. By setting some Hive properties we can tell Hive to automatically Merge files. It is an optimization technique and concept of Advance Hive

Merge files in Hive
02:55

Parallelism in Hive means executing independent portions of a query parallely . This is done to reduce the execution time.

**Note: Parallelism should be used wisely, it may lead to a deadlock situation.

This video contains:

  • Parallel processing in HIve.
  • hive.exec.parallel property
  • Hive parallel join
Parallelism Property
06:37
+
Variables in Hive (Advance)
4 Lectures 22:14

Advance Hive:

Hive variables are widely used in Real time Live Hive Projects. Variables in Hive behave as same in any other programming language. We can declare variables in two ways hiveconf and hivevar

Set Hive variable (hiveconf)
04:29

Advance Hive concept:

We can also pass value to Hive variables from Bash shell. These Hive variables can be passes to a query or to a Hive script.

Using variables in bash shell
05:37

Advance Hive:

We can also run Unix and Hadoop commands from our Hive shell. This is a good approach to run Unix and Hadoop commands from Hive shell rather an opening a new JVM instance.

Running Unix commands from Hive Shell
09:46

Advance Hive concept

This Hive lecture explains how to a variable gets it's value from other variable and to do this which Hive property should be set to true.

Substituting value of a Variable
02:22
2 More Sections
About the Instructor
J Garg
4.4 Average rating
30 Reviews
107 Students
1 Course
Technical Lead in a prestigious MNC

Working as a Technical Lead in a well known MNC. Have 5 years of experience in Hadoop Technologies and Hive being the strength. I have implemented many end to end customer projects while serving the role of Technical Lead. I am also engaged in Classroom training and training freshers in my Company.