From 0 to 1: Hive for Processing Big Data
4.1 (239 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2,741 students enrolled
Wishlisted Wishlist

Please confirm that you want to add From 0 to 1: Hive for Processing Big Data to your Wishlist.

Add to Wishlist

From 0 to 1: Hive for Processing Big Data

End-to-End Hive : HQL, Partitioning, Bucketing, UDFs, Windowing, Optimization, Map Joins, Indexes
Best Selling
4.1 (239 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2,741 students enrolled
Created by Loony Corn
Last updated 11/2016
English
Curiosity Sale
Current price: $10 Original price: $50 Discount: 80% off
30-Day Money-Back Guarantee
Includes:
  • 15.5 hours on-demand video
  • 137 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Write complex analytical queries on data in Hive and uncover insights
  • Leverage ideas of partitioning, bucketing to optimize queries in Hive
  • Customize hive with user defined functions in Java and Python
  • Understand what goes on under the hood of Hive with HDFS and MapReduce
View Curriculum
Requirements
  • Hive requires knowledge of SQL. If you don't know SQL, please head to the SQL primer at the end of the course first.
  • You'll need to know Java if you are interested in the sections on custom user defined functions
  • No other prerequisites: The course covers everything you need to install Hive and run queries!
Description

Prerequisites: Hive requires knowledge of SQL. The course includes and SQL primer at the end. Please do that first if you don't know SQL. You'll need to know Java if you want to follow the sections on custom functions. 

Taught by a 4 person team including 2 Stanford-educated, ex-Googlers  and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data. 

 Hive is like a new friend with an old face (SQL). This course is an end-to-end, practical guide to using Hive for Big Data processing. 

Let's parse that 

A new friend with an old face: Hive helps you leverage the power of Distributed computing and Hadoop for Analytical processing. It's interface is like an old friend : the very SQL like HiveQL. This course will fill in all the gaps between SQL and what you need to use Hive. 

End-to-End: The course is an end-to-end guide for using Hive:  whether you are analyst who wants to process data  or an Engineer who needs to build custom functionality or optimize performance - everything you'll need is right here. New to SQL? No need to look elsewhere. The course  has a primer on all the basic SQL constructs, . 

Practical: Everything is taught using real-life examples, working queries and code . 

What's Covered: 

Analytical Processing: Joins, Subqueries, Views, Table Generating Functions, Explode, Lateral View, Windowing and more

Tuning Hive for better functionality: Partitioning, Bucketing, Join Optimizations, Map Side Joins, Indexes, Writing custom User Defined functions in Java. UDF, UDAF, GenericUDF, GenericUDTF,  Custom functions in Python,  Implementation of MapReduce for Select, Group by and Join

For SQL Newbies: SQL In Great Depth


Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2 people developing technical video content. Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is to *NOT offer additional technical support over email or in-person*. The truth is, direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.

It is a hard trade-off.

Thank you for your patience and understanding!


Who is the target audience?
  • Yep! Analysts who want to write complex analytical queries on large scale data
  • Yep! Engineers who want to know more about managing Hive as their data warehousing solution
Students Who Viewed This Course Also Viewed
Curriculum For This Course
87 Lectures
15:15:27
+
You, Us & This Course
1 Lecture 02:02

We start with an introduction. What is the course about? What will you know at the end of the course? 

Preview 02:02
+
Introducing Hive
4 Lectures 43:30

Data warehousing systems - which have become the rage with the rise of 'Big Data' -  are quite different from traditional transaction processing systems. Hive is a prototypical data warehousing system. 

Preview 12:59

Hive is built atop Hadoop, and can even be characterized as the SQL skin atop Hadoop MapReduce.

Hive and Hadoop
09:19

Hive tries really hard - and mostly succeeds - at pretending to be a relational DBMS, but really, under the hood its quite different - understand how, and understand schema-on-read.

Hive vs Traditional Relational DBMS
13:52

Now that we understand the differences between Hive and a traditional RDBMS, the differences between HiveQL and SQL will seem a lot less annoying and arbitrary.

HiveQL and SQL
07:20
+
Hadoop and Hive Install
5 Lectures 54:31

Before we install Hive, we need to install Hadoop. Hadoop has 3 different install modes - Standalone, Pseudo-distributed and Fully Distributed. Get an overview of when to use each

Hadoop Install Modes
08:32

How to set up Hadoop in the standalone mode. Windows users need to install a Virtual Linux instance before this video. 

Hadoop Install Step 1 : Standalone Mode
15:46

Set up Hadoop in the Pseudo-Distributed mode. All Hadoop services will be up and running! 

Hadoop Install Step 2 : Pseudo-Distributed Mode
11:44

If you are all set with Hadoop, let's go ahead and install Hive. 

Hive install
12:05

Let's run a few basic queries on Hive. Head on over to the SQL primer section at the end of the course, if you have no previous experience in Hive.

Preview 06:24
+
Hadoop and HDFS Overview
2 Lectures 18:25

What exactly is Hadoop? Its origins and its logical components explained.

Preview 07:25

HDFS based on GFS (The Google File System) is the storage layer within Hadoop. It stores files in blocks of 128MB. 

HDFS or the Hadoop Distributed File System
11:00
+
Hive Basics
11 Lectures 01:45:27

Let's cycle through primitive datatypes in Hive.

Preview 17:07

Hive has some really cool datatypes - collections that make it feel like there is a real programming language under the hood. Oh, and btw - there is!

Collections_Arrays_Maps
09:28

Structs and unions are yet another bit of Hive that seem more at home in a programming language.

Structs and Unions
05:57

Let's get into the nitty-gritty - starting with creating tables. Remember schema-on-read? 

Create Table
13:15

Inserting into tables has a few quirks in Hive, because, after all, all writes are just data dumps that know nothing about the schema

Insert Into Table
12:04

More on inserts - remember that no schema checking happens during database writes!

Insert into Table 2
06:51

Alter table works in Hive - understand how.

Alter Table
07:22

Hive data is stored as files on HDFS, the distributed file system that is an integral part of Hadoop. Understanding the physical layout of hive tables will make many advanced concepts - bucketing and partitioning - far more clear.

HDFS
09:25

Learn how to interact with HDFS. This comes in handy if you want to understand what's going on under the hood of your Hive Queries. 

HDFS CLI - Interacting with HDFS
10:58

Let's create a few tables and see how to insert data. We'll see external tables as well and what happens under the hood in HDFS with each of these activities. 

Code-Along: Create Table
09:54

Hive CLI allows you to run scripts and execute queries directly from the command line rather than the hive shell. 

Code-Along : Hive CLI
03:06
+
Built-in Functions
4 Lectures 34:29

Hive has a whole bunch of useful functions available out-of-the-box. This is an introduction to the 3 types of functions available. Standard, aggregate and table generating functions.

Preview 06:45

The case-when statement is very useful to populate columns by evaluating conditions. Size() and Cast() are other useful built-in functions.

The Case-When statement, the Size function, the Cast function
10:09

explode() is a very interesting table generating function which expands an array to produce row for every element in the array.

The Explode function
13:07

Code-Along : Hive Built - in functions
04:28
+
Sub-Queries
5 Lectures 46:03

Sub-queries in Hive are rather quirky. For instance, union is fine, but intersect is not. 

Preview 07:13

Sub-queries have a few rather arcane rules - no equality signs, and some rather specific rules on exists and in.

More on subqueries: Exists and In
15:13

It is possible to insert data into a table using subqueries - just don't try to specify any schema information!

Inserting via subqueries
05:23

Code-Along : Use Subqueries to work with Collection Datatypes
05:56

Views are an awesome bit of functionality in Hive - use them. Oh, btw, views are non-materialized, if that means anything to you. If not - never mind!

Views
12:18
+
Partitioning
7 Lectures 51:38

Indices are just a lot less important in Hive than they are in SQL. Understand why, and also how they can be used.

Indices
06:40

Partitioning in Hive is conceptually similar to Indexing in traditional DBMS - way to quickly look up rows with specific values in a particular column

Preview 06:36

Let's understand the why of partitioning

The Rationale for Partitioning
06:16

Partitioning needs to specified at the time of table creation - understand the syntax.

How Tables are Partitioned
09:52

Once a table has been partitioned appropriately, using it is not a lot of work.

Using Partitioned Tables
05:27

Inserting data into partitioned tables can be a bit tedious - understand how dynamic partitioning can help!

Dynamic Partitioning: Inserting data into partitioned tables
12:44

Let's see partitioning in action! 

Code-Along : Partitioning
04:03
+
Bucketing
5 Lectures 48:01

Bucketing is conceptually quite close to partitioning - and indeed to Indexing in traditional RDBMS - but with a key difference. 

Preview 11:56

Bucketing has an important advantage over partitioning - the metastore is unlikely to be taken down by it.

The Advantages of Bucketing
04:54

Bucketing needs to specified at the time of table creation - understand how.

How Tables are Bucketed
12:36

Once a table has been bucketed, using it is not that difficult.

Using Bucketed Tables
07:22

Sampling is a very handy technique in a data warehouse, and bucketing helps power this functionality

Sampling
11:13
+
Windowing
4 Lectures 49:39

Windowing functions start to get at the real number-crunching power of Hive. In effect, they help tack on a new column to a query result - and that column contains the results of aggregate functions on a window of rows.

Windowing Introduced
12:59

Let's use windowing to set up a running total, aka a cumulative sum, for revenues in a sales table

Windowing - A Simple Example: Cumulative Sum
09:39

Let's now make that running sum reset each day - combining the power of windowing and the power of partitioning

Windowing - A More Involved Example: Partitioning
11:54

Rownumber, rank, lead and lag - Hive places really nifty windowing functions at your disposal. 

Windowing - Special Aggregation Functions
15:07
9 More Sections
About the Instructor
Loony Corn
4.3 Average rating
5,071 Reviews
39,359 Students
78 Courses
An ex-Google, Stanford and Flipkart team

Loonycorn is us, Janani Ravi and Vitthal Srinivasan. Between us, we have studied at Stanford, been admitted to IIM Ahmedabad and have spent years  working in tech, in the Bay Area, New York, Singapore and Bangalore.

Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft

Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too

We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Udemy!

We hope you will try our offerings, and think you'll like them :-)