Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Mastering Hive: From Basics to Advanced Big Data Analysis
Rating: 4.0 out of 5(11 ratings)
503 students

Mastering Hive: From Basics to Advanced Big Data Analysis

Unlock the power of Hive for big data management and analytics, from beginner to expert level!
Last updated 7/2024
English

What you'll learn

  • Introduction to Hive: Understand the fundamentals of Hive and its role in the Hadoop ecosystem.
  • Hive Database Management: Learn how to create and manage Hive databases and tables.
  • Data Loading and Manipulation: Master the techniques for loading data into Hive and performing data manipulation operations.
  • Advanced Querying: Execute complex queries using HiveQL, including joins, partitions, and bucketing.
  • Hive Functions: Utilize built-in Hive functions for data processing and analysis.
  • User Defined Functions (UDFs): Create and implement custom UDFs to extend Hive's capabilities.
  • Hive Integration with HBase: Explore the integration of Hive with HBase for efficient data storage and retrieval.
  • Real-World Case Studies: Apply Hive knowledge to practical case studies in various industries, such as telecom and social media.
  • Hive with Other Big Data Tools: Learn to use Hive in conjunction with Pig, MapReduce, and Sqoop for comprehensive data analysis.
  • Sensor Data Analysis: Gain hands-on experience in processing and analyzing sensor data using Hive and Pig.

Course content

7 sections190 lectures23h 42m total length
  • Introduction to HIVE10:45

    Explore how Apache Hive enables SQL-like analytics on Hadoop without Java, using HiveQL to query data stored in HDFS; learn about the metastore, data types, and creating databases.

  • HIVE Data Base10:21

    Master hive database basics by creating, using, showing databases, and dropping them, then create an employee table with row format delimited by tab and load data from local or HDFS.

  • Load Data Command5:37

    Learn to load data into Hive using the load data command, with local and HDFS options, overwrite to avoid duplicates, and use alter table to rename, add, or drop columns.

  • How to Replace Column4:17

    Learn to alter Hive tables by replace columns, change data types, and add city, using show create table, show tables, and drop table, plus external versus managed and temporary tables.

  • External Table6:26

    Understand external tables in hive, when to use them to preserve data, how to create with the external keyword, and basics of embedded, local, and remote metastores.

  • HIVE Metastore3:25
  • what is Hive Partition9:45
  • Creating Partition Table8:30
  • Insert Overwrite Table3:55

    Demonstrates inserting overwrite into a partitioned table named emp_prtn, loading only 2011 year-of-joining records, and explains static versus dynamic partitioning and loading data from the employee table.

  • Dynamic Partition True1:57
  • Hive Bucketing5:24
  • Decomposing Data Sets5:30
  • Hive Joins8:51
  • Hive Joins Continue9:45

    Master Hive joins, including shuffle join, map join, bucket map join, sort merge bucket join, and skew join, with optimization strategies and Hive properties to boost performance on large data.

  • Skew Join2:54

    explain how skewed keys are handled in hive joins, routing skewed data to an in-memory hash table while joining data via a reducer, and using map-side joins for small data.

  • What is Serde7:29

    Explore how serde in Hive handles JSON data, serializer and deserializer roles, and how to serialize and deserialize data when storing to and retrieving from Hive tables.

  • Serde in Hive8:55
  • Hive UDF9:46

    Explore Hive user defined functions (udf and udaf) by building a udf in a Maven project, packaging as a jar, uploading to Hive, and invoking a temporary function in queries.

  • Hive UDF Continues7:28
  • More Hive UDF6:58

    Learn how Hive UDFs compute maximum values using init, iterate, terminate partial, merge, and final methods, demonstrated with two files and the math.max approach.

  • Maxcale Function3:01
  • Hive Example Use Case12:04

Requirements

  • Basic understanding of SQL.
  • Familiarity with Hadoop ecosystem and big data concepts.
  • Basic programming knowledge, preferably in Python or Java.
  • Access to a computer with internet connectivity for practical exercises.

Description

Students will gain a comprehensive understanding of Hive, from the fundamentals to advanced topics. They will learn how to create and manage Hive databases, perform data loading and manipulation, execute complex queries, and use Hive's powerful features for data partitioning, bucketing, and indexing. Additionally, students will explore practical case studies and projects, applying their knowledge to real-world scenarios such as telecom industry analysis, customer complaint analysis, social media analysis, and sensor data analysis.

Section 1: Hive - Beginners

In this section, students will be introduced to Hive, an essential tool for managing and querying large datasets stored in Hadoop. They will learn the basics of Hive, including how to create databases, load data, and manipulate tables. Topics such as external tables, the Hive Metastore, and partitions will be covered, along with practical examples of creating partition tables, using dynamic partitions, and performing Hive joins. Students will also explore the concept of Hive UDFs (User Defined Functions) and how to implement them.

Section 2: Hive - Advanced

Building on the foundational knowledge, this section delves into advanced Hive concepts. Students will learn about internal and external tables, inserting data, and various Hive functions. The section covers advanced partitioning techniques, bucketing, table sampling, and indexing. Practical demonstrations include creating views, using Hive variables, and understanding Hive architecture. Students will also explore Hive's parallelism capabilities, table properties, and how to manage and compress files in Hive.

Section 3: Project 1 - HBase Managed Hive Tables

This section focuses on integrating Hive with HBase, a distributed database. Students will learn how to create and manage Hive tables, both managed and external, and understand the nuances of static and dynamic partitions. They will gain hands-on experience in creating joins, views, and indexes, and explore complex data types in Hive. The section culminates in practical implementation projects involving Hive and HBase, showcasing real-world applications and use cases.

Section 4: Project 2 - Case Study on Telecom Industry using Hive

Students will apply their Hive knowledge to a case study in the telecom industry. This project involves working with simple and complex data types, creating and managing tables, and using partitions and bucketing to organize data. Students will learn how to perform various data operations, understand table control services, and create contract tables. This hands-on project provides valuable insights into how Hive can be used for industry-specific data analysis.

Section 5: Project 3 - Customer Complaints Analysis using Hive - MapReduce

In this section, students will analyze customer complaints data using Hive and MapReduce. They will learn how to create driver files, process data from specific locations, and group complaints by location. This project highlights the power of Hive and MapReduce for handling large datasets and provides practical experience in data processing and analysis.

Section 6: Project 4 - Social Media Analysis using Hive/Pig/MapReduce/Sqoop

This section explores the integration of Hive with other big data tools like Pig, MapReduce, and Sqoop for social media analysis. Students will learn how to process and analyze social media data, perform data transfers from RDMS to HDFS, and execute MapReduce programs. The project includes practical exercises in processing XML files, analyzing book reviews and performance, and working with complex datasets using Hive and Pig.

Section 7: Project 5 - Sensor Data Analysis using Hive/Pig

The final section focuses on sensor data analysis using Hive and Pig. Students will learn the basics of big data and MapReduce, and how to convert JSON files into text format. They will perform various data analysis tasks, including calculating ratios, generating reports, and processing data using Pig functions. This project provides comprehensive hands-on experience in processing and analyzing sensor data, showcasing the practical applications of Hive and Pig in real-world scenarios.

Conclusion

This course provides a complete journey from understanding the basics of Hive to mastering advanced big data analysis techniques. Through a combination of theoretical knowledge and practical projects, students will gain the skills needed to manage, analyze, and derive insights from large datasets using Hive. Whether you're an aspiring data engineer, a data analyst, or a tech entrepreneur, this course will equip you with the tools and knowledge to excel in the world of big data.

Who this course is for:

  • Aspiring Data Engineers: Individuals aiming to build a career in data engineering and big data analytics.
  • Big Data Enthusiasts: Anyone with a passion for big data technologies and analytics.
  • Data Analysts: Professionals seeking to enhance their data analysis skills with Hive.
  • Students: Computer science and engineering students interested in learning about big data technologies.
  • IT Professionals: IT professionals looking to upskill and transition into big data roles.
  • Software Developers: Developers wanting to integrate Hive capabilities into their applications.
  • Tech Entrepreneurs: Entrepreneurs looking to implement big data solutions in their startups.