Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Mastering Hive: From Basics to Advanced Big Data Analysis

Name: Mastering Hive: From Basics to Advanced Big Data Analysis
Rating: 4.0 (11 reviews)

Unlock the power of Hive for big data management and analytics, from beginner to expert level!

Created byEDUCBA Bridging the Gap

Last updated 7/2024

English

What you'll learn

Introduction to Hive: Understand the fundamentals of Hive and its role in the Hadoop ecosystem.
Hive Database Management: Learn how to create and manage Hive databases and tables.
Data Loading and Manipulation: Master the techniques for loading data into Hive and performing data manipulation operations.
Advanced Querying: Execute complex queries using HiveQL, including joins, partitions, and bucketing.
Hive Functions: Utilize built-in Hive functions for data processing and analysis.
User Defined Functions (UDFs): Create and implement custom UDFs to extend Hive's capabilities.
Hive Integration with HBase: Explore the integration of Hive with HBase for efficient data storage and retrieval.
Real-World Case Studies: Apply Hive knowledge to practical case studies in various industries, such as telecom and social media.
Hive with Other Big Data Tools: Learn to use Hive in conjunction with Pig, MapReduce, and Sqoop for comprehensive data analysis.
Sensor Data Analysis: Gain hands-on experience in processing and analyzing sensor data using Hive and Pig.

Course content

7 sections • 190 lectures • 23h 42m total length

Introduction to HIVE10:45
Explore how Apache Hive enables SQL-like analytics on Hadoop without Java, using HiveQL to query data stored in HDFS; learn about the metastore, data types, and creating databases.
HIVE Data Base10:21
Master hive database basics by creating, using, showing databases, and dropping them, then create an employee table with row format delimited by tab and load data from local or HDFS.
Load Data Command5:37
Learn to load data into Hive using the load data command, with local and HDFS options, overwrite to avoid duplicates, and use alter table to rename, add, or drop columns.
How to Replace Column4:17
Learn to alter Hive tables by replace columns, change data types, and add city, using show create table, show tables, and drop table, plus external versus managed and temporary tables.
External Table6:26
Understand external tables in hive, when to use them to preserve data, how to create with the external keyword, and basics of embedded, local, and remote metastores.
HIVE Metastore3:25
what is Hive Partition9:45
Creating Partition Table8:30
Insert Overwrite Table3:55
Demonstrates inserting overwrite into a partitioned table named emp_prtn, loading only 2011 year-of-joining records, and explains static versus dynamic partitioning and loading data from the employee table.
Dynamic Partition True1:57
Hive Bucketing5:24
Decomposing Data Sets5:30
Hive Joins8:51
Hive Joins Continue9:45
Master Hive joins, including shuffle join, map join, bucket map join, sort merge bucket join, and skew join, with optimization strategies and Hive properties to boost performance on large data.
Skew Join2:54
explain how skewed keys are handled in hive joins, routing skewed data to an in-memory hash table while joining data via a reducer, and using map-side joins for small data.
What is Serde7:29
Explore how serde in Hive handles JSON data, serializer and deserializer roles, and how to serialize and deserialize data when storing to and retrieving from Hive tables.
Serde in Hive8:55
Hive UDF9:46
Explore Hive user defined functions (udf and udaf) by building a udf in a Maven project, packaging as a jar, uploading to Hive, and invoking a temporary function in queries.
Hive UDF Continues7:28
More Hive UDF6:58
Learn how Hive UDFs compute maximum values using init, iterate, terminate partial, merge, and final methods, demonstrated with two files and the math.max approach.
Maxcale Function3:01
Hive Example Use Case12:04

Introduction to Hive Concepts and Hands-on Demonstration5:59
Explore hive as a sql-like tool over HDFS that translates queries to MapReduce for batch processing. Store metadata, not data in Hive, and practice creating databases with if not exists.
Internal Table and External Table6:19
Inserting Data Into Tables7:25
Date and Mathematical Functions9:00
Conditional Statements6:40
Mastering Hive teaches conditional statements like if and case, and functions such as isnull, coalesce, and nvl, with examples and string tools like split, substring, and instring.
Explode and Lateral View7:59
Master Hive explode and lateral view to transform array elements into rows, launch a MapReduce job, join with other columns, and extract map keys and values.
Sorting6:18
Join8:44
Explore how Hive joins combine tables using equality conditions, covering left, right, and full outer joins, with two- and three-table joins and memory management via streaming versus buffering.
Map Join2:10
Use map join to load small tables into memory for a mapper-only join, avoiding mapreduce. Configure bucket map join with the same bucketed column and same number of buckets.
Static and Dynamic Partitioning7:17
More on Dynamic Partitioning6:59
Alter Command6:15
Explore Hive alter commands to modify table schemas and structures. Rename tables, change column names and types, add or replace columns, and set properties.
MSCK Command8:44
Bucketing8:08
Table Sampling3:05
Master table sampling, a bucketing feature that draws samples from across partitions, unlike limit, using bucket one out of four or percent, memory, or rows parameters.
Archiving2:44
Hive archiving transfers older data to less frequently used storage to reduce NameNode load in HDFS, with archive and unarchive commands and notes that space is not salvaged.
Ranks8:44
Explore the rank function, dense rank function, and row number function and how they rank data within partitions, handle ties, and produce top-n results using partition by and order by.
Creating Views8:39
Advantages of views and Altering Views6:50
Learn how to create, alter, and drop compact and bitmap indexes on a table, compare performance, and determine when indexing helps or hinders queries in large datasets.
What is Indexing5:49
Learn how hive views function as tables from SQL results, selecting columns and filtering rows. See how changes to base data don't reflect in views, and how to create them.
Compact and Bitmap Index Running Time5:25
Hive Commands in Bash Shell5:24
Hive Variables - Hiveconf4:10
Hive Variables -Hiveconf in Bash Shell5:08
Configuring a Hive Var Variable8:57
Configure and use Hive variables in queries with Hive where and Hive conf, run scripts via source, and execute Unix and HDFS commands from the Hive shell.
Variable Substitution2:14
Word Count5:47
Hive Architecture3:14
Parallelism in Hive6:14
Explore how Hive parallelism enables executing independent stages in parallel to speed up joins of partial data from two tables, noting the built-in parallel property hive.exec.parallel and potential deadlocks.
Table Properties in Hive6:06
Master Hive table properties to control data loading, distinguishing active from passive. Use skip header line count and skip footer line count to omit header and footer lines during load.
Null Format Properties5:31
Learn to use Hive's null format property to treat empty fields as null values by setting serialization.null.format during table creation and validating with queries.
Null Format Properties Continues3:39
Demonstrate setting table properties for RC and ORC formats, including compression codecs (zlib, snappy, bzip2, gzip) and options like stripe size, row index stride, and bloom filters.
Purge Commands in Hives4:41
Explore Hive purge commands: differentiate drop vs truncate, distinguish internal vs external tables, and understand purge behavior that permanently deletes data (no trash) since version 0.14.
Slowing Changing Dimension6:56
Explore how to capture data changes in Hive using slowly changing dimensions, detailing type zero, type one, and type two approaches for preserving history.
Implement the SCD8:57
Example of the SCD4:02
Demonstrate a Hive-based slow changing dimension (SCD) example using a full outer join of two tables and CDC codes to label new, update, or no change records.
How to Load XML Data in Hive5:11
How to Load XML Data in Hive Continue8:48
Create a hive table book_details with title, author, country, company, price, and year using an xml row format and xpath mappings. Load local books.xml and verify two rows via select.
No Drop and Offline in Hive8:09
Learn to prevent dropping and offline querying in Hive by enabling no drop and offline protections on tables and partitions, and then re-enabling them with disable commands.
Immutable Table9:09
How to Create Hive RC File8:38
Learn how to configure hive with the hive rc file, manage session versus global settings, and enable headers and map join, then explore cartesian products or cross joins.
Multiple Tables6:25
Discover how Hive links metadata to a single data file to support multiple tables, including cases with fewer, equal, or more columns than the file, where extra columns become null.
Merging Hive Created Files and Function rLike5:32
Master hive file merging to reduce small files by configuring hive.merge settings, and learn the are like function for substring and regex matching.
Various Configuration Settings in Hive9:07
Various Configuration Settings in Hive Continues3:12
Explore Hive configurations: speculative execution reduces slow tasks on multiple nodes, with map and reducer options; enable bucketing and Hive auto convert join for map joins on small tables.
Compressing Various Files in Hive5:45
Learn how Hive compression reduces storage and speeds data transfer in a Hadoop cluster by compressing input and map output, with codecs such as gzip, bzip2, lzma2, and snappy.
Different Modes in Hive3:54
Explore the three Hive modes: embedded, local, and remote, and learn their use cases. Understand how Metastore, Derby database, and JDBC shape performance and parallel sessions.
File Compression in Hive5:30
Learn to enable compression in Hive to reduce storage and network traffic across input, map output, and reducer output, using gzip, bzip2, lzo, or snappy.
Type of Mode in Hive3:56
Explore hive's embedded, local, and remote modes, detailing metastore and Derby database setups, how to switch modes, and when to use each for small-scale to production environments.
Comparison of Internal and External Table8:19
Compare internal and external tables in hive, showing that internal tables are controlled by hive and lose data and metadata when dropped, while external tables retain data in hdfs.

Introduction to Hive11:53
Creating Hive Tables8:24
Managed Tables in Hive4:22
Explore Hive managed tables by creating a sample database and employee table, loading comma-delimited data, and querying with select star from employee in a safe mode Hive environment.
External Tables in Hive6:45
More on External Tables in Hive6:08
Explore creating and managing external tables in Hive. Load data into employee details, view with show tables and show create table, and understand drop table behavior on the warehouse entry.
Tables with Location11:16
Static Partitions11:36
Explains how Hive partitions optimize queries by data segregation and reducing full-table scans, highlighting static and dynamic partitions and their use cases.
Dynamic Partitions6:54
Learn dynamic partitions in Hive to avoid static partitions and prevent table scans. Implement by creating a temp table, loading data, setting hive.execute.dynamic.partition to non-strict, then inserting into partitioned table.
Dynamic Partitions Continues6:59
Adding Partitions12:00
File Formats12:05
Explore common Hive file formats, including text, RC, and ORC, and learn practical steps to create tables, load csv data, and work with multiple formats.
Bucketing and its Code in Hive11:24
Introduction to Joins in Hive5:52
Explore how Hive joins combine common data from the employee and department tables using inner join, left outer join, right outer join, and full outer join on the department number.
Example of Joins in Hive9:14
Creating a Join Space in Hive6:02
Creating a Join Space in Hive Continue6:49
Learn to perform inner, left outer, right outer, and full outer joins in Hive, create and query join tables, and understand how join semantics affect results.
Views and it Example in Hive11:27
Mastering Hive shows how to create normal and complex views to read only selected columns from existing tables, avoiding storage waste and improving performance.
Indexes6:25
Examples of Index7:29
Complex Data Types12:00
Explore Hive's data types, comparing primitive and complex types. Learn how arrays, maps, and structs store data, with a practical example of loading an array table from a csv.
Complex Data Types Continues7:33
Explore Hive complex data types, focusing on maps as key-value structures, and learn to create a map table, load map.csv, and query results.
Examples of Data Types in Hive5:14
Explore Hive data types with practical map datatype examples, including creating a map-based table for pay details, loading csv data, and querying by map keys like salary.
Three Types Data11:49
Explore Hive's struct data type and load CSV data into a table with a nested address field, then query address.street and other struct elements.
Hive Scripts and its Example14:05
Learn to create and run Hive scripts by writing a sample ql, creating databases and tables, loading data, and executing scripts with Hive -f and parameterized configs.
User Defined Function And its Advantages in Hive5:46
Example of User Defined Function in Hive8:06
Create a hive user defined function in java to compute bonuses, package as bonus.jar, and register it as a temporary function to add a bonus column to the employee table.
Practical Implementation of UDF6:27
Practical Implementation of UDF Continues8:08
Type of Tables in Hive5:22
Learn the two types of tables in hive—external and managed—and how dropping or converting between them works, and how to create a hive tutorial database with a managed employee table.
Example of Type of Table in Hive7:12
Load data into Hive tables, convert between managed and external tables, and describe table formats. Explore why external tables protect data and remain accessible after deletions, with performance benefits.
Creating Tables Using Hive and Hbase6:02
Advantages and Disadvantages in Hive and Hbase6:39
Creation of Hbase Table Using Multiple Columns7:56
Create an hbase employee table with cf1 for personal data and cf2 for working info. Insert rows with name, gender, age, job, and salary, then scan table using can command.
Example of Creation of Hbase Table Using Multiple Columns5:07
Hbased Managed Hive Tables7:26
Learn to create a hash-based managed hive table and use HBase to manage Hive tables, addressing Hive and HBase disadvantages like no updates, null value space, and inserts not possible.
Hbased Managed Hive Tables Continues6:01
Syntax of Hbased Managed Hive Tables7:11
Example of Hbased Managed Hive Tables5:34

Introduction of Hive7:48
Explore Hive as a data warehouse on Hadoop, enabling OLAP-style data summarization and analysis through HiveQL, with a metastore, various data formats, and MapReduce execution on Hadoop.
Simple and Complex Datatype in Hive8:48
Explore Hive data types from primitive boolean, int, bigint, float, double, and string to complex struct, map, and array, with implicit conversions. Understand partitions and buckets for faster queries.
Clusters0:29
Database Command in Hive11:51
Explore hive database and table commands, including create, show, describe, alter, and drop databases; manage tables with create, show, describe, and cascade drops, including managed and external tables.
Tables Commands in Hive5:39
Master hive table creation and database modification by adding properties, and define columns such as name (string), salary (float), subordinates (array<string>), deductions (map type), and address (struct).
Manage Tables6:29
Discover how to manage tables in Hive by showing tables, describing and describe extended, and altering tables, including rename, add or drop columns, drop tables, with managed and external tables.
External Tables1:31
Learn the difference between managed and external tables in Hive. Dropping a managed table deletes data and metadata, while dropping an external table preserves the data and removes only metadata.
Introduction to Partitioning7:08
Explore partitioning and bucketing in Hive to optimize queries. Learn to create and manage partitions across internal, managed, and external tables with partition commands in a telecom data case study.
Partition Command6:55
Bucketing8:01
Learn how to create and manage partitions with alter table, rename and drop partitions, and apply bucketing using hashing and clustered by id into 50 buckets for telecom data analysis.
Table Contr Services in Hive11:06
Example of Contr Services6:36
Construct and modify hive tables with partitioning, defining columns like unique customer id, sequence number, status, activation and deactivation dates, and manage table lifecycle by dropping and recreating with partitions.
Example of Contr Services Continues5:04
Create Hive tables for telecom data, focusing on the control services table with call_id, cs_sequence_number, change_of_service, status, and bill_date, using tab-delimited text storage.
Creating Contract All Table10:42
Create the contract all and customer all tables with if not exists, define bigint, string, float, and int columns. Partition by ko user last mod and tm code.

Introduction to Customer Complaint Project in Big Data11:34
Complaint Filed Under Each File9:57
Creating Driver Files and Jar Manifest10:21
Create and configure a Hadoop driver class with a main method, set the job, mapper, reducer, map output, and input/output paths, then build and run the jar.
Creating Driver Files and Jar Manifest Continues2:05
Run a Hadoop MapReduce job, monitor map and reduce progress, and verify the output under hadoop fs, observe part-r output and sample product complaint counts.
Complaint Filed from Particular Location5:34
Learn to compute location-wise counts of complaints and compare them with product-level totals using Hive, and prepare a grouped list of complaints by location.
User Defined Location7:35
Filter counts by user-specified location in Hive, taking a third-argument input, comparing with locations like Delhi or Mumbai, and writing location-specific results to the output folder.
List of Complaint Grouped By Location6:09
Group complaints by location using MapReduce, count delayed responses, and flag not on time versus on time to produce a location-based delay report, and explore future Hive and Pig implementations.

Introduction to Social Media Industry8:35
Analyze bookmarking website data in the social media industry by moving data from rdbms to hdfs with sqoop. Process xml with mapreduce, pig, and hive to analyze reviews and location.
Book Marking Website7:39
Process bookmarking site data from RDBMS to HDFS using Sqoop, convert flat files to XML, then analyze with MapReduce, Pig, and Hive, including custom data types and aggregation.
Book Marking Website Continues5:19
Understanding Sqoop7:22
Show how to move data from RDBMS to HDFS with Sqoop, converting flat files to XML via an ETL tool, using a MySQL DB demonstration for MapReduce, Pig, and Hive.
Get Data from RDMS to HDFS8:51
Learn to import data from an RDBMS to HDFS using Sqoop. Specify a target directory, verify the schema, and run a MapReduce job on the resulting flat file.
Execute Map Reduce Program in order to Process XML File12:06
Export the MapReduce jar, deploy to the Hadoop cluster, and run a mapper-driven xml processing job with the driver, noting there is no reducer, to generate a flat output file.
Analyze Book Performance By Reviews Using Code7:07
Transform the xml into a flat, star-delimited file and use a mapper to tally positive, average, and negative reviews for each book based on comma-separated feedback.
Analyze Book Performance By Reviews Using Code Continues8:48
Analyze book reviews with a Hadoop MapReduce job that counts positive, negative, and average comments by converting input to lowercase and matching keywords such as awesome and bad.
Analyse Book By Location7:24
Example of Analyse Book By Location6:50
Demonstrates a MapReduce reduce workflow to count bookmarks by location, aggregating values per key and producing a location-keyed output.
Analyse Book Reader Against Author10:22
Analyze book data by author with a MapReduce workflow, counting how many books each author has written, from an XML to flat-file input and using author as the key.
How to process XML File in PIG6:25
How to process XML File in PIG Continues8:06
Analyze Book Performance in XML File in PIG10:04
Analyze book performance by loading an xml file with piggy bank's xml loader, extracting book name, author, review, and location, and applying a udf for positive, negative, and average reviews.
More on Analyze Book Performance in XML File in PIG10:04
Pig XML File Output Using Book9:28
Register and use a piggy bank jar to process xml with pig and the xml loader. Employ tokenize and flatten to derive book id, category, and locations stored in SDFs.
Pig XML File Output Using Location9:32
Pig XML File Output Using Location Continues8:36
Understanding Complex Data Set Using Hive11:37
Explore how to analyze a Pig output in Hive to handle location data, using Hive's complex data types like arrays, maps, and structs.
Understanding Complex Data Set Using Hive Continues9:40
Create Array in Map Reduce Using Hive9:59
Book Marking Type Data Set Using Complex Type8:31
Explore processing pig output into hive by converting a pipe-delimited location array and loading it into a table in the book analysis database, with collection items terminated by pipe.
Output of Book Marking Type Data Set9:46
Load Hive data with append or overwrite, explode array-typed locations into separate rows, and count by location for simple analysis. Convert complex types to simple types for Hadoop analytics.

Introduction to Sensor Data Analysis7:26
Introduction to Sensor Data Analysis Continues10:20
Explore the production workflow of big data processing with dfs, hdfs, mapreduce, and hive, showing how json data moves from local to hdfs, mapreduce to hive, and reports.
Example of Sensor Data Analysis10:46
Explore how to transform demographic data into meaningful insights using Hadoop, data science, and data analytics. The session demonstrates use cases for policy decisions, tax analysis, and population trends.
Uderstanding Basic of Big Data and MapReduce8:08
Explore big data basics with IBM’s four v framework: volume, velocity, variety. Understand data growth toward 40 zettabytes and MapReduce's role in distributed, fault-tolerant processing of XML and JSON.
More on Big Data and MapReduce10:30
Explore big data processing of sensor data, addressing data veracity and uncertainty, with MapReduce architecture, Hadoop HDFS and YARN, and the mapper, reducer, and driver workflow for scalable analytics.
Converting Json File into Simple Text Format7:55
Convert a json input file with mapreduce in hadoop to a simple comma-delimited flat file, using a json library or a custom parser.
Converting Json File into Simple Text Format Continues6:33
Output for Json File format4:29
Ratio of Male and Female in MapReduce5:59
Convert json to plain text, then use mapreduce to count males and females in the Philippines by aggregating the gender field from a comma-delimited input.
Output of Ratio of Male and Female6:56
Generate Old Aged Woman Count6:33
Generate the count of females over 45 who are widowed or divorced using a MapReduce workflow, filtering by age, education, and marital status, with a mapper and reducer.
Expected Income Tax in MapReduce9:04
Actual Income Tax in MapReduce5:01
Examine a MapReduce workflow to compute total income and tax data, fix data types for decimals, and generate government-ready tax filer insights.
Coming Year Income Tax In MapReduce5:20
Run a MapReduce job to count people aged 18 and above with non-zero income, using a mapper and reducer to project next year’s income tax from HDFS data.
Educated Vs Non Educated Ratio in MapReduce6:26
Native People Reports in MApReduce9:33
Report Against the Child Labour law in MapReduce7:57
Develop a MapReduce workflow to count records where age is under 18 and income > 0, enabling government reporting on child labor using age and income data.
Diffrence Between Pig MapReduce and Hive9:55
Compare MapReduce, Pig, and Hive to reveal abstraction differences: MapReduce is low-level and code-heavy, Pig uses Pig Latin for data flow, while Hive offers SQL-like declarative queries.
More on Pig MapReduce and Hive7:22
Compare Pig, MapReduce, and Hive for data processing and analysis, focusing on execution, performance, joins, and how each supports structured vs unstructured data.
Sensor Data Processing in Pig10:34
Learn how to process sensor data in json format on hdfs with pig instead of mapreduce, using pig latin to analyze data and store results in hive for reporting.
Working With Pig Function8:10
Types of Function in Pig8:24
Example of Pig Function8:03
Create and export a pig function jar, register it in the pig grunt shell, and apply a generic function to analyze gender, age, and marital status on HDFS.
Working on Use Cases Using Functions in PIG8:59
Learn how Pig enables data flow processing for big data use cases, loading JSON data into HDFS with Pig storage and running grunt to compare Pig, MapReduce, and Hive.
Use Case Data Flow in Pig6:51
Explore data flow with Pig by loading data, registering a jar with a user defined function, and extracting values for a given key using for each and generate.
Ratio Data Flow in Pig8:11
More on Use Case in Pig9:16
Learn a Pig Latin use case to select females over 45 who are divorced or widowed, using data cleaning with trim, UDFs, and filters before grouping.
More on Use Case in Pig Continues8:57
Advance Pig use cases by filtering S3 data with trim and equals to for female, divorced, or widow, then dump or store results to hdfs and prepare for udf integration.
Example od Ratio Education in Pig8:34
Approach Process the Json File in Hive10:30
Explore Hive as a data analytics tool built on Hadoop, using Hive query language similar to SQL to load data into HDFS, create tables, and generate reports through MapReduce-backed queries.
Features and Query in Hive10:30
Explore hive features, hive metastore, and hiveql for analytics on hdfs; learn internal and external tables and how json data is processed with built-in functions.
Work on Json Use Cases Using Hive7:10
Explore json use cases in hive by creating a demo database, a tbl_json table with a single json_data string column, and loading input text data with a star delimiter.
Work on Json Use Cases Using Hive Continues6:11
Explore loading JSON data into Hive tables, troubleshoot missing files, and use Unix and Hadoop commands to manage JSON data in a Hive workflow.
Output of Json Usecases Using Hive11:01
Use Hive's get_json_object to extract gender, age, and marital status from JSON data and count by gender. Filter females over 45 who are divorced or widowed using JSON extraction.
More on Json Usecses in Hive10:11
Summary of Sensor Data Processing9:26
Explore end-to-end big data processing of JSON data, comparing MapReduce, Pig, and Hive, and deriving meaningful insights for government decision making from raw data.

Requirements

Basic understanding of SQL.
Familiarity with Hadoop ecosystem and big data concepts.
Basic programming knowledge, preferably in Python or Java.
Access to a computer with internet connectivity for practical exercises.

Description

Students will gain a comprehensive understanding of Hive, from the fundamentals to advanced topics. They will learn how to create and manage Hive databases, perform data loading and manipulation, execute complex queries, and use Hive's powerful features for data partitioning, bucketing, and indexing. Additionally, students will explore practical case studies and projects, applying their knowledge to real-world scenarios such as telecom industry analysis, customer complaint analysis, social media analysis, and sensor data analysis.

Section 1: Hive - Beginners

In this section, students will be introduced to Hive, an essential tool for managing and querying large datasets stored in Hadoop. They will learn the basics of Hive, including how to create databases, load data, and manipulate tables. Topics such as external tables, the Hive Metastore, and partitions will be covered, along with practical examples of creating partition tables, using dynamic partitions, and performing Hive joins. Students will also explore the concept of Hive UDFs (User Defined Functions) and how to implement them.

Section 2: Hive - Advanced

Building on the foundational knowledge, this section delves into advanced Hive concepts. Students will learn about internal and external tables, inserting data, and various Hive functions. The section covers advanced partitioning techniques, bucketing, table sampling, and indexing. Practical demonstrations include creating views, using Hive variables, and understanding Hive architecture. Students will also explore Hive's parallelism capabilities, table properties, and how to manage and compress files in Hive.

Section 3: Project 1 - HBase Managed Hive Tables

This section focuses on integrating Hive with HBase, a distributed database. Students will learn how to create and manage Hive tables, both managed and external, and understand the nuances of static and dynamic partitions. They will gain hands-on experience in creating joins, views, and indexes, and explore complex data types in Hive. The section culminates in practical implementation projects involving Hive and HBase, showcasing real-world applications and use cases.

Section 4: Project 2 - Case Study on Telecom Industry using Hive

Students will apply their Hive knowledge to a case study in the telecom industry. This project involves working with simple and complex data types, creating and managing tables, and using partitions and bucketing to organize data. Students will learn how to perform various data operations, understand table control services, and create contract tables. This hands-on project provides valuable insights into how Hive can be used for industry-specific data analysis.

Section 5: Project 3 - Customer Complaints Analysis using Hive - MapReduce

In this section, students will analyze customer complaints data using Hive and MapReduce. They will learn how to create driver files, process data from specific locations, and group complaints by location. This project highlights the power of Hive and MapReduce for handling large datasets and provides practical experience in data processing and analysis.

Section 6: Project 4 - Social Media Analysis using Hive/Pig/MapReduce/Sqoop

This section explores the integration of Hive with other big data tools like Pig, MapReduce, and Sqoop for social media analysis. Students will learn how to process and analyze social media data, perform data transfers from RDMS to HDFS, and execute MapReduce programs. The project includes practical exercises in processing XML files, analyzing book reviews and performance, and working with complex datasets using Hive and Pig.

Section 7: Project 5 - Sensor Data Analysis using Hive/Pig

The final section focuses on sensor data analysis using Hive and Pig. Students will learn the basics of big data and MapReduce, and how to convert JSON files into text format. They will perform various data analysis tasks, including calculating ratios, generating reports, and processing data using Pig functions. This project provides comprehensive hands-on experience in processing and analyzing sensor data, showcasing the practical applications of Hive and Pig in real-world scenarios.

Conclusion

This course provides a complete journey from understanding the basics of Hive to mastering advanced big data analysis techniques. Through a combination of theoretical knowledge and practical projects, students will gain the skills needed to manage, analyze, and derive insights from large datasets using Hive. Whether you're an aspiring data engineer, a data analyst, or a tech entrepreneur, this course will equip you with the tools and knowledge to excel in the world of big data.

Who this course is for:

Aspiring Data Engineers: Individuals aiming to build a career in data engineering and big data analytics.
Big Data Enthusiasts: Anyone with a passion for big data technologies and analytics.
Data Analysts: Professionals seeking to enhance their data analysis skills with Hive.
Students: Computer science and engineering students interested in learning about big data technologies.
IT Professionals: IT professionals looking to upskill and transition into big data roles.
Software Developers: Developers wanting to integrate Hive capabilities into their applications.
Tech Entrepreneurs: Entrepreneurs looking to implement big data solutions in their startups.

Mastering Hive: From Basics to Advanced Big Data Analysis

What you'll learn

Explore related topics

Course content

Hive - Beginners22 lectures • 2hr 33min

Hive - Advanced50 lectures • 5hr 12min

Project1 - HBase Managed HIVE Tables38 lectures • 5hr 7min

Project2 - Case Study on Telecom Industry using HIVE14 lectures • 1hr 38min

Project3 - Customers Complaints Analysis using HIVE - MapReduce7 lectures • 53min

Project3 - Social Media Analysis using HIVE/PIG/MapReduce/Sqoop23 lectures • 3hr 22min

Project4 - Sensor Data Analysis using HIVE/PIG36 lectures • 4hr 57min

Requirements

Description

Who this course is for: