New in Big Data: Hive, HiveMall, AWS Lambda, Solr, Kibana
4.4 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
42 students enrolled
Wishlisted Wishlist

Please confirm that you want to add New in Big Data: Hive, HiveMall, AWS Lambda, Solr, Kibana to your Wishlist.

Add to Wishlist

New in Big Data: Hive, HiveMall, AWS Lambda, Solr, Kibana

Big Data ETL, Machine Learning and Data Visualization. Concise hands-on course with full code examples. Learn to excel!
4.4 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
42 students enrolled
Last updated 6/2017
English
Curiosity Sale
Current price: $10 Original price: $45 Discount: 78% off
30-Day Money-Back Guarantee
Includes:
  • 2.5 hours on-demand video
  • 12 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • craft a solution to your BigData tasks using building blocks shown in the course
  • create deliverables for your work in a form of an Amazon microservice, Search Engine web service, or a Dashboard
  • learn new technologies: AWS Lambda, Hivemall Machine Learning Library, HyperLogLog cardinality estimation technique, connecting Apache Hive to Solr and Hue, writing custom Hive UDFs, and a few other things
View Curriculum
Requirements
  • Access to an AWS account, or access to a machine where Hadoop is installed, or access to a machine where you have root privileges.
  • Some exposure to BigData, Hive, Hadoop, Machine Learning.
  • Some programming experience with Java (required for Sections 4 and 6 only)
  • Understanding SQL.
Description

This course is for people who want to learn how to do things, not just to fill their heads with important concepts, paradigms, and heaps of information they kind of know but have no idea how to use. 

This course works you through the full Big Data process:

  • Data Input
  • ETL
  • Predictive Modelling using Machine Learning
  • Data Visualization 
  • Deployment to AWS using AWS Lambda and Amazon EMR bundle


Apache Hive is an easy SQL based tool that allows to process large amounts of data on Hadoop fast. Hive gained popularity immediately after Hadoop MapReduce became widely used as it allows to work with data by means of SQL queries. It is used by many organisations to process their data. This course shows a number of interesting Hive queries and explains what Hive UDFs are.

Apache HiveMall is a Machine Learning library of tomorrow. Like Hive it allows to use complex machine learning algorithms knowing SQL only. No need to code, compile and debug! It is really easy to use for programmers and non-programmers. Apache HiveMall Machine Learning library implements many useful Machine Learning algorithms (Supervised classification, LDA, RandomForest, etc.) using Hive UDFs. This course focuses on Text Classification when presenting HiveMall.

Hive + HiveMall is no less (or maybe even more) attractive and efficient than Spark + Spark MLib. Also, as HiveQL is more or less SQL. Knowing SQL and knowing only SQL will allow many non-developers to enter BigData world. 

AWS Lambda is a must to know now. I show how to use it with Java to make it suitable to be a part of a BigData pipeline. AWS Lambda + Amazon EMR + Hive combination is also explained.

Solr and Hue is a search engine and visualisation dashboard combination. ElasticSearch and Kibana is another such combination. Both technologies use the same idea: use connectors to push data from Hive or Spark directly to Solr or ElasticSearch. Hue and Kibana use properties and inner data representations of their corresponding search engines to display data on a dashboard. This course shows how to integrate Hive with both technologies.

Instead of being comprehensive this course assumes a bit of prior knowledge of the topic. It teaches by presenting solutions for the problems that occurred repeatedly during the time i worked on different BigData projects. It shows how mastering small things gives you an ability to create a simple solution to almost every problem from concept to delivery.

We start with importing data to Apache Hive correctly, and slowly progress to an ability to quickly deliver results of your work as an AWS service, a Search Engine service, or a Hue dashboard. 

The course shows data processing with Hive (also teaching how to write User Defined Functions for Hive of different levels of complexity: UDF, GenericUDF, UDAF and UDTF), it shows an application of Machine Learning to Text Classification using HiveMall, and then exporting data from Hive to Solr & Hue or ElasticSearch & Kibana. You will also learn how to write an AWS Lambda that runs Hive.   

All together that gives you an ability to build a simple data processing pipeline. A data pipeline that is simple, robust and ready to be delivered and used in no time.  

Who is the target audience?
  • Anyone who has done at least one introductory BigData course.
  • People who think they know it all will most probably learn something new too.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
23 Lectures
02:28:50
+
Introduction
1 Lecture 01:43
+
Apache Hive
2 Lectures 09:58
+
Useful Third-Party UDFs for Apache Hive
5 Lectures 24:19
HyperLogLog
05:47

Integrating Hive and Solr
05:46

Row numbering and Ranking 1
06:52

Row numbering and Ranking 2
02:51

Integrating Hive and ElasticSearch
03:03
+
Custom HIve UDFs
5 Lectures 47:00

Hive UDF: Simple UDF
07:23

Hive UDF: GenericUDF
07:24

Hive UDF: GenericUDTF
07:19

Hive UDF: UDAF
16:55
+
Machine Learning on Hadoop
3 Lectures 19:41

Text Classification 1
11:43

Text Classification 2
06:42
+
AWS Lambda
3 Lectures 19:26

Simple AWS Lambda function
05:54

AWS Lambda as part of BigData pipeline
09:13
+
Visualization
4 Lectures 26:43

Solr
06:55

Hue
08:26

Kibana
08:15
About the Instructor
Dr. Elena Akhmatova
4.5 Average rating
10 Reviews
1,459 Students
2 Courses
Data Scientist

Elena works in the field of Natural Language Processing. She graduated with a degree from Saint-Petersburg State University in Russia first and then acquired PhD from Macquarie University in Sydney, Australia, where she works currently. Now she applies theoretical concepts developed in the field of Natural Language Processing to solve business problems of different big and small enterprises.

As an early adopter of BigData tools and concepts she finds existing BigData frameworks to be attractive means of working with data. She started using such tools and advising other people to adopt BigData concepts way before Hadoop, Spark and other related technologies became “must to know” tools for many IT professionals.

Sharing knowledge is something Elena enjoys doing. She believes that sharing knowledge enriches her as much as other people.