This course is for people who want to learn how to do things, not just to fill their heads with important concepts, paradigms, and heaps of information they kind of know but have no idea how to use.
This course works you through the full Big Data process:
Apache Hive is an easy SQL based tool that allows to process large amounts of data on Hadoop fast. Hive gained popularity immediately after Hadoop MapReduce became widely used as it allows to work with data by means of SQL queries. It is used by many organisations to process their data. This course shows a number of interesting Hive queries and explains what Hive UDFs are.
Apache HiveMall is a Machine Learning library of tomorrow. Like Hive it allows to use complex machine learning algorithms knowing SQL only. No need to code, compile and debug! It is really easy to use for programmers and non-programmers. Apache HiveMall Machine Learning library implements many useful Machine Learning algorithms (Supervised classification, LDA, RandomForest, etc.) using Hive UDFs. This course focuses on Text Classification when presenting HiveMall.
Hive + HiveMall is no less (or maybe even more) attractive and efficient than Spark + Spark MLib. Also, as HiveQL is more or less SQL. Knowing SQL and knowing only SQL will allow many non-developers to enter BigData world.
AWS Lambda is a must to know now. I show how to use it with Java to make it suitable to be a part of a BigData pipeline. AWS Lambda + Amazon EMR + Hive combination is also explained.
Solr and Hue is a search engine and visualisation dashboard combination. ElasticSearch and Kibana is another such combination. Both technologies use the same idea: use connectors to push data from Hive or Spark directly to Solr or ElasticSearch. Hue and Kibana use properties and inner data representations of their corresponding search engines to display data on a dashboard. This course shows how to integrate Hive with both technologies.
Instead of being comprehensive this course assumes a bit of prior knowledge of the topic. It teaches by presenting solutions for the problems that occurred repeatedly during the time i worked on different BigData projects. It shows how mastering small things gives you an ability to create a simple solution to almost every problem from concept to delivery.
We start with importing data to Apache Hive correctly, and slowly progress to an ability to quickly deliver results of your work as an AWS service, a Search Engine service, or a Hue dashboard.
The course shows data processing with Hive (also teaching how to write User Defined Functions for Hive of different levels of complexity: UDF, GenericUDF, UDAF and UDTF), it shows an application of Machine Learning to Text Classification using HiveMall, and then exporting data from Hive to Solr & Hue or ElasticSearch & Kibana. You will also learn how to write an AWS Lambda that runs Hive.
All together that gives you an ability to build a simple data processing pipeline. A data pipeline that is simple, robust and ready to be delivered and used in no time.
Elena works in the field of Natural Language Processing. She graduated with a degree from Saint-Petersburg State University in Russia first and then acquired PhD from Macquarie University in Sydney, Australia, where she works currently. Now she applies theoretical concepts developed in the field of Natural Language Processing to solve business problems of different big and small enterprises.
As an early adopter of BigData tools and concepts she finds existing BigData frameworks to be attractive means of working with data. She started using such tools and advising other people to adopt BigData concepts way before Hadoop, Spark and other related technologies became “must to know” tools for many IT professionals.
Sharing knowledge is something Elena enjoys doing. She believes that sharing knowledge enriches her as much as other people.