Apache Kylin : Implementing OLAP on the Hadoop platform

Name: Apache Kylin : Implementing OLAP on the Hadoop platform
Rating: 4.1 (193 reviews)

Building and querying online analytical processing data (OLAP) big data structures in your hadoop platform

Created byMichael Enudi

Last updated 6/2018

English

What you'll learn

Understand how OLAP Cube structures are created
Build and query OLAP Cubes on Hadoop Big Data Platform
Perform analytical queries on streaming data
Integrate your big data cube with external tools or application
Secure your OLAP Cube on the cluster

Course content

7 sections • 43 lectures • 6h 8m total length

Introduction4:26
Explore how Apache Kylin provides a single unified layer on Hadoop to run low-latency OLAP queries, enabling conceptual data modeling and fast analytics across data lake style environments.
What is Kylin?5:53
How Kylin Works. I10:59
Explain Kailin's end-to-end workflow and its reliance on HDFS, YARN, MapReduce, Hive, Kafka, Calcite, Spock, and Zookeeper for storage, processing, ingestion, and coordination.
How Kylin Works. II10:51
Discover how Kaylin builds olap cubes from star or snowflake models by identifying dimensions and measures to enable subsecond queries via JDBC or REST APIs.
How Kylin Works. III4:53
Installing Kylin in Hortonworks HDP Sandbox13:26
Installing Kylin in Cloudera CDH Sandbox9:41
Installing Kylin in a Custom Hadoop Environment14:47
Our First Taste of a Kylin Cube11:03
Exploring the web console4:14
Resources0:02

Introduction to AdventureWorks DW Dataset3:40
AdventureWorks Dataset Preparation12:19
Prepare Adventure Works data for OLAP on Hadoop by importing into MySQL, transferring to HDFS with Sqoop, loading into Hive, and testing sample queries to guide design for Apache Kylin.
Create Your Data Sources5:53
Implementing The Data Model8:29
Create The Cube14:30
Building The Cube4:57
Querying the Cube8:14
Troubleshooting tips12:22

Introduction to Airline on-time performance Dataset3:18
Dataset Preparation7:25
Incremental Build20:54
Running Incremental Cube Building6:23
Single Fact/Dimension Table Model10:57
Learn the single fact/dimension table model, where facts and dimensions reside in one flat table with no lookups, using the flat flight data example to build a cube.
Cube Optimization/Tuning I16:32
Cube Optimization/Tuning II8:56
Summary3:50

Introduction the Use case3:11
Explore how Apache Kylin enables OLAP on streaming data by building cube segments for each time slot from a Kafka stream, answering questions about peak hits and hourly patterns.
How Kylin with Streaming Tables work?6:25
Data Preparation & Kafka Setup10:04
Implementing OLAP Cube Over Streaming Dataset in Kafka19:13
Building The Cube With Streaming Logs6:30
Query the Cube6:21
Query the built cube in the Apache Kylin OLAP setup to count hits, measure size, and drill by month, day, hour, and status codes.
Troubleshooting8:12

Requirements

Ability to write a SQL query or use SQL query tool is required to be a Kylin User.
A good understanding of the hadoop big data platform is required to be a Kylin developer or adminstrator
Knowledge of hadoop technologies like MapReduce, Hive and HBase is necessary but not mandatory

Description

A Comprehensive Course for Learning How to Build and Query Big Data OLAP Cubes Using Apache Kylin.

Apache Kylin is an Apache top-level project that bring OLAP to Big data. This simply means that we can now write complex aggregation queries with different levels of aggregation and expect to get a second or micro-seconds response to our query.

Online analytical processing (OLAP) has been a common word in traditional business intelligence for years but has not been easy with hadoop platform that has become a data lake solution for many. These data lake often have hundreds of millions and even billions of records that organizations want to slice and dice for insights. However, the high latency of query execution in SQL on Hadoop technologies like Apache Hive or Apache Drill often meant that data architect opted to transfer their data back to traditional systems that allow for real time response to query.

Kylin solves all of this.

With Apache Kylin, anyone with the skills can now build OLAP, ROLAP or MOLAP structures using a web UI, deploy it and expect to query these structure with second of response time in mind. Also, one can connect their applications or favorite visualization tools to Kylin to integrate data either for system processing or for visualization.

In this course, we are going to review

What Kylin is
How it works
How to build OLAP cubes in batch and streaming model
How to deploy the cubes
How to query cubes
How to connect external tools and applications to Kylin

.. and many more

What is the target audience?

Big Data Engineers/Developers
Data Architects
Data Analysts.
Anyone who wishes to be able to write simple to complex aggregation queries of large dataset and wants a low latency response time.

What are the requirements?

You need access to a Big Data Sandbox like Cloudera quickstart VM, Hortonworks HDP sandbox or a cloud-based Hadoop environment with a least 10GB of Ram.
You should have some familiarity SQL and be able to use ODBC or JDBC based tools.
Some familiarity with Linux will be helpful

What do I need to know to get the best out of this course?

Because Kylin uses other hadoop projects to achieve its design a fair understanding of projects like Apache Hive, Apache Kafka, Apache HBase, MapReduce is great for this course. However, one can still use Kylin without any knowledge of these technologies.

It is also worth knowing that no prior knowledge of any big data technology is required to query Kylin or use data integration in running report or data visualizations.

Who this course is for:

Data Analysts
Big Data/Hadoop Data Engineers
Data Architects
Anyone who wants to be able to perform a complex aggregate/OLAP queries on large dataset.

Apache Kylin : Implementing OLAP on the Hadoop platform

What you'll learn

Explore related topics

Course content

Course Introduction11 lectures • 1hr 30min

Use Case 1: AdventureWorks DW8 lectures • 1hr 10min

Use Case 2: Analyzing Flight Delays8 lectures • 1hr 18min

Use Case 3: Access Log Files7 lectures • 1hr

Kylin Client Integration5 lectures • 43min

Other Features3 lectures • 26min

Conclusion1 lecture • 1min

Requirements

Description

Who this course is for: