Hadoop on Azure. An Introduction to Big Data Using HDInsight
3.0 (41 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
418 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Hadoop on Azure. An Introduction to Big Data Using HDInsight to your Wishlist.

Add to Wishlist

Hadoop on Azure. An Introduction to Big Data Using HDInsight

A Pragmatic Introduction To HDInsight
3.0 (41 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
418 students enrolled
Created by Mike West
Last updated 12/2015
Current price: $10 Original price: $20 Discount: 50% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 26 mins on-demand video
  • 5 mins on-demand audio
  • 14 Articles
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Understand the basics concepts of Big Data.
  • Understand the mostly widely used tool for working with big data... Hadoop.
  • Learn what Microsoft is doing to help organizations create Hadoop clusters faster and more affordably than ever before.
View Curriculum
  • Only the desire to learn Microsoft's direction for Big Data.
  • The basic concepts behind cloud architecture and SQL Server would be beneficial.

Massive amounts of data are being collected on just about everything and only a small part of that data is being analyzed.

In 2014, every second over 5700 tweets were sent and 870 Facebook links were sent.

In 2013, about 4.4 zettabytes of data were created and approximately 5% of it was analyzed.

By 2020, it’s estimated that we will collect 44 zettabytes of data and the amount we analyze will jump to 40%.

One of the most overused words in recent times is “Big Data”

But what does the word really mean?

Big data refers to data being collected in ever-escalating volumes, at increasingly high velocities, and for a widening variety of unstructured formats and variable semantic contexts.

Big data describes any large body of digital information, from the text in a Twitter feed, to the sensor information from industrial equipment, to information about customer browsing and purchases on an online catalog.

Big data can be historical (meaning stored data) or real-time (meaning streamed directly from the source).

For big data to provide actionable intelligence or insight, not only must the right questions be asked and data be relevant to the issues be collected, the data must be accessible, cleaned, analyzed, and then presented in a useful way.

HDInsight is a cloud implementation on Microsoft Azure of the rapidly exanding Apache Hadoop technology stack that is the go-to solution for big data analysis.

It includes implementations of Storm, HBase, Pig, Hive, Sqoop, Oozie, Ambari, and so on. HDInsight also integrates with business intelligence (BI) tools such as Excel, SQL Server Analysis Services, and SQL Server Reporting Services.

Note: This is not a hands on course. This course creates a knowledge foundation for my next course in this series which is using what we've learned to create a real world end to end big data solution with Azure HDInsight.

Who is the target audience?
  • You are a developer, DBA or windows admin seeking to learn how Microsoft is chaining the Big Data landscape.
  • You curios about Big Data and want to learn the more.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
An Introduction To Big Data on Azure
6 Lectures 06:54

What's the course about?

In this course we are going to cover Hadoop on Azure.

Microsoft's take on Big Data is a game changer.

Preview 01:18

Approximately 90% of all organizational data is unstructured.

That means only 10% is stored in traditional relational databases.

The amount of data stored and analyzed will grow exponentially in the coming years.

Preview 01:20

This is a very specific course.

This course will focus on Microsoft's approach to big data.

We will be learning Azure HDInsight.

What Are We Going To Cover In This Course

I want to make sure you are in the right place.

If you are looking to learn Microsoft's approach to big data then this course is for you.

This is not a traditional Hadoop course.

Is This Course Right For You?

Let's cover some of the key terminology associated with this section.


Let's wrap up what we've learned.


9 questions
Hadoop High Level Overview
9 Lectures 11:13

Big Data is an ecosystem and Hadoop is a product.

Let's learn what Hadoop really is.

Preview 01:08

There's a lot of new terminology here.

Let's learn what's involved and what the key components to Hadoop are.

The High Level Hadoop Ecosystem

One of the core components of Hadoop is the NameNode.

Let's learn what it is in this lecture.

Preview 02:03

MapReduce is both and engine and a programming model.

Let's learn about the map and the reduce.

Preview 01:27

The Pig programming language is designed to handle any kind of data, hence the name.

Let's learn about the two most prevalent Hadoop languages.


MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN

Let's learn about this new feature.

Preview 01:38

YARN was designed for separation of duties.

In this short lecture let's look at a visual representation of YARN.

YARN - Separation of Duties

Let's learn the new vernacular introduced in this section.


Let's wrap up what we've covered so far in this section.


11 questions
HDInsight In Your Data Lake
9 Lectures 12:03

At this juncture, Azure Data Lake is made up of three different services.

Let's learn about them in this lecture.

Preview 00:48

The data lake concept is fairly new.

Let's learn what a data lake is and more importantly, if we need one or not.

Preview 02:05

Let's talk about Microsoft's Azure Data Lake in the Cloud.

Azure Data Lake Store

In this lecture we will learn about Hadoop in the cloud.


A new service built on Apache YARN that dynamically scales distributed infrastructure

Azure Data Lake Analytics

Azure Data Factory

Microsoft's new language for big data.

U-SQL. The New Language For Working With Big Data

Let's cover the key terms used in this section.


Let's look at some bullet points on what we've covered in this section.


10 questions
HDInsight on Azure
6 Lectures 07:19

In order to start our work with big data in the cloud we will need an account.

In this lecture we navigate to the URL to create one.

Create Azure Account

In order to start working with our HDInsight clusters we have on major dependency.

We need a storage account.

In this lesson we will learn how to create a storage account and provision our first cluster.

Create Storage and Provision Our First Cluster

In this lesson we will learn the basics of managing our cluster.

We will also look at the new portal... which provides us a much more granular view of our clusters.

The HDInsight Management Dashboard

In this lesson let's learn how to remote into our cluster.

RDP Into Cluster

Let's go over the key words in this lesson.


Let's go over the high points on what we've covered in this section.


10 questions
2 Lectures 00:36

We've covered a lot of new information in this course.

My next course will provide real world examples to working with big data.

We will create native MapReduce jobs and learn more about U-SQL.

What's Next?

Let's wrap up what we've covered in this course.

About the Instructor
Mike West
4.2 Average rating
2,916 Reviews
49,024 Students
42 Courses
SQL Server and Machine Learning Evangelist

I've been a production SQL Server DBA most of my career.

I've worked with databases for over two decades. I've worked for or consulted with over 50 different companies as a full time employee or consultant. Fortune 500 as well as several small to mid-size companies. Some include: Georgia Pacific, SunTrust, Reed Construction Data, Building Systems Design, NetCertainty, The Home Shopping Network, SwingVote, Atlanta Gas and Light and Northrup Grumman.

Experience, education and passion

I learn something almost every day. I work with insanely smart people. I'm a voracious learner of all things SQL Server and I'm passionate about sharing what I've learned. My area of concentration is performance tuning. SQL Server is like an exotic sports car, it will run just fine in anyone's hands but put it in the hands of skilled tuner and it will perform like a race car.


Certifications are like college degrees, they are a great starting points to begin learning. I'm a Microsoft Certified Database Administrator (MCDBA), Microsoft Certified System Engineer (MCSE) and Microsoft Certified Trainer (MCT).


Born in Ohio, raised and educated in Pennsylvania, I currently reside in Atlanta with my wife and two children.