Hadoop on Azure. An Introduction to Big Data Using HDInsight

A Pragmatic Introduction To HDInsight
4.1 (19 ratings)
Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
205 students enrolled
Instructed by Mike West IT & Software / Other
25% off
Take This Course
  • Lectures 32
  • Length 38 mins
  • Skill Level Beginner Level
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works


Find online courses made by experts from around the world.


Take your courses with you and learn anywhere, anytime.


Learn and practice real-world skills and achieve your goals.

About This Course

Published 12/2015 English

Course Description

Massive amounts of data are being collected on just about everything and only a small part of that data is being analyzed.

In 2014, every second over 5700 tweets were sent and 870 Facebook links were sent.

In 2013, about 4.4 zettabytes of data were created and approximately 5% of it was analyzed.

By 2020, it’s estimated that we will collect 44 zettabytes of data and the amount we analyze will jump to 40%.

One of the most overused words in recent times is “Big Data”

But what does the word really mean?

Big data refers to data being collected in ever-escalating volumes, at increasingly high velocities, and for a widening variety of unstructured formats and variable semantic contexts.

Big data describes any large body of digital information, from the text in a Twitter feed, to the sensor information from industrial equipment, to information about customer browsing and purchases on an online catalog.

Big data can be historical (meaning stored data) or real-time (meaning streamed directly from the source).

For big data to provide actionable intelligence or insight, not only must the right questions be asked and data be relevant to the issues be collected, the data must be accessible, cleaned, analyzed, and then presented in a useful way.

HDInsight is a cloud implementation on Microsoft Azure of the rapidly exanding Apache Hadoop technology stack that is the go-to solution for big data analysis.

It includes implementations of Storm, HBase, Pig, Hive, Sqoop, Oozie, Ambari, and so on. HDInsight also integrates with business intelligence (BI) tools such as Excel, SQL Server Analysis Services, and SQL Server Reporting Services.

Note: This is not a hands on course. This course creates a knowledge foundation for my next course in this series which is using what we've learned to create a real world end to end big data solution with Azure HDInsight.

What are the requirements?

  • Only the desire to learn Microsoft's direction for Big Data.
  • The basic concepts behind cloud architecture and SQL Server would be beneficial.

What am I going to get from this course?

  • Understand the basics concepts of Big Data.
  • Understand the mostly widely used tool for working with big data... Hadoop.
  • Learn what Microsoft is doing to help organizations create Hadoop clusters faster and more affordably than ever before.

Who is the target audience?

  • You are a developer, DBA or windows admin seeking to learn how Microsoft is chaining the Big Data landscape.
  • You curios about Big Data and want to learn the more.

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.


Section 1: An Introduction To Big Data on Azure

What's the course about?

In this course we are going to cover Hadoop on Azure.

Microsoft's take on Big Data is a game changer.


Approximately 90% of all organizational data is unstructured.

That means only 10% is stored in traditional relational databases.

The amount of data stored and analyzed will grow exponentially in the coming years.


This is a very specific course.

This course will focus on Microsoft's approach to big data.

We will be learning Azure HDInsight.


I want to make sure you are in the right place.

If you are looking to learn Microsoft's approach to big data then this course is for you.

This is not a traditional Hadoop course.


Let's cover some of the key terminology associated with this section.


Let's wrap up what we've learned.

9 questions
Section 2: Hadoop High Level Overview

Big Data is an ecosystem and Hadoop is a product.

Let's learn what Hadoop really is.


There's a lot of new terminology here.

Let's learn what's involved and what the key components to Hadoop are.


One of the core components of Hadoop is the NameNode.

Let's learn what it is in this lecture.


MapReduce is both and engine and a programming model.

Let's learn about the map and the reduce.


The Pig programming language is designed to handle any kind of data, hence the name.

Let's learn about the two most prevalent Hadoop languages.


MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN

Let's learn about this new feature.


YARN was designed for separation of duties.

In this short lecture let's look at a visual representation of YARN.


Let's learn the new vernacular introduced in this section.


Let's wrap up what we've covered so far in this section.

11 questions
Section 3: HDInsight In Your Data Lake

At this juncture, Azure Data Lake is made up of three different services.

Let's learn about them in this lecture.


The data lake concept is fairly new.

Let's learn what a data lake is and more importantly, if we need one or not.


Let's talk about Microsoft's Azure Data Lake in the Cloud.


In this lecture we will learn about Hadoop in the cloud.


A new service built on Apache YARN that dynamically scales distributed infrastructure

Azure Data Factory

Microsoft's new language for big data.


Let's cover the key terms used in this section.


Let's look at some bullet points on what we've covered in this section.

10 questions
Section 4: HDInsight on Azure

In order to start our work with big data in the cloud we will need an account.

In this lecture we navigate to the URL to create one.


In order to start working with our HDInsight clusters we have on major dependency.

We need a storage account.

In this lesson we will learn how to create a storage account and provision our first cluster.


In this lesson we will learn the basics of managing our cluster.

We will also look at the new portal... which provides us a much more granular view of our clusters.


In this lesson let's learn how to remote into our cluster.


Let's go over the key words in this lesson.


Let's go over the high points on what we've covered in this section.

10 questions
Section 5: Conclusion

We've covered a lot of new information in this course.

My next course will provide real world examples to working with big data.

We will create native MapReduce jobs and learn more about U-SQL.


Let's wrap up what we've covered in this course.

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Mike West, SQL Server Evangelist

I've been a production SQL Server DBA most of my career.

I've worked with databases for over two decades. I've worked for or consulted with over 50 different companies as a full time employee or consultant. Fortune 500 as well as several small to mid-size companies. Some include: Georgia Pacific, SunTrust, Reed Construction Data, Building Systems Design, NetCertainty, The Home Shopping Network, SwingVote, Atlanta Gas and Light and Northrup Grumman.

Experience, education and passion

I learn something almost every day. I work with insanely smart people. I'm a voracious learner of all things SQL Server and I'm passionate about sharing what I've learned. My area of concentration is performance tuning. SQL Server is like an exotic sports car, it will run just fine in anyone's hands but put it in the hands of skilled tuner and it will perform like a race car.


Certifications are like college degrees, they are a great starting points to begin learning. I'm a Microsoft Certified Database Administrator (MCDBA), Microsoft Certified System Engineer (MCSE) and Microsoft Certified Trainer (MCT).


Born in Ohio, raised and educated in Pennsylvania, I currently reside in Atlanta with my wife and two children.

Ready to start learning?
Take This Course