Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA Amazon AWS CompTIA Security+ AWS Certified Developer - Associate
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Personal Development Mindfulness Personal Transformation Meditation Life Purpose Coaching Emotional Intelligence
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Retargeting
SQL Microsoft Power BI Tableau Business Analysis Business Intelligence MySQL Data Analysis Data Modeling Data Cleaning
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Online Business Business Plan Startup Freelancing Blogging Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
30-Day Money-Back Guarantee

This course includes:

  • 14.5 hours on-demand video
  • 5 articles
  • 2 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
Business Business Analytics & Intelligence Big Data

The Ultimate Hands-On Hadoop: Tame your Big Data!

Hadoop tutorial with MapReduce, HDFS, Spark, Flink, Hive, HBase, MongoDB, Cassandra, Kafka + more! Over 25 technologies.
Bestseller
Rating: 4.5 out of 54.5 (23,012 ratings)
124,447 students
Created by Sundog Education by Frank Kane, Frank Kane
Last updated 10/2020
English
English, French [Auto], 
30-Day Money-Back Guarantee

What you'll learn

  • Design distributed systems that manage "big data" using Hadoop and related technologies.
  • Use HDFS and MapReduce for storing and analyzing data at scale.
  • Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
  • Analyze relational data using Hive and MySQL
  • Analyze non-relational data using HBase, Cassandra, and MongoDB
  • Query data interactively with Drill, Phoenix, and Presto
  • Choose an appropriate data storage technology for your application
  • Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.
  • Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume
  • Consume streaming data using Spark Streaming, Flink, and Storm
Curated for the Udemy for Business collection

Requirements

  • You will need access to a PC running 64-bit Windows, MacOS, or Linux with an Internet connection and at least 8GB of *free* (not total) RAM, if you want to participate in the hands-on activities and exercises. If your PC does not meet these requirements, you can still follow along in the course without doing hands-on activities.
  • Some activities will require some prior programming experience, preferably in Python or Scala.
  • A basic familiarity with the Linux command line will be very helpful.

Description

The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this Hadoop tutorial, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems!

Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

  • Install and work with a real Hadoop installation right on your desktop with Hortonworks (now part of Cloudera) and the Ambari UI

  • Manage big data on a cluster with HDFS and MapReduce

  • Write programs to analyze data on Hadoop with Pig and Spark

  • Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto

  • Design real-world systems using the Hadoop ecosystem

  • Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue

  • Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm

Understanding Hadoop is a highly valuable skill for anyone working at companies with large amounts of data.

Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM,  Spotify, Twitter, and Yahoo! And it's not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.

This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory.

You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines, we'll show you how to work with them too. And if you're a programmer, I'll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.

You'll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end! 

Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.

Knowing how to wrangle "big data" is an incredibly valuable skill for today's top tech employers. Don't be left behind - enroll now!


  • "The Ultimate Hands-On Hadoop... was a crucial discovery for me. I supplemented your course with a bunch of literature and conferences until I managed to land an interview. I can proudly say that I landed a job as a Big Data Engineer around a year after I started your course. Thanks so much for all the great content you have generated and the crystal clear explanations. " - Aldo Serrano

  • "I honestly wouldn’t be where I am now without this course. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment.   This course helped me achieve a far greater understanding of the environment and its capabilities.  Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment." - Tyler Buck

Who this course is for:

  • Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend "big data" at scale.
  • Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
  • Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
  • System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.

Featured review

Aayush Gautam
Aayush Gautam
8 courses
5 reviews
Rating: 4.5 out of 57 months ago
Well thank you Frank for sharing your such great knowledge and giving me a right path towards Excellence. This course taught me soo many things like patience,hard-work and many more as this was a new field for me. The content was great and was beautifully explained by Frank. This course covers mostly all the topics which a beginner needs to learn. Thanx once again for this course i hope ill surely implement all this knowledge and land up in a good job profile thanxxxx :)

Course content

12 sections • 101 lectures • 14h 38m total length

  • Preview02:10
  • Tips for Using This Course
    01:09
  • If you have trouble downloading Hortonworks Data Platform...
    00:31
  • Installing Hadoop [Step by Step]
    Preview19:03
  • Preview03:01
  • Hadoop Overview and History
    Preview07:44
  • Overview of the Hadoop Ecosystem
    16:46

  • HDFS: What it is, and how it works
    13:53
  • Installing the MovieLens Dataset
    Preview06:20
  • [Activity] Install the MovieLens dataset into HDFS using the command line
    07:50
  • MapReduce: What it is, and how it works
    10:40
  • How MapReduce distributes processing
    12:57
  • MapReduce example: Break down movie ratings by rating score
    11:35
  • Troubleshooting tips: installing pip and mrjob
    00:26
  • [Activity] Installing Python, MRJob, and nano
    07:43
  • [Activity] Code up the ratings histogram MapReduce job and run it
    Preview07:36
  • [Exercise] Rank movies by their popularity
    07:06
  • [Activity] Check your results against mine!
    08:23

  • Introducing Ambari
    09:49
  • Introducing Pig
    06:25
  • Example: Find the oldest movie with a 5-star rating using Pig
    15:07
  • Preview09:40
  • More Pig Latin
    07:34
  • [Exercise] Find the most-rated one-star movie
    01:56
  • Pig Challenge: Compare Your Results to Mine!
    05:37

  • Why Spark?
    10:06
  • The Resilient Distributed Dataset (RDD)
    10:13
  • [Activity] Find the movie with the lowest average rating - with RDD's
    15:33
  • Preview06:28
  • [Activity] Find the movie with the lowest average rating - with DataFrames
    10:00
  • [Activity] Movie recommendations with MLLib
    Preview12:16
  • [Exercise] Filter the lowest-rated movies by number of ratings
    02:51
  • [Activity] Check your results against mine!
    06:40

  • What is Hive?
    06:31
  • [Activity] Use Hive to find the most popular movie
    10:45
  • How Hive works
    Preview09:10
  • [Exercise] Use Hive to find the movie with the highest average rating
    01:55
  • Compare your solution to mine.
    04:10
  • Integrating MySQL with Hadoop
    08:00
  • [Activity] Install MySQL and import our movie data
    07:45
  • [Activity] Use Sqoop to import data from MySQL to HFDS/Hive
    07:31
  • [Activity] Use Sqoop to export data from Hadoop to MySQL
    07:16

  • Why NoSQL?
    13:54
  • What is HBase
    12:55
  • [Activity] Import movie ratings into HBase
    13:28
  • [Activity] Use HBase with Pig to import data at scale.
    11:19
  • Cassandra overview
    14:50
  • If you have trouble installing Cassandra...
    00:58
  • [Activity] Installing Cassandra
    11:18
  • [Activity] Write Spark output into Cassandra
    11:00
  • MongoDB overview
    17:19
  • [Activity] Install MongoDB, and integrate Spark with MongoDB
    12:44
  • [Activity] Using the MongoDB shell
    07:48
  • Choosing a database technology
    Preview15:59
  • [Exercise] Choose a database for a given problem
    05:00

  • Overview of Drill
    07:55
  • [Activity] Setting up Drill
    10:58
  • Preview07:07
  • Overview of Phoenix
    08:55
  • [Activity] Install Phoenix and query HBase with it
    07:08
  • [Activity] Integrate Phoenix with Pig
    11:45
  • Overview of Presto
    06:39
  • [Activity] Install Presto, and query Hive with it.
    12:26
  • Preview09:01

  • YARN explained
    Preview10:01
  • Tez explained
    04:56
  • [Activity] Use Hive on Tez and measure the performance benefit
    08:35
  • Mesos explained
    07:13
  • ZooKeeper explained
    13:10
  • [Activity] Simulating a failing master with ZooKeeper
    06:47
  • Oozie explained
    11:56
  • Import setup step for Oozie on HDP 2.6.5!
    00:13
  • [Activity] Set up a simple Oozie workflow
    16:46
  • Zeppelin overview
    05:01
  • [Activity] Use Zeppelin to analyze movie ratings, part 1
    12:28
  • [Activity] Use Zeppelin to analyze movie ratings, part 2
    09:46
  • Hue overview
    08:07
  • Other technologies worth mentioning
    04:35

  • Kafka explained
    09:48
  • [Activity] Setting up Kafka, and publishing some data.
    07:24
  • [Activity] Publishing web logs with Kafka
    10:21
  • Flume explained
    10:16
  • [Activity] Set up Flume and publish logs with it.
    07:46
  • [Activity] Set up Flume to monitor a directory and store its data in HDFS
    Preview09:12

  • Spark Streaming: Introduction
    14:27
  • [Activity] Analyze web logs published with Flume using Spark Streaming
    14:20
  • [Exercise] Monitor Flume-published logs for errors in real time
    02:02
  • Exercise solution: Aggregating HTTP access codes with Spark Streaming
    04:24
  • Apache Storm: Introduction
    09:27
  • [Activity] Count words with Storm
    14:35
  • Preview06:53
  • [Activity] Counting words with Flink
    10:20

Instructors

Sundog Education by Frank Kane
Founder, Sundog Education. Machine Learning Pro
Sundog Education by Frank Kane
  • 4.5 Instructor Rating
  • 96,488 Reviews
  • 434,577 Students
  • 22 Courses

Sundog Education's mission is to make highly valuable career skills in big data, data science, and machine learning accessible to everyone in the world. Our consortium of expert instructors shares our knowledge in these emerging fields with you, at prices anyone can afford. 

Sundog Education is led by Frank Kane and owned by Frank's company, Sundog Software LLC. Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

Due to our volume of students we are unable to respond to private messages; please post your questions within the Q&A of your course. Thanks for understanding.

Frank Kane
Founder, Sundog Education
Frank Kane
  • 4.5 Instructor Rating
  • 93,154 Reviews
  • 390,480 Students
  • 14 Courses

Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

Due to our volume of students, I am unable to respond to private messages; please post your questions within the Q&A of your course. Thanks for understanding.

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.