Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Hands-On with Hadoop 2: 3-in-1
Rating: 4.2 out of 5(6 ratings)
177 students

Hands-On with Hadoop 2: 3-in-1

Run your own Hadoop clusters on your own machine or in the cloud
Last updated 11/2020
English

What you'll learn

  • Understand the Hadoop 2.x Architecture
  • Create Map-reduce jobs
  • Plan, install and configure core Hadoop services on a Cluster
  • Validate the Cluster using HDFS, Map Reduce and Spark
  • Understand Cluster Life-Cycle and Performance tuning of a Hadoop Cluster
  • Hands-on solutions to your perplexing, real-world big data problems

Course content

3 sections96 lectures10h 56m total length
  • The Course Overview3:43

    This video gives an overview of the entire course.

  • Installing Hadoop in Local22:50

    In this video we’ll learn how to install Hadoop on our local system.

    • Prerequisites to the Hadoop installation
    • Hadoop Installation
    • Testing our Hadoop installation
  • Bring Process to Data4:43

    The important part of selecting the Hadoop framework for your own solution is to understand why it is a good fit for your application.

    • Understand the Hadoop way of execution
    • Fit your own application and see if it will benefit from it
  • NameNode Versus DataNode4:15

    Understand the difference between the two nodes in HDFS; Datanode and Namenode

    • Understand data resiliency
    • Distributed data source
  • Map and Reduce Operations7:59

    The new term Map-Reduce… what does it mean and how does it solve a problem?

    • Understand what map-reduce operation is
    • Dig deep with different parts of map-reduce in the Hadoop world
    • Devise a way to implement your own problem with a map-reduce operation


  • Order of Execution and Parallel Thinking4:39

    When jumping to parallel programming from serial programming, it is always hard to plan the computation.

    • Figure out the pitfalls of parallel data
    • See how the Hadoop parallel process works
    • Start parallel thinking
  • Formatting a HDFS6:38

    Prepare your HDD for with HDFS

    • Have the HDFS
    • Fit your own application and see if it will benefit with it
  • Formatting a HDFS4:34

    Copy data to/from HDFS.

    • Copy data to HDFS
    • Copy data from HDFS
  • Some Helpful Commands to Communicate with the HDFS3:35

    Using the HDFS commands in the shell.

    • Find the basic difference between generic shell command and HDFS command
    • Get used to the different ways of writing the commands
  • HDFS Protocol and Using It in Applications11:11

    How do we access the HDFS files from a java program

    • Connect to the file system HDFS protocol
    • Fetch data from HDFS
    • Put data in HDFS
  • Hadoop Jobs Versus Tasks4:47

    What are Hadoop jobs and tasks?

    • Understand how tasks are communicated
    • Understand how jobs are run
  • The Hadoop UI for Task Progress4:06

    How to see the process flow and progress of a Hadoop job.

    • Open the Hadoop UI
    • Assess the memory and process
  • Running a Couple of Example Jobs10:08

    Run Hadoop jobs.

    • Run Hadoop jobs directly
    • Run Hadoop jobs on yarn
  • Analyze the Work Flow/Data Flow/Process Flow7:26

    In this video, we are going to look at how the map and reduce gets executed

    • Get to know about the data flow
    • Discover how map gets executed
    • Discover how reduce gets executed
  • Introduction to the Movie Dataset4:04
  • Data Transformation and Storing to HDFS17:55

    Prepare the data to be fit for our algorithm

    • Split the data to be transformed
    • Transform the data using Hadoop
    • Merge the data using a basic java application
  • Devise a Simple Algorithm for Recommendation4:06

    Devise a simple algorithm for recommendation

    • Create an algorithm to prepare data for recommendation by genre
    • Understand the data format for that output
  • Implement the Algorithm in Hadoop Map-Reduce Way and Analyze Performance10:39

    Implement the map-reduce for the transformation of the movie -> genre context

    • Create a map-reduce job for this problem
    • Take different splits for different performance assessment

Requirements

  • Good knowledge of Java

Description

Hadoop is the most popular, reliable and scalable distributed computing and storage for Big Data solutions. It comprises of components designed to enable tasks on a distributed scale, across multiple servers and thousands of machines.

This comprehensive 3-in-1 training course gives you a strong foundation by exploring Hadoop ecosystem with real-world examples. You’ll discover the process to set up an HDFS cluster along with formatting and data transfer in between your local storage and the Hadoop filesystem. Also get a hands-on solution to 10 real-world use-cases using Hadoop.

Contents and Overview This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.

The first course, Getting Started with Hadoop 2.x, opens with an introduction to the world of Hadoop, where you will learn Nodes, Data Sets, and operations such as map and reduce. The second section deals HDFS, Hadoop's file-system used to store data. Further on, you’ll discover the differences between jobs and tasks, and get to know about the Hadoop UI. After this, we turn our attention to storing data in HDFS and Data Transformations. Lastly, we will learn how to implement an algorithm in Hadoop map-reduce way and analyze the overall performance.

The second course, Hadoop Administration and Cluster Management, starts by installing the Apache Hadoop for cluster installation and configuring the required services. Learn various cluster operations like validations, and expanding and shrinking Hadoop services. You will then move onto gain a better understanding of administrative tasks like planning your cluster, monitoring, logging, security, troubleshooting and best practices. Techniques to keep your Hadoop clusters highly available and reliant are also covered in this course.

The third course, Solving 10 Hadoop'able Problems, covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.

By the end of this Learning Path, you’ll be able to plan, deploy, manage and monitor and performance-tune your Hadoop Cluster with Apache Hadoop.

About the Author

A K M Zahiduzzaman is a software engineer with NewsCred Dhaka. He is a software developer and technology enthusiast. He was a Ruby on Rails developer, but now working on NodeJS and angularJS and python. He is also working with a much wider vision as a technology company. The next goal is introducing SOA within the current applications to scale development via microservices. Zahiduzzaman has a lot of experience with Spark and is passionate about it. He is also a guitarist and has a band too. He was also a speaker for an international event in Dhaka. He is very enthusiastic and love to share his knowledge.

Gurmukh Singh is a technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo and has authored the book Monitoring Hadoop.               

Tomasz Lelek is a Software Engineer and Co-Founder of InitLearn. He mostly does programming in Java and Scala. He dedicates his time and efforts to get better at everything. He is currently delving into big data technologies. Tomasz is very passionate about everything associated with software development. He has been a speaker at a few conferences in Poland-Confitura and JDD, and at the Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He was also a speaker at an international event in Dhaka. He is very enthusiastic and loves to share his knowledge.

Who this course is for:

  • This course is perfect for budding data scientists and data analysts with a firm understanding of Java and wants to get started with Hadoop