Real World Hadoop - Hands on Enterprise Distributed Storage.
4.5 (5 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
172 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Real World Hadoop - Hands on Enterprise Distributed Storage. to your Wishlist.

Add to Wishlist

Real World Hadoop - Hands on Enterprise Distributed Storage.

Master the art of manipulating files within a distributed storage enterprise platform with hands on Hadoop HDFS.
4.5 (5 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
172 students enrolled
Created by Toyin Akin
Last updated 12/2016
English
Curiosity Sale
Current price: $10 Original price: $90 Discount: 89% off
30-Day Money-Back Guarantee
Includes:
  • 2.5 hours on-demand video
  • 1 Article
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Learn how to navigate the HDFS file system
  • If you want to build a HDFS stack, Simply run a single command on your desktop, go for a coffee, and come back with a running distributed environment for cluster deployment
  • Quickly build an environment where Cloudera and HDFS software can be installed.
  • Ability to automate the installation of software across multiple Virtual Machines
View Curriculum
Requirements
  • Basic programming or scripting experience is required.
  • You will need a desktop PC and an Internet connection. The course is created with Windows in mind.
  • The software needed for this course is freely available
  • You will require a computer with a Virtualization chipset support - VT-x. Most computers purchased over the last five years should be good enough
  • Optional : Some exposure to Linux and/or Bash shell environment
  • 64-bit Windows operating system required (Would recommend Windows 7 or above)
  • This course is not recommened if you have no desire to work with/in distributed computing
  • Optional : This course is built on top of - "Real World Vagrant - Automate a Cloudera Manager Build"
Description

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems.

We will be manipulating the HDFS File System, however why are Enterprises interested in HDFS to begin with?

However, the differences from other distributed file systems are significant.

HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. 

HDFS provides high throughput access to application data and is suitable for applications that have large data sets. 

HDFS relaxes a few POSIX requirements to enable streaming access to file system data. 

HDFS is part of the Apache Hadoop Core project.

Hardware failure is the norm rather than the exception. An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system’s data. The fact that there are a huge number of components and that each component has a non-trivial probability of  failure means that some component of HDFS is always non-functional. Therefore, detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.

Applications that run on HDFS have large data sets. A typical file in  HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale  to hundreds of nodes in a single cluster. It should support tens of   millions of files in a single instance.

A computation requested by an application is much more efficient if it  is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and  increases the overall throughput of the system. The assumption is that 
it is often better to migrate the computation closer to where the data  is located rather than moving the data to where the application is running. HDFS provides interfaces for applications to move themselves  closer to where the data is located. 

.

Here I present a curriculum as to the current state of my Cloudera courses.

My Hadoop courses are based on Vagrant so that you can practice and destroy your virtual environment before applying the installation onto real servers/VMs.

.

For those with little or no knowledge of the Hadoop eco system Udemy course : Big Data Intro for IT Administrators, Devs and Consultants

.

I would first practice with Vagrant so that you can carve out a virtual environment on your local desktop. You don't want to corrupt your physical servers if you do not understand the steps or make a mistake. Udemy course : Real World Vagrant For Distributed Computing

.

I would then, on the virtual servers, deploy Cloudera Manager plus agents. Agents are the guys that will sit on all the slave nodes ready to deploy your Hadoop services Udemy course : Real World Vagrant - Automate a Cloudera Manager Build

.

Then deploy the Hadoop services across your cluster (via the installed Cloudera Manager in the previous step). We look at the logic regarding the placement of master and slave services. Udemy course : Real World Hadoop - Deploying Hadoop with Cloudera Manager

.

If you want to play around with HDFS commands (Hands on distributed file manipulation). Udemy course : Real World Hadoop - Hands on Enterprise Distributed Storage.

.

You can also automate the deployment of the Hadoop services via Python (using the Cloudera Manager Python API). But this is an advanced step and thus I would make sure that you understand how to manually deploy the Hadoop services first. Udemy course : Real World Hadoop - Automating Hadoop install with Python!

.

There is also the upgrade step. Once you have a running cluster, how do you upgrade to a newer hadoop cluster (Both for Cloudera Manager and the Hadoop Services). Udemy course : Real World Hadoop - Upgrade Cloudera and Hadoop hands on


Who is the target audience?
  • Software engineers who want to expand their skills into the world of distributed computing
  • System Engineers that want to expand their skillsets beyond the single server
  • Developers who want to write/test their code against a valid distributed enviroment
Students Who Viewed This Course Also Viewed
Curriculum For This Course
18 Lectures
02:34:09
+
Navigating the HDFS File System
2 Lectures 08:07

Simple introduction to the course.

Preview 08:07

Suggested course curriculum to follow ...
Preview 00:00
+
HDFS Theory and Installation
3 Lectures 42:25

Here we whiteboard by stepping through the topology and benefits of HDFS

Walking through the topology and benefits of HDFS
17:47

HDFS needs to be installed and for our Cluster, we are using Cloudera Manager. The HDFS binaries are contained within Cloudera's Parcel and thus we break down how the binaries are distributed to all the Hadoop nodes.

Here we break down how HDFS can be installed
10:52

Here, we step through the installation of HDFS using Cloudera Manager

We step through the installation of HDFS using Cloudera Manager
13:46
+
Navigating the Distributed Storage using - hdfs dfs
5 Lectures 34:22

Here, we start to compare the "hdfs dfs" commands to our regular "bash" commands. They look pretty similar!!

Comparing "hdfs dfs" commands to our regular "bash" commands
05:46

Here we use the hdfs "superuser" account in order to create a userspace for regular users to access the distributed file system in order to read/write files.

Creating a userspace within hdfs for users to read/write files
07:17

Here we upload a file into HDFS and view some details

Here we upload a file into HDFS and view some details
08:45

It sounds naughty, but here we look at - hdfs fsck
It sounds naughty, but here we look at - hdfs fsck
05:53

Here we look at hdfs - ls, rm and expunge
Here we look at hdfs - ls, rm and expunge
06:41
+
OK, so how can one Add or Remove Files within a Distributed System.
4 Lectures 33:22

We take a closer look at deleting files along with the skipTrash option

We take a closer look at deleting files along with the skipTrash option
06:40

Here we look at the hdfs commands - mkdir, appendToFile, cat and tail
Here we look at the hdfs commands - mkdir, appendToFile, cat and tail
10:44

Here we learn to search for files within hdfs

Here we learn to search for files within hdfs
06:05

Here we look at the hdfs get and getmerge commands

Preview 09:53
+
So how easy (or hard!) is it to Manage a Large Distributed Cluster
3 Lectures 22:11

Here we look at how we can count files and directories within hdfs

Here we look at how we can count files and directories within hdfs
06:23

Here we look at how we can copy and moved files within hdfs
Here we look at how we can copy and move files within hdfs
08:16

Here we combine touchz and appendToFile to simulate increasing DataSet size

Here we combine touchz and appendToFile to simulate increasing DataSet size
07:32
+
Conclusion
1 Lecture 13:42

Conclusion

Conclusion
13:42
About the Instructor
Toyin Akin
3.8 Average rating
135 Reviews
1,374 Students
15 Courses
Big Data Engineer, Capital Markets FinTech Developer

I spent 6 years at "Royal Bank of Scotland" and 5 years at the investment bank "BNP Paribas"  developing and managing Interest Rate Derivatives services as well as engineering and deploying In Memory DataBases (Oracle Coherence), NoSQL and Hadoop clusters (Cloudera) into production.

In 2016, I left to start my own training, POC-D. "Proof Of Concept - Delivered", which focuses on delivering training on IMDB (In Memory Database), NoSQL, BigData and DevOps technology. 

From Q3 2017, this will also include FinTech Training in Capital Markets using Microsoft Excel (Windows), JVM languages (Java/Scala) as well as .NET (C#, VB.NET, C++/CLI, F# and IronPythyon)

I have a YouTube Channel, publishing snippets of my videos. These are not courses. Simply ad-hoc videos discussing various distributed computing ideas.

Check out my website and/or YouTube for more info

See you inside ...