
This video gives an overview of the entire course.
In this video we’ll learn how to install Hadoop on our local system.
The important part of selecting the Hadoop framework for your own solution is to understand why it is a good fit for your application.
Understand the difference between the two nodes in HDFS; Datanode and Namenode
The new term Map-Reduce… what does it mean and how does it solve a problem?
When jumping to parallel programming from serial programming, it is always hard to plan the computation.
Prepare your HDD for with HDFS
Copy data to/from HDFS.
Using the HDFS commands in the shell.
How do we access the HDFS files from a java program
What are Hadoop jobs and tasks?
How to see the process flow and progress of a Hadoop job.
Run Hadoop jobs.
In this video, we are going to look at how the map and reduce gets executed
Prepare the data to be fit for our algorithm
Devise a simple algorithm for recommendation
Implement the map-reduce for the transformation of the movie -> genre context
This video will give you an overview about the course.
In this video we will understand the use of GitBash and how it helps in stream lining the code.
The aim of this video is to learn how to manage virtual machines especially when you have a large cluster
The aim of this video is to understand the virtualization tool you use in your environment as you cannot have physical nodes to practice a Hadoop cluster.
The aim of this video is to build up on our understanding to scale large clusters.
The aim of this video is to install Hadoop, understand its components and the role they play in the Hadoop ecosystem
The aim of this videos is to explore the Hadoop ecosystem and look at various tools or frameworks
The aim of this video is to learn how Hadoop stores data and its differences from the traditional file system
The aim of this video is to understand the YARN framework and the problems it solves
The aim of this video is to know what is MapReduce, its evolution and its simplicity to address large scale data
The aim of this video is to study planning the layout of various Hadoop services to improve availability and performance. To have a balanced distribution of compute and memory across the cluster.
The aim of this video is to demonstrate how in distributed system, coordination, locking and dynamic configuration is critical. Also, learn how to address the situation of split brain scenarios.
The aim of this video is to show how important it is to address failures and make sure that the services are running even when few components or hardware fails
The aim of this videos is to address failures and make sure that the services are running even when few components or hardware fails
The aim of this video is to understand the use cases for Spark and how to execute it across the cluster. It is important to know that the Spark history server is the only daemon in Yarn mode
The aim of this video is to show how HDFS as a filesystem is used to store data by splitting it into blocks of specific size. The blocks are replicated for redundancy and performance
The aim of this video is to show how important use for HDFS as a filesystem is its ability to copy data from local to HDFS filesystem or visa versa.
The aim of this video is to understand what an administrator must do in his day to day activities and maintain the health of the cluster and keep the users happy
The aim of this video is to show how does a MapReduce work and the stages it executes? When does a Reducer run or how we size them?
The aim of this video is to show how does Spark jobs work? Where are the libraries pulled from?
The aim of this video is to show how we can start/stop an individual service in a cluster or restart all services across the cluster.
The aim of this video to is to learn to manage services using Ambari web UI.
The aim of this video it to show how to maintain stability and roll the latest patches as it is important to keep the versions updated to latest stable releases.
The aim of this video is to show how as the cluster grows, it is important to scale it according to the needs. Also, how we gracefully remove bad hardware or replace nodes – Apache Hadoop.
The aim of this video is to show how as the cluster grows, it is important to scale it according to the needs. Also, how we gracefully remove bad hardware or replace nodes – using HDP.
The aim of this video is to understand the role of HDFS masters and its type. What if they fail? Can we recover them or failover?
The aim of this video is to Setup Namenode HA using the QJM. It is important to understand its role, usage and the steps to be performed
The aim of this video is to setup HA for YARN using HDP and understand the ease as compared to setting up things manually, as we did in HDFS HA
The aim of this video is to check the additional permissions and controls a user can have in terms of permissions. Can users in same group have different access rights on Linux native filesystem
The aim of this video is to know how a User is identified and is Hadoop secure.
The aim of this video is to study about how could many users in an organization access a Hadoop cluster. Do we add them manually or using a centralized user management system
The aim of this video is to learn about how we know the state of the cluster? Is it healthy or there are some issues? What is the total capacity of the cluster and its number of nodes we have
The aim of this video is to study that how in a multi-tenancy cluster with many users, who accessed a file or data. Were they authorized to execute or read a file?
The aim of this video is to get to know if my cluster is optimally used or do we need to add more resources? How are my jobs performing, do they need optimizations?
The aim of this video is to know if it is a good habit to log and monitor for proactive resolution. Each service has its logging mechanism and verbosity and give information about its state.
The aim of this video is to know how despite having all the checks and best practices in place, things go wrong. How quickly we can identify the problem and resolve it is a key factor.
The aim of this video is to study how the cluster is up and running, all services are healthy, and yet the jobs are failing.
This video gives an overview of the entire course.
In this video, we will see what a HDFS is.
In this video, we will learn about YARN.
In this video, we will see what the Hive is.
In this video, we will see what a pub-sub is.
In this video, we will see some column-oriented database concepts.
In this video, we will see Spark architecture.
In this video, we will explain Spark Streaming architecture.
In this video, we will process payment data.
In this video, we will implement real-time logic on stream of events.
In this video, we will save data to HBase.
In this video, we will implement bots filtering streaming jobs.
In this video, we will implement HDFS sink that saves data into HDFS.
In this video, we will investigate the data of customers in Hive
In this video, we will use the streaming way to find the top seller item.
In this video, we will enrich transactions with additional information.
In this video, we will perform quantitative analyze on the customer churn.
In this video, we will analyze the amounts of customer churn based on transactional amounts.
In this video, we will take a look at Streaming processing of sensor data.
In this video, we will insert data to HBase from Spark Streaming job.
In this video, we will calculate statistics from sensors.
In this video, we will see how to represent a graph.
In this video, we will perform operations in graph using GraphX.
In this video, we will count degrees of vertices.
In this video, we will calculate average of neighborhood.
In this video, we will see what connected components are.
In this video, we will see find page rank using Spark GraphX.
In this video, we will see what an anomaly is and how to detect it.
The aim of this video is to analyse web logs for suspicious activity and load data into Spark.
In this video, we will implement clustering in Spark.
In this video, we will detect anomalies in network traffic.
In this video, we will analyse post for an author.
In this video, we will extract information from unstructured text.
In this video, we will get to know the algorithms for transforming text into vector of numbers.
In this video, we will see what a supervised and unsupervised ML is.
In this video, we will find an author of a post.
In this video, we will download and setup the Cloudera Sandbox.
In this video, we will find out what products the users want to buy.
In this video, we will use movies to suggest interesting content to the viewer.
In this video, we will test and experiment with the recommendation engine.
Hadoop is the most popular, reliable and scalable distributed computing and storage for Big Data solutions. It comprises of components designed to enable tasks on a distributed scale, across multiple servers and thousands of machines.
This comprehensive 3-in-1 training course gives you a strong foundation by exploring Hadoop ecosystem with real-world examples. You’ll discover the process to set up an HDFS cluster along with formatting and data transfer in between your local storage and the Hadoop filesystem. Also get a hands-on solution to 10 real-world use-cases using Hadoop.
Contents and Overview This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.The first course, Getting Started with Hadoop 2.x, opens with an introduction to the world of Hadoop, where you will learn Nodes, Data Sets, and operations such as map and reduce. The second section deals HDFS, Hadoop's file-system used to store data. Further on, you’ll discover the differences between jobs and tasks, and get to know about the Hadoop UI. After this, we turn our attention to storing data in HDFS and Data Transformations. Lastly, we will learn how to implement an algorithm in Hadoop map-reduce way and analyze the overall performance.
The second course, Hadoop Administration and Cluster Management, starts by installing the Apache Hadoop for cluster installation and configuring the required services. Learn various cluster operations like validations, and expanding and shrinking Hadoop services. You will then move onto gain a better understanding of administrative tasks like planning your cluster, monitoring, logging, security, troubleshooting and best practices. Techniques to keep your Hadoop clusters highly available and reliant are also covered in this course.
The third course, Solving 10 Hadoop'able Problems, covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.
By the end of this Learning Path, you’ll be able to plan, deploy, manage and monitor and performance-tune your Hadoop Cluster with Apache Hadoop.
About the AuthorA K M Zahiduzzaman is a software engineer with NewsCred Dhaka. He is a software developer and technology enthusiast. He was a Ruby on Rails developer, but now working on NodeJS and angularJS and python. He is also working with a much wider vision as a technology company. The next goal is introducing SOA within the current applications to scale development via microservices. Zahiduzzaman has a lot of experience with Spark and is passionate about it. He is also a guitarist and has a band too. He was also a speaker for an international event in Dhaka. He is very enthusiastic and love to share his knowledge.
Gurmukh Singh is a technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo and has authored the book Monitoring Hadoop.
Tomasz Lelek is a Software Engineer and Co-Founder of InitLearn. He mostly does programming in Java and Scala. He dedicates his time and efforts to get better at everything. He is currently delving into big data technologies. Tomasz is very passionate about everything associated with software development. He has been a speaker at a few conferences in Poland-Confitura and JDD, and at the Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He was also a speaker at an international event in Dhaka. He is very enthusiastic and loves to share his knowledge.