What you'll learn

Importance of hadoop framework in BigData analytics
Understanding Hadoop Framework in detail
Hands on experience on data ingestion techniques : Apache Sqoop and Apache Flume
Hands on experience on MapReduce Programming and its hidden concepts
Hands on experience on Apache Hive Programming, Performance tuning, UDF's
Understand and work with Pig
Realtime data streaming analysis with Apache Spark and its ecosystems
Understand and work with Apache Kafka
Process workflow automation using Oozie
Understand and work with MongoDb
Case Studies , practical explanations and Interview Questions

Course content

17 sections • 96 lectures • 18h 31m total length

Course Introduction2:27
This video provides detailed overview of this course and the topics that will be covered as a part of this course. Also , a brief note about trainer's profile and experience is also mentioned.

BigDataIntroduction and Sources of BigData12:28
This video explains in detail about the Big Data Introduction, Challenges of BigData, sources of bigdata along with real time example scenarios
Hadoop Introduction13:48
This video provides information about hadoop introduction, different roles in hadoop and distributors of hadoop etc.,Realtime example has been discussed to demonstrate the roles in hadoop.
Hadoop Ecosystems Overview13:00
This video explains the overview about hadoop ecosystems. Each ecosystem introduction and their importance in big data stack.

HDFS Architecture115:31
This video explains about distributed architecture ., how big data addresses storage issue using hdfs architecture and daemon services of hadoop1 architecture.
HDFS Architecture213:18
In continuation to the previous video, this video explains about the hdfs architecture in detail
HDFS Architecture316:00
In continuation to the previous video, this video explains about the concepts of edge node, cluster nodes, responsibilities of job tracker and namenode in detail
Hadoop2 - YARN Architecture7:35
This video explains about the disadvantages of hadoop1 architecture and introduces Yarn architecture and its deamon services. Also , how hadoop2 architecture has overcome the limitations of hadoop1 is explained in detail.
HDFS high availability2:00
This video provides the detailed explanation about how hadoop handles namenode failure.
HDFS Architecture Quiz

Environment Setup and Hadoop ecosystems14:20
This video explains the process of setting up hadoop in pseudo distribution mode. Also I have explained the softwares required to download for hadoop quickstart virtual environment setup.
Hadoop Linux commands20:30
This video provides the information about the linux commands that are used to interact with hdfs.Using these basic linux commands user can interact with hdfs to store the big data and also to implement his business logic on the data in hdfs.
Remote Desktop connection to cluster node via Putty and FileTransfer via Winscp13:03
This video provides the way how we connect to a cluster node or edge node remotely from a window desktop using putty.exe and also I have explained the file transfer from windows machine to datanode using winscp.

Data Ingestion from local to hdfs8:59
Data Ingestion from remote machine to edge node or clusternode10:35
This video explains the second approach in ingesting the data from a remote machine to edgenode/clusternode using sftp protocol in linux and using winscp in windows.
Sqoop Introduction16:09
This video provides the introduction to Apache Sqoop and various common, control arguments we use in sqoop command
Data Ingestion using Apache Sqoop20:31
This video provides handson experience on data ingestion from RDMS mysql database to hdfs
Incremental Append in sqoop15:54
This video provides practical demo on incremental append scenario in sqoop.
EnclosedBY and escapedBy in sqoop7:56
Sqoop Commands and other attributes16:06
This video provides the practical demonstration of sqoop commands like querying,columnar records sqooping,importing all the tables from a database etc.,
Apache Flume Introduction8:53
This video provides the information about apache flume and its components, architecture, properties used etc.,
Apache Flume Demo19:17
This video provides a practical demo on data ingestion of streaming data from an external source folder to hdfs using spoolDir source property in flume

Hive Introduction and Managed Tables15:55
This video provides the introduction to hive and the way how managed tables can be created in hive and how to load the data in to those tables etc.,
External Tables in Hive5:42
This video provides a hands on demonstration of external tables creation in hive
Hive Architecture7:19
This video provides a diagrammatic explanation of hive architecture and its components. Also the way how we execute hive queries in GUI mode using HUE manager.
Hive Partitioning20:53
This video provides a hands on demo on how we partition the data using hive and advantages of using partitioning concept in hive. Also we will discuss about types of partitioning we have in hive
Hive Bucketing14:37
This video provides detailed hands on demo on dividing the data into buckets and the properties to set to load the data into bucketing tables
SET properties in hive4:43
Various properties have to be set in order to enable certain features in hive which are disabled by default. For example dynamic partitioning, data loading into bucketing tables needs extra properties to be set. They will be dealt in this video
Xml parsing in hive11:41
Inpu dataset can be xml documents also. We will be learning how to process xml documents in hive as a part of this demo
Json file processing in hive7:49
Through this video you can learn json file processing in hive with a detailed example
Beeline Mode in hive8:24
Apart from hive cli which is deprecated we have beeline shell to connect to hive server . This video will provide you the hands on approach to demonstrate connectivity to beeline shell and its usage
Various File Formats in Hive (Text,RC,ORC,Sequence)13:52
we have various file formats in hive depending on the compression techniques used and size., all these file formats are explained in detail in this video
Demo for File formats in Hive6:03
This video demonstrates various file formats in hive with an example
Complex data ypes in hive9:32
This video provides a hands on demonstration about complex datatypes in hive like structs, unions, array and map .
Update and delete operations in hive8:38
This video explain about the properties to be set for enabling update and delete operations in hive
Hive Joins17:09
Hive converts joins over multiple tables into a single map/reduce job if for every table the same column is used in the join clauses. This video demonstrates mapside join in hive
Hive Join Demo212:46
Hive UDFs13:40
This video demonstrates the way how we execute hive scripts
Performance tuning techniques in Apache Hive8:23

Pig Introduction7:16
This video helps you to understand the importance of pig in performing data cleansing operations in hadoop and its basic commands to start with
Working with pig Commands8:47
Group and CoGroup in PIG16:02
This video explains the usage of group and cogroup commands in apache pig
SPLIT command in PIG13:26
FILTER,JOIN,ORDERBY,RANK and FLATTEN commands in PIG11:47
This video provides the detailed explanation about FILTER,JOIN,RANK,FLATTEN,ORDERBY,DISTINCT commands in pig
Executing a pigScript3:33
In this video you will learn how to create and execute a pig script
FLATTEN : un-nesting tuples and bag14:42
This video demonstrates how we will un-nest the tuples and bag in the input dataset.
JSON file processing in PIG8:34
This video demonstrates how we can process a json file using pig functions

Introduction to core java programming and its importance in hadoop2:57
This video helps you to understand the basic building blocks of programming and java concepts
Java Programming basics13:36
Basic building blocks of core java programming are explained in detail in video along with eclipseIDE environment usage
Object Oriented Programming Features in Java9:34
This video explains about inheritance,polymorphism,abstraction and encapsulation properties in java with sample hands on demo. Also interfaces are explained with an example program
AccessSpecifiers,Final and Static keywords,ExceptionHandling in Java12:06

Requirements

Be familiar with sql concepts, programming basics
Download Cloudera quickstart VM CDH 5.8 and install VMWare workstation player. Environment setup guidance will be covered in our lectures

Description

Data Analytics is the practice of using data to drive business strategy and performance. It includes a range of approaches and solutions, from looking backward to evaluate what happened in the past to looking forward to do scenario planning and predictive modelling.Data Analytics spans all of the functional businesses to address a continuum of opportunities in Information Management, Performance Optimisation and Analytic Insights. Organizations now realize the inherent value of transforming these big data into actionable insights. Data science is the highest form of big data analytics that produce the most accurate actionable insights, identifying what will happen next and what to do about it.

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. Hadoop is not just an effective distributed storage system for large amounts of data, but also, importantly, a distributed computing environment that can execute analyses where the data is.

In this course, detailed explanation about hadoop framework and its ecosystems has been provided. All the concepts are explained in detail with examples and business use cases as case studies.Also, latest technologies in big data area like apache spark, apache kafka, Mongo DB are explained. In addition, Interview questions with respect to each ecosystem and resume preparation tips are included.

Who this course is for:

This course is addressed to the students who has some prior knowledge on programming, sql concepts.
Any one who is interested to pursue their career as a hadoop developer

What you'll learn

Explore related topics

Course content

Course Introduction1 lecture • 2min

BigData Introduction3 lectures • 39min

HDFS Architecture5 lectures • 54min

Environment Setup and Hadoop Linux Commands3 lectures • 48min

HortonWorks environment setup in Azure Cloud4 lectures • 23min

Data Ingestion Using Apache Sqoop and Apache Flume9 lectures • 2hr 4min

Apache Hive17 lectures • 3hr 7min

Quiz 20

Apache Pig8 lectures • 1hr 24min

Core Java Programming4 lectures • 38min

Requirements

Description

Who this course is for: