
Introduction to the big data multiclass course, covering fundamentals, unstructured data concepts, and a hands-on path to building machine learning workflows with Hadoop for movie recommendations.
Explore how mapreduce daemons manage job tracking and task execution, ship data to the processing engine, and coordinate map and reduce phases with a distributed file system.
Learn how map reduce solves the word count problem by mapping words to counts, shuffling intermediate key values, and reducing to final totals.
Explore how Hadoop reads and writes files, partitions data into blocks and splits, and executes jobs via map and reduce tasks guided by a job tracker and task trackers.
Explore a hands-on overview of mapreduce word count in Hadoop, writing a Java map and reduce functions with a driver to process input text and produce word counts.
Explore how combiner and partitioner optimize Hadoop MapReduce by reducing intermediate data and directing map outputs to reducers, with practical examples of local reducers, shuffling, and custom partitioning.
Explore why a higher-level language is essential for Hadoop, and get an introduction to Hive and Pig, data warehousing and scripting tools that simplify data transformation on Hadoop.
Explore Hive architecture and how Hive translates SQL-like queries into MapReduce jobs, from parsing and abstract syntax trees to compilation, optimization, and execution.
Learn how Hive managed tables and external tables differ, including creating tables with a specified location, loading data, and dropping tables to control data storage.
Explore Pig introduction on Hadoop, using simple language constructs to define a sequence of steps and perform basic operations like grouping for data processing.
Learn Pig Latin relational operations and Pig setup modes, and use the Grunt shell to run and manage Hadoop jobs, including loading, grouping, sorting, and outputting data.
Explore a hands-on Pig example that processes a temperature dataset in Hadoop using Pig, including loading data, transforming records, grouping by year, and calculating yearly maximum temperatures, then storing results.
This course teaches you Hadoop, Pig, Hive and Apache Mahout from scratch with an example based and hands on approach.
"From Scratch to Practical"
-----------------------------------------
"This course is hell awesome, if you are new to Hadoop this course is for you, from theory to hands on experience , plus a Mahout and recommended system as Project. This course is a five star.!!!" - Aakash
======================================================================
"Easy to understand, makes Hadoop & Mahout simple"
--------------------------------------------------------------------------------
"This course has helped me crack a couple of Big Data engineer interviews as the basics are well explained here. The video/audio quality is fine and the instructor knows his stuff!"- Shipra
======================================================================
"Brilliant course for Data Engineers"
--------------------------------------------------------------------------------
"This is course is well structured. I would like to call this Big Data and Hadoop for Dummies. It covers basics as well as advanced concepts in a very unique way. Hands on examples gave me clear direction about how to use Hadoop in production environment. I strongly recommend this course to all levels of data engineers and Big data enthusiasts.Production quality is good." - Ashrith
======================================================================
Master the Fundamental Concepts of Big Data, Hadoop and Mahout with ease
Big Data and Data Science Foundation to empower you with the most specialized skills
The core concepts are stressed upon and the focus is on building a solid foundation of the key Hadoop, Map Reduce and collaborative filtering concepts upon which you can learn just about every other technology in the same space. Preliminary Java and Unix knowledge is expected.
Contents & Overview
Through 47 lectures and 8 hours of content, we will take a step-by step approach to understanding Big Data and related concepts from scratch.
The first few topics will focus on the rise of Big Data and how Apache Hadoop fits in. We will focus on the fundamentals of Hadoop and its core components: HDFS and Map Reduce. We will then setup and play around with Hadoop and HDFS and then deep dive into MapReduce programming with hands on examples. We will also spend time on Combiners and Partitioners and how they can help. We will also spend time on Hadoop Streaming: a tool that helps non-Java professionals to leverage the power of Hadoop and do POCs on it.
Once we have a solid foundation of HDFS and MapReduce, in the next couple of topics we will explore higher level components of the Hadoop ecosystem: Hive and Pig. We will go into the details of both Hive and Pig by installing them and working with examples. Hive and Pig can make your life easy by shielding you from the complexity of writing MR jobs and yet leveraging the parallel processing ability of the Hadoop framework.
In the next few lectures we will look at something very interesting: Apache Mahout and Machine Learning. Apache Mahout is a Java library that lets you write machine learning applications with ease. We will learn the basics of Machine Learning and go deeper into Collaborative Filtering and recommender systems, something that Mahout excels that.
We will look at some similarity algorithms, understand their real-life implications and apply them when we will build together a real world movie recommender system using Mahout and Hadoop.
After taking this course, which includes slides, examples, code and data sets, you will be at ease with playing aroundwith HDFS, writing MapReduce jobs, analyzing data with Hive and Pig, and building a recommender system using Apache Mahout. So go ahead and enroll to crack that Big Data/Data Science interview and clear that certification exam!