Projects in Hadoop and Big Data - Learn by Building Apps

A Practical Course to Learn Big Data Technologies While Developing Professional Projects

Created byEduonix Learning Solutions, Eduonix-Tech ., Eduonix Support

Last updated 12/2018

English

What you'll learn

Understand the Hadoop Ecosystem and Associated Technologies
Learn Concepts to Solve Real World Problems
Learn the Updated Changes in Hadoop
Use Code Examples Present Here to Create Your own Big Data Services
Get fully functional VMs fine tuned and created specifically for this course.

Course content

12 sections • 44 lectures • 9h 59m total length

Introduction3:32
Virtual Machines for the Projects13:00
Source VMs for the Projects

Project Setup15:30
Explore visual analytics on big data by building a Spark-on-Yarn workflow using PySpark, Seaborn, and Spark SQL APIs to perform in-memory iterative machine learning.
Setting Up Java Dependencies15:24
Spark Analytics with PySpark15:36
Explore PySpark analytics on Yarn, bridging Python to HDFS, converting Spark RDDs to DataFrames, and using Spark SQL to run SQL queries.
Bringing it all together13:50

Requirements

Working knowledge of Hadoop is expected before starting this course
Basic programming knowledge of Java and Python will be great

Description

The most awaited Big Data course on the planet is here. The course covers all the major big data technologies within the Hadoop ecosystem and weave them together in real life projects. So while doing the course you not only learn the nuances of the hadoop and its associated technologies but see how they solve real world problems and how they are being used by companies worldwide.

This course will help you take a quantum jump and will help you build Hadoop solutions that will solve real world problems. However we must warn you that this course is not for the faint hearted and will test your abilities and knowledge while help you build a cutting edge knowhow in the most happening technology space. The course focuses on the following topics

Add Value to Existing Data - Learn how technologies such as Mapreduce applies to Clustering problems. The project focus on removing duplicate or equivalent values from a very large data set with Mapreduce.

Hadoop Analytics and NoSQL - Parse a twitter stream with Python, extract keyword with apache pig and map to hdfs, pull from hdfs and push to mongodb with pig, visualise data with node js . Learn all this in this cool project.

Kafka Streaming with Yarn and Zookeeper - Set up a twitter stream with Python, set up a Kafka stream with java code for producers and consumers, package and deploy java code with apache samza.

Real-Time Stream Processing with Apache Kafka and Apache Storm - This project focus on twitter streaming but uses Kafka and apache storm and you will learn to use each of them effectively.

Big Data Applications for the Healthcare Industry with Apache Sqoop and Apache Solr - Set up the relational schema for a Health Care Data dictionary used by the US Dept of Veterans Affairs, demonstrate underlying technology and conceptual framework. Demonstrate issues with certain join queries that fail on MySQL, map technology to a Hadoop/Hive stack with Scoop and HCatalog, show how this stack can perform the query successfully.

Log collection and analytics with the Hadoop Distributed File System using Apache Flume and Apache HCatalog - Use Apache Flume and Apache HCatalog to map real time log stream to hdfs and tail this file as Flume event stream. , Map data from hdfs to Python with Pig, use Python modules for analytic queries

Data Science with Hadoop Predictive Analytics - Create structured data with Mapreduce, Map data from hdfs to Python with Pig, run Python Machine Learning logistic regression, use Python modules for regression matrices and supervise training

Visual Analytics with Apache Spark on Yarn - Create structured data with Mapreduce, Map data from hdfs to Python with Spark, convert Spark dataframes and RDD’s to Python datastructures, Perform Python visualisations

Customer 360 degree view, Big Data Analytics for e-commerce - Demonstrate use of EComerce tool ‘Datameer’ to perform many fof the analytic queries from part 6,7 and 8. Perform queries in the context of Senitment analysis and Twiteer stream.

Putting it all together Big Data with Amazon Elastic Map Reduce - Rub clustering code on AWS Mapreduce cluster. Using AWS Java sdk spin up a Dedicated task cluster with the same attributes.

So after this course you can confidently built almost any system within the Hadoop family of technologies. This course comes with complete source code and fully operational Virtual machines which will help you build the projects quickly without wasting too much time on system setup. The course also comes with English captions. So buckle up and join us on our journey into the Big Data.

Who this course is for:

Students who want to use Hadoop and Big Data in their Workplace and want to learn the implementation details for big data technologies.

Projects in Hadoop and Big Data - Learn by Building Apps

What you'll learn

Explore related topics

Course content

Introduction2 lectures • 17min

Add Value to Existing Data with Mapreduce4 lectures • 58min

Hadoop Analytics and NoSQL4 lectures • 55min

Kafka Streaming with Yarn and Zookeeper4 lectures • 1hr 1min

Real Time Stream processing with Apache Kafka and Apache Storm4 lectures • 1hr

Big Data Applications for the Healthcare Industry with Apache Sqoop and Apache S4 lectures • 59min

Log collection and analytics with the Hadoop Distributed File System using Apach4 lectures • 59min

Data Science with Hadoop Predictive Analytics4 lectures • 1hr

Visual Analytics with Apache Spark on Yarn4 lectures • 1hr

Customer 360 degree view, Big Data Analytics for e-commerce4 lectures • 1hr

Requirements

Description

Who this course is for: