Find online courses made by experts from around the world.
Take your courses with you and learn anywhere, anytime.
Learn and practice real-world skills and achieve your goals.
The most awaited Big Data course on the planet is here. The course covers all the major big data technologies within the Hadoop ecosystem and weave them together in real life projects. So while doing the course you not only learn the nuances of the hadoop and its associated technologies but see how they solve real world problems and how they are being used by companies worldwide.
This course will help you take a quantum jump and will help you build Hadoop solutions that will solve real world problems. However we must warn you that this course is not for the faint hearted and will test your abilities and knowledge while help you build a cutting edge knowhow in the most happening technology space. The course focuses on the following topics
Add Value to Existing Data - Learn how technologies such as Mapreduce applies to Clustering problems. The project focus on removing duplicate or equivalent values from a very large data set with Mapreduce.
Analytics and NoSQL - Parse a twitter stream with Python, extract keyword with apache pig and map to hdfs, pull from hdfs and push to mongodb with pig, visualise data with node js . Learn all this in this cool project.
Kafka Streaming with Yarn and Zookeeper - Set up a twitter stream with Python, set up a Kafka stream with java code for producers and consumers, package and deploy java code with apache samza.
Real-Time Stream Processing with Apache Kafka and Apache Storm - This project focus on twitter streaming but uses Kafka and apache storm and you will learn to use each of them effectively.
Big Data Applications for the Healthcare Industry with Apache Sqoop and Apache Solr - Set up the relational schema for a Health Care Data dictionary used by the US Dept of Veterans Affairs, demonstrate underlying technology and conceptual framework. Demonstrate issues with certain join queries that fail on MySQL, map technology to a Hadoop/Hive stack with Scoop and HCatalog, show how this stack can perform the query successfully.
Log collection and analytics with the Hadoop Distributed File System using Apache Flume and Apache HCatalog - Use Apache Flume and Apache HCatalog to map real time log stream to hdfs and tail this file as Flume event stream. , Map data from hdfs to Python with Pig, use Python modules for analytic queries
Data Science with Hadoop Predictive Analytics - Create structured data with Mapreduce, Map data from hdfs to Python with Pig, run Python Machine Learning logistic regression, use Python modules for regression matrices and supervise training
Visual Analytics with Apache Spark on Yarn - Create structured data with Mapreduce, Map data from hdfs to Python with Spark, convert Spark dataframes and RDD’s to Python datastructures, Perform Python visualisations
Customer 360 degree view, Big Data
Analytics for e-commerce - Demonstrate use of EComerce tool ‘Datameer’ to perform many fof the analytic queries from part 6,7 and 8. Perform queries in the context of Senitment analysis and Twiteer stream.
Putting it all together Big Data with Amazon Elastic Map Reduce - Rub clustering code on AWS Mapreduce cluster. Using AWS Java sdk spin up a Dedicated task cluster with the same attributes.
So after this course you can confidently built almost any system within the Hadoop family of technologies. This course comes with complete source code and fully operational Virtual machines which will help you build the projects quickly without wasting too much time on system setup. The course also comes with English captions. So buckle up and join us on our journey into the Big Data.
Not for you? No problem.
30 day money back guarantee.
Learn on the go.
Desktop, iOS and Android.
Certificate of completion.
|Section 1: Introduction|
|Lecture 2||13 pages|
Source VMs for the Projects
|Section 2: Add Value to Existing Data with Mapreduce|
Introduction to the ProjectPreview
Build and Run the Basic Code
Understanding the Code
Dependencies and packages
|Section 3: Hadoop Analytics and NoSQL|
Introduction to Hadoop Analytics
Introduction to NoSQL Database
Installing the Solution
|Section 4: Kafka Streaming with Yarn and Zookeeper|
Introduction to Kafka Yarn and Zookeeper
Creating Kafka Streams
Yarn Job with Samza
|Section 5: Real Time Stream processing with Apache Kafka and Apache Storm|
Real Time Streaming
Hortonbox Virtual Machine
Running in Cluster Mode
Submitting the Storm Jar
|Section 6: Big Data Applications for the Healthcare Industry with Apache Sqoop and Apache S|
Introduction to the Project
Introduction to HDDAccess
Sqoop, Hive and Solr
|Section 7: Log collection and analytics with the Hadoop Distributed File System using Apach|
Apache Flume and HCatalog
Install and Configure Apache Flume
Visualisation of the Data
Embedded Pig Scripts
|Section 8: Data Science with Hadoop Predictive Analytics|
Introduction to Data Science
Source Code Review
Setting Up the Machine
|Section 9: Visual Analytics with Apache Spark on Yarn|
Setting Up Java Dependencies
Spark Analytics with PySpark
Bringing it all together
|Section 10: Customer 360 degree view, Big Data Analytics for e-commerce|
Ecommerce and Big Data
Analytics and Visualizations
|Section 11: Putting it all together Big Data with Amazon Elastic Map Reduce|
Introduction to the Project
Setting Up Cluster on EMR
Dedicated Task Cluster on EMR
|Section 12: Summary|
Eduonix creates and distributes high quality technology training content. Our team of industry professionals have been training manpower for more than a decade. We aim to teach technology the way it is used in industry and professional world. We have professional team of trainers for technologies ranging from Mobility, Web to Enterprise and Database and Server Administration.