Learn Big Data: The Hadoop Ecosystem Masterclass
- 6 hours on-demand video
- 1 article
- Full lifetime access
- Access on mobile and TV
- Certificate of Completion
Get your team access to Udemy's top 3,000+ courses anytime, anywhere.Try Udemy for Business
- Process Big Data using batch
Process Big Data using realtime data
Be familiar with the technologies in the Hadoop Stack
- Be able to install and configure the Hortonworks Data Platform (HDP)
- You will need to have a background in IT. The course is aimed at Software Engineers, System Administrators, DBAs who want to learn about Big Data
- Knowing any programming language will enhance your course experience
- The course contains demos you can try out on your own machine. To run the Hadoop cluster on your own machine, you will need to run a virtual server. 8 GB or more RAM is recommended.
In this course you will learn Big Data using the Hadoop Ecosystem. Why Hadoop? It is one of the most sought after skills in the IT industry. The average salary in the US is $112,000 per year, up to an average of $160,000 in San Fransisco (source: Indeed).
The course is aimed at Software Engineers, Database Administrators, and System Administrators that want to learn about Big Data. Other IT professionals can also take this course, but might have to do some extra research to understand some of the concepts.
You will learn how to use the most popular software in the Big Data industry at moment, using batch processing as well as realtime processing. This course will give you enough background to be able to talk about real problems and solutions with experts in the industry. Updating your LinkedIn profile with these technologies will make recruiters want you to get interviews at the most prestigious companies in the world.
The course is very practical, with more than 6 hours of lectures. You want to try out everything yourself, adding multiple hours of learning. If you get stuck with the technology while trying, there is support available. I will answer your messages on the message boards and we have a Facebook group where you can post questions.
- This course is for anyone that wants to know how Big Data works, and what technologies are involved
- The main focus is on the Hadoop ecosystem. We don't cover any technologies not on the Hortonworks Data Platform Stack
- The course compares MapR, Cloudera, and Hortonworks, but we only use the Hortonworks Data Platform (HDP) in the demos
What is Big Data? Some examples of companies using Big Data, like Spotify, Amazon, Google, and Tesla
This lectures gives an introduction to Resilient Distributed Datasets (RDDs). This abstraction allows you to do transformations and actions in Spark. I give an example using filter RDDs, and explain how shuffle RDDs impact disk and network IO
Kafka guarantees at-least-once message delivery, but can also be configured for at-most-once. Log Compaction is a technique that Kafka provides to have a full dataset maintained in the commit log. This lecture shows an example of a customer dataset fully kept in Kafka and explains Log Tail, Cleaner Point and Log Head and how it impacts consumers.