Hands-On Big Data Analysis with Hadoop 3
- 1.5 hours on-demand video
- 1 downloadable resource
- Full lifetime access
- Access on mobile and TV
- Certificate of Completion
Get your team access to 4,000+ top Udemy courses anytime, anywhere.Try Udemy for Business
- Store data with HDFS and learn in detail about HBase
- Share and access data in a SQL-like interface for HDFS
- Analyze real-time events using Spark Streaming
- Perform complex big data analytics using MapReduce
- Analyze data to perform complex processing with Hive and Pig
- Explore functional programming using Spark
- Learn to import data using Sqoop
- Basic knowledge of the Hadoop ecosystem and basic Java programming knowledge are assumed.
This course is your guide to performing real-time data analytics and stream processing with Spark. Use different components and tools such as HDFS, HBase, and Hive to process raw data. Learn how tools such as Hive and Pig aid in this process.
In this course, you will start off by learning data analysis techniques with Hadoop using tools such as Hive. Furthermore, you will learn to apply these techniques in real-world big data applications. Also, you will delve into Spark and its related tools to perform real-time data analytics, streaming, and batch processing on your application.
Finally, you'll learn how to extend your analytics solutions to the cloud.
Please note that this course is based on Hadoop 3.0 but the code used in the course is compatible with Hadoop 3.2.
About the Author
Tomasz Lelek is a software engineer, programming mostly in Java and Scala. He has been working with the Spark and ML APIs for the past 5 years, with production experience in processing petabytes of data.
He is passionate about nearly everything associated with software development and believes that we should always try to consider different solutions and approaches before attempting to solve a problem. Recently he was a speaker at conferences in Poland—Confitura and JDD (Java Developers Day)—and at Krakow Scala User Group. He has also conducted a live coding session at the Geecon Conference.
- This course is for big data professionals looking to build quick and efficient data analytics solutions for their big data applications with Hadoop 3.
In this video, we will be looking at the MapReduce job architecture.
• Explore what a MapReduce job is
• Learn how to calculate word count in a distributed way
In this video, we will delve into Spark’s key concepts: Spark context driver and RDD.
• Explore the Spark architecture
• Explore RDD and learn how it is the main building block of a Spark program
• Perform partitioning and distinguish between transformations and actions
In this video, we will be looking at Spark transformations and actions.
• Use transformations API methods
• Use actions API methods
In this video, you will delve into the Pig tool.
• Explore what Apache Pig is
• Explore an example Pig job
• Learn when to use Pig
In this video, you will be filtering bots from a stream of page view events.
• Implement stream processing that filters our bots
• Use deduplication that we implemented to make stream processing robust
• Implement order verification that we implemented to make stream processing robust