A Big Data Hadoop and Spark project for absolute beginners
3.9 (76 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
4,532 students enrolled

A Big Data Hadoop and Spark project for absolute beginners

Hadoop, Spark, Python, Scala, Dataproc, AWS S3 Data Lake, Glue, Athena, Machine Learning, Intellij, Maven, QuickStart VM
3.9 (76 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
4,532 students enrolled
Created by FutureX Skill
Last updated 5/2020
English
Current price: $13.99 Original price: $19.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 5 hours on-demand video
  • 30 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Big Data , Hadoop and Spark from scratch using Python and Scala. You will also learn how to use free cloud tools to get started with Hadoop and Spark programming in minutes. Additionally you will find two bonus projects on AWS data lake solution and Machine Learning Classification model
Requirements
  • Students should have some programming background and some knowledge of SQL queries.
Description

A bank is launching a new credit card and wants to identify prospects it can target in its marketing campaign.

It has received prospect data from various internal and 3rd party sources. The data has various issues such as missing or unknown values in certain fields.The data needs to be cleansed before any kind of analysis can be done.

Since the data is in huge volume with billions of records, the bank has asked you to use Big Data Hadoop and Spark technology to cleanse, transform and analyze this data.

What you will learn :

  • Big Data, Hadoop concepts

  • How to create a free Hadoop and Spark cluster using Google Dataproc

  • Hadoop hands-on - HDFS, Hive

  • Why there was a need for Spark

  • Python basics

  • PySpark RDD - hands-on

  • PySpark SQL, DataFrame - hands-on

  • Project work using PySpark and Hive

  • Scala basics

  • Spark Scala DataFrame

  • Project working using Spark Scala

  • Google Colab environment

  • Bonus project - Applying spark transformation on data stored in AWS S3 using Glue and viewing data using Athena

  • Bonus project - Build your first Machine Learning model using Python, Scikit-learn to predict whether a customer will buy or not.


Prerequisites :

  • Some basic programming skills

  • Some knowledge of SQL queries

Who this course is for:
  • Beginners who want to learn Big Data or experienced people who want to transition to a Big Data role
Course content
Expand all 37 lectures 04:58:48
+ Hadoop - Hands-On
2 lectures 25:01
Creating a free Hadoop and Spark cluster using Google Dataproc
11:28
Storing data in HDFS and querying with Hive
13:33
+ Spark concepts and hands-on
6 lectures 56:03
Spark concepts
04:45
Python basics
12:59
PySpark RDD
13:55
PySpark - Spark SQL and DataFrame
11:05
Running PySpark on a Hadoop Cluster
06:57
+ Project - Bank prospects marketing data cleansing using Spark
2 lectures 27:43
Project - Bank prospects marketing data transformation using Hadoop and Spark
12:18
Rapid Revision - Big Data, Hadoop and Spark concepts
15:25
+ Running the project in Scala
4 lectures 35:30
Scala basics
08:25
Spark SQL DataFrame using Scala
05:39
Bank prospects marketing project in Scala
02:48
Bonus - Running Spark Scala Hive on WIndows using IntelliJ Maven and winutils
18:38
+ Advanced Spark
3 lectures 10:48
Advanced Spark datasets
01:22
User Defined Function (UDF)
03:36
Joins - Left, Right, Inner, Outer
05:50
+ Bonus - Running Spark and Hive on a Cloudera QuickStart VM on GCP
2 lectures 13:13
Cloudera QuickStart VM Installation on GCP
08:22
Running Spark 2 with Hive on Cloudera QuickStart VM
04:51
+ Bonus - Bank prospects data transformation using AWS S3, Glue and Athena
7 lectures 40:02

Learn the advantages of a serverless data lake solution over a Hadoop Platform

Preview 04:40
AWS data lake - S3, Glue and Athena introduction
02:40
Create a data lake on AWS S3
02:14
AWS Glue crawler and AWS Athena query tool
06:53
ETL transformation using AWS Glue
06:18
Triggering AWS Glue job with a serverless AWS Lambda function
07:17

Run the bank transformation code using AWS Glue. Store the prospects data in a bucket and apply transformation using the same code that you have executed in the Colab and Dataproc environment

Project - Bank prospects data transformation using S3, Glue & Athena services
10:00