Big Data , Hadoop and Spark from scratch by solving a real world use case using Python and Scala
Spark Scala & PySpark real world coding framework.
Real world coding best practices, logging, error handling , configuration management using both Scala and Python.
Serverless big data solution using AWS Glue, Athena and S3
Students should have some programming background and some knowledge of SQL queries.
This course will prepare you for a real world Data Engineer role !
Get started with Big Data quickly leveraging free cloud cluster and solving a real world use case! Learn Hadoop, Hive , Spark (both Python and Scala) from scratch!
Learn to code Spark Scala & PySpark like a real world developer. Understand real world coding best practices, logging, error handling , configuration management using both Scala and Python.
A bank is launching a new credit card and wants to identify prospects it can target in its marketing campaign.
It has received prospect data from various internal and 3rd party sources. The data has various issues such as missing or unknown values in certain fields. The data needs to be cleansed before any kind of analysis can be done.
Since the data is in huge volume with billions of records, the bank has asked you to use Big Data Hadoop and Spark technology to cleanse, transform and analyze this data.
What you will learn :
Big Data, Hadoop concepts
How to create a free Hadoop and Spark cluster using Google Dataproc
Hadoop hands-on - HDFS, Hive
PySpark RDD - hands-on
PySpark SQL, DataFrame - hands-on
Project work using PySpark and Hive
Spark Scala DataFrame
Project work using Spark Scala
Spark Scala Real world coding framework and development using Winutil, Maven and IntelliJ.
Python Spark Hadoop Hive coding framework and development using PyCharm
Building a data pipeline using Hive , PostgreSQL, Spark
Logging , error handling and unit testing of PySpark and Spark Scala applications
Spark Scala Structured Streaming
Applying spark transformation on data stored in AWS S3 using Glue and viewing data using Athena
Some basic programming skills
Some knowledge of SQL queries
Who this course is for:
Beginners who want to learn Big Data or experienced people who want to transition to a Big Data role
Big data beginners who want to learn how to code in the real world
18 sections • 113 lectures • 9h 30m total length
Big Data concepts
Hadoop Distributed File System (HDFS)
MapReduce and YARN
Querying HDFS data using Hive
Deleting the Cluster
Analyzing a billion records with Hive
What is Spark?
Spark for data transformation
What is a DataFrame?
RDDs - The fundamental building block
PySpark - Creating RDDs
Python functions and lambda expressions
RDD - Transformation & Action
PySpark - SparkSQL and DataFrame
Project - Bank prospects marketing data transformation using Hadoop and Spark
Rapid Revision - Big Data, Hadoop and Spark concepts
Fast queries with Hive Partitioning
Fast queries with Hive Bucketing
Advanced Spark datasets
User Defined Function (UDF)
Joins - Left, Right, Inner, Outer
Spark SQL DataFrame using Scala
Bank prospects marketing project in Scala
Installing JDK on a local Machine
Installing IntelliJ IDEA
Adding Scala Plugin to IntelliJ
Scala basics using IntelliJ
Hello World Spark Scala using IntelliJ
Configuring HADOOP HOME on Windows using Winutils
Special instructions for Mac users
Enabling Hive Support in Spark Session
psql command line interface for PostgreSQL
Importing a project into IntelliJ
Organizing code with Objects and Methods
Implementing Log4j SLf4j Logging
Exception Handling with try, catch, Option, Some and None
Reading from Hive and Writing to Postgres
Reading Configuration from JSON using Typesafe
Reading command-line arguments and debugging in InjtelliJ
Writing data to a Hive Table
Managing input parameters using a Scala Case Class
Intellij Maven troubleshooting tips
Scala Unit Testing using JUnit & ScalaTest
Spark Transformation unit testing using ScalaTest
Unit testing to catch an Exception
Catching Exception using assertThrows
Throwing Custom Error and Intercepting Error Message