Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA Amazon AWS CompTIA Security+ AWS Certified Developer - Associate
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Personal Development Mindfulness Personal Transformation Meditation Life Purpose Coaching Neuroscience
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Retargeting
SQL Microsoft Power BI Tableau Business Analysis Business Intelligence MySQL Data Analysis Data Modeling Big Data
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Online Business Business Plan Startup Freelancing Blogging Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
2020-12-12 18:05:59
30-Day Money-Back Guarantee

This course includes:

  • 8.5 hours on-demand video
  • 1 article
  • 78 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
IT & Software IT Certification Apache Spark

A Big Data Hadoop and Spark project for absolute beginners

Hadoop, Spark, Python, PySpark, Scala, Hive, coding framework, testing, IntelliJ, Maven, PyCharm, Glue, AWS, Streaming
Rating: 4.4 out of 54.4 (106 ratings)
4,674 students
Created by FutureX Skill
Last updated 1/2021
English
30-Day Money-Back Guarantee

What you'll learn

  • Big Data , Hadoop and Spark from scratch by solving a real world use case using Python and Scala
  • Spark Scala & PySpark real world coding framework.
  • Real world coding best practices, logging, error handling , configuration management using both Scala and Python.
  • Serverless big data solution using AWS Glue, Athena and S3

Requirements

  • Students should have some programming background and some knowledge of SQL queries.

Description

Get started with Big Data quickly leveraging free cloud cluster and solving a real world use case!  Learn Hadoop, Hive , Spark (both Python and Scala) from scratch!


Learn to code Spark Scala & PySpark like  a real world developer. Understand real world coding best practices, logging, error handling , configuration management using both Scala and Python.


Project

A bank is launching a new credit card and wants to identify prospects it can target in its marketing campaign.

It has received prospect data from various internal and 3rd party sources. The data has various issues such as missing or unknown values in certain fields. The data needs to be cleansed before any kind of analysis can be done.

Since the data is in huge volume with billions of records, the bank has asked you to use Big Data Hadoop and Spark technology to cleanse, transform and analyze this data.

What you will learn :

  • Big Data, Hadoop concepts

  • How to create a free Hadoop and Spark cluster using Google Dataproc

  • Hadoop hands-on - HDFS, Hive

  • Python basics

  • PySpark RDD - hands-on

  • PySpark SQL, DataFrame - hands-on

  • Project work using PySpark and Hive

  • Scala basics

  • Spark Scala DataFrame

  • Project work using Spark Scala

  • Spark Scala Real world coding framework and development using Winutil, Maven and IntelliJ.

  • Python Spark Hadoop Hive coding framework and development using PyCharm

  • Building a data pipeline using Hive , PostgreSQL, Spark

  • Logging , error handling and unit testing of PySpark and Spark Scala applications

  • Spark Scala Structured Streaming

  • Applying spark transformation on data stored in AWS S3 using Glue and viewing data using Athena


Prerequisites :

  • Some basic programming skills

  • Some knowledge of SQL queries

Who this course is for:

  • Beginners who want to learn Big Data or experienced people who want to transition to a Big Data role
  • Big data beginners who want to learn how to code in the real world

Course content

19 sections • 97 lectures • 8h 39m total length

  • Preview03:16

  • Big Data concepts
    05:42
  • Hadoop concepts
    09:26

  • Preview01:28
  • Preview01:28
  • Preview05:36
  • Storing data in HDFS and querying with Hive
    13:33

  • Spark concepts
    04:45
  • Preview06:22
  • Python basics
    12:59
  • PySpark RDD
    13:55
  • PySpark - Spark SQL and DataFrame
    11:05
  • Running PySpark on a Hadoop Cluster
    06:57

  • Project - Bank prospects marketing data transformation using Hadoop and Spark
    12:18
  • Rapid Revision - Big Data, Hadoop and Spark concepts
    15:25

  • Scala basics
    08:25
  • Spark SQL DataFrame using Scala
    05:39
  • Bank prospects marketing project in Scala
    02:48

  • Preview04:40
  • AWS data lake - S3, Glue and Athena introduction
    02:40
  • Create a data lake on AWS S3
    02:14
  • AWS Glue crawler and AWS Athena query tool
    06:53
  • ETL transformation using AWS Glue
    06:18
  • Triggering AWS Glue job with a serverless AWS Lambda function
    07:17
  • Project - Bank prospects data transformation using S3, Glue & Athena services
    10:00

  • Fast queries with Hive Partitioning
    15:52
  • Fast queries with Hive Bucketing
    03:00

  • Advanced Spark datasets
    01:22
  • User Defined Function (UDF)
    03:36
  • Joins - Left, Right, Inner, Outer
    05:50

  • Spark Scala real world coding introduction
    00:44
  • Installing JDK on a local Machine
    01:34
  • Installing IntelliJ IDEA
    00:48
  • Adding Scala Plugin to IntelliJ
    00:28
  • Preview06:24
  • Scala basics using IntelliJ
    13:51
  • Hello World Spark Scala using IntelliJ
    05:41
  • Configuring HADOOP HOME on Windows using Winutils
    01:20
  • Enabling Hive Support in Spark Session
    05:27
  • Installing PostgreSQL
    04:53
  • psql command line interface for PostgreSQL
    02:08
  • Preview04:43
  • Importing a project into IntelliJ
    04:39
  • Organizing code with Objects and Methods
    09:51
  • Implementing Log4j SLf4j Logging
    05:32
  • Exception Handling with try, catch, Option, Some and None
    06:19

Instructor

FutureX Skill
Big Data, Cloud and AI Solution Architects
FutureX Skill
  • 4.3 Instructor Rating
  • 733 Reviews
  • 26,695 Students
  • 6 Courses

We are a group of Solution Architects and Developers with expertise in Java, Python, Scala , Big Data , Machine Learning and Cloud.

We have years of experience in building Data and Analytics solutions for global clients.

Our primary goal is to simplify learning for our students.

We take a very practical use case based approach in all our courses.

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.