Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA CompTIA Security+ Amazon AWS AWS Certified Developer - Associate
Photoshop Graphic Design Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Personal Development Mindfulness Meditation Personal Transformation Life Purpose Emotional Intelligence Neuroscience
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Google Analytics
Microsoft Power BI SQL Tableau Business Analysis Business Intelligence MySQL Data Analysis Data Modeling Data Science
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Online Business Business Plan Startup Freelancing Blogging Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
30-Day Money-Back Guarantee
IT & Software Other IT & Software Data Warehouse

Data Engineering, Serverless ETL & BI on Amazon Cloud

Data warehousing & ETL on AWS Cloud
Bestseller
Rating: 4.3 out of 54.3 (153 ratings)
1,320 students
Created by Siddharth Raghunath
Last updated 8/2020
English
English [Auto]
30-Day Money-Back Guarantee

What you'll learn

  • Setting up a Data Warehouse on Amazon Cloud using Redshift from scratch
  • Learn and understand AWS Athena and when to make use of Athena
  • Learn how to store data in S3 Data lakes using Parquet columnar file formats and optimize the process of data scans using Athena
  • Learn and automate the ETL processes using different server-less components like AWS Glue , Data Pipeline and Lambda Functions
  • Data Centralization using Redshift Spectrum
  • Trigger and Automate Glue jobs using Lambda Functions
  • Understand how to pull data into QuickSight which is a BI-Reporting/Visualization offering from AWS
Curated for the Udemy for Business collection

Requirements

  • Hands on expertise on Python & Sql is a must
  • should have a technical background or prior experience in Pyspark (at least beginner level)
  • Basic understanding of different cloud components (AWS ,GCP or Azure )

Description

AWS Cloud can seem intimidating and overwhelming to a lot of people due to its vast ecosystem, but this course will make it easier for anyone who wants a hands-on expertise in setting up a data-warehouse in Redshift or setup a BI infrastructure from scratch .

Data Scientists/Analysts/Business Analysts will soon be expected to (if not already) become all-rounders and handle the technical aspect of data ingestion/engineering/warehousing .

Anyone who has the basic understanding of how cloud works can benefit from this course because : 

- This course is designed keeping in mind end to end life cycle of a typical data engineering project

-  Provides a practical solution to real-world use-cases

This Course covers : 

  • Setting up a data warehouse in AWS Redshift from scratch

  • Basic Data Warehousing Concepts

  • Writing server-less AWS Glue Jobs (pyspark and python shell) for ETL and batch processing

  • AWS Athena for ad-hoc analysis (when to use Athena)

  • AWS Data Pipeline to sync incremental data

  • Lambda functions to trigger and automate ETL/Data Syncing processes

  • QuickSight Setup , Analyses and Dashboards

Prerequisites for this course are : 

  • Python / Sql (Absolute must)

  • PySpark (should know how to write some basic Pyspark scripts)

  • Willingness to explore ,learn and put in the extra effort to succeed

  • An active AWS Account

Important Note - This course makes use of the free tiers for Redshift and RDS , so you will not be billed for them unless you exceed the free tier usage which should be more than enough to get enough practice from this course  .

Also , this course makes use of AWS UI on the browser for creating clusters and setting up jobs , there is no bash scripting involved. One can use any operating system to perform the lab sessions in this course . 

This course is not code-intense or code-heavy ,there is only 35% coding involved , the rest is execution,understanding and chaining different component together. The whole purpose of this course is to make everyone aware of and feel comfortable with all the tools/features used in this course .

Some Tips : 

  • Try to watch the videos at 1.2X speed

  • Every time you work on a new component or feature , do some research on the other tools that are meant for the same purpose and see how they differ and in what aspects , For Eg  Redshift/Athena vs  Snowflake or Bigquery , QuickSight vs PowerBi vs Microstrategy


Who this course is for:

  • Data Scientists/Analysts who need hands on implementation experience on AWS ETL Tools
  • Software developers who are curious to learn data engineering
  • Anyone with experience in coding that wants to get into the field of Data Engineering/Analytics and Science

Course content

8 sections • 49 lectures • 6h 32m total length

  • Preview03:28
  • Preview03:27
  • Preview03:29

  • Redshift Overview
    05:23
  • Redshift vs BigQuery
    09:06
  • Redshift - Data Consistency
    02:44
  • Lab:Setup Mysql RDS Instance on AWS Cloud
    09:07
  • Lab: Mysql RDS Database Import
    05:38
  • Load Data into Mysql RDS using DBeaver
    03:35
  • Lab : Redshift Cluster Setup
    09:21
  • Lab : Sql Client for Redshift and RDS Mysql
    04:10

  • Introduction - Flow of Data
    02:24
  • Understanding the different Components and their Roles
    06:26
  • Designing your Data Warehouse - Basic Concepts
    05:53
  • Lab - AWS DataPipeline - Getting Started with the first import Job
    18:26
  • Lab : One-time Load Historical Data into Redshift Tables using Copy Command
    09:39
  • AWS Glue - Overview and Walkthrough
    10:35
  • Lab - AWS DataPipeline - Setup our first Hourly Jobs for Incremental Data Loads
    09:18
  • Lab - AWS Glue - First Python Shell Job for incremental Data loads into Redshift
    25:56
  • Lab - AWS Lambda Function to Trigger our Glue Job
    08:29
  • Lab - AWS DataPipeline - Second import Job
    07:15
  • Lab : One-time Load Historical Data into Redshift Tables using Copy Command
    11:42
  • Lab - AWS Glue - Python Shell Job for incremental Data loads into Redshift
    13:15
  • AWS Glue Python - Capacity
    05:47
  • Important - Data Syncing Approach and the Bigger picture
    03:14
  • Redshift - Cluster Snapshot and restoring
    05:09
  • Sync the Other Mysql Tables
    2 questions

  • Section Overview and Introduction
    06:13
  • Lab - AWS Glue Crawler Setup
    11:40
  • Lab - Athena - Data and Table Scan Explanation
    06:03
  • Lab - Pyspark Development Local
    15:33
  • Lab - Port Local Pyspark Script to AWS Glue
    08:08
  • Lab - AWS Glue Pyspark - Parquet File Format & Snappy Compression
    09:43
  • Lab - AWS Lambda to Trigger Glue Jobs
    11:19
  • Lab - Glue Crawler Run - Populate Partitions in Data Catalog
    03:16

  • Preview06:21
  • Lab - Redshift Spectrum | Create External Schema
    10:26
  • Lab - Redshift Spectrum | Cross Database Joins
    02:56

  • Quicksight - Introduction
    05:42
  • Lab - Connecting with Redshift and Create Dashboards/Analyses
    12:24
  • Lab - Run Custom Sql Queries for QuickSight Analyses and Dashboards
    12:36

  • Redshift - Sort Keys and Compound Sort Keys
    05:56
  • Redshift - Interleaved Sort Keys
    04:29
  • Redshift - Vacuum Operations
    07:13
  • Redshift - Choosing Keys
    04:35
  • Redshift - Distribution Keys
    06:03
  • Lab - Parameter Group | Redshift Cluster Modification
    07:25
  • Lab - Sort and Dist Keys & Vacuuming | Alter Table Commands
    11:52

  • Lab - AWS Glue Pyspark - Insert External Data into Redshift
    12:22
  • Lab - AWS Glue - Pyspark - Connect to RDS directly
    07:33

Instructor

Siddharth Raghunath
Data Engineer / Cloud Data Engineer / Passionate Techie
Siddharth Raghunath
  • 4.3 Instructor Rating
  • 260 Reviews
  • 1,999 Students
  • 2 Courses

I am a Data Engineer with a vast experience in the field of Software Development,Distributed processing and data engineering on cloud . I have worked on different cloud platforms such as AWS & GCP and also with on-prem hadoop clusters. I also give seminars on Distributed processing using Spark , real time streaming and analytics and best practices for ETL and data governance.I am also a passionate coder ,love writing and building optimal data pipelines for robust data processing and streaming solutions . 

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Impressum Kontakt
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.