Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Data Engineering for Beginner using Google Cloud & Python

Name: Data Engineering for Beginner using Google Cloud & Python
Rating: 4.5 (329 reviews)

Basic data engineering : python, pandas, google cloud platform (GCP) bigquery, spark on dataproc, gcs, data warehouse

Created byTimotius P

Last updated 11/2024

English

What you'll learn

Basic data engineering, what is data engineering, why needed, how to do it from zero
Relational database model, database modelling for normalization design & hands-on using postgresql & python / pandas
NoSQL database model, denormalization design & hands-on using elasticsearch & python / pandas
Introduction to spark & spark cluster using google cloud platform

Course content

11 sections • 77 lectures • 8h 4m total length

Welcome to This Course1:37
Course Structure & Coverage3:35
How To Get Maximum Value From This Course5:56
Use step-by-step learning, pause to understand code, and replicate it locally to absorb knowledge. Turn on accurate subtitles, adjust playback speed, and seek help via Q&A, Stack Overflow, or Google.

What is Database2:35
Relational Database20:17
When Not To Use Relational Database?8:10
NoSQL Database5:29
Demo : Postgresql11:46
Explore a practical PostgreSQL setup demo by installing and connecting a PostgreSQL database locally or on Google Cloud, using tools like DBeaver or pgAdmin and understanding cloud costs.
Demo : Python for Postgresql4:29
Demo : Elasticsearch7:50
Demo : Python for Elasticsearch4:02
Connect Python to Elasticsearch with a PostgreSQL client via pip, storing documents as JSON in hero-index. Query with Elasticsearch's JSON language and explore Kibana discovery and index patterns.

The Importance of Relational Data Model1:33
OLTP vs OLAP3:10
Differentiate OLTP and OLAP to understand real-time transaction processing and historical data analysis. Learn how OLTP captures and maintains individual transactions while OLAP aggregates data for analytics and business intelligence.
Database Normalization2:55
First Normal Form (1NF)3:40
Second Normal Form (2NF)9:11
Third Normal Form (3NF)1:44
Normalization Python Demo3:19
Normalization Tips4:52
Learn 3rd normal form basics: when to separate location into its own table versus a descriptive field, and how header-detail lookups with start and end active dates manage lookups.
Database Denormalization4:38
Denormalization Python Demo6:53
Fact & Dimension Tables3:43
Star Schema2:43
Star Schema Python Demo3:28
Snowflake Schema2:24
Learn how snowflake schema builds on the star schema with deeper dimension relationships and normalization, offering less storage and better data integrity at the cost of complexity and joins.
Galaxy Schema2:12
Extract Transform Load (ETL) & Staging Tables7:04
ETL & Staging Tables - Demo Overview3:13
ETL & Staging Tables - Python Demo 12:31
ETL & Staging Tables - Python Demo 211:59
To Insert or To Update?3:01
ETL & Staging Tables - Python Demo 36:06
ETL & Staging Tables - Python Demo 46:49
ETL & Staging Tables - Tips1:30

Basic NoSQL Concept3:44
CAP Theorem2:24
Denormalization on Elasticsearch2:47
Elasticsearch Basic Usage5:38
Elasticsearch Index & Document3:17
Explore how Elasticsearch stores data in an index, uses explicit or dynamic mappings, and indexes JSON documents via rest api or clients, with auto-generated or self-defined IDs and update semantics.
Elasticsearch ETL - Overview6:55
Elasticsearch Query DSL2:04
Elasticsearch ETL - Python Demo5:34

Business Perspective8:16
Technical Perspective8:51
More Fact & Dimension Table12:15
OLAP Cube3:58
On-Premise or Cloud?6:15
Explore the choice between on premise and cloud data warehouses, weighing construction costs, maintenance, and speed to insight, with cloud providers like Google Cloud and Redshift.
Various Techniques6:52
Demo Overview8:06
See how to perform etl from oltp to data warehouse using dummy procurement data; apply Kimball approach to build a data mart in BigQuery and visualize with Data Studio.
Demo 1 - PostgreSQL Data Warehouse16:18
Demo 2 - BigQuery Data Warehouse9:02
Demo 3 - Data Warehouse Operations4:43
Explore olap cube operations such as roll up, drill down, slicing, and dicing. Use BigQuery or PostgreSQL to analyze vendor distribution by invoice month, vendor name, and invoice payment status.

Hadoop Ecosystem5:03
Explore the Hadoop ecosystem and Spark, highlighting HDFS, MapReduce, YARN, and Hadoop Common. Describe MapReduce steps (map, shuffle, reduce), in-memory Spark processing, and higher-level abstractions like Pig Latin and Hive.
Introducing Spark2:58
Spark Programming7:29
Data Formats3:58
Hello Spark2:50
Spark Demo - Dataframe5:35
Spark Demo - Spark SQL6:24
Spark & BigQuery - Setting Environment8:28
Spark & BigQuery - ETL Movies16:33
Spark & BigQuery - Lesson Learned6:33

Requirements

Understanding basic sql statements (select, insert, update, delete is sufficient)
Understanding basic python / pandas
The course uses google cloud platform. If you wants to do hands-on, you need to provide credit card detail for payment on google cloud. If you don't, you can still watch the course video

Description

"Data is the new oil".

You might have heard the quote before. Data in digital era is as valuable as oil in industrial era. However, just like oil, raw data itself is not usable. Rather, the value is created when it is gathered completely and accurately, connected to other relevant data, and done so in a timely manner.

Data engineers design and build pipelines that transform and transport data into a usable format. A different role, like data scientist or machine learning engineer then able to use the data into valuable business insight. Just like raw oil transformed into petrol to be used through complex process.

To be a data engineer requires a lot of data literacy and practice. This course is the first step for you who want to know about data engineering. In this course, we will see theories and hands-on to introduce you to data engineering. As data field is very wide, this course will show you the basic, entry level knowledge about data engineering process and tools.

This course is very suitable to build foundation for you to go to data field. In this course, we will learn about:

Introduction to data engineering
Relational & non relational database
Relational & non relational data model
Table normalization
Fact & dimension tables
Table denormalization for data warehouse
ETL (Extract Transform Load) & data staging using pyhton pandas
Elasticsearch basic
Data warehouse
Numbers every engineers should know & how it is related to big data
Hadoop
Spark cluster on google cloud dataproc
Data lake

Important Notes

Data field is HUGE! This course will be continuously updated, but for time being, this contains introduction to concept, and sample hands-on for data engineering.

For now, this course is intended for beginner on data engineering.

If you have some experience on programming and wonder about data engineering, this course is for you.

If you have experience in data engineering field, this course might be too basic for you (although I'm very happy if you still purchase the course)

If you never write python or SQL before, this course is not for you. To understand the course, you must have basic knowledge on SQL and pyhton.

Who this course is for:

Beginner python developer curious about data engineering
Software engineer who wants to take the path of becoming data engineer
Technical architect, engineering manager, who wants to know overview of data engineering

Data Engineering for Beginner using Google Cloud & Python

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 11min

Introduction to Data Engineering3 lectures • 15min

Database8 lectures • 1hr 5min

Relational Database Model23 lectures • 1hr 39min

NoSQL Database Model8 lectures • 32min

Data Warehouse10 lectures • 1hr 25min

Numbes Every Engineer Should Know3 lectures • 16min

Hadoop & Spark10 lectures • 1hr 6min

Spark Cluster on Google Cloud (Dataproc)3 lectures • 39min

Data Lake4 lectures • 51min

Requirements

Description

Who this course is for: