
Use step-by-step learning, pause to understand code, and replicate it locally to absorb knowledge. Turn on accurate subtitles, adjust playback speed, and seek help via Q&A, Stack Overflow, or Google.
Define data engineering as the tasks that make data available to business users for analytics, reporting, and machine learning, illustrated by the data hierarchy of needs from collection to AI.
Explore a practical PostgreSQL setup demo by installing and connecting a PostgreSQL database locally or on Google Cloud, using tools like DBeaver or pgAdmin and understanding cloud costs.
Connect Python to Elasticsearch with a PostgreSQL client via pip, storing documents as JSON in hero-index. Query with Elasticsearch's JSON language and explore Kibana discovery and index patterns.
Differentiate OLTP and OLAP to understand real-time transaction processing and historical data analysis. Learn how OLTP captures and maintains individual transactions while OLAP aggregates data for analytics and business intelligence.
Learn 3rd normal form basics: when to separate location into its own table versus a descriptive field, and how header-detail lookups with start and end active dates manage lookups.
Learn how snowflake schema builds on the star schema with deeper dimension relationships and normalization, offering less storage and better data integrity at the cost of complexity and joins.
Explore how Elasticsearch stores data in an index, uses explicit or dynamic mappings, and indexes JSON documents via rest api or clients, with auto-generated or self-defined IDs and update semantics.
Explore the choice between on premise and cloud data warehouses, weighing construction costs, maintenance, and speed to insight, with cloud providers like Google Cloud and Redshift.
See how to perform etl from oltp to data warehouse using dummy procurement data; apply Kimball approach to build a data mart in BigQuery and visualize with Data Studio.
Explore olap cube operations such as roll up, drill down, slicing, and dicing. Use BigQuery or PostgreSQL to analyze vendor distribution by invoice month, vendor name, and invoice payment status.
Explore the Hadoop ecosystem and Spark, highlighting HDFS, MapReduce, YARN, and Hadoop Common. Describe MapReduce steps (map, shuffle, reduce), in-memory Spark processing, and higher-level abstractions like Pig Latin and Hive.
"Data is the new oil".
You might have heard the quote before. Data in digital era is as valuable as oil in industrial era. However, just like oil, raw data itself is not usable. Rather, the value is created when it is gathered completely and accurately, connected to other relevant data, and done so in a timely manner.
Data engineers design and build pipelines that transform and transport data into a usable format. A different role, like data scientist or machine learning engineer then able to use the data into valuable business insight. Just like raw oil transformed into petrol to be used through complex process.
To be a data engineer requires a lot of data literacy and practice. This course is the first step for you who want to know about data engineering. In this course, we will see theories and hands-on to introduce you to data engineering. As data field is very wide, this course will show you the basic, entry level knowledge about data engineering process and tools.
This course is very suitable to build foundation for you to go to data field. In this course, we will learn about:
Introduction to data engineering
Relational & non relational database
Relational & non relational data model
Table normalization
Fact & dimension tables
Table denormalization for data warehouse
ETL (Extract Transform Load) & data staging using pyhton pandas
Elasticsearch basic
Data warehouse
Numbers every engineers should know & how it is related to big data
Hadoop
Spark cluster on google cloud dataproc
Data lake
Important Notes
Data field is HUGE! This course will be continuously updated, but for time being, this contains introduction to concept, and sample hands-on for data engineering.
For now, this course is intended for beginner on data engineering.
If you have some experience on programming and wonder about data engineering, this course is for you.
If you have experience in data engineering field, this course might be too basic for you (although I'm very happy if you still purchase the course)
If you never write python or SQL before, this course is not for you. To understand the course, you must have basic knowledge on SQL and pyhton.