The Complete Hands-on Introduction to Airbyte

Name: The Complete Hands-on Introduction to Airbyte
Rating: 4.7 (485 reviews)

Get started with Airbyte and learn how to use it with Apache Airflow, Snowflake, dbt and more

Created byMarc Lamberti

Last updated 8/2024

English

What you'll learn

Understand what Airbyte is, its architecture, concepts, and its role in the MDS
Install and set up Airbyte locally with Docker
Connect Airbyte to different data sources (databases, cloud storages, etc)
Configure Airbyte to send data to various destinations (DWs, databases)
Develop a data pipeline from scratch with Airbyte, dbt, Soda, Airflow, Postgres, and Snowflake to run your first data syncs
Set up monitoring and notifications with Airbyte

Course content

6 sections • 54 lectures • 3h 44m total length

Welcome!2:35
Prerequisites1:22
Who am I?1:31
Learning Advice2:16

Why Airbyte?3:22
What is Airbyte?5:04
The Core Concepts2:28
Discover Airbyte core concepts: source, destination, and connectors; learn how streams and fields organize data, and configure connections with replication options like full overwrite and incremental updates.
The Core Components3:01
Explore Airbyte’s core components, including the config database, web app, API, scaler, discover, workers, and how the UI creates sources and manages sync jobs through the API.
Why not Airbyte?1:20
Airbyte Cloud or OSS?0:45
Explore Airbyte deployment options, including three paths: open source self-managed community version, self-managed enterprise, and fully managed Airbyte cloud, tailoring features and support to your needs.
Quiz!

Introduction to Docker (Optional)3:48
Discover how Docker creates isolated containers to run applications with their own dependencies, using Docker files and images, and executing Airbyte with Docker build and Docker run.
Running Airbyte with Docker5:01
Learn to run Airbyte with Docker by installing Docker Desktop, managing containers and resources, starting Airbyte components with Docker Compose, then access the web UI on port 8000.
Potential issues when running Airbyte and fixes0:12
The Airbyte UI tour8:18
The Bank Pipeline1:25
Create your first source (Google Sheets)5:07
Create your first Airbyte source with Google Sheets by configuring a service account, enabling the Google Sheets API, and sharing the spreadsheet for access.
Create your first destination (BigQuery)3:31
Create your first Airbyte destination with BigQuery, replicating data from Google Sheets, attach billing, set project and a customers data set, choose a loading method, and verify the connection.
Configure your first connection8:02
Make your first sync!3:28
Make your first sync by replicating data from Google Sheets into BigQuery, then verify the results in BigQuery tables CC and wire via the job history and logs.
Raw tables and additional columns?5:27
Airbyte creates an internal namespace and route tables storing records with airbyte row id, extracted at, loaded at, meta, and data for schema migrations and to recreate the final table.
Connector classifications (Certified, Community, etc)2:43
Quiz!

How does a sync work?5:26
Discover how Airbyte performs a source-to-destination sync via a sync workflow, using connectors, a queue and workers, with config checks, schema validation, and loading into destinations like BigQuery.
Create a new source (Postgres)
Side notes for Postgres0:50
Create a dedicated read-only Postgres user for non-Airbyte databases to enable data transfer, granting read-only access to relevant schemas and tables; the Airbyte Postgres database already has this user.
Add some data (Postgres)
Discover the sync modes14:16
Handling schema changes8:19
Explore how Airbyte detects schema changes before syncing, handles non-breaking and breaking changes, and propagates changes with manual approval.
What is Change Data Capture (CDC)?2:34
Enable CDC with Postgres5:58
Configure the Postgres connector to replicate data from Postgres to BigQuery using CDC, including enabling logical replication, creating a replication slot and publication, and switching to CDC in the UI.
Syncing data between Postgres and BigQuery using CDC7:40
Configures Postgres with CDC to replicate data into BigQuery using Airbyte, verifies replication, and explains CDC fields like cdc_lsn and updated_at, including delete handling and CDC limitations.
The Sync Modes cheat sheet1:00
CDC under the hood0:05
What happens behind the scene for CDC
Quiz!

Project overview3:23
Learn to build a data pipeline with airbyte that loads fake transactions, runs data quality checks with soda, and produces a customer risk score.
Learning recommendations1:45
Take your time to learn, watch the video first, then replicate. Focus on tasks, not setup; brief intros to snowflake, airflow, DBT, soda.
The Setup and requirements3:13
The Data Generators (Python Scripts)2:07
generate fake customer transactions with two python scripts: a postgres-backed transaction generator for airbyte and a detection generator that labels fraud and stores results in mysql, triggered by airflow pipeline.
Quick introduction to Apache Airflow3:31
Explore how Apache Airflow acts as a data orchestrator to program, monitor, and schedule workflows, using DAGs, operators, and providers like Airbyte to sync data.
Let's generate some data!4:11
Set up the S3 bucket with the user3:46
Create and configure an AWS S3 bucket and an IAM user, attach a tailored policy, generate access keys, and prepare Airbyte to access the bucket.
Create the AWS S3 destination with Airbyte2:49
Create the Postgres to S3 connection with Airbyte1:13
Create the MySQL source with Airbyte1:21
Create the MySQL to S3 connection with Airbyte0:52
Create an Airbyte connection from MySQL to S3 to transfer the labeled transactions table; name it load labeled transactions and set the schedule to manual.
Let's try to sync data!
What's the Write Audi Publish pattern?1:33
Learn the write audit publish pattern that stages data, audits it for quality with a framework like soda, and only then moves valid records to production for downstream consumers.
Create the AWS S3 source with Airbyte4:49
Create the Snowflake destination with Airbyte3:01
Create the Raw to Staging connection with Airbyte1:38
Let's write the data into staging!
Quick introduction to Soda1:42
Learn to validate staged data with soda, a data quality framework, by defining checks in a YAML file with soda core and soda SQL to ensure tables are not empty.
Write data quality checks with Soda10:36
Create the customer_metrics table with dbt5:57
Create the Fraud Data Pipeline with Airflow16:03
The pipeline in action!2:47

Requirements

Prior experience with Python
Access to Docker on a local machine
A Google Cloud account with a billing account (for BigQuery)

Description

Welcome to the Complete Hands-On Introduction to Airbyte!

Airbyte is an open-source data integration engine that helps you consolidate data in your data warehouses, lakes, and databases. It is an alternative to Stich and Fivetran and provides hundreds of connectors mainly built by the community.

Aibyte has many connectors (+300) and is extensible. You can create your connector if it doesn't exist.

In this course, you will learn everything you need to get started with Airbyte:

What is Airbyte? Where does it fit in the data stack, and why it is helpful for you.
Essential concepts such as source, destination, connections, normalization, etc.
How to create a source and a destination to synchronize data at ease.
Airbyte best practices to efficiently move data between endpoints.
How to set up and run Airbyte locally with Docker and Kubernetes
Build a data pipeline from scratch using Airflow, dbt, Postgres, Snowflake, Airbyte and Soda.

And more.

At the end of the course, you will fully understand Airbyte and be ready to use it with your data stack!

If you need any help, don't hesitate to ask in Q/A section of Udemy, I will be more than happy to help!

See you in the course!

Who this course is for:

Data Engineers
Analytics Engineers
Data Architects

The Complete Hands-on Introduction to Airbyte

What you'll learn

Explore related topics

Course content

Welcome!4 lectures • 8min

Airbyte Fundamentals6 lectures • 16min

Getting started with Airbyte11 lectures • 47min

Advanced Concepts9 lectures • 46min

The Fraud Project20 lectures • 1hr 16min

Airbyte for Python users!4 lectures • 28min

Requirements

Description

Who this course is for: