
Discover how this course helps data engineers, analysts, and scientists build clean, automated, production-ready pipelines using Snowflake for speed and DBT for version-controlled coding, with hands-on resources.
Welcome to the Course: Snowflake & dbt – A Hands-On Guide
In this introduction, you’ll get an overview of what this course covers and how it will help you master data ingestion and transformation using Snowflake and dbt.
We’ll go step by step, starting with ingesting data from AWS S3 and a local drive, creating a database, schema, and stages in Snowflake, and using SnowSQL. Then, we’ll move on to dbt, where you’ll learn to build and transformations.
This course is fully hands-on, meaning you’ll be working with real data side by side with me.
What You Will Be Able to Do After This Lecture:
Understand the structure and objectives of this course.
Gain real Snowflake and dbt knowledge ready to be used in real life projects.
Gain a clear roadmap of how the course progresses from data ingestion to transformation.
Feel confident about what you’ll achieve by the end of the course.
Let's dive in ! ?
Learn how to set up a Snowflake account, start a 30-day free trial with $400 compute credit, and ingest data from S3 buckets by selecting a cloud provider and region.
Explore Snowflake's new workspaces, where worksheets live inside private workspaces, organize SQL files, and run queries with Ctrl+Enter, while tracking query history.
Set up your snowflake workspace, understand roles and privileges, and learn to use warehouses and worksheets to run queries and manage objects in the UI.
Create a demo database, a demo schema, and a large warehouse, showing how objects live in schemas inside databases and how to grant privileges and set up a stage.
Create an external stage in the demo schema to connect to a weather S3 bucket, list files, and set the json file format for copying into a Snowflake table.
Explore the variant data type to store and analyze JSON from semi-structured data, create a variant column, copy data from stage into a table, and inspect JSON payloads.
Extract fields from a JSON payload in Snowflake using variant to populate a weather table with city, coordinates (lat, lon), clouds, humidity, pressure, temperature, and timestamp.
Master copying json data from a stage into a Snowflake table using copy into, navigate json payloads in a variant column, and populate a weather table end-to-end.
Learn to load semi-structured json into the pets table in Snowflake by creating an external stage, inspecting the json payload, and using a copy into command to map fields.
Explore managing semi-structured data in Snowflake by creating a database and schema, configuring an external stage with a JSON file format, and loading selective fields into a table.
Learn to ingest city bike trips into Snowflake with SnowSQL, unzipping 2023 CitiBike data from zip files and loading it via an internal stage.
Push local csv files to a Snowflake internal stage with the put command, after creating the stage and selecting database and schema, using Windows path handling and wildcards.
Create a role analyst, grant it to a user, and assign read and write access to a stage while navigating database and schema permissions in Snowflake.
Check your Snowflake stage file structure before loading by reviewing uploaded files, overwrite option, define a 13-column schema, and prepare a final copy into a table.
Create a final Snowflake table, load 35 million rows with a string-based base layer, and handle errors with onerror options before moving to dbt-driven transformations.
Learn to automate Snowflake SQL tasks with scheduled cron expressions. Create tasks in a schema, attach a warehouse, and run staged data loads into target tables.
Learn to create a child Snowflake task that runs after a weather task, configure a warehouse, load data with copy into the bike table, and view DAG and run history.
Manage Snowflake tasks with alter task resume, show tasks, execute task, and task history to control schedules, predecessors, and manual runs.
Complete Snowflake section one to define the sources dbt will build upon, establishing a foundation for the workflow. If you've defined the bike and weather tables, you can skip.
Explore dbt, data build tool, turning SQL into models and materializing tables or views, revealing lineage from Snowflake and integrating version control, tests, and yaml with dbt core and cloud.
Create a dbt cloud account and link your Snowflake data warehouse to GitHub by configuring credentials, verifying your email, and setting up the repository.
Link your GitHub account to dbt cloud, configure Snowflake credentials, and create dev and prod environments with separate schemas to safely develop dbt models.
Link your dbt cloud to Snowflake, initialize a dbt project, and run all models to materialize tables in your development environment; or target a specific model with dbt run --select.
Explore how Jinja in dbt uses expressions, variables, for loops, and macros to enhance SQL, and how compiled and run folders, plus the ref function, drive model execution.
Learn how dbt materialization precedence works, with tables vs views, and how model-level config overrides project-wide defaults to determine the final materialization.
Create a sources.yaml file to define data sources for traceability and lineage, referencing demo.bike and demo.weather in a snowflake schema, and clarify distinctions between sources and models.
Master dbt's ref and source functions to trace model lineage in Snowflake, with hands-on examples and practical steps for making your first commit.
Use Snowflake date functions to extract date, hour, and day name from timestamps, classify days and seasons with case statements, and build a date dimension in dbt.
Refactor the station dimension to replace the station year calculation with a dbt macro from a date utils macro that outputs the season from a date string.
In this macro refactor walkthrough, learn to convert dbt logic into reusable macros like get season and day type, using double curly brackets and raw sql to reuse across models.
Apply snowflake window functions to select the daily weather category with the most occurrences using row_number and qualify, then compute and round averages for temperature, pressure, humidity, and clouds.
Build a station dimension in dbt by extracting start station name, id, latitude, and longitude from bike data using a CTE and distinct values.
build the trip fact table in dbt, convert id fields to timestamps, derive trip date, include start and end station ids, and compute trip duration with timestamp diff.
Create a dbt model weather_bike_correlation by joining the trip facts with the daily weather table using a left join and a CTE, previewing results and addressing year mismatches.
Replace the 2023 Citi Bike data with 2018 data to fix the year mismatch with the weather dataset. If you’d rather, jump straight to the prod deployment video.
Create a new Snowflake stage, load CitiBike 2018 trip data, adapt to a schema change by mapping columns, and perform a 35 million-row copy into the bike table.
Create a staging bike model to align 2018 data with current schema, update dependent models, and run dbt to push corrected analytics to production.
Merge corrected models from dev to main via a pull request, then run a prod dbt deploy job that materializes models in the prod schema for end users.
Automate the end-to-end workflow from Snowflake ingestion to DBT job runs, using weather and bike tasks, scheduling on free tier, and explore API-driven and full orchestration with Airflow or Lambda.
wrap copy into weather and bike tasks, qualify the demo schema, assign a warehouse and cron schedule, then resume and order tasks in a dag to prepare the dbt job.
Schedule dbt jobs with cron in utc, chaining weather and bike tasks, and set prod to run one hour after weather; configure deploy jobs to use cron-based daily timing.
Ingest data into Snowflake from S3 and local CSVs via NoSQL CLI tool, work with semi-structured JSON, and build dbt models to deliver clean data for Power BI or Tableau.
Configure a Snowflake external access integration to trigger dbt cloud tasks via API. Configure the network rule and secret API token, and weave it into the workflow.
Explore Snowflake functions, comparing inline and script-based definitions, with multi-language support and Python runtime, then deploy via stage uploads or inline code using a handler.
Build a complete automation that triggers dbt cloud from a Snowflake dag by creating a Snowflake function, using an external access integration and python code to call the API.
Learn to connect Snowflake to a private AWS S3 bucket with a storage integration, replacing public stage access and configuring the bucket, allowed locations, and AWS role ARN.
Create an AWS role for a Snowflake storage integration to access a private S3 bucket, configure external ID and trust policy, and create a Snowflake stage using the storage integration.
Explore setting up a storage integration with Azure to access private blob containers from Snowflake, including tenant ID configuration, endpoint setup, and granting blob data contributor rights.
Learn to configure a GCS storage integration in Snowflake, including bucket setup, permissions, and the service account IAM. Then create a Snowflake stage and access the GCS bucket data.
Explore dbt core on your laptop to ingest bitcoin data from a public s3 bucket into Snowflake, using source freshness, tests, and seed incremental models.
Create a Snowflake external stage to connect AWS S3 Bitcoin blockchain data using parquet files, with a BTC database and schema, and prepare for dbt modeling.
Understand bitcoin data by outlining blocks and transactions, and prepare a parquet-based dataset from a stage as a source for dbt transformations, including transaction and block hashes.
Finish setting up the bitcoin table as a dbt source and copy into data using a regular expression pattern to retrieve only the current day’s blocks from snappy parquet files.
Wrap your copy into a Snowflake task named Bitcoin load task, scheduled every two hours in the Bitcoin schema to load the latest blocks into the Bitcoin table for dbt.
Install Python and set up a virtual environment. Then install dbt core and the Snowflake extension and verify with dbt version.
Set up git on your command line by downloading and installing it for Windows or Mac, and ensure it is in your path, install Visual Studio Code as your IDE.
Install and explore visual studio code, set up a project workspace, manage files and folders, use the integrated terminal, and customize with extensions for efficient coding.
initialize a dbt project and set up a snowflake connection by creating a profiles file, then activate the virtual environment and make the first git commit.
learn to push local changes to a GitHub repository by linking a remote origin, configuring master and a new dev branch, and authenticating during remote pushes.
Install the dbt core extension in VS Code, configure the virtual environment Python interpreter, and view model lineage in Snowflake to explore relationships between dbt models.
Explore the core DBT project files, create a sources YAML file, and define a BTC source with database, schema, and tables, then test with a staging BTC query.
Learn to use dbt source freshness to verify BTC data freshness by defining a freshness block with a timestamp column and loaded_field, and set warn after and error after thresholds.
Learn a quick overview of DBT tests and implement a first simple test for the staging BTC model, including unique and not null checks using a schema yaml configuration.
Learn how incremental models in dbt optimize large projects by loading only new data, using is_incremental and max timestamp filters, with append and merge strategies in Snowflake.
Implement an incremental model with the merge strategy in dbt for Snowflake, configuring materialization to incremental and using a unique hash key to insert or update rows.
Explode transaction outputs in snowflake using flatten and lateral to unnest the outputs variant into separate rows, extracting address and value with proper typing, for Bitcoin transactions.
Transform a previously exploded model into an incremental model using the append strategy, applying the incremental macro to staging btc outputs and tracking changes with a where clause.
Build ephemeral dbt models to filter bitcoin transactions and power a whale alert through downstream models. Use a staging BTC transactions layer and CTEs, following the DRY principle.
Clean the workspace to enforce dbt project organization with staging and marts. Consider ephemeral models from the video, but use views for large datasets and explore bitcoin data for dashboards.
Upload csv data into Snowflake with dbt seed to enrich whale alert bitcoin transactions with usd price from CoinGecko, then debug and validate total_sent_usd via joins.
Refactor code to create a btc_utils macro in dbt for btc to usd conversion, using current date price and jinja syntax to make the calculation reusable across models.
Materialize your dbt models into a production Snowflake schema by updating profiles to prod, seeding data, and running targeted dbt commands to deploy from dev to production.
Explore how the manifest.json file acts as a compiled blueprint of a dbt project, detailing models, sources, tests, configs, and the project state.
Walks through creating a new dbt profile for a new team member, adding a distinct snowflake schema and git branch, and planning to reuse prod data while avoiding full reruns.
Learn to use the defer argument in dbt to run models against production artifacts without materializing upstream models, leveraging a prod manifest.json and a state folder.
Demonstrate using dbt clone to duplicate a project with zero copy cloning in Snowflake, using a separate manifest.json and state for safe, pointer-based clones.
Compare the defer flag and the dbt clone command to manage models, highlighting how defer avoids data drift while clone allows editing upstream models and saves recomputation.
Master python dbt models in snowflake with snowpark, define a dbt function, and materialize as tables or increments using pandas for json array explosions.
Walks through refactoring a python dbt model to convert snowpark data frames to pandas, explode the outputs column, json normalize, and extract address and value.
Discover a practical workaround to debug Python dbt models without print support by raising an exception to reveal the pandas data frame type and diagnose issues.
Explain DBT packages and their reusable macros, highlight the DBT packages hub and the dbt_utils package, and show how to declare and install packages via packages.yaml.
Learn how to use dbt utils macros, including star and equal row count, to select upstream columns, run tests, and validate row counts across models.
Learn to generate dbt documentation using the dbt docs generate command, which creates a catalog.json and serves a local docs site to view project lineage, sources, and models.
Explore dbt exposures by building a Looker Studio dashboard from a Snowflake data source and exposing it via dbt, then view the dashboard lineage in docs.
implement a custom dbt generic test for the whale alert model to verify bitcoin addresses begin with 1, 3, or bc1, using a crypto utils macro and yaml configuration.
Dbt contracts guarantee that the whale alert data stays stable by enforcing column data types and constraints, protecting Looker dashboard consumers from breaking changes.
Learn how to version dbt models to balance evolving logic with stable contracts for downstream consumers, using v1 and v2, deprecation dates, and latest-version views.
Set up key pair authentication for snowflake with dbt by creating a private/public rsa key pair, assigning the public key to a snowflake user, and preparing github actions integration.
Set up continuous integration for a dbt project on Snowflake with GitHub Actions, using private key authentication, profiles YAML, and a workflow that runs dbt debug on pull requests.
Review project two materials and solidify topics needed for the DBT analytics certification, then start the final project: analyze the Ethereum blockchain to extract whale activity data.
Apply Ethereum data analysis in Snowflake by loading parquet stage, parsing transactions, and building a whale model to identify recipient addresses with at least 100 Ethereum.
Examine how dbt logs document each command with an invocation ID, revealing info and debug lines, log levels, and tests for not null and uniqueness in Snowflake workflows.
Discover how to use dbt query comments in Snowflake, append comments after queries, and review query history in information schema to verify what ran.
Apply dbt hooks, including pre and post hooks, to run SQL around model materialization and learn how to add comments to Snowflake tables.
Explore native dbt jinja functions to power an audit log table, pulling invocation id, run started at, and invocation args dictionary to illuminate dbt executions.
Create an audit table in Snowflake using a dbt macro with Jinja variables, inserting invocation id, run started, profile, user, and version via dbt run operation and run_query, then commit.
Master dbt audit trails by using an on run end hook and the log run macro to populate a centralized audit table with invocation IDs and row-level logs.
Learn to enhance dbt audit logging by leveraging the results object to capture invocation id, model, status, and execution time, with run results json and on run and hook.
Discover how to use run results artifacts with dbt retry to re-run only the failing node by inspecting the run results JSON and invocation IDs.
Learn to track historical data with dbt snapshots using the timestamp strategy, updating with updated_at and recording previous and current rows in a snapshot table.
Discover Snowflake features like snowpipe, streams, merge, and tasks, plus time travel, cloning, and failsafe. Learn to load data to S3, share accounts, and apply column masking and row-level security.
Discover how Snowpipe enables continuous data ingestion from a private S3 bucket via an external stage, with storage integration and S3 events triggering near real-time loads of JSON data.
Set up a Snowflake storage integration and AWS trust policy to enable Snowpipe events in S3, linking an external stage, S3 bucket, and IAM role.
Configure Snowpipe with auto ingest to load new S3 files into the heroes table via a Snowflake pipe, using S3 event notifications and an SQS queue for near real-time ingestion.
Master the snowflake merge command for upserting data by combining inserts, updates, and deletes in one statement, using a json payload from raw superheroes table to a processed superheroes table.
Explore Snowflake streams and change data capture, attach streams to tables, and track inserts, updates, and deletes. Learn how to consume stream data into a target table using merge.
Consume the stream to keep the superheroes target table in sync with the source, using a Snowflake merge to handle inserts, updates, and deletes based on stream metadata actions.
Create and schedule a Snowflake task that runs only when a stream has data, enabling event-driven, cost-efficient merges from the stream into a table via a Snowpipe workflow.
Learn how Snowflake time travel lets you recover past data within the retention window, using before a statement, an offset, or a timestamp to access historical states.
Explore snowflake time travel and retention time checks with show tables and show tables history. Filter by retention and name, and alter table to set retention to 30 days.
Learn zero copy cloning in snowflake using the clone command to copy databases, schemas, or tables with no storage increase until modifications, and clone from the past with time travel.
Understand failsafe as the seven-day, non-configurable window after time travel, where recovery is only possible via a Snowflake support ticket on a best-effort basis as last resort.
Export data from Snowflake to S3 by unloading a table into a bucket using a storage integration and an unload stage, with CSV format options.
Learn how to customize Snowflake copy into exports to S3 by exporting CSV or JSON with format options, headers, compression, and using object_construct to create JSON variants.
Explore Snowflake secure data sharing, sharing data between accounts without copying, using shares and privileges, with commands to create shares, grant usage and select, and consume via inbound shares.
Learn how to use Snowflake masking policies to hide a sensitive power column, create a squad commander role, apply the mask to the superheroes table, and test access by role.
Implement row level security in Snowflake using row access policies and a mapping table to filter superhero data by franchise, granting Marvel manager access to Marvel rows only.
Apply row level security and data masking to a five-row characters table in a Snowflake dbt hands-on exercise, using roles north watch, commander, and small council.
Explore role based access control in Snowflake, focusing on the role hierarchy and how to grant and revoke roles to assign privileges across analysts and managers.
Learn how Snowflake RBAC attaches privileges to roles and grants those roles to users, and explore direct grants and secondary roles in the Snowflake dbt hands-on guide.
Create professor x as a user, grant the squad commander role, and provide usage and select access on heroes db, including future schemas, tables, and views.
Discover how Snowflake RBAC assigns roles like security admin, user admin, sysadmin, and account admin to grant access, create databases, warehouses, tables, and views, and show grants.
Practice securely dropping AWS resources, deleting the Udemy role, cleaning S3 buckets used for Snowpipe JSON ingestion, and dropping course databases to prevent incurred costs.
Explore snowflake cache layers: remote disk storage, warehouse local disk cache, and the result cache that lasts 24 hours, and learn how warehouse sharing enhances cache reuse for faster queries.
Discover how snowflake stores data in micro partitions, each 50mb to 500mb uncompressed, with per-partition max/min metadata, enabling selective scans and clustering to rewrite partitions and speed up queries.
Explore micro partitions and clustering by order priority, showing how clustering reduces partition scans and speeds queries, with notes on reclustering implications.
Want to build real, production-ready data pipelines with two of the most in-demand tools in the modern data stack ? This all-levels, project-based course takes you from the basics to advanced workflows with Snowflake and dbt (data build tool).
With over 11 hours of content, you’ll not just learn the concepts — you’ll apply them step by step in real projects, including a Bitcoin blockchain data pipeline designed to mirror real-world challenges.
In this course, you will:
Set up and manage Snowflake environments (databases, schemas, stages).
Ingest data from local files, S3 buckets, and external sources.
Write modular SQL with dbt models, CTEs, and window functions.
Apply ELT best practices and build maintainable data models.
Use dbt contracts, versioning, and generic tests for reliability.
Set up CI/CD with GitHub Actions and key pair authentication.
Optimize Snowflake with caching, micropartitions, and clustering.
Prepare data for analytics tools like Power BI.
This course is packed with real-world use cases, code walkthroughs, and tips I’ve implemented in production environments. By the end, you’ll be confident in building scalable, maintainable data pipelines — skills you can immediately apply in your current role or future projects. These lessons are designed to give you both technical expertise and practical confidence when working with modern data tools.