
Navigate the Snowflake Workspaces UI to organize projects, create and rename workspaces, manage files and folders, and securely reference and run SQL scripts from the workspace.
Manage multiple Snowflake workspaces, move or copy files between ETL_DEV and ETL_QA, and use migrate.sql to create a database, schema, table, and rows.
Navigate the workspace sql editor in learn workspaces, rename explore.sql to iexplore.sql, and run semicolon-separated queries. Set role and warehouse, query tpch_sf1 customer and supplier data.
Explore how to use the Snowflake results window to search all columns, perform partial and unique-value searches, select columns, review query details, and download results as csv.
Learn to use Snowflake workspaces to view multiple query results in a split, tabbed window. Pin and unpin tabs to lock results and keep earlier data while you run queries.
Explore how to compare data before and after updates in Snowflake using Workspaces split results pane for side-by-side viewing, with pinning and active window behavior.
Practice splitting the sql editor to compare two files side by side, rename duplicates, run independent queries, and merge or resize splits to review code versions efficiently.
Understand the four Snowflake cost types: compute, storage, cloud services, and data transfer, and how credits, editions, and storage plans influence your cloud data costs.
Execute a lab that creates the HRMS database and HR schema, drops existing tables with if exists, and recreates seven tables, then verifies them with show tables.
Populate the HR data model tables by executing inserts after setting the HRMS database context and schema, truncating each table to avoid duplicates, and verifying counts with a final union.
Explore how columnar storage, used in data warehouses and big data analytics, delivers faster queries, better compression, and efficient parallel loading by storing data column-wise.
Explore Snowflake architecture across database storage, query processing, and cloud services, including micro-partitions, virtual warehouses, AWS S3 storage, and authentication, access control, and security.
Master storage integration, file formats, and stages in Snowflake and apply labs to create a storage integration, a csv file format, and a stage for loading data.
Create a storage integration to connect Snowflake with AWS S3, control access via allowed and blocked locations, use an AWS role ARN, and validate connectivity for loading data from learn2cloud-snowflake.
Learn how a Snowflake file format stores metadata about data files, dictates parsing for the COPY command, and supports CSV, JSON, AVRO, ORC, PARQUET, and XML.
Learn how CSV file formats in Snowflake handle field delimiter, record delimiter, headers, encoding, and error controls, enabling schema evolution and resilient data loads.
Create a CSV file format in Snowflake tailored to the AWS file, with delimiter comma, no header, NULL_IF handling for SQL nulls, double-quoted fields, and trim space, stored in hrms.etl.
Leverage a Snowflake stage as a temporary storage for data files loaded with COPY; external stages link to cloud storage, internal stages reside in Snowflake, and both can unload data.
Explore how storage integration connects AWS and Snowflake, configure roles with a trust policy, and use an optional external ID. Learn about AWS storage access, stage concepts, and file formats.
Define data latency as the time from source change to its availability in target tables or data warehouse. Learn how generation, extraction, and load timing, plus pauses, drive architectural solution.
Explore Snowflake copy options, including on error set to continue, to load all valid rows while rejecting bad ones, and review other copy options through a lab exercise.
Load data from AWS into Snowflake using copy into a target table from a CSV stage; learn force option to reload data and why default behavior prevents duplicates.
Load a subset of file columns into a Snowflake table with COPY and a SELECT, align fields by position, optionally include file names, and support auto-increment surrogate keys in warehousing.
Learn how to use the copy option files to load only the relevant file from a stage with two files, avoiding duplicates and preserving production access constraints.
Explore how the COPY option TRUNCATECOLUMNS=TRUE prevents load failures by truncating data that exceeds column length. The demo contrasts a false setting with a successful, truncated load into EMPLOYEES_COPY_TRUNCATECOLUMNS.
Learn how Snowflake's copy on_error options, from abort_statement to continue and skip_file, control load behavior and rollback.
Learn how skip_file_% dynamically adjusts tolerated errors based on file row counts in Snowflake's copy command, improving on skip_file_num by using a percentage value in quotes to set adaptive thresholds.
Explore how copy with on_error=continue enables loading all good rows from Redshift-to-Snowflake migrations, with no error limit, while rejected rows are left for later analysis and reprocessing.
Learn how the copy option size_limit in Snowflake loads files in chunks, with at least one file loaded before checking the limit, to manage large migrations.
Demonstrates loading data from S3 into Snowflake without a stage by using AWS keys in the COPY statement, with FILE_FORMAT and a target table EMPLOYEES_NOSTAGE.
Use the copy statement to transform data during load with a select, including case, cast of salary, and try_cast, while controlling errors and applying multiple copy options.
Teach non-technical users to load data into Snowflake via Snowsight Web UI Wizard, supporting CSV, JSON, ORC, Avro, Parquet with a 250 MB limit, creating a table and viewing SQL.
Explore data latency and three loading modes—bulk, near real time, real time—and copy options for Avro and JSON formats, plus Snowsight's add data wizard for loading from S3 and stages.
Discover how Snowflake handles semi-structured data with JSON, Parquet, Avro, ORC, and XML, enabling flexible loading into Snowflake from data lakes.
Explore JSON as a human-readable data exchange and storage format built on key-value pairs, with nested structures and wide use in IoT data transmission.
Explain the three semi-structured data types in Snowflake, object, array, and variant, and demonstrate how to create, query, and access nested data stored in these types.
Configure AWS S3 event notifications to trigger Snowpipe for csv files with a prefix and optional suffix, and use alter pipe refresh and copy history to load and verify data.
Learn how Snowpipe restricts copy options, how ON_ERROR choices such as SKIP_FILE or CONTINUE affect loads, and why FILES, FORCE, PURGE, and SIZE_LIMIT are not permitted.
Data engineers troubleshoot snowpipe failures by creating an error table and a dedicated pipe. They surface all issues with information_schema.validate_pipe_load, fix the CSV data, and re-upload for successful load.
Monitor snowpipe costs by analyzing credits used and files inserted in information_schema.pipe_usage_history (14 days retention). Use date_range_start and date_range_end to review recent activity, lag up to 3 hours.
Learn to track snowpipe costs with information_schema.pipe_usage_history, reviewing credits_used, bytes_inserted, and files_inserted. Note information_schema stores 14 days; for older data or pipe-specific details, use account_usage.pipe_usage_history.
Leverage Snowpipe, a serverless, continuous ingestion service that auto loads files into Snowflake on arrival. Supports csv, json, parquet, ORC, xml and provides latency and error checks.
Monitor Snowflake tasks in real time with information_schema.task_history and manage dependencies and schedules, plus manual execution using execute task to ensure reliable, non overlapping runs and automatic retries.
Create and manage a Snowflake task named TASK_CREATE_TABLE in the COMPUTE_WH warehouse with a 10-minute schedule to run a SQL statement that creates SCHEDULE.EMPLOYEES and verify task details.
Learn how to create and run a Snowflake task without a warehouse, using serverless compute by default, and adjust initial compute size with user_task_managed_initial_warehouse_size for compute-intensive tasks.
Discover how to execute multiple sql statements in a single Snowflake task using anonymous blocks, stored procedures, or multiple tasks, with examples creating and querying employees tables.
Discover how to document task dependencies in Snowflake using the task_dependents information schema, capturing parent and child relationships, and use recursive true to reveal the full DAG tree.
Use merge with a task to update or insert into the employees_dim_merge table, using hr.employees as the source, and track changes with last_updated_ts and last_inserted_ts.
Discover three stream types in Snowflake: standard (default) tracks all dml changes, including inserts, updates, deletes, and truncates; append-only captures inserts only and is faster; insert-only applies to external tables.
Explore merging data from more than one table by joining the EMPLOYEES stream with the DEPARTMENTS table and merging into the target EMP_DEPT_DIM_CDC using a merge statement.
Learn how streams capture inserts, updates, and deletes, propagate changes to a target via merge, create table, or insert, and use metadata$action and metadata$isupdate to show before and after values.
Learn how data unloading transfers data from Snowflake tables to AWS S3 and why it matters. Explore use cases, features, and labs, including JSON, Parquet formats and partitioned unloading.
Export data from Snowflake to external storage such as Amazon S3, Google Cloud Storage, or Azure Blob, enabling applications, data feeds via SFTP, and ad-hoc sharing with business users.
Unload Snowflake data to S3 as csv using a dedicated stage and file format, control compression, set a file name prefix and overwrite option for reproducible file naming.
Unload Snowflake data to S3 as partitioned Parquet files to improve storage and query performance, and highlight Parquet format creation, stage setup, and copy unload.
Elevate your data engineering and cloud analytics skills with this comprehensive Snowflake Master Class, specifically tailored for professionals leveraging Amazon Web Services (AWS). This intensive course delves deep into the intricacies of the Snowflake Data Cloud, equipping you with the practical knowledge and hands-on experience to design, build, optimize, and manage robust data solutions on AWS.
Starting with a foundational understanding of Snowflake's unique architecture and its seamless integration with the AWS ecosystem, you will progress through critical data engineering workflows. Learn to efficiently ingest diverse data sources, including structured and semi-structured formats, utilizing powerful tools like Snowpipe for continuous data loading. Master automation techniques with Snowflake Tasks and track data changes effectively with Streams.
Explore Snowflake's innovative features for data management and resilience, including Time Travel and Fail-Safe, and gain a thorough understanding of various Snowflake table types and their optimal use cases. Discover the power of Zero-Copy Cloning for agile development and testing.
Crucially, you will learn how to secure your Snowflake environment with granular Roles and Access Controls and implement Dynamic Data Masking for sensitive information. Understand the principles and practicalities of secure Data Sharing both within and outside your organization.
The course further explores advanced topics such as optimizing query performance and managing costs effectively using Materialized Views and various tuning strategies. You will also learn techniques for Data Sampling and how to integrate with external data sources via External Tables. Finally, we will explore the exciting new capabilities of Dynamic Tables, Event Tables, Hybrid Tables, and Iceberg Tables, preparing you for the future of data management in Snowflake.
This Master Class culminates with a dedicated module focused on preparing you for Snowflake Data Engineer interviews, covering key concepts and practical scenarios. By the end of this course, you will possess the expertise to architect and implement sophisticated, scalable, and cost-efficient data solutions using Snowflake on AWS.
Course Topics:
Introduction to Snowflake: Overview of the Snowflake Data Cloud, its value proposition, and integration with AWS services.
Getting Started with Snowflake on AWS: Account setup, connecting via Snowsight, Workspaces UI and other clients, navigating the Snowflake interface.
Snowflake Architecture: Understanding Snowflake's unique multi-cluster shared data architecture, virtual warehouses, and cloud services layer.
Storage Integration with AWS S3: Configuring and managing external stages for seamless data access and loading from AWS S3.
Loading Data to Snowflake: Best practices and techniques for bulk loading structured data using COPY INTO statements.
Loading Semi-Structured Data to Snowflake: Efficiently loading and querying JSON, Avro, Parquet, and other semi-structured data formats.
Snowpipe: Implementing continuous data ingestion pipelines for real-time and near real-time data loading.
Tasks: Automating data processing workflows, scheduling SQL statements, and managing dependencies.
Streams: Tracking data changes in tables for efficient ETL/ELT processes and incremental updates.
Time Travel & Fail-Safe: Understanding and utilizing Snowflake's data recovery and historical data access features.
Snowflake Table Types: Deep dive into Permanent, Transient, and Temporary tables and their use cases.
Zero Copy Cloning: Leveraging instant, zero-cost cloning for development, testing, and disaster recovery.
Roles and Access Controls: Implementing robust security models using Snowflake's role-based access control (RBAC) framework.
Dynamic Data Masking: Protecting sensitive data with dynamic masking policies based on user roles.
Data Sharing: Securely sharing data with internal and external stakeholders without copying or moving data.
Materialized Views: Optimizing query performance by creating and managing materialized views.
Performance Tuning and Cost Optimization: Strategies for analyzing query performance, optimizing SQL, and managing warehouse costs.
Data Sampling: Techniques for extracting representative subsets of data for analysis and testing.
External Tables: Querying data directly from external storage locations like AWS S3 without loading.
Dynamic Tables: Understanding and implementing declarative data transformation pipelines with automatic refresh.
Event Tables: Capturing and analyzing event data within Snowflake.
Hybrid Tables: Exploring the capabilities and use cases of Snowflake's Hybrid Tables.
Iceberg Tables: Working with Iceberg tables in Snowflake for enhanced data lake functionality.
Snowflake Data Engineer Interview : Snowflake interview for those who are trying to get a Snowflake job and want to know what a Snowflake interview sounds like.