
Set up prerequisites and a virtual machine, then orchestrate data loading from Airflow to Snowflake using Airflow operators, Snowflake Python API, and Talend, including S3 to Snowflake transfer and scheduling.
Learn to install and configure Visual Studio Code, install the remote development extension, and connect to the Airflow server to access and run code from the mounted Airflow folder.
Configure Snowflake connection details across Airflow, Talend, and Python projects by updating account, username, password, and region in multiple config files to enable ELT workflows.
Airflow introduces a batch oriented orchestration framework using Python to build scheduled data pipelines as directed acyclic graphs with rich scheduling and an open source web interface for monitoring.
Explore Airflow architecture coordinating python-based data workflows stored in a bag folder, scheduled by the flow scheduler, executed by flow workers, and monitored via metadata db and web server.
Start the flow server and scheduler in Airflow, connect to the metadata database, and verify the first job entry. Learn how tasks are extracted and scheduled based on dependencies.
Explore how start date and schedule interval drive Airflow DAG execution, including first run timing, daily intervals, and what null intervals and end dates mean for stopping or delaying runs.
Learn how to load data from Airflow to Snowflake using Airflow, and how to copy and process data with Snowflake Operator in upcoming lectures.
Offload data from Airflow to Snowflake by using a Snowflake operator in a DAG to create, load, and drop a table with templated names.
Create a Snowflake connection in Airflow, run a DAG to create a test table, insert data, and drop it, then review the Snowflake operator output and history.
Demonstrates using a Snowflake hook in an Airflow ELT workflow to create a table, insert data, and fetch a raw count, printing results to logs and guiding task flow decisions.
Copy data from a three location stage to a Snowflake table using the three to Snowflake operator, then process and load it back, filtering by city.
Explore the Snowflake Python API fundamentals, including connection and cursor objects, methods like execute and fetch, attributes, exception handling, bindings, and integration with Pandas and Talend.
Connect to Snowflake using the Snowflake Python connector, create a connection and cursor, run select current_version(), and note that credentials will be stored separately in a future lecture.
Learn how to securely manage Snowflake credentials by storing them in a separate credentials json file, reading them with a helper, and establishing a connection to obtain a cursor.
Explore cursor attributes in Snowflake, including the query and the description of the statement, the current connection, and how to use the describe option to fetch metadata without executing.
Learn to execute asynchronous queries in Snowflake, monitor progress with is still running and get query status, and fetch results using fetch all, fetch one, and get results from cursor.
Learn to execute asynchronous statements using a context object's cursor and connection, monitor query status with get query status through if error, and handle results across new connections.
Learn to use the connection object's is in error and get query status methods to manage errors and update the audit table during Snowflake queries.
Explore pyformat in Snowflake: use q mark binding, numeric binding, and Python format bindings to parameterize queries, insert data, and run in clause and select statements.
Explore numeric and q-mark bindings in Snowflake, compare BI format to q-mark binding, and learn to insert data using parameters with proper date-time mapping and prepared statements.
The lecture demonstrates how the cursor's describe method returns a result metadata object, detailing column attributes, type codes, and internal sizes, and shows how to access them via indexing.
Explore how a result metadata object uses a type code to indicate a column's data type. It discusses values 0 and 2 and their link to the description method.
Explore how a context manager controls Snowflake transactions by wrapping multiple statements, automatically rolling back on error when auto commit is off.
Learn to inspect and materialize Snowflake get result batches, select a batch and chunk, and use properties like compressed size and uncompressed size to gauge data size.
Fetch pandas batches demonstrates returning Snowflake query results as pandas dataframes in batches, using an iterator to iterate over each batch and assemble the full dataset.
Learn how to convert an arrow result batch back to a pandas dataframe or an arrow table using two dedicated methods, with a quick practical example.
Explore dictionary cursor usage in Snowflake, querying data by column name or index, and managing connections, transactions, and exceptions in an ELT workflow with Airflow, Python, and Talend.
Copy data from S3 to Snowflake using a Python program that builds copy commands from a parameter file. It captures rejects and audits each load with a copy audit table.
Submit copy commands in parallel from s3 to snowflake using async execution, monitor status, and audit rejected records for each copy command.
Explore Talend as an open source etl tool and its data integration, data preparation, governance, and big data capabilities, including cloud data fabric, Gartner leadership, and industry adoption.
Explore the Talend workspace for Snowflake ELT workflows, including job designs, variables, routines, templates, and metadata connections; learn drag-and-drop components, run, document, and export capabilities.
Set up the Talend studio and create a first job. Configure a Snowflake elt pipeline that loads a local employee file after defining the input schema.
Build and run a Snowflake elt job from the command line, parameterizing the connection context. Generate executable files for Windows and Linux and verify data loading.
Schedule and run a talent job with Airflow by unzipping the build and loading data into a table using Python and Bash operators. In production, GitHub and Jenkins deploy jobs.
A non-technical analyst requests a simple, automated method to load files from S3 into Snowflake without manual copy commands, while the developer ensures command sequencing, cost auditing, and failure tracking.
Demonstrates copying data from S3 to Snowflake with a jar app, using connection and job parameter files to specify table, stage, file format, and rejects threshold.
Explore how an elt app demo uses Snowflake connection details and job parameters with GitHub stored queries to load data into a control table and execute the queries.
Develop generic dag code for Snowflake ELT workflows by standardizing job-scoped parameter folders, naming tasks like copy_task and easier_task, and replacing hardcoded commands with reusable python-driven commands.
Learn to build generic DAG code in Airflow using a Python-based task builder, replacing bash operators with functions, and balance generalization with default arguments to create flexible workflows.
In our previous course Snowflake Masterclass [Real time demos+Best practices+Labs] we deep-dived and understood the fundamentals of snowflake, solved lot of assignments, and understood best practices to load data and unload data.
Also, we closely evaluated most of the snowflake features understanding how they work under the hood. Through these discussions, you realized how to use Snowflake efficiently.
There was one missing piece, How to build and orchestrate ETL workflows on Snowflake. This course is just about that.
In this course, we are going to learn,
Build workflows in Airflow.
We will leverage Talend capabilities to build generic code to ingest data and process data in snowflake.
We will build audit tables and record every command we fire on the snowflake. We will record the time consumed for each task and capture snowflake credits.
Once we build the framework we will build a workflow to process and transform 250 + GB volume of NYC traffic data.
At last, we will connect the Snowflake with python and write code to capture stats of data we loaded to the snowflake.
you will also get access to preconfigured Jupyter notebook to run your python code on the Snowflake.
If you have previously not worked with Talend, Airflow and Python don't worry they are very simple tools I will provide the necessary introduction.
I am sure you will learn a lot from this journey. See you in the course!!