
Compare airflow and prefect as task orchestration tools for data pipelines, highlighting dag versus flows and the machine learning workflow from data collection to deployment and monitoring.
Experiment with a simple Python hello world flow in Prefect IO, deploy it on a client machine, and monitor its minute-by-minute execution and dashboard status via Prefect UI.
Explore Prefect workflow documentation to learn core concepts like flows and tasks, and master deployment and scheduling via cron jobs, intervals, and dashboards.
Learn how compartments define a boundary in Oracle Cloud to isolate resources such as databases, VMs, and VCNs, and how to create or delete compartments with the removal rule.
Connect to the Oracle Autonomous Database via the web edition of SQL Developer, explore the always free tier options, and run sample queries like selecting all from the customers table.
Learn how to retrieve data from an Oracle Autonomous Database using a Python script, including connection setup, cursor usage, and executing a select query to fetch table data.
Compare webhooks, mqtt, and web sockets by communication style differences. Webhooks push events to a URL; mqtt uses a broker for publish-subscribe in IoT; web sockets enable bidirectional communication.
Automate GitHub issue events by deploying a webhook-driven workflow with Prefect, using a programmatic deployment, a work pool, and JSON parameters to trigger automated events.
Data engineering is the process of designing and building systems that let people collect and analyze raw data from multiple sources and formats. These systems empower people to find practical applications of the data, which businesses can use to thrive.
Companies of all sizes have huge amounts of disparate data to comb through to answer critical business questions. Data engineering is designed to support the process, making it possible for consumers of data, such as analysts, data scientists and executives, to reliably, quickly and securely inspect all of the data available.
About a decade back, the data analysis was merely on the structured data available on the a Relational data base or in ERP system and any decision was made based on analysis of the historic data and tools like ETL (extract, Tranform & load) was used for datawarehousing system. However in this dynamic ever changing world, non relational data base information need to used for quick analysis.
So apart from transactions in database, the other source of web information from CSV, webhooks, http & MQTT need to taken care as appropriate.
Further more, the process of ETL as evolved into Data pipelines. A data pipeline is a method in which raw data is ingested from various data sources and then ported to data store, like a data lake or data warehouse, for analysis. In data pipe line task dependency can be build with different task. These task can be also based on some events happening like Order booked or Issues raise which can trigger a task. For this concepts of Webhooks are used.
Prefect is one such newly evolved data pipeline or workflow tool, in which one can build not only static task dependency, but these task dependency can be built based on some event happeningas well.
This course uses the cloud version Prefect worflow tool which can be invoked from a cloud based virtual machine. Knowledge of Python & shell scripting is essential.
This course covers following topic:
•Difference between Data Engineering Vs Data Analysis Vs Data Science
•An Overview about Data Science, Machine Learning & Data Science.
•Extract, Transform, Load vs Data pipeline.
•Provisioning Oracle Linux Virtual machine On Oracle Cloud Infrastructure.
•Prefect Cloud Data pipeline and Client VM Set up.
•Documentation reference - Prefect Workflow / Data pipelines.
•Hands-on Demonstration of Perfect Flow with Tasks dependency.
•Building Prefect dataflow pipeline for Oracle Database extract using Python.
•Introduction to Webhooks and Hands-on Demonstration with Prefect & Github.
•Career Path for Data Engineers
Happy Learning!