
Explore the modern data ops landscape and the customer data platform, covering data lineage, cataloging, ETL/ELT, streaming, activation, and governance.
Learn to set up an aws ec2 instance: pick an ami, choose an instance type, create a key pair, configure a security group, and attach storage.
Connect to your AWS EC2 instance from the terminal using ssh with your private key (chmod 400) and the instance DNS or IP, then navigate with cd and ls.
Install and configure Docker on an EC2 instance, update packages, install dependencies, add Docker's official GPG key and repository, start Docker, and verify with hello-world to prepare for Airbyte.
Install Airbyte on an EC2 instance using Docker Compose, configure disk storage, and launch the Airbyte UI on port 8000 to manage data transfers.
Secure the Airbyte instance on an EC2 server by updating the dot env with a new basic auth username and password, and enabling secure access for ELT data transfers.
Add an Airbyte source by connecting a CSV file via URL and naming the source and dataset. Test the connection on the sources tab to validate the setup.
This course is a skills based approach to teaching Data Engineering. Through these courses, the student will be introduced to progressively more complex concepts and exercises commonly encountered by Data Engineers.
Lectures series will cover:
Creating AWS Instances
Intro to Unix Command line
Installing Docker Images
Loading CSV into Postgres using Airbyte
Introduction to SQL in Metabase
The approach is built around:
Hands-on demos leveraging Medium Articles
Video walk-through of the Medium Articles
Background demos to be equipped to tackle demos
How to use Docker, EC2, SSH Keys, …
Leverage data to build the Demo
ELT data sources
Transform the data using SQL
Demonstrate your knowledge building a Data Project
Background on the Instructor:
Tom has been creating innovative products around HTTP-based distributed client architectures for 20+ years. Recent experience involves AdTech and MarTech including Incentivized Mobile Apps, Real-Time Programmatic Bidding, and Customer Data Platforms. Currently focused on tooling to simplify operation and management of large scale customer data platform projects. My experience is the teams, processes, and tooling to iteratively solve complex Service problems.
Currently, enjoying the challenges of creating tooling which combines end to end customer success insight, principles of DataOps, and managing product through continuous improvement. Proud to lead a team of Full Stack developers, Data Engineers, and UI Engineers constructing tooling using React UI, Java service layer, and Azure data stack. Problem space includes Analytics, Anomaly Detection with Statistical and ML, Data Lineage, and Streaming data (Pipelines, HTTP,...).
Osvaldo Valedez was my daughter's Python instructor who expressed interest in DataOps as a career. He most recently graduated from University of California Berkeley and is getting started in his software career. We are working together to explore the different Data Engineering products on the market and assembled these tutorials to help others on the same journey.
As with any new material, coaching and training can help overcome challenges. Tutoring is available for 1:1 help