Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Software Development Tools No-Code Development
Business
Entrepreneurship Communication Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certifications Network & Security Hardware Operating Systems & Servers Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Paid Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement & Gardening Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition & Diet Yoga Mental Health Martial Arts & Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Learning Teacher Training Test Prep Other Teaching & Academics
Web Development JavaScript React Angular CSS Node.Js HTML5 PHP Vue JS
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Amazon AWS Cisco CCNA Microsoft AZ-900 AWS Certified Developer - Associate
Microsoft Power BI SQL Tableau Data Modeling Business Analysis Business Intelligence MySQL Qlik Sense Blockchain
Unity Unreal Engine Game Development Fundamentals C# 3D Game Development C++ Unreal Engine Blueprints 2D Game Development Virtual Reality
Google Flutter Android Development iOS Development React Native Swift Dart (programming language) Mobile App Development Kotlin SwiftUI
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting Canva InDesign Character Design Procreate Digital Illustration App
Life Coach Training Neuro-Linguistic Programming Personal Development Personal Transformation Life Purpose Mindfulness Meditation CBT Cognitive Behavioral Therapy Sound Therapy
Entrepreneurship Fundamentals Business Fundamentals Freelancing Business Strategy Startup Business Plan Online Business Blogging Home Business
Digital Marketing Social Media Marketing Marketing Strategy Internet Marketing Google Analytics Copywriting Email Marketing YouTube Marketing Podcasting

DevelopmentSoftware EngineeringETL

Writing production-ready ETL pipelines in Python / Pandas

Learn how to write professional ETL pipelines using best practices in Python and Data Engineering.
Bestseller
Rating: 4.4 out of 54.4 (309 ratings)
2,519 students
Created by Jan Schwarzlose
Last updated 4/2022
English
English [Auto]

What you'll learn

  • How to write professional ETL pipelines in Python.
  • Steps to write production level Python code.
  • How to apply functional programming in Data Engineering.
  • How to do a proper object oriented code design.
  • How to use a meta file for job control.
  • Coding best practices for Python in ETL/Data Engineering.
  • How to implement a pipeline in Python extracting data from an AWS S3 source, transforming and loading the data to another AWS S3 target.

Requirements

  • Basic Python and Pandas knowledge is desirable.
  • Basic ETL and AWS S3 knowledge is desirable.

Description

This course will show each step to write an ETL pipeline in Python from scratch to production using the necessary tools such as Python 3.9, Jupyter Notebook, Git and Github, Visual Studio Code, Docker and Docker Hub and the Python packages Pandas, boto3, pyyaml, awscli, jupyter, pylint, moto, coverage and the memory-profiler.

Two different approaches how to code in the Data Engineering field will be introduced and applied - functional and object oriented programming.

Best practices in developing Python code will be introduced and applied:

  • design principles

  • clean coding

  • virtual environments

  • project/folder setup

  • configuration

  • logging

  • exeption handling

  • linting

  • dependency management

  • performance tuning with profiling

  • unit testing

  • integration testing

  • dockerization


What is the goal of this course?

In the course we are going to use the Xetra dataset. Xetra stands for Exchange Electronic Trading and it is the trading platform of the Deutsche Börse Group. This dataset is derived near-time on a minute-by-minute basis from Deutsche Börse’s trading system and saved in an AWS S3 bucket available to the public for free.

The ETL Pipeline we are going to create will extract the Xetra dataset from the AWS S3 source bucket on a scheduled basis, create a report using transformations and load the transformed data to another AWS S3 target bucket.

The pipeline will be written in a way that it can be deployed easily to almost any production environment that can handle containerized applications. The production environment we are going to write the ETL pipeline for consists of a GitHub Code repository, a DockerHub Image Repository, an execution platform such as Kubernetes and an Orchestration tool such as the container-native Kubernetes workflow engine Argo Workflows or Apache Airflow.


So what can you expect in the course?

You will receive primarily practical interactive lessons where you have to code and implement the pipeline and theory lessons when needed. Furthermore you will get the python code for each lesson in the course material, the whole project on GitHub and the ready to use docker image with the application code on Docker Hub.

There will be power point slides for download for each theoretical lesson and useful links for each topic and step where you find more information and can even dive deeper.


Who this course is for:

  • Data engineers, scientists and developers who want to write professional production-ready data pipelines in Python.
  • Everyone who is interested in writing data pipelines in Python that are ready for production.

Instructor

Jan Schwarzlose
Data Engineer aus Leidenschaft
Jan Schwarzlose
  • 4.5 Instructor Rating
  • 662 Reviews
  • 10,896 Students
  • 7 Courses

Es gibt so viele coole Tools da draußen - vor allem im Bereich Small/Large/Big Data. Ein Leben reicht garnicht aus, alle Tools zu kennen und gut darin zu sein. Aber bereits mit einem übersichtlichen und guten Toolset kann man tolle Projekte mit echtem Mehrwert umsetzen.

Ich habe 2012 die Universität als Diplomingenieur für Mechatronik abgeschlossen, wobei Programmierung vor allem im Embedded Umfeld eine wichtige Rolle spielte. Im Laufe meiner ersten Berufsjahre als Ingenieur entdeckte ich mehr und mehr meine Leidenschaft für Python vor allem mit Small/Large/Big Data.

Nach einigen Hobby-Projekten wagte ich 2016 schließlich den Schritt auch beruflich in diesem Umfeld zu arbeiten. Nun arbeite ich seit mehreren Jahren erfolgreich als Data-Engineer und hatte die Möglichkeit in tollen Projekten mitzuwirken.

Dieses Wissen möchte ich gern durch Kurse im Bereich Data-Engineering und Data-Science mit einem hohen Fokus auf die Praxis weitergeben.


----------------------------------------

English Version

There are so many cool tools out there - especially in the small / large / big data area. One life is not enough to know all tools and be proficient with them. But even with a quite small and good toolset, you can implement great projects with real value.

In 2012 I graduated as engineer for mechatronics. Programming especially in the embedded area was an important part of my education. During my first years as an engineer, I discovered more and more my passion for Python, especially with small / large / big data.

After a few hobby projects, I took the step to work professionally in this area in 2016. I've been working now for years as a data engineer being involved in great projects.

I like to pass this knowledge on through courses in data engineering and data science with a high focus on practice.

Top companies choose Udemy Business to build in-demand career skills.
NasdaqVolkswagenBoxNetAppEventbrite
  • Udemy Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Investors
  • Impressum Kontakt
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Accessibility statement
Udemy
© 2022 Udemy, Inc.