Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
GCP Data Engineering-End to End Project-Healthcare Domain
Rating: 4.6 out of 5(503 ratings)
3,356 students

GCP Data Engineering-End to End Project-Healthcare Domain

Industry Standard Project in Healthcare Domain using GCP services like GCS, BigQuery, Dataproc, Composer, GitHub, CICD
Created bySaidhul Shaik
Last updated 4/2025
English

What you'll learn

  • Understand the End to End Data Engineering Project
  • Design and Implement Scalable ETL Pipelines for Healthcare Data
  • Implement Key Techniques like Incremental Data, SCD2, Metadata driven approach, Medallion Arch, Error Handling, CDM , CICD & Many more..
  • Develop and Deploy Data Solutions with CI/CD Practices

Course content

2 sections10 lectures7h 47m total length
  • Important Links0:02
  • Introductory Lecture to Understand Project29:17

Requirements

  • Basic Knowledge on Python and SQL

Description

  • This project focuses on building a data lake in Google Cloud Platform (GCP) for Revenue Cycle Management (RCM) in the healthcare domain.

  • The goal is to centralize, clean, and transform data from multiple sources, enabling healthcare providers and insurance companies to streamline billing, claims processing, and revenue tracking.

  • GCP Services Used:

    • Google Cloud Storage (GCS): Stores raw and processed data files.

    • BigQuery: Serves as the analytical engine for storing and querying structured data.

    • Dataproc: Used for large-scale data processing with Apache Spark.

    • Cloud Composer (Apache Airflow): Automates ETL pipelines and workflow orchestration.

    • Cloud SQL (MySQL): Stores transactional Electronic Medical Records (EMR) data.

    • GitHub & Cloud Build: Enables version control and CI/CD implementation.

    • CICD (Continuous Integration & Continuous Deployment): Automates deployment pipelines for data processing and ETL workflows.

  • Techniques involved :

    • Metadata Driven Approach

    • SCD type 2 implementation

    • CDM(Common Data Model)

    • Medallion Architecture

    • Logging and Monitoring

    • Error Handling

    • Optimizations

    • CICD implementation

    • many more best practices

  • Data Sources

    • EMR (Electronic Medical Records) data from two hospitals

    • Claims files

    • CPT (Current Procedural Terminology) Code

    • NPI (National Provider Identifier) Data

  • Expected Outcomes

    • Efficient Data Pipeline: Automating the ingestion and transformation of RCM data.

    • Structured Data Warehouse: gold tables in BigQuery for analytical queries.

    • KPI Dashboards: Insights into revenue collection, claims processing efficiency, and financial trends.

Who this course is for:

  • Aspiring Data Engineers, Data Professionals
  • For getting interview Ready