Name: VBM Portfolio Projects: SQL Data Cleaning (e-commerce data)
Rating: 4.9 (23 reviews)

Udemy Business

Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Created byMatthew Barr

Last updated 12/2025

English

What you'll learn

Build a complete portfolio project you can publish: an end-to-end SQL data cleaning + KPI pipeline
Turn a messy e-commerce table into a trusted clean_table that’s safe for reporting and dashboards
Profile data like a professional: row counts, null/completeness checks, category profiling, and “how bad is it?” diagnostics
Build a typed silver layer in SQL: safe casting, mixed-format date parsing, and text normalisation (without silently corrupting results)
Enforce a real business contract: filter invalid orders (amounts, costs, flags, hour ranges) and quantify exactly what each rule removes
Detect and remove duplicates using a business key, and understand the real-world risk of defining that key incorrectly
Implement 10 dashboard-ready KPIs in SQL using CTEs, aggregates, and window functions where needed
Standardise outputs into a single kpi_results table with one consistent schema a dashboard (or platform) can read
Debug KPI mismatches properly: trace issues back to the right layer (source → silver → clean → KPI) instead of guessing
Package the project professionally: clean SQL files, a strong README, evidence screenshots, and a LinkedIn-ready project summary

Course content

10 sections • 71 lectures • 1h 49m total length

The Verulam Blue Environment3:19
Overview of Sections5:32

Section Introduction1:43
The Initial Data Sampling - (a) Core Query1:58
Launch the sql workbench, create a clean workspace, and run a simple core query to inspect the raw e-commerce data for mixed date formats, missing values, and extraction issues.
The Initial Data Sampling - (b) Interpreting Results2:01
Data Completeness Assessment - (a) Core Query1:47
Data Completeness Assessment - (b) Interpreting Results1:41
Categorical Value Distribution Analysis - (a) Core Query1:49
Categorical Value Distribution Analysis - (b) Interpreting Results1:24
Payment Method Consistency Check - (a) Core Query1:22
Payment Method Consistency Check - (b) Interpreting Results1:07
End of Section0:51

Section Introduction0:50
Silver Layer - Parsing & Health Check - (a) Core Query1:51
Silver Layer - Parsing & Health Check - (b) Interpreting Results1:56
Silver Layer - Sample Inspection - (a) Core Query1:04
Silver Layer - Sample Inspection - (b) Interpreting Results1:14
Silver Layer - Normalisation & Visual Check - (a) Core Query1:38
Normalisation Health Check - (a) Core Query1:39
End of Section1:04

Section Introduction0:54
Business Rule Impact Assessment - (a) Core Query1:05
Assess the silver normalize temp view against the brief rules to diagnostically count how many rows would fail each rule, yielding a one-row data quality summary.
Business Rule Impact Assessment - (b) Interpreting Results1:32
Business Rule Enforcement & Baseline Count - (a) Core Query1:48
Business Rule Enforcement & Baseline Count - (b) Interpreting Results1:32
Baseline count keeps 9978 rows after filtering, with 308 rows rejected by the amount rule and zero failures elsewhere, validating the filter and establishing a future monitoring anchor.
Duplicate Detection & Business Key Analysis - (a) Core Query1:18
Duplicate Detection & Business Key Analysis - (b) Interpreting Results1:52
Clean Dataset – Deduplication & Final Inspection - (a) Core Query1:43
Define the clean table as the one-row-per-unique combination of eight business fields, deduplicating identical rows for final inspection and KPI-ready dashboards.
Clean Dataset – Deduplication & Final Inspection - (b) Interpreting Result1:42
Section End0:46
Clean the messy legacy orders table by profiling raw data, normalizing the silver layer, enforcing explicit business rules, and removing duplicates to enable kpis and tell the business story.

Introducing the KPIs - Introduction0:33
Introducing the KPIs - KPI 10:39
Introducing the KPIs - KPI 20:29
Introducing the KPIs - KPI 30:35
Introducing the KPIs - KPI 40:33
Introducing the KPIs - KPI 50:34
Introducing the KPIs - KPI 60:42
Calculate KPI six as the premium and platinum share of GMV. Sum order amounts from premium or platinum orders and divide by total GMV across all segments.
Introducing the KPIs - KPI 70:42
Introducing the KPIs - KPI 80:26
Introducing the KPIs - KPI 90:30
Introducing the KPIs - KPI 100:50
Introducing the KPIs - Submission Format0:35

Section Introduction2:05
KPI 1 - Average Order Value (AOV) - (a) Code and Results1:58
Compute the first KPI, average order value (AOV), by building a temp KPI view, rounding to two decimals, and casting to varchar for a unified API schema.
KPI 1 - Average Order Value (AOV) - (b) Business Context and Deep Dive1:18
KPI 2 - Overall Gross Margin - (a) Code and Results1:30
KPI 2 - Overall Gross Margin - (b) Business Context and Deep Dive1:50
Assess KPI two, the gross margin, to gauge profitability after expenses and pricing power. Compute (revenue − cost) / revenue, weighted by GMV on a clean dataset, six decimals.
KPI 3 - Return Rate - (a) Code and Results1:12
Compute KPI three return rate by summing is_return to count returns and dividing by total valid unique orders, yielding 0.079938 (about 7.99%).
KPI 3 - Return Rate - (b) Business Context and Deep Dive1:34
KPI 4 - Median Order Amount - (a) Code and Results1:29
Compute the median order amount KPI by querying the 50th percentile of order values, rounding to two decimals, casting to varchar, yielding 6522 as the overall metric.
KPI 4 - Median Order Amount - (b) Business Context and Deep Dive1:28
Determine the median order amount to reveal the typical customer spend, using the 50th percentile to minimize outliers, and compare with average order value to inform forecasting and leadership insights.
KPI 5 - Return Rate by Payment Method - (a) Code and Results1:33
KPI 5 - Return Rate by Payment Method - (b) Business Context and Deep Dive1:49
Assess KPI five, the return rate by payment method, to see if certain methods attract more returns and what that reveals about customer experience and risk.
KPI 6 - High-Value Segment GMV Share - (a) Code and Results1:31
KPI 6 - High-Value Segment GMV Share - (b) Business Context and Deep Dive1:47
Evaluate KPI six by analyzing the high value segment GMV share to show how much revenue comes from premium and platinum customers, about 64.9% of total GMV, indicating concentration risk.
KPI 7 - Below-Target Margin Rate - (a) Code and Results2:11
KPI 7 - Below-Target Margin Rate - (b) Business Context and Deep Dive1:27
Interpret KPI seven by calculating realized margin per order, attaching the correct margin by segment, and flagging failures to reveal a 0.29% below-margin rate.
KPI 8 - Top GMV Month - (a) Code and Results1:26
Identify KPI eight, the top GMV month, by converting order dates to year-month, aggregating total GMV per month, and selecting December 2024 as the top result.
KPI 8 - Top GMV Month - (b) Business Context and Deep Dive1:24
KPI 9 - Latest Month-over-Month GMV Growth Percentage - (a) Code and Results1:57
KPI 10 - Max Payment Mix Shift - (a) Code and Results1:56
KPI 10 - Max Payment Mix Shift - (b) Business Context and Deep Dive1:24
End of section1:21

Section Introduction1:25
Consolidate ten KPI views into a single canonical KPI results table via a union all pattern on a clean deduplicated dataset, enabling automated grading and a concise business narrative.
Final KPI Results Consolidation2:35
Results Page1:32
Detailed Results1:28
Audit the detailed results page to confirm every KPI value matches the reference, with a pass status, 100% score, and per channel metrics.
End of Section2:59

Requirements

Basic SQL knowledge (you should be comfortable with SELECT, WHERE, GROUP BY, and basic joins)
A laptop/PC and a modern web browser
You can use any SQL tool you already have
In the videos, I use Verulam Blue Mint (a free to use browser-based SQL workbench) to keep everything in one notebook workflow and support KPI checking/feedback — but the SQL approach is transferable

Description

This course is built to give you a publishable portfolio project as the end product — a complete SQL data-cleaning and KPI pipeline you can put on GitHub, link on LinkedIn, and confidently talk through in interviews.

It’s a real-world simulation built around one messy dataset and a business brief with a clear target: deliver ten KPIs that are trustworthy enough to go on a dashboard.

Most SQL “data cleaning” courses either stay at the level of syntax drills, or they use clean toy datasets where nothing breaks. That’s not what you face in real data teams.

In this course you’ll work through the same workflow you’d use on a real project:

Read the brief properly so you know what “correct” means
Explore the raw schema and spot the mess early (mixed date formats, typos in categories, missing values, duplicates)
Build a typed, safer silver layer where errors surface in a controlled way
Enforce the business rules and deduplicate into one trusted clean_table
Compute and standardise all KPI outputs into a consistent results table
Validate results, understand tolerances/rounding, and debug mismatches like a professional
Finish by turning the whole pipeline into a portfolio-ready GitHub project, with a clean repo structure, a strong README, and proof of results

Course outline (high level):

Section 00: Course Introduction
Section 01: The Verulam Blue Mint Environment
Section 02: Understanding the Challenge Brief
Section 03: Exploring Source Data Schema
Section 04: Data Cleaning I – Sampling & Completeness
Section 05: Data Cleaning II – Silver Layer & Normalisation
Section 06: Data Cleaning III – Business Rules & Deduplication
Section 07: Understanding the KPIs
Section 08: Computing KPIs
Section 09: Results
Section 10: Portfolio project deployment (repo + README + LinkedIn-style project story)

By the end, you won’t just know “how to clean data using SQL”. You’ll have an end-to-end portfolio project you can explain clearly: what was wrong with the data, what you changed, what rules you enforced, and why your KPIs can be trusted.

Who this course is for:

Anyone who wants a portfolio project they can publish: a complete SQL cleaning + KPI pipeline you can put on GitHub and confidently explain in interviews
Data analysts, BI developers, and aspiring analytics/data engineers who already know basic SQL and want a serious, employer-facing project (not toy examples)
Learners who can write queries but haven’t yet built a layered workflow end-to-end (raw → silver → clean → KPIs → standardised results)
Job seekers who want proof-of-skill in the areas employers actually care about: data quality reasoning, business-rule enforcement, deduplication, and metric reliability
Not ideal if you’re brand new to SQL and need a fundamentals-first course.

What you'll learn

Explore related topics

Course content

The Verulam Blue Mint Environment2 lectures • 9min

Understanding the Challenge Brief1 lecture • 4min

Exploring Source Data Schema1 lecture • 4min

Data Cleaning I – Sampling & Completeness10 lectures • 16min

Data Cleaning II – Silver Layer & Normalisation8 lectures • 11min

Data Cleaning III – Business Rules & Deduplication10 lectures • 14min

Understanding the KPIs12 lectures • 7min

Computing KPIs21 lectures • 34min

The Results5 lectures • 10min

Portfolio project deployment1 lecture • 1min

Requirements

Description

Who this course is for: