
data = [
(1, "Alice", 23),
(2, "Bob", 34),
(3, "Charlie", 29),
(1, "Alice", 23), # duplicate row
(2, "Bob", 34), # duplicate row
(6, "Alice", 30) # same name, different age
]
columns = ["id", "name", "age"]
df = spark.createDataFrame(data, columns)
display(df)
data = [
(1, "Alice", "NY", 2000),
(2, "Bob", "CA", 1500),
(3, "Charlie", "NY", 3000),
(4, "David", "CA", 2500),
(5, "Eve", "TX", 1800),
(6, "Frank", "TX", 2200),
]
columns = ["id", "name", "state", "salary"]
df = spark.createDataFrame(data, columns)
display(df)
https://github.com/manish040596/FABRIC-PROJECTS/tree/main/SALES%20ANALYTICS%20PROJEC%20T-%20FABRIC%20DATASET
Become a Job-Ready Azure Data Engineer
Master real-world data engineering with this hands-on, beginner-to-advanced course designed for aspiring and working professionals.
What You Will Learn:
Azure Data Factory (ADF):
Design & orchestrate ETL/ELT pipelines
Integrate data from SQL, Blob, REST APIs & more
Build Data Flows and automate with Triggers
Azure Databricks & PySpark:
Process big data using PySpark on Azure
Implement Delta Lake & optimize Spark jobs
Real-world notebook-based projects
Azure Synapse Analytics:
Create and query data using Dedicated & Serverless SQL Pools
Use Synapse Studio for data transformation & analytics
Integrate Synapse with ADF and Databricks
SQL for Data Engineering:
Write efficient SQL queries for transformations
Use joins, window functions, and aggregations
Practice with real datasets and assignments
Python for Data Engineering:
Automate tasks and create clean, modular scripts
Use Python in ADF and Databricks notebooks
Real-World Projects + Interview Prep:
Industry-based mini projects for each topic
Azure Data Engineer interview questions and answers
Resume & career guidance included
Who Should Take This Course?
Aspiring Data Engineers & Cloud Developers
SQL Developers transitioning to cloud data platforms
Anyone preparing for Azure Data Engineer Associate
Professionals looking to gain hands-on Azure data stack skills
Tools Covered:
Azure Data Factory, Databricks, Synapse Analytics, SQL Server, Blob Storage, Delta Lake, Python, Git, Event Hubs, and more.
No prior cloud experience required! All concepts are explained from scratch.