Data Architecture for Data Scientists

Name: Data Architecture for Data Scientists
Rating: 4.5 (1447 reviews)

Datawarehouse, Data Lake, Data Lakehouse, Data Mesh, Kafka, Lambda & Kappa architecture, Feature Store, Vector DB & more

Created byBiju Krishnan

Last updated 5/2024

English

What you'll learn

Data Architecture in general, to be able to navigate your organizations data landscape
Develop understanding of topics like Data Lake, Datawarehousing and even Data Lakehouse to be able to communicate with data engineering teams
Understand the pricinciples of data governance topics like Data Mesh to better navigate the data governance paradigm
Get introduced to technologies related to machine learning specific data infrastructure like feature stores and vector databases
What is data architecture? What is a data warehouse (DWH) ? What is data lake? What is data lakehouse? What is data mesh?
How is streaming data used in data science? What is a feature store? How is a feature store used in machine learning? What are vector databases??

Course content

9 sections • 37 lectures • 2h 16m total length

Why enroll in this course?1:48
Course contents2:17
About the course creator1:28
Million dollar slide1:42

Introduction to Data Mesh3:57
Data Mesh principles : Domain ownership and data as a product5:26
Data Mesh principles : Self service and federated governance3:27
Data Catalog5:39
Data Contracts10:17
In the chapter "Data Contracts," we explore the critical role of data contracts within the data mesh framework, a concept vital for maintaining data integrity and facilitating seamless interactions between the source and consumer domains. Data contracts serve as a set of rules or agreements designed to ensure that any changes in data schema or format by one domain do not adversely affect another, thus preserving autonomy while mitigating the risk of disruptions. This chapter delves into various forms of data contracts—verbal, written, and automated—highlighting their importance in ensuring data quality and consistency across domains. Through a detailed examination of an exemplary data contract articulated in a YAML-like declarative syntax, we uncover the mechanisms for validating data sets, from ensuring email address validity to checking for data freshness and consistency in variable relationships. The discussion extends to the enforcement of these contracts, ensuring data compliance and integrity from creation to consumption. This concise exploration not only sheds light on the foundational principles of data contracts but also their pivotal role in the modern data ecosystem, underscoring the balance between independence and interdependence within organizations.
Data Fabric9:49
Module review quiz - Data Governance with the Data Mesh
Resources and Slides0:32

Requirements

Basic understanding of data science project workflow like model training and model deployment
Basic understanding of why data is needed for training and deploying models
Understanding of the difference between batch and real time use cases

Description

Machine learning models are only as good as the data they are trained on, which is why understanding data architecture is critical for data scientists building machine learning models.

This course will teach you:

The fundamentals of data architecture
A refresher on data types, including structured, unstructured, and semi-structured data
DataWarehouse Fundamentals
Data Lake Fundamentals
The differences between data warehouses and data lakes
DataLakehouse Fundamentals
Data Mesh fundamentals for decentralized governance of data including topics like data catalog, data contracts and data fabric.
The challenges of incorporating streaming data in data science
Some machine learning-specific data infrastructure, such as feature stores and vector databases

The course will help you:

Make informed decisions about the architecture of your data infrastructure to improve the accuracy and effectiveness of your models
Adopt modern technologies and practices to improve workflows
Develop a better understanding and empathy for data engineers
Improve your reputation as an all-around data scientist

Think of data architecture as the framework that supports the construction of a machine learning model. Just as a building needs a strong framework to support its structure, a machine learning model needs a solid data architecture to support its accuracy and effectiveness. Without a strong framework, the building is at risk of collapsing, and without a strong data architecture, machine learning models are at risk of producing inaccurate or biased results. By understanding the principles of data architecture, data scientists can ensure that their data infrastructure is robust, reliable, and capable of supporting the training and deployment of accurate and effective machine learning models.

By the end of this course, you'll have the knowledge to help guide your team and organization in creating the right data architecture for deploying data science use cases.

Who this course is for:

Data Scientists who are transitioning from academia or business domains
Junior data scientists who would like to understand the topics surrounding data infrastructure
Citizen data scientists who wish to deploy machine learning models in production
Anyone who wishes to learn the basics of data architecture in a very short time
BI Analysts and BI developers who would like a quick overview of the enterprise data landscape
Folks who wish to get a quick overview of data architecture components in an enterprise.

Data Architecture for Data Scientists

What you'll learn

Explore related topics

Course content

Introduction4 lectures • 7min

Data Types6 lectures • 20min

Datawarehouse4 lectures • 13min

Data Lake4 lectures • 13min

Data Lakehouse3 lectures • 12min

Data Governance with the Data Mesh7 lectures • 39min

Streaming data in Data Science5 lectures • 15min

Data infrastructure for Machine Learning2 lectures • 9min

Flowchart and Use case examples2 lectures • 8min

Requirements

Description

Who this course is for: