Data Warehouse Concepts: Basic to Advanced concepts

Data Warehouse Concepts: Learn the in BI/Data Warehouse/BIG DATA Concepts from scratch and become an expert.

Created bySid Inf

Last updated 4/2020

English

English [Auto],

What you'll learn

Learning all the concepts and terminologies related to the Datawarehouse , such as the OLTP, OLAP, Dimensions, Facts, Start Schema, Snow flake Schema etc
It also explains how the data is managed with in the Data Warehouse and explains the process of reading and writing data onto the Warehouse.
Later in the course you would also learn the basics of Data Modelling and how to start with it logically and physically.
You would also learn all the concepts related to Facts, Dimensions, Aggregations and commonly used techniques of ETL.

Course content

18 sections • 122 lectures • 7h 4m total length

Welcome! Thank you for learning the Data Warehouse concepts with me!6:20
Thank you for choosing this course. In this lecture you would know how to navigate and get the best out of this course.

All the Best and Happy Learning!! :)

Is Data Warehouse still relevant in the age of Big Data?4:25
The original intent of the data warehouse was to segregate analytical operations from mainframe transaction processing in order to avoid slowdowns in transaction response times, and minimize the increased CPU costs accrued by running ad hoc queries and creating and distributing reports. Over time, the enterprise data warehouse became a core component of information architectures, and it's now rare to find a mature business that doesn't employ some form of an EDW or a collection of smaller data marts to support business intelligence, reporting and analytics applications.
In this lecture we see what will be the future of Data warehouse in the age of Big Data.
Why do we need a Data Warehouse?5:26
The concept of data warehousing is not hard to understand. The notion is to create a permanent storage space for the data needed to support reporting, analysis, and other BI functions. In this lecture we understand what are the main reasons behind creating a data warehouse and the benefits of it.

This long list of benefits is what makes data warehousing an essential management tool for businesses that have reached a certain level of complexity.
What is a Data Warehouse?5:42
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.
Characteristics of a Data Warehouse5:40
A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by William (Bill) Inmon:

•Subject Oriented

•Integrated

•Nonvolatile

•Time Variant

What is Business Intelligence?5:37
Business intelligence (BI) is a technology-driven process for analyzing data and presenting actionable information to help corporate executives, business managers and other end users make more informed business decisions.
Business Intelligence -Extended Explanation3:34
Business intelligence (BI) is the use of computing technologies for the identification, discovery and analysis of business data - like sales revenue, products, costs and incomes.

BI technologies provide current, historical and predictive views of internally structured data for products and departments by establishing more effective decision-making and strategic operational insights through functions like online analytical processing (OLAP), reporting, predictive analytics, data/text mining, bench marking and Business Performance Management (BPM). These technologies and functions are often referred to as information management.
Uses of Business Intelligence8:02
Business operations reporting
Forecasting
Dashboard
Multidimensional analysis
Finding correlation among different factors
Tools used for (in) Business Intelligence6:24
The most common tools used for business intelligence are discussed in this lecture.

Enterprise Architecture or Centralized Architecture4:46
In this lecture we see how the Centralized architecture is set up, in which there exists only one data warehouse which stores all data necessary for the business analysis.
Federated Architecture3:05
In a Federated Architecture the data is logically consolidated but stored in separate physical database, at the same or at different physical sites. The local data marts store only the relevant information for a department.
The amount of data is reduced in contrast to a central data warehouse. The level of detail is enhanced in this kind of model.
Multi-Tired Architecture3:13
A Multi Tired architecture is a distributed data approach. This process cannot be done in a one step because many sources have to be integrated into a warehouse.
Components of a Data Warehouse3:57
Different data warehousing systems have different structures. Some may have an ODS (operational data store), while some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than discussing the specifics of any one system.

In general, all data warehouse systems have the following layers:

Data Source Layer
Data Extraction Layer
Staging Area
ETL Layer
Data Storage Layer
Data Logic Layer
Data Presentation Layer
Metadata Layer
System Operations Layer
Purpose of a Staging Area in Data Warehouse Architecture - Part 14:49
This is where data is stored prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration. Based on the business architecture and design there can be more than one staging area which can be termed with different naming conventions.
Purpose of a Staging Area in Data Warehouse Architecture - Part 23:41
This is where data is stored prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration. Based on the business architecture and design there can be more than one staging area which can be termed with different naming conventions.
Advantages of Traditional warehouse2:33
Data warehouses tend to have a high query success, as they have complete control over the four main areas of data management systems:

Clean data
Indexes: multiple types
Query processing: multiple options
Security: data and access
Limitations of Traditional Data Warehouses6:08
There are considerable disadvantages involved in moving data from multiple, often highly disparate, data sources to one data warehouse that translate into long implementation time, high cost, lack of flexibility, dated information and limited capabilities.

Other disadvantages are discussed in the video.

What is ODS?2:26
An ODS is designed for relatively simple queries on small amounts of data (such as finding the status of a customer order), rather than the complex queries on large amounts of data typical of the data warehouse.

An ODS is similar to your short term memory in that it stores only very recent information; in comparison, the data warehouse is more like long term memory in that it stores relatively permanent information.
Define ODS7:40
The typical definition of an operational data store (ODS) is that it’s a set of logically related data structures within a database. The data within an ODS is integrated, volatile and at a non-historical granular level that is designed to address a set of operational functions for a specific business purpose. The ODS must also be based on the enterprise standards for data management for the organization.
Staging Vs ODS1:56
This lecture covers the topic of the difference between Staging and ODS.
Differences between ODS,DWH, OLTP, OLAP, DSS4:07
Please refer to the additional resources of this section which contains the Info-graphic on the differences between the ODS, DWH, OLTP, OLAP, DSS and DM (Data Mart).

OLAP Overview5:17
OLAP (Online Analytical Processing) is the technology behind many Business Intelligence (BI) applications. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning.
OLTP Vs OLAP - Part 1_U4:05
- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transnational databases is the entity model (usually 3NF).

- OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).
OLTP Vs OLAP - Part 25:31
- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transnational databases is the entity model (usually 3NF).

- OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).
OLAP Architecture - MOLAP5:56
MOLAP (multidimensional online analytical processing) is online analytical processing (OLAP) that indexes directly into a multidimensional database.
ROLAP3:35
Relational online analytical processing (ROLAP) is a form of online analytical processing (OLAP) that performs dynamic multidimensional analysis of data stored in a relational database rather than in a multidimensional database (which is usually considered the OLAP standard).
HOLAP2:20
HOLAP (hybrid online analytical processing) is a combination of ROLAP (Relational OLAP) and MOLAP (Multidimensional OLAP) which are other possible implementations of OLAP. HOLAP allows storing part of the data in a MOLAP store and another part of the data in a ROLAP store, allowing a tradeoff of the advantages of each.
DOLAP1:31
DOLAP (Desktop OLAP or Database OLAP)provide multidimensional analysis locally in the client machine on the data collected from relational or multidimensional database servers.

What is a Data Mart?1:40
The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. Data marts are small slices of the data warehouse. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.
Fundamental Difference between DWH and DM0:40
Data Warehouse:

Holds multiple subject areas
Holds very detailed information
Works to integrate all data sources
Does not necessarily use a dimensional model but feeds dimensional models.

Data Mart:

Often holds only one subject area- for example, Finance, or Sales
May hold more summarized data (although many hold full detail)
Concentrates on integrating information from a given subject area or set of source systems
Is built focused on a dimensional model using a star schema.
Advantages of a Data Mart2:46
The primary advantages are:

Data Segregation: Each box of information is developed without changing the other ones. This boosts information security and the quality of data.
Easier Access to Information: These data structures provide a easiest way of interpret the information stored on the database
Faster Response: Derived from the adopted structure
Simple queries: Based on the structure and size of the data
Subject full detailed data: Might also provide summarization of the information
Specific to User Needs: This set of data is focused on the end user needs
Easy to Create and Mantain
Characteristics of a Data Mart3:37
Easy access to frequently needed data
Creates collective view by a group of users
Improves end-user response time
Ease of creation
Lower cost than implementing a full data warehouse
Potential users are more clearly defined than in a full data warehouse
Contains only business essential data and is less cluttered.
Disadvantages of a Data Mart3:01
Like any other system, data marts have many issues including functionality, data size, scalability, performance, data access, and consolidation. Since data marts can be broken into different departments to focus on their individual needs. This approach makes data access, consolidation, and cleansing very difficult.

For instance, when a company has a data mart for each of its departments including sales, inventory, tracking, shipping, receiving, and production. Combining revenue information from each of these departments into a single data mart can be overwhelming and confusing, due to the volume of data to be analyzed.
Mistakes and MisConceptions of a Data Mart2:08
This lecture talks about the mistakes and the mis-conceptions one have with regard to the Data warehouse.

Overview of Metadata1:50
Metadata is data that describes other data. Meta is a prefix that in most information technology usages means "an underlying definition or description." Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier.
Benefits of Metadata1:47
Metadata provides a number of very important benefits to the enterprise, including:

Consistency of definitions: Metadata contains information about data that helps reconcile the difference in terminology such as "clients" and "customers," "revenue" and "sales," etc.

Clarity of relationships: Metadata helps resolve ambiguity and inconsistencies when determining the associations between entities stored throughout data environment.

Clarity of data lineage: Metadata contains information about the origins of a particular data set and can be granular enough to define information at the attribute level; metadata may maintain allowed values for a data attribute, its proper format, location, owner, and steward.
Types of Metadata5:38
Discover technical, static, dynamic, and business metadata and how they connect source systems, transformations, and mappings to the data warehouse, enabling end users to locate and interpret data.
Projects on Metadata5:28
In this lecture, we will see what is to be considered for setting up the projects with Metadata.
Best Practices for Metadata Setup1:36
The standard best practices are discussed in this project. Though these may vary from project to project and enterprise to enterprise, these can be considered as the ground rules.
Summary3:11
A quick summary on what is discussed on Metadata.

What is Data Modeling?2:11
Data modeling is the formalization and documentation of existing processes and events that occur during application software design and development.
Data Modeling Techniques3:42
Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data flows and processes, creating a blueprint for construction and/or re-engineering.

ER - (Entity Relation) Data Model3:37
An entity–relationship model (ER model) is a data model for describing the data or information aspects of a business domain or its process requirements, in an abstract way that lends itself to ultimately being implemented in a database such as a relational database.
ER Data Model - What is Entity?2:01
An entity can be a real-world object, either animate or inanimate, that can be easily identifiable.

For example, in a university database, students, teachers, classes, and courses offered can be considered as entities. All these entities have some attributes or properties that give them their identity.

An entity set is a collection of similar types of entities. An entity set may contain entities with attribute sharing similar values.

For example, a Students set may contain all the students of a university; likewise a Teachers set may contain all the teachers of a university from all faculties. Entity sets need not be disjoint.
ER Data Model - Types of Entities - Part 13:57
Entity Types: The following types are discussed in this lecture

Independent/Fundamental Entity (###b/b###)
Dependent/Attributive Entity (Weak)
Associative Entity
ER Data Model - Types of Entities - Part 21:49
Entity Types: The following types are discussed in this lecture

Super type Entity
Sub type Entity
ER Data Model - Attributes1:54
Entities are represented by means of their properties, called attributes. All attributes have values.
ER Data Model - Types of Attributes3:59
Simple attribute
Composite attribute
Derived attribute
Single-value attribute
Multi-value attribute
ER Data Model - Entity-Set and Keys2:42
Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.

Super Key
Candidate Key
Primary Key
ER Data Model - Identifier1:53
One or more attribute uniquely identifies an instance of an entity.
ER Data Model - Relationship1:15
The association among entities is called a relationship. For example, an employee works_at a department, a customer enrolls for an offer. Here, Works_at and Enrolls are called relationships.
ER Data Model - Notation2:34
ER-Diagram is a visual representation of data that describes how data is related to each other.
ER Data Model - Logical Data Model1:30
A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Features of a logical data model include:

Includes all entities and relationships among them.
All attributes for each entity are specified.
The primary key for each entity is specified.
Foreign keys (keys identifying the relationship between different entities) are specified.
Normalization occurs at this level.

The steps for designing the logical data model are as follows:

Specify primary keys for all entities.
Find the relationships between different entities.
Find all attributes for each entity.
Resolve many-to-many relationships.
Normalization.
ER Data Model - Moving from Logical Data Model to Physical Data Model2:14
ER Data Model - Differences between CDM, LDM and PDM3:06
Here we compare these three types of data models.

We can see that the complexity increases from conceptual to logical to physical. This is why we always first start with the conceptual data model (so we understand at high level what are the different entities in our data and how they relate to one another), then move on to the logical data model (so we understand the details of our data without worrying about how they will actually implemented), and finally the physical data model (so we know exactly how to implement our data model in the database of choice). In a data warehousing project, sometimes the conceptual data model and the logical data model are considered as a single deliverable.
ER Data Model - Disadvantages3:15
Following are advantages of an E-R Model:

Straightforward relation representation
Easy conversion for E-R to other data model
Graphical representation for better understanding

Disadvantages of E-R Data Model:

No industry standard for notation
Popular for high-level design
Software cannot usefully query a general ER model
Non-intuitive

Requirements

Basic understanding of the IT (Software) industry
Familiar with the basic concept of Database/RDBMS.

Description

In this course, you will learn all the concepts and terminologies related to the Data Warehouse , such as the OLTP, OLAP, Dimensions, Facts and much more, along with other concepts related to it such as what is meant by Start Schema, Snow flake Schema, other options available and their differences. It also explains how the data is managed with in the Data Warehouse and explains the process of reading and writing data onto the Warehouse. Later in the course you would also learn the basics of Data Modelling and how to start with it logically and physically. You would also learn all the concepts related to Facts, Dimensions, Aggregations and commonly used techniques of ETL. Upon completion of this course, you would have a clear idea about, all the concepts related to the Data Warehouse, that should be sufficient to help you start off with the next step of becoming an ETL developer or Administering the Data warehouse environment with the help of various tools. All the Best and Happy Learning !

Who this course is for:

This Course is intended for freshers who are new to the Data Warehouse world, Application/ETL developers, Mainframe develoeprs, database administrators, system administrators, and database application developers who design, maintain, and use data warehouses.
Recent graduates looking to get a job in the IT Industry
Finance professionals wanting to learn about Data Warehouses for reporting and analysis purposes, businesses looking to update and improve on data storage, analysis, and reporting processes.

Data Warehouse Concepts: Basic to Advanced concepts

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 6min

Brief about the Data warehouse4 lectures • 21min

Business Intelligence4 lectures • 24min

Data Warehouse Architectures8 lectures • 32min

ODS - Operational Data Store4 lectures • 16min

OLAP7 lectures • 28min

Data Mart6 lectures • 14min

Metadata6 lectures • 20min

Data Modeling2 lectures • 6min

Entity Relational Data Model14 lectures • 36min

Requirements

Description

Who this course is for: