Data Warehousing - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Data Warehousing

Description:

data marts fit together like interlocking blocks. conformed dimensions & facts ... These are facts that are aggregated / summarized over Orders, Shippers, ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 24
Provided by: ronmcf
Category:
Tags: data | warehousing

less

Transcript and Presenter's Notes

Title: Data Warehousing


1
Section 26.1
  • Data Warehousing
  • Data warehouse
  • Comparison to traditional databases
  • Architecture
  • Dimensional Modelling
  • Northwind example
  • StarTracker (if time permits)

2
Data Warehouse
  • A Data Warehouse (many definitions)
  • is a subject-oriented, integrated, non-volatile,
    time-variant collection of data in support of
    managements decisions
  • is an integrated data collection supporting
    sophisticated analysis
  • is the union of its constituent data marts. These
    data marts are organized in a Bus Architecture
    around conformed dimensions and conformed facts.
    A data mart is a collection of dimensional models
    for one area of the enterprise, such as payroll,
    human resources, sales, ...
  • Data warehousing is a collection of decision
    support technologies, aimed at enabling the
    knowledge worker

3
Traditional and Data Warehouse Databases
Traditional (OLTP) event processing data entry ER
model normalized, complex current
(twinkling) repetitive usage transaction
throughput sub-second response data consistency
Data Warehouse decision support, analytic
processing data loads Star, snowflake,
... de-normalized, simple historical (periodic
updates) ad-hoc query throughput seconds,
minutes accuracy
4
Warehouse Architecture
Extract Transform Load
ETL
OLAP
DSS
Extract
Load
Data Warehouse
Transform
Repair
Data mining
Operational Systems (heterogeneous systems)
Metadata
5
Warehouse Architecture (from 4904 text)
6
Dimensional Modeling - Star Schemas
  • Dimensional Modeling
  • a technique for designing databases
  • very constrained
  • two types of entity sets
  • 1 dimensions
  • 2 facts
  • the Entity Relationship Diagram (ERD) has a
    star-like appearance
  • Dimensional models are referred to as star schemas

7
Northwind Database
Categories
OrderDetails
Suppliers
Products
Orders
Shippers
Customers
  • An OLTP database
  • used to run the business
  • useful for managing the day-to-day operations of
    handling orders for customers
  • these designs tend to be very complicated

Employees
8
Star Schema
  • Dimensional Model
  • Example
  • In the Northwind database, each order comprises
    several order detail lines. On each line, a
    product, price and quantity are specified

Day
Order Detail
Customer
Order
Shipper
Employee
Product
Consider a star schema where the price and
quantity of each order detail line is stored in
the fact table. Related to the fact table are
dimensions for Customer, Employee, Product,
Shipper, Order, and Day
9
Star Schema
Typically a star schema is shown as many
one-to-many relationships
Day
1
n
  • Each day has many order details
  • Each customer has many order details
  • Each product has many order details
  • Each order detail line item is for one order, one
    customer, one employee, one product, one shipper
    and occurs on one day

Order Detail
1
n
Customer
10
Star Schema
But, equivalently it can be shown as
one many-to-many-to-many- relationship
Day
n
Order Detail
m
o
Order
Customer
p
r
Employee
Shipper
q
Now, well examine a Star Schema in SQL Server
Product
11
Star Schema
  • Dimensional Modeling
  • basic model contains
  • one fact table
  • many dimension tables
  • relationships are between the fact table and the
    dimensions only

12
Star Schema
  • Dimensional Modeling
  • relationships can be viewed either as
  • many one-to-many relationships between the fact
    entity set and the dimension entity sets
  • or one many-to-many-to-many-to relationship
    (this is an n-ary relationship).

13
Star Schema
  • Why are Star Schemas being used in warehousing?
  • predictable
  • simplifying
  • extensible
  • performance

14
Star Schema
  • Predictable
  • general structure is known in advance
  • star schema pattern is repeated
  • easier to comprehend

15
Star Schema
  • Simplifying
  • simple models
  • models the data the way the analyst sees the data
  • becomes easy to formulate queries
  • users are less likely to submit incorrect queries
  • time is simplified

Strive for simplicity Business issues must
come first, simplicity second, and normalization
third (if necessary) Robert J. Hocutt in The
Official Guide to Informix/Red Brick Data
Warehousing
16
Star Schema
  • Performance
  • Performance improvement because
  • fewer tables to join
  • dimensions are de-normalized
  • facts or measures are centralized
  • requires different join algorithms (star-join)
  • requires different data structures (bit vector
    indexes)

17
Star Schema
  • Extensible
  • we can gracefully add
  • new attributes fact or dimension
  • new fact tables
  • new dimension tables

18
Data Mart
  • Data Mart
  • a data mart is a collection of star schemas for
    one area of the enterprise, such as
  • payroll,
  • human resources,
  • sales, ...

Recall that some define a data warehouse as the
union of its constituent data marts.
19
Bus Architecture
  • Bus Architecture
  • data marts fit together like interlocking blocks
  • conformed dimensions facts
  • a dimension that means the same in every star
    where it is used
  • facts that have the same units of measurement

20
Bus Architecture
Dimensions
Matrix useful for planning the evolution of the
warehouse
Day
Customer
Product
Employee
Order
Shipper
JobPosition

Data Marts
OrderTracking
X
X
X
X
X
X


HumanResources
X


X


X










21
Aggregates
For various reasons, we have summary
tables Principle created by collapsing
dimensions, or, summing across some
dimension Aggregate tables can be transparent to
the end-user
22
Aggregates
existing dimensions are reused
Day
Day
Order Detail dp
These are facts that are aggregated / summarized
over Orders, Shippers, Employees and Customers
Order Detail
Customer
Order
Shipper
Employee
Product
Product
How would you generate this summary table using
SQL?
23
Aggregates
The Month dimension is called a shrunken Day
dimension
Day
Month
Order Detail mp
Order Detail
Customer
Order
Shipper
Employee
Product
Product
How would you generate this summary table using
SQL?
Write a Comment
User Comments (0)
About PowerShow.com