Advanced Topics in Dimensional Modeling - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Advanced Topics in Dimensional Modeling

Description:

Often a large number of specific key figures can be represented by a single ... The old attribute value is overwritten so the dimension always reflects the most ... – PowerPoint PPT presentation

Number of Views:333
Avg rating:3.0/5.0
Slides: 40
Provided by: busi210
Category:

less

Transcript and Presenter's Notes

Title: Advanced Topics in Dimensional Modeling


1
Advanced Topics in Dimensional Modeling
2
Generic Key Figures
  • Often a large number of specific key figures can
    be represented by a single generic key figure
  • For example, a business may receive revenue
    through a variety of means such as products and
    services
  • In this case a single key figure, revenue item,
    is created and the type of revenue item is
    indicated in a dimension
  • Accommodates changes easily since new types can
    be added by simply adding a record to the
    dimension

3
Agenda
  • Snowflaking
  • Slowly Changing Dimension
  • Rapidly Changing Dimensions
  • Junk Dimensions
  • Many-to-Many Dimensions
  • Organization and Parts Hierarchies

4
Snowflaking
  • Snowflaking occurs when the low cardinality
    fields in dimension tables are removed into
    separate tables
  • Normalizing the dimension tables
  • Snowflaking is tempting to database types who
    like to normalize but it may be a bad idea
  • May decrease performance
  • Decreases clarity of database
  • Prevents the use of bitmap indexes which increase
    performance dramatically
  • Empirical tests show that effect of snowflaking
    on performance depends on specific DBMS

5
Snowflake Order Item Schema
6
Starflake schema
7
Starflake Schema for Order Item Transaction
8
Permissible Snowflaking
  • Creating subdimensions is permissible under
    specific circumstances
  • If some attributes in a dimension are at a
    different granularity than the others
  • If we save a significant amount of storage space
    (very large dimension)
  • If we often want to browse among the attributes
    in the subdimension

9
Demographic Subdimension of Customer Dimension
Aggregated at County Level
Individual Customers
10
Time Dimension
Subdimension created in this casebecause we want
to use the sameTime Dimension in
different countries
Time dimension is critical in every data
warehouse. This dimension has adaily
granularity and has enough detail that
applications dont have to haveany hard coded
date logic.
11
Agenda
  • Snowflaking
  • Slowly Changing Dimension
  • Rapidly Changing Dimensions
  • Junk Dimensions
  • Many-to-Many Dimensions
  • Organization and Parts Hierarchies

12
Slowly Changing Dimensions
  • Dimensions are usually fairly static but
    sometimes can change slowly
  • A well designed data warehouse will specify a
    change strategy for every attribute of every
    dimension
  • Three strategies for handling changing dimensions

13
Type 1 Overwrite the value
  • The old attribute value is overwritten so the
    dimension always reflects the most recent value

Product
SKU Number Product Key
Description Department (Natural
Key) 12345 IntelliKidz 1.0
Education ABC922-Z
Assume IntelliKidz 1.0 is reclassified as
Strategy. The Type 1 responsesimply updates the
existing record
Product
SKU Number Product Key
Description Department (Natural
Key) 12345 IntelliKidz 1.0
Strategy ABC922-Z
14
Type 1 Overwrite the value
  • Advantage is that its quick and easy
  • Disadvantage is that there is no historical
    record
  • So if the sales if IntelliKidz 1.0 takes off
    because of the reclassification we have no
    historical record of the reason
  • Correct strategy if were simply correcting a bad
    value

15
Type 2 Add a Dimension Row
  • A new record is created and the old record is kept
  • Product
    SKU Number
  • Product Key Description Department
    (Natural Key)
  • IntelliKidz 1.0 Education
    ABC922-Z
  • 67890 IntelliKidz 1.0 Strategy
    ABC922-Z

May include an effective date field but it isnt
necessary
16
Type 2 Add a Dimension Row
  • An example of why we cant use the natural key as
    our primary key
  • Type 2 changes are said to perfectly partition
    history because fact table records that are added
    after the change will reference record 67890
  • We can look at the entire history using the SKU
    or can look at before and after using the Product
    Key
  • Most common approach

17
Time Stamped Dimension
A record is added for each changeto the
employees HR profile This is a very large
dimension but may be worth it
18
Type 3 Add a Dimension Column
  • Adding a column is useful if we want the ability
    to see fact data as if the change never occurred
  • Sales district boundaries are altered but users
    want to still see sales in terms of the old
    boundaries

Product
Prior SKU
Number Product Key Description
Department Department (Natural Key) 12345
IntelliKidz 1.0 Strategy
Education ABC922-Z
19
Type 3 Add a Dimension Column
  • Useful when we want to maintain multiple
    alternate realities
  • We can look at all the fact data by either value
    of the attribute
  • Rarely used

20
Agenda
  • Snowflaking
  • Slowly Changing Dimension
  • Rapidly Changing Dimensions
  • Junk Dimensions
  • Many-to-Many Dimensions
  • Organization and Parts Hierarchies

21
Rapidly Changing Dimensions
  • If changes to dimensions are rapid then
  • If the dimension is small use the Type 2 approach
  • If the dimension is large then create a
    subdimension with the changing attributes

22
Rapidly Changing Dimensions
Demographics are convertedinto banded values
(e.g.0-12,000, 12,000-20,000,etc.) and then
we have a recordfor every combination of
demographic attributes. If we have five
attributes with 10 levels each then we have 105
100,000 records. If this gets too large
createmultiple demographic dimensions
(i.e.demographics and purchase-demographicss) W
henever a change occurs we adda record to the
fact table and sinceeach fact table record
includes thecustomer and demographic keys,
wellalways be able to classify customers.
23
Agenda
  • Snowflaking
  • Slowly Changing Dimension
  • Rapidly Changing Dimensions
  • Junk Dimensions
  • Many-to-Many Dimensions
  • Organization and Parts Hierarchies

24
Junk Dimensions
  • Sometimes when all the descriptive attributes are
    assigned to dimensions there are data elements
    left over
  • e.g. miscellaneous flags like order type
  • These can be gathered into a junk dimension

25
Agenda
  • Snowflaking
  • Slowly Changing Dimension
  • Rapidly Changing Dimensions
  • Junk Dimensions
  • Many-to-Many Dimensions
  • Organization and Parts Hierarchies

26
Many-to-Many Dimensions
  • Many-to-Many relationships are difficult to model
    in star schemas
  • Time-dependent (history) relationships
  • Generic (multiple role) relationships
  • Multi-valued dependencies

27
Time-dependent (history) relationships
  • This refers to an attribute that has a single
    value at any specific time but a history is kept
    so that there are many values saved
  • Stock levels
  • Employment position

28
Employment Position History
Breaks hierarchical chain
29
Time-dependent (history) relationships
  • Ignore history
  • Keep only current value
  • Slowly changing dimension
  • Add a new record in dimension for each value

30
Generic (Multiple Role) Relationships
  • Many-to-many relationship is used to represent a
    fixed number of different types of relationships
    between the same two entities
  • These correspond to different roles that an
    entity may play in relation to an entity

31
Generic (Multiple Role) Relationships
Employee may - receive the order -
approve the order - dispatch the order
32
Multi-Valued Dependency
  • When many entities of one type can be associated
    with many entities of another type at the same
    point in time

33
Multi-Valued Dependency
A single patientmay have zero, one or
morediagnoses
34
Bridge Table
Bridge table contains a record for each diagnosis
a patient has.The weighting factor is used to
partition facts among the diagnoses. The
weighting factor should sum to one for each
diagnosis group.
35
Account Bridge Table
Customers in a bank have multiple accounts and
there may be multiple signatories onthe set of
accounts so there is a many-to-many relationship
between the Customer Dimension and the Account
Balance Fact Table. The Account-to-Customer
Bridge Table contains account groups for each
customer. The Account Balance Fact Table
contains the total of all balances and the
weighting factor allocates the total among the
individual accounts.
36
Agenda
  • Snowflaking
  • Slowly Changing Dimension
  • Rapidly Changing Dimensions
  • Junk Dimensions
  • Many-to-Many Dimensions
  • Organization and Parts Hierarchies

37
Hierarchies
  • Complex hierarchies among records in a dimension
    are difficult to represent
  • Organizational structure
  • Bill of materials
  • Traditional method is to use a recursive pointer
    but this wont work because it cant be used with
    SQL to traverse the hierarchy

38
Recursive Join
Recursive join allows the use of the ManagerID
to be used to match anemployee with their
immediate supervisor whose record exists inthe
same table.
39
Organizational Hierarchy
In this case we want to be able traverse the
organizational structure of a commercialcustomer
. The Navigation Bridge Table containsa record
for parent and subsidiary in the figure at left.
There is also a zero length record of the
pathway of acustomer to itself. Each record
contains customerkey of the parent and the
subsidiary. This allowsthe use of the SQL GROUP
BY clause to traverse the hierarchy.
Write a Comment
User Comments (0)
About PowerShow.com