Title: Advanced Topics in Dimensional Modeling
1Advanced Topics in Dimensional Modeling
2Generic Key Figures
- Often a large number of specific key figures can
be represented by a single generic key figure - For example, a business may receive revenue
through a variety of means such as products and
services - In this case a single key figure, revenue item,
is created and the type of revenue item is
indicated in a dimension - Accommodates changes easily since new types can
be added by simply adding a record to the
dimension
3Agenda
- Snowflaking
- Slowly Changing Dimension
- Rapidly Changing Dimensions
- Junk Dimensions
- Many-to-Many Dimensions
- Organization and Parts Hierarchies
4Snowflaking
- Snowflaking occurs when the low cardinality
fields in dimension tables are removed into
separate tables - Normalizing the dimension tables
- Snowflaking is tempting to database types who
like to normalize but it may be a bad idea - May decrease performance
- Decreases clarity of database
- Prevents the use of bitmap indexes which increase
performance dramatically - Empirical tests show that effect of snowflaking
on performance depends on specific DBMS
5Snowflake Order Item Schema
6Starflake schema
7Starflake Schema for Order Item Transaction
8Permissible Snowflaking
- Creating subdimensions is permissible under
specific circumstances - If some attributes in a dimension are at a
different granularity than the others - If we save a significant amount of storage space
(very large dimension) - If we often want to browse among the attributes
in the subdimension
9Demographic Subdimension of Customer Dimension
Aggregated at County Level
Individual Customers
10Time Dimension
Subdimension created in this casebecause we want
to use the sameTime Dimension in
different countries
Time dimension is critical in every data
warehouse. This dimension has adaily
granularity and has enough detail that
applications dont have to haveany hard coded
date logic.
11Agenda
- Snowflaking
- Slowly Changing Dimension
- Rapidly Changing Dimensions
- Junk Dimensions
- Many-to-Many Dimensions
- Organization and Parts Hierarchies
12Slowly Changing Dimensions
- Dimensions are usually fairly static but
sometimes can change slowly - A well designed data warehouse will specify a
change strategy for every attribute of every
dimension - Three strategies for handling changing dimensions
13Type 1 Overwrite the value
- The old attribute value is overwritten so the
dimension always reflects the most recent value
Product
SKU Number Product Key
Description Department (Natural
Key) 12345 IntelliKidz 1.0
Education ABC922-Z
Assume IntelliKidz 1.0 is reclassified as
Strategy. The Type 1 responsesimply updates the
existing record
Product
SKU Number Product Key
Description Department (Natural
Key) 12345 IntelliKidz 1.0
Strategy ABC922-Z
14Type 1 Overwrite the value
- Advantage is that its quick and easy
- Disadvantage is that there is no historical
record - So if the sales if IntelliKidz 1.0 takes off
because of the reclassification we have no
historical record of the reason - Correct strategy if were simply correcting a bad
value
15Type 2 Add a Dimension Row
- A new record is created and the old record is kept
- Product
SKU Number - Product Key Description Department
(Natural Key) - IntelliKidz 1.0 Education
ABC922-Z - 67890 IntelliKidz 1.0 Strategy
ABC922-Z
May include an effective date field but it isnt
necessary
16Type 2 Add a Dimension Row
- An example of why we cant use the natural key as
our primary key - Type 2 changes are said to perfectly partition
history because fact table records that are added
after the change will reference record 67890 - We can look at the entire history using the SKU
or can look at before and after using the Product
Key - Most common approach
17Time Stamped Dimension
A record is added for each changeto the
employees HR profile This is a very large
dimension but may be worth it
18Type 3 Add a Dimension Column
- Adding a column is useful if we want the ability
to see fact data as if the change never occurred - Sales district boundaries are altered but users
want to still see sales in terms of the old
boundaries
Product
Prior SKU
Number Product Key Description
Department Department (Natural Key) 12345
IntelliKidz 1.0 Strategy
Education ABC922-Z
19Type 3 Add a Dimension Column
- Useful when we want to maintain multiple
alternate realities - We can look at all the fact data by either value
of the attribute - Rarely used
20Agenda
- Snowflaking
- Slowly Changing Dimension
- Rapidly Changing Dimensions
- Junk Dimensions
- Many-to-Many Dimensions
- Organization and Parts Hierarchies
21Rapidly Changing Dimensions
- If changes to dimensions are rapid then
- If the dimension is small use the Type 2 approach
- If the dimension is large then create a
subdimension with the changing attributes
22Rapidly Changing Dimensions
Demographics are convertedinto banded values
(e.g.0-12,000, 12,000-20,000,etc.) and then
we have a recordfor every combination of
demographic attributes. If we have five
attributes with 10 levels each then we have 105
100,000 records. If this gets too large
createmultiple demographic dimensions
(i.e.demographics and purchase-demographicss) W
henever a change occurs we adda record to the
fact table and sinceeach fact table record
includes thecustomer and demographic keys,
wellalways be able to classify customers.
23Agenda
- Snowflaking
- Slowly Changing Dimension
- Rapidly Changing Dimensions
- Junk Dimensions
- Many-to-Many Dimensions
- Organization and Parts Hierarchies
24Junk Dimensions
- Sometimes when all the descriptive attributes are
assigned to dimensions there are data elements
left over - e.g. miscellaneous flags like order type
- These can be gathered into a junk dimension
25Agenda
- Snowflaking
- Slowly Changing Dimension
- Rapidly Changing Dimensions
- Junk Dimensions
- Many-to-Many Dimensions
- Organization and Parts Hierarchies
26Many-to-Many Dimensions
- Many-to-Many relationships are difficult to model
in star schemas - Time-dependent (history) relationships
- Generic (multiple role) relationships
- Multi-valued dependencies
27Time-dependent (history) relationships
- This refers to an attribute that has a single
value at any specific time but a history is kept
so that there are many values saved - Stock levels
- Employment position
28Employment Position History
Breaks hierarchical chain
29Time-dependent (history) relationships
- Ignore history
- Keep only current value
- Slowly changing dimension
- Add a new record in dimension for each value
30Generic (Multiple Role) Relationships
- Many-to-many relationship is used to represent a
fixed number of different types of relationships
between the same two entities - These correspond to different roles that an
entity may play in relation to an entity
31Generic (Multiple Role) Relationships
Employee may - receive the order -
approve the order - dispatch the order
32Multi-Valued Dependency
- When many entities of one type can be associated
with many entities of another type at the same
point in time
33Multi-Valued Dependency
A single patientmay have zero, one or
morediagnoses
34Bridge Table
Bridge table contains a record for each diagnosis
a patient has.The weighting factor is used to
partition facts among the diagnoses. The
weighting factor should sum to one for each
diagnosis group.
35Account Bridge Table
Customers in a bank have multiple accounts and
there may be multiple signatories onthe set of
accounts so there is a many-to-many relationship
between the Customer Dimension and the Account
Balance Fact Table. The Account-to-Customer
Bridge Table contains account groups for each
customer. The Account Balance Fact Table
contains the total of all balances and the
weighting factor allocates the total among the
individual accounts.
36Agenda
- Snowflaking
- Slowly Changing Dimension
- Rapidly Changing Dimensions
- Junk Dimensions
- Many-to-Many Dimensions
- Organization and Parts Hierarchies
37Hierarchies
- Complex hierarchies among records in a dimension
are difficult to represent - Organizational structure
- Bill of materials
- Traditional method is to use a recursive pointer
but this wont work because it cant be used with
SQL to traverse the hierarchy
38Recursive Join
Recursive join allows the use of the ManagerID
to be used to match anemployee with their
immediate supervisor whose record exists inthe
same table.
39Organizational Hierarchy
In this case we want to be able traverse the
organizational structure of a commercialcustomer
. The Navigation Bridge Table containsa record
for parent and subsidiary in the figure at left.
There is also a zero length record of the
pathway of acustomer to itself. Each record
contains customerkey of the parent and the
subsidiary. This allowsthe use of the SQL GROUP
BY clause to traverse the hierarchy.