Chapt. 7 Multidimensional Hierarchical Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Chapt. 7 Multidimensional Hierarchical Clustering

Description:

Prof. Bayer, DWH, Ch.7, SS2002. 1. Chapt. 7 Multidimensional Hierarchical ... Apple. Juice. Asia. Processing a query box in sort order with the Tetris algorithm ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 30
Provided by: Bay81
Category:

less

Transcript and Presenter's Notes

Title: Chapt. 7 Multidimensional Hierarchical Clustering


1
Chapt. 7 Multidimensional Hierarchical Clustering
Fig. 3.1 Hierarchies in the Juice and More
schema
2
(b)
3
Size of completely aggregated Cube (692011)(9
838)(64)(413) -----------------------------
------------------- (581910)(8727)(5
3)(312) 46691113
185.328 -------------------- -----------
7.96 larger than base cube 557719
23.275 Base Cube has 2.245.024.000 cells
4 B 9 GB Number of available facts 26
million
4
Sparsity 26106 -------------- 0,0116 2,2
45 109 100 - 1.16 98.84 sparsity
5
Hierarchically aggregated Cube (15407607600)
8406 (1856112784) 961 (1515)
21 (1324) 28 P 4.749.961.608 Size of
base cube 2.145.024.000 Number of aggregate
cells 2.504.937.608 gt Juice and More database
has 96 times more hierarchically aggregated cells
than occupied base cells!
6
Star-Joins Restrictions on several dimension
tables, which are then joined with fact table In
addition grouping, computation of aggregates,
sorting of results. Example
7
(No Transcript)
8
  • Key Question
  • How to compute star-joins efficiently?
  • Secondary indexes on foreign keys of fact
    table (standard B-trees), see chapter 5 for
    details
  • - intersect result lists
  • retrieve tuples from fact table randomly
  • Bitmaps

9






bitmap for organization
34 of
1.....1.11 1.1...1.1. 1.1...1.1. ...1.1....
..1.1...1.
TM
tuples







bitmap for region
32 of
11.1...... 1.11.....1 .1.1..1... 1.1.1.....
.1..1.1...

Asia


tuples


result of bitmap intersection


10 of

1......... 1.1....... ......1... ..........
....1.....

tuples



80 of





accessed disk pages
Page 1


Page 2
Page 3
Page 4
Page 5
pages
(shaded)

Bitmap Index Intersection
10
Problem for small result sets of a few ,
almost all pages of the facts table must be
fetched from disk, if the hits in the result set
are not clustered on disk. Ex with 8 KB pages
20 to 400 tuples per page, i.e. at 0.25 to 5
hits in the result almost all pages must be
fetched. At least tuple clustering, preferably
page clustering, are desirable, but how?? Goal
Code hierarchies in such a way, that for
star-joins with the Fact table we have to join
only with a query box on the Fact table
11
Basic Idea for Multidimensional Clustering
All
All Products
AppleJuice
Orange Juice
Apple Juice
Product Category
0
1
0
0,33L
0,7L
1L
0,5L
1
0
2
1
1L
Example Hierarchy in Member Set Representation
12
Dimension D consists of Value Set V v1, v2,
... vn Hierarchy H of height h consisting of
h1 hierarchy levels H L0 , L1 ,..., Lh
Level Li is a set of sets m1i, ..., mji
with mki ? V mki get names, e.g. Orange
Juice as label(m11), in general
label(mki) Constraint every mli1 must be a
subset of some mki
13
Hierarchic Relationships The children of mki are
all those sets mli1 of the lower level i1 with
the property mli1 ? mki , formally children(
mki ) mli1 ? Li1 mli1 ? mki
parent(mki ) mli-1 ? Li-1 mli-1 ? mki
Principle the children of m are numbered by
the bijective function ordm starting at 1 or 0
14
Hierarchic Relationships The children of mki are
all those sets mli1 of the lower level i1 with
the property mli1 ? mki , formally children(
mki ) mli1 ? Li1 mli1 ? mki
parent(mki ) mli-1 ? Li-1 mli-1 ? mki
Principle the children of m are numbered by
the bijective function ordm starting at 1 or 0
15
Enumeration and Surrogate Functions Let A be an
enumeration type A a0, a1, ... ak f A
--gt (0, 1 ,..., k ) defined as f (ai ) i then
i is called the surrogate of ai
16
Hierarchies and composite Surrogates Basic Idea
concatenate the surogates of successive hierarchy
levels (compound surrogates cs) Note the root
ALL of the hierarchy is not encoded Def compound
surrogate cs for hierarchy H ordm children (m)
--gt 0, 1, ..., children(m) -1 cs (H, mi)
ord father (mi) (mi) if i1 cs (H,
father ( mi)) ? ord father (mi) (mi) otherwise
17
Example
18
Surrogates for Region and the entire Costumer
Hierarchy
19
Example the path North America --gt USA --gt
Retail --gt Bar has the compound surrogate
4?1?1?2 Next Idea for every hierarchy level
determine the higest branching degree (plus a
safety margin for future extensions) and code by
fixed number of bits. surrogates (H,i) max
cardinality (children (H,m)) m ? level (H, i-1)

20
let li ?log2 surrogates (H,i)? then li bits
are needed for the surrogates of level i let ?
be a path ? m0 ? m1 ? m2 ? ... ? mh to a
leaf mh of hierarchy H
21
cs (H,?) cs (H,mh)



...

22
Example cs (H, Bar) 100 001 1
010 538
l13 l23 l31 l43

number of bits needed at certain level
23
  • Properties of MHC Encoding
  • very compact coding of fixed length
  • lexicographic order of composite keys remains,
    i.e. isomorphic to integer ordering
  • point restrictions on arbitrary hierarchy levels
    lead to interval restrictions on the compound
    surrogates

24
Example path to USA is North America --gt
USA 4 1002 1 0012 leads to range on
cs 100 001 0 0002 to 100 001 1 1112
and to the decimal range 528 to 543
or 528 543 gt star join with restriction
North America.USA leads to an interval
restriction on the fact table gt point
restrictions on arbitrary hierarchy levels of
several dimensions lead to Query Boxes on the
fact table.
25
  • Complex Hierarchies
  • time with months and weeks, both restrictions
    lead to intervals on the level of days
  • Example of Fig. 4-4
  • proposal for multiple hierarchies choose the
    most useful (depending on the query profile) or
    consider multiple hierarchies as several
    independent hierarchies. Caution, this increases
    the number of dimensions !!!
  • Time variant hierarchies extend by time
    interval of validity , see Example Fig. 4-5,

26
REGION
YEAR
NATION
CUSTOMER TYPE
MONTH
WEEK
TRADE TYPE
CUSTOMER SIZE
DAY
CUSTOMER
(b)
(a)
Fig. 4-4 Complex Hierarchy Graphs
27
CUSTOMER
South Europe
North
America

...
USA
Canada
Retail
Wholesale
Bar
Restaurant
Year
lt 1997
Year
gt 1997
Joe
s Sports Bar
Fig. 4-5 Change of a hierarchy over the time
28
(No Transcript)
29
Processing a query box in sort order with the
Tetris algorithm
Write a Comment
User Comments (0)
About PowerShow.com