Title: The Impact of Global Clustering on Spatial Database Systems
1The Impact of Global Clustering on Spatial
Database Systems
- VLDB Conference 94
- Thomas Brinkhoff Hans-Peter Kriegel
- 2000/5/16
- ??? ???
2Contents
- Introduction
- Queries in Spatial Database Systems
- The Storage of Spatial Objects
- The Cluster Organization
- Evaluation
3Introduction(1/3)
- Characteristics of Spatial Databases
- Manage high numbers of objects
- A high variation in their complexity
- Need selective spatial access
- Spatial Access Methods
- Spatially adjacent objects are arbitrarily
distributed
4Introduction(2/3)
- Global Clustering
- A set of data pages representing spatially
adjacent objects is stored on consecutive pages
of the disk. - Problem
- A global reorganization of all objects in the
database is not reasonable in a dynamic
environment
5Introduction(3/3)
- Goals of this paper
- Obtain an evaluation of the importance of several
techniques for global clustering - The impacts of global clustering on spatial joins
have not been investigated
6Queries in Spatial DB Systems
- Point Query
- Window Query
- Spatial Join
- A ?? B
- Intersection join
i? j
7- Introduction
- Queries in Spatial Database Systems
- The Storage of Spatial Objects
- Spatial Access Method
- Clustering
- Organization Model
- The Cluster Organization
- Evaluation
8Spatial Access Method(1/2)
- Access Methods
- Organize a dynamic set of objects in secondary
storage - B-tree or linear hashing are not suitable
- SAM
- group spatial objects which are close to each
other in data space close to each other in the
data pages
9Spatial Access Method(2/2)
10Clustering(1/2)
- Access time
- Seek time latency time transfer time
- Goal
- Minimize the number of seek operations and the
rotational delay in order to reduce access cost.
11Clustering(2/2)
Page
- Internal Clustering
- Local clustering
- Global clustering
Object
Page
O1
O2
O3
O6
O1
O2
O5
O4
O3
12Organization Models(1/3)-secondary organization
- Primary index for approximation
- Second index for spatial objects
13Organization Models(2/3)-Primary organization
- The exact representation of the objects are on
the datapages.
14Organization Models(3/3)-for Global Clustering
- Combine sets of pages with larger storage units,
Custer Units
15- Introduction
- Queries in Spatial Database Systems
- The Storage of Spatial Objects
- The Cluster Organization
- Requirements
- SAM R-tree
- Cluster Organization
- Modification of R-tree
- Evaluation
16The Cluster Organization(1/5)
- Requirements
- SAM with high quality space partitioning scheme
- Support insertion and deletion
- 3 queries should be efficiently supported
- Maximum cluster size exists
- A reasonable storage utilization
- SAM
- R-tree
- One of the most efficient variants of the R-tree
17The Cluster Organization(2/5)-The Cluster
Organization
- The Cluster Organization
- Static definition of the size of a cluster unit
- Cluster all objects in a cluster unit whose MBRs
are stored in one data page.
18The Cluster Organization(3/5)-The Cluster
Organization
- Page size 4kb
- Entry size 46Bytes
- Smax1.5M Sobj
- Storage utilization 66
- Per Cluster unit 58 objects
19The Cluster Organization(4/5)-Modification of
the R-tree
- Size of all objects in one cluster unit gt the
maximun clsuter size Smax - split data page and the cluster size
- Splitting
- No re-insertion
20The Cluster Organization(5/5)-Processing
21Evaluation (1/9)
- TEST DATA
- Seek time average 9msec
- Latency time average 6msec
- Transfer time average 1msec
22Evaluation (2/9)
- Storage Utilization
- Bad Utilization of Cluster organization
23Evaluation (3/9)
- Buddy System
- Each physical unit has the size Smax2-i (igt0)
- Each cluster unit has the buddy of the smallest
possible size - Cluster gt Buddy
- ? into a bigger buddy
- Split
- ? into a smaller buddy
- Restricted Sizes
- (Smax,, 0.5Smax, 0.25Smax)
24Evaluation (4/9)
- Windows Queries
- Transfer the complete cluster unit
- This is the handicap until now
25Evaluation (5/9)
- Geometric Threshold
- use the degree of overlap between the region of a
cluster unit and the query window - Use threshold T
- If (T lt degree) send page-by-page
- If (T gt degree) send the cluster unit
- The SLM-Technique
- Reading requested and non-requested pages
26Evaluation (6/9)
- The SLM-technique is the best choice
27Evaluation (7/9)
28Evaluation (8/9)
- Spatial Join
- a
- 86,094 pairs of intersect
- b
- 1.2 million pairs of intersect
29Evaluation (9/9)
- Impact of Global Clustering on the performance of
a Complete Spatial Join
30Conclusion
- Global Clustering speeds up the access to spatial
objects for large window queries as well as for
spatial joins. - Using a buddy system, it shows a good storage
utilization. - SLM-technique is the best choice for window
queries.
31(No Transcript)
32(No Transcript)