Title: File Processing : Multi-dimensional Index
1File Processing Multi-dimensional Index
- 2008, Spring
- Pusan National University
- Ki-Joune Li
2Multi-Dimensional Index
- Multi-Attributes Query vs. Single Attribute Query
- Single Attribute Only ONE attribute to specify
query condition - Example Find Students whose record is in 3.5,
4.5 - Multi-Attributes Several attributes
- Example Find students whose height is greater
than 180 cm and weight is less
than 70 Kg - Each attribute corresponds to a dimension
- Multi-Attribute Query Multi-Dimensional Query
3Processing Multi-dimensional Queries
- Example Find students whose height gt 180 cm and
weight lt
70 Kg - Method 1 Using a B-tree
- Step 1 Apply B-tree to search student taller
than 180 cm - Step 2 Search students lighter than 70 Kg
from the result of step 1 - Height and Weight or Weight and Height ?
4Processing Multi-dimensional Queries
- Method 2 Using Two B-trees
- Step 1 Result1 ? Students taller than 180 cm by
B-tree - Step 2 Result2 ? Students lighter than 70 Kg by
B-tree - Step 3 Result ? Result1 ? Result2
- Comparison of Method 1 and Method 2
?
5Processing Multi-dimensional Queries
- Method 3 Unified Index for Several Attributes
- One index for several attributes
- Multi-Dimensional Space
- Two approaches
- Extending B-tree
- Extending Dynamic Hashing
Weight
Height
6Extending Hashing Grid Approach
7Extending Hashing Grid File
Directory
(x1, y1)
(x2, y2)
Block Pointer
Query
8Problem 1 Dead Space
No objects in this query area
5 block accesses
Query
Dead Space? Empty space with no objects
How to reduce dead space
9Minimum Bounding Rectangle
MBR(Minimum Bounding Rectangle)
Query
Only 1 Disk Access
10Problem 2 Non-Point Object
Where to store this object
11Minimum Bounding Rectangle
- MBR (Minimum Bounding Box)
- Two dimensional geometric simplification of
objects - Not the Whole space,
- only in the region occupied by objects
12Extending B-tree R-tree
- B-tree vs. R-tree
- B-tree Interval (1-D rectangle)
- R-tree Multi-Dimensional Interval (Rectangle)
- R-tree Rectangle B-tree
- Each Node
- MBR (Minimum Bounding Rectangle) instead of
Interval (or Delimiter) - No Linked-List for External Nodes
- A certain amount of overlapping is indispensable
13Extending B-tree R-tree
Root
Query
14Upward Split like B-tree
- Split MBR in the case of overflow
- Line sweeping Compare Cost-X and Cost-Y
Splitting Line
15Splitting Strategy
- 5050 Split
- Instead of 5050 split, other cost measures
- Area,
- Perimeter
- Overlapping Area
1. Make them as COMPACT as possible
2. Preserve spatial proximity as possible
16R-tree An Improvement of R-tree
- Re-Insertion Strategy on Overflow
- Most Popular Index for Multi-Dimensional Index
Overflow