Title: Spatial%20Indexing
1Spatial Indexing
- Many slides taken from George Kollios, Boston
University
2Spatial Indexing
- Point Access Methods can index only points. What
about regions? - Use the transformation technique
- Z-ordering and quadtrees
- New methods Spatial Access Methods SAMs
- R-tree and variations
3Problem
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer spatial queries
(range, nn, etc)
4R-tree
- z-ordering cuts regions to pieces -gt dup. elim.
- how could we avoid that?
- Idea try to extend/merge B-trees and k-d trees
- R-tree a generalization of the B-tree for
multidimensional spaces
5Kd-Btrees
- Robinson, 81 if f is the fanout, split
point-set in f parts and so on, recursively
6Kd-Btrees
- But insertions/deletions are tricky (splits may
propagate downwards and upwards) - no guarantee on space utilization
7R-trees
- Guttman 84 Main idea allow parents to overlap!
- gt guaranteed 50 utilization
- gt easier insertion/split algorithms.
- (only deal with Minimum Bounding Rectangles -
MBRs)
8R-trees
- A multi-way external memory tree
- Index nodes and data (leaf) nodes
- All leaf nodes appear on the same level
- Every node contains between m and M entries
- The root node has at least 2 entries (children)
9Example
- eg., w/ fanout 4 group nearby rectangles to
parent MBRs each group -gt disk page
I
C
A
G
H
F
B
J
E
D
10Example
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
11Example
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
12R-trees - format of nodes
- (MBR obj_ptr) for leaf nodes
x-low x-high y-low y-high ...
obj ptr
...
13R-trees - format of nodes
- (MBR node_ptr) for non-leaf nodes
x-low x-high y-low y-high ...
node ptr
...
14R-treesSearch
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
15R-treesSearch
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
16R-treesSearch
- Main points
- every parent node completely covers its
children - a child MBR may be covered by more than one
parent - it is stored under ONLY ONE of them.
(ie., no need for dup. elim.) - a point query may follow multiple branches.
- everything works for any(?) dimensionality
17R-treesInsertion
Insert X
P1
P3
I
C
A
G
H
F
B
X
J
E
P4
D
P2
X
18R-treesInsertion
Insert Y
P1
P3
I
C
A
G
H
F
B
J
E
P4
Y
D
P2
19R-treesInsertion
P1
P3
I
C
A
G
H
F
B
J
E
P4
Y
D
P2
Y
20R-treesInsertion
- How to find the next node to insert the new
object? - Using ChooseLeaf Find the entry that needs the
least enlargement to include Y. Resolve ties
using the area (smallest) - Other methods (later)
21R-treesInsertion
- If node is full then Split ex. Insert w
P1
P3
K
I
C
A
G
W
H
F
B
J
K
E
P4
D
P2
22R-treesSplit
- Split node P1 partition the MBRs into two groups.
- (A1 plane sweep,
- until 50 of rectangles)
- A2 linear split
- A3 quadratic split
- A4 exponential split
- 2M-1 choices
P1
K
C
A
W
B
23R-treesSplit
- pick two rectangles as seeds
- assign each rectangle R to the closest seed
seed1
24R-treesSplit
- pick two rectangles as seeds
- assign each rectangle R to the closest
seed - closest the smallest increase in area
seed1
25R-treesSplit
- How to pick Seeds
- LinearFind the highest and lowest side in each
dimension, normalize the separations, choose the
pair with the greatest normalized separation - Quadratic For each pair E1 and E2, calculate the
rectangle JMBR(E1, E2) and d J-E1-E2. Choose
the pair with the largest d
26R-treesInsertion
- Use the ChooseLeaf to find the leaf node to
insert an entry E - If leaf node is full, then Split, otherwise
insert there - Propagate the split upwards, if necessary
- Adjust parent nodes
27R-TreesDeletion
- Find the leaf node that contains the entry E
- Remove E from this node
- If underflow
- Eliminate the node by removing the node entries
and the parent entry - Reinsert the orphaned (other entries) into the
tree using Insert - Other method (later)
28R-trees Variations
- R-tree DO not allow overlapping, so split the
objects (similar to z-values) - R-tree change the insertion, deletion
algorithms (minimize not only area but also
perimeter, forced re-insertion ) - Hilbert R-tree use the Hilbert values to insert
objects into the tree
29R-trees - variations
- what about static datasets (no ins/del/upd)?
- Q Best way to pack points?
30R-trees - variations
- what about static datasets (no ins/del/upd)?
- Q Best way to pack points?
- A1 plane-sweep
- great for queries on x
- terrible for y
31R-trees - variations
- what about static datasets (no ins/del/upd)?
- Q Best way to pack points?
- A1 plane-sweep
- great for queries on x
- bad for y
32R-trees - variations
- what about static datasets (no ins/del/upd)?
- Q Best way to pack points?
- A1 plane-sweep
- great for queries on x
- terrible for y
- Q how to improve?
33R-trees - variations
- A plane-sweep on HILBERT curve!
34R-trees - variations
- A plane-sweep on HILBERT curve!
- In fact, it can be made dynamic (how?), as well
as to handle regions (how?)
35R-trees - variations
- Dynamic (Hilbert R-tree)
- each point has an h-value (hilbert value)
- insertions like a B-tree on the h-value
- but also store MBR, for searches
36R-trees - variations
- what about other bounding shapes? (and why?)
- A1 arbitrary-orientation lines (cell-tree,
Guenther - A2 P-trees (polygon trees) (MB polygon 0, 90,
45, 135 degree lines)
37R-trees - variations
- A3 L-shapes holes (hB-tree)
- A4 TV-trees Lin, VLDB-Journal 1994
- A5 SR-trees Katayama, SIGMOD97 (used in
Informedia)
38R-trees - conclusions
- Popular method like multi-d B-trees
- guaranteed utilization
- good search times (for low-dim. at least)
- Informix ships DataBlade with R-trees
39Spatial Queries
40Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
41Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
42Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
43Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
44Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
45R-trees - Range search
- pseudocode
- check the root
- for each branch,
- if its MBR intersects the query rectangle
- apply range-search (or print out, if
this - is a leaf)
46R-trees - NN search
47R-trees - NN search
- Q How? (find near neighbor refine...)
48R-trees - NN search
- A1 depth-first search then, range query
P1
I
P3
C
A
G
H
F
B
J
E
P4
q
D
P2
49R-trees - NN search
- A1 depth-first search then, range query
P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
50R-trees - NN search
- A1 depth-first search then, range query
P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
51R-trees - NN search
- A2 Roussopoulos, sigmod95
- priority queue, with promising MBRs, and their
best and worst-case distance - main idea Every face of any MBR contains at
least one point of an actual spatial object!
52R-trees - NN search
consider only P2 and P4, for illustration
q
53R-trees - NN search
best of P4
gt P4 is useless for 1-nn
worst of P2
H
J
E
P4
q
D
P2
54R-trees - NN search
- what is really the worst of, say, P2?
worst of P2
E
q
D
P2
55R-trees - NN search
- what is really the worst of, say, P2?
- A the smallest of the two red segments!
q
P2
56Nearest-neighbor searching
- Branch and bound strategy
- Compute MINDIST and MINMAXDIST RKV95
- MINDIST(p,R) is the minimum distance between p
and R with corner points l and u - the closest point in R is at least this distance
away
R
u
Sqrt sum (pi-ri)2 where
ri li if pi lt li ui if pi gt ui pi
otherwise
p
p
MINDIST 0
l
p
57Nearest-neighbor searching
- MINMAXDIST(p,R) is the minimum of the maximum
distance to each pair of faces of R - MaxDistanceToFace(p,R,k) distance between p and
Maxk (M1,M2,..,Mk-1mk,MK1,..,Mn) - mi closer of the two boundary points along i
axis - Mi farther of the two boundary points along i
axis
Max2
u
p
l
Max1
58Nearest-neighbor searching
- MINMAXDIST(p,R) is the minimum of the maximum
distance to each pair of faces of R - MaxDistanceToFace(p,R,k) distance between p and
Maxk (M1,M2,..,Mk-1mk,MK1,..,Mn) - mi closer of the two boundary points along i
axis - Mi farther of the two boundary points along i
axis
Max1
u
p
l
Max2
59Pruning
- ESTIMATE smallest MINMAXDIST(p,R)
- Prune an MBR R for which MINDIST(p,R) is
greater than ESTIMATE. - Generalize to k-nearest neighbor searching
- Maintain kth largest MINMAXDIST
- Prune an MBR if MINDIST to it is larger than the
current estimate of kth MINMAXDIST - Can use objects to refine estimate
60Order of searching
- Depth first order
- Inspect children in MINDIST order
- For each node in the tree keep a list of nodes to
be visited - Prune some of these nodes in the list
- Continue until the lists are empty
61Another NN search
- Global order HS99
- Maintain distance to all entries in a common list
- Order the list by MINDIST
- Repeat
- Inspect the next MBR in the list
- Add the children to the list and reorder
- Until all remaining MBRs can be pruned
62Spatial Join
- Find all parks in a city
- Find all trails that go through a forest
- Basic operation
- find all pairs of objects that overlap
- Single-scan queries
- nearest neighbor queries, range queries
- Multiple-scan queries
- spatial join
63Algorithms
- No existing index structures
- Transform data into 1-d space O89
- z-transform sensitive to size of pixel
- Partition-based spatial-merge join PW96
- partition into tiles that can fit into memory
- plane sweep algorithm on tiles
- Spatial hash joins LR96, KS97
- Sort data BBKK01
- With index structures BKS93, HJR97
- k-d trees and grid files
- R-trees
64R-tree based Join BKS93
S
R
65Join1(R,S)
- Repeat
- Find a pair of intersecting entries E in R and F
in S - If R and S are leaf pages then add (E,F) to
result-set - Else Join1(E,F)
- Until all pairs are examined
- CPU and I/O bottleneck
66Reducing CPU bottleneck
S
R
67Join2(R,S,IntersectedVol)
- Repeat
- Find a pair of intersecting entries E in R and F
in S that overlap with IntersectedVol - If R and S are leaf pages then add (E,F) to
result-set - Else Join2(E,F,CommonEF)
- Until all pairs are examined
- 146 comparisons instead of 49
- In general, number of comparisons equals
- size(R) size(S) relevant(R)relevant(S)
- Reduce the product term
68Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Consider the extents along x-axis Start with the
first entry r1 sweep a vertical line
69Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Check if (r1,s1) intersect along y-dimension Add
(r1,s1) to result set
70Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Check if (r1,s2) intersect along y-dimension Add
(r1,s2) to result set
71Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Reached the end of r1 Start with next entry r2
72Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Reposition sweep line
73Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Check if r2 and s1 intersect along y Do not add
(r2,s1) to result
74Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Reached the end of r2 Start with next entry s1
75Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Total of 2(r1) 1(r2) 0 (s1) 1(s2) 0(r3)
4 comparisons
76Reducing I/O
- Read schedule r1, s1, s2, r3
- Every subtree examined only once
- Consider a slightly different layout
77Reducing I/O
S
R
s1
r2
r1
s2
r3
Read schedule is r1, s2, r2, s1, s2, r3
Subtree s2 is examined twice
78Pinning of nodes
- After examining a pair (E,F), compute the degree
of intersection of each entry - degree(E) is the number of intersections between
E and unprocessed rectangles of the other dataset - If the degrees are non-zero, pin the pages of the
entry with maximum degree - Perform spatial joins for this page
- Continue with plane sweep
79Reducing I/O
S
R
s1
r2
r1
s2
r3
After computing join(r1,s2), degree(r1)
0 degree(s2) 1 So, examine s2 next Read
schedule r1, s2, r3, r2, s1 Subtree s2
examined only once
80References
- SK98 Optimal multi-step k-nearest neighbor
search, T. Seidl and H. Kriegel, SIGMOD 1998
154--165. - BBKK01 Epsilon Grid Order An Algorithm for the
Similarity Join on Massive High-Dimensional Data,
C. Bohm, B. Braunmüller, F. Krebs and H.-P.
Kriegel, SIGMOD 2001. - RKV95 Roussopoulos N., Kelley S., Vincent F.
Nearest Neighbor Queries. Proceedings of the
ACM-SIGMOD International Conference on Management
of Data, pages 71-79, 1995.
81References
- HS99 G. R. Hjaltason and H. Samet, Distance
browsing in spatial databases, ACM Transactions
on Database Systems 24, 2 (June 1999), 265-318 - O89 Jack A. Orenstein Redundancy in Spatial
Databases. SIGMOD Conference 1989 294-305 - PW96 Jignesh M. Patel, David J. DeWitt
Partition Based Spatial-Merge Join. SIGMOD
Conference 1996 259-270 - LR96 Ming-Ling Lo, Chinya V. Ravishankar
Spatial Hash-Joins. SIGMOD Conference 1996
247-258
82References
- KS97 Nick Koudas, Kenneth C. Sevcik Size
Separation Spatial Join. SIGMOD Conference 1997
324-335 - HJR97 Yun-Wu Huang, Ning Jing, Elke A.
Rundensteiner Spatial Joins Using R-trees
Breadth-First Traversal with Global
Optimizations. VLDB 1997, 396-405 - BKS93 Thomas Brinkhoff, Hans-Peter Kriegel,
Bernhard Seeger Efficient Processing of Spatial
Joins Using R-Trees. SIGMOD Conference 1993
237-246