Spatial%20Indexing - PowerPoint PPT Presentation

About This Presentation
Title:

Spatial%20Indexing

Description:

Spatial Indexing Many s taken from George Kollios, Boston University ... [O89] z-transform; sensitive to size of pixel Partition-based spatial-merge join ... – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0
Slides: 83
Provided by: Valued1314
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Spatial%20Indexing


1
Spatial Indexing
  • Many slides taken from George Kollios, Boston
    University

2
Spatial Indexing
  • Point Access Methods can index only points. What
    about regions?
  • Use the transformation technique
  • Z-ordering and quadtrees
  • New methods Spatial Access Methods SAMs
  • R-tree and variations

3
Problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer spatial queries
    (range, nn, etc)

4
R-tree
  • z-ordering cuts regions to pieces -gt dup. elim.
  • how could we avoid that?
  • Idea try to extend/merge B-trees and k-d trees
  • R-tree a generalization of the B-tree for
    multidimensional spaces

5
Kd-Btrees
  • Robinson, 81 if f is the fanout, split
    point-set in f parts and so on, recursively

6
Kd-Btrees
  • But insertions/deletions are tricky (splits may
    propagate downwards and upwards)
  • no guarantee on space utilization

7
R-trees
  • Guttman 84 Main idea allow parents to overlap!
  • gt guaranteed 50 utilization
  • gt easier insertion/split algorithms.
  • (only deal with Minimum Bounding Rectangles -
    MBRs)

8
R-trees
  • A multi-way external memory tree
  • Index nodes and data (leaf) nodes
  • All leaf nodes appear on the same level
  • Every node contains between m and M entries
  • The root node has at least 2 entries (children)

9
Example
  • eg., w/ fanout 4 group nearby rectangles to
    parent MBRs each group -gt disk page

I
C
A
G
H
F
B
J
E
D
10
Example
  • F4

P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
11
Example
  • F4

P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
12
R-trees - format of nodes
  • (MBR obj_ptr) for leaf nodes

x-low x-high y-low y-high ...
obj ptr
...
13
R-trees - format of nodes
  • (MBR node_ptr) for non-leaf nodes

x-low x-high y-low y-high ...
node ptr
...
14
R-treesSearch
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
15
R-treesSearch
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
16
R-treesSearch
  • Main points
  • every parent node completely covers its
    children
  • a child MBR may be covered by more than one
    parent - it is stored under ONLY ONE of them.
    (ie., no need for dup. elim.)
  • a point query may follow multiple branches.
  • everything works for any(?) dimensionality

17
R-treesInsertion
Insert X
P1
P3
I
C
A
G
H
F
B
X
J
E
P4
D
P2
X
18
R-treesInsertion
Insert Y
P1
P3
I
C
A
G
H
F
B
J
E
P4
Y
D
P2
19
R-treesInsertion
  • Extend the parent MBR

P1
P3
I
C
A
G
H
F
B
J
E
P4
Y
D
P2
Y
20
R-treesInsertion
  • How to find the next node to insert the new
    object?
  • Using ChooseLeaf Find the entry that needs the
    least enlargement to include Y. Resolve ties
    using the area (smallest)
  • Other methods (later)

21
R-treesInsertion
  • If node is full then Split ex. Insert w

P1
P3
K
I
C
A
G
W
H
F
B
J
K
E
P4
D
P2
22
R-treesSplit
  • Split node P1 partition the MBRs into two groups.
  • (A1 plane sweep,
  • until 50 of rectangles)
  • A2 linear split
  • A3 quadratic split
  • A4 exponential split
  • 2M-1 choices

P1
K
C
A
W
B
23
R-treesSplit
  • pick two rectangles as seeds
  • assign each rectangle R to the closest seed

seed1
24
R-treesSplit
  • pick two rectangles as seeds
  • assign each rectangle R to the closest
    seed
  • closest the smallest increase in area

seed1
25
R-treesSplit
  • How to pick Seeds
  • LinearFind the highest and lowest side in each
    dimension, normalize the separations, choose the
    pair with the greatest normalized separation
  • Quadratic For each pair E1 and E2, calculate the
    rectangle JMBR(E1, E2) and d J-E1-E2. Choose
    the pair with the largest d

26
R-treesInsertion
  • Use the ChooseLeaf to find the leaf node to
    insert an entry E
  • If leaf node is full, then Split, otherwise
    insert there
  • Propagate the split upwards, if necessary
  • Adjust parent nodes

27
R-TreesDeletion
  • Find the leaf node that contains the entry E
  • Remove E from this node
  • If underflow
  • Eliminate the node by removing the node entries
    and the parent entry
  • Reinsert the orphaned (other entries) into the
    tree using Insert
  • Other method (later)

28
R-trees Variations
  • R-tree DO not allow overlapping, so split the
    objects (similar to z-values)
  • R-tree change the insertion, deletion
    algorithms (minimize not only area but also
    perimeter, forced re-insertion )
  • Hilbert R-tree use the Hilbert values to insert
    objects into the tree

29
R-trees - variations
  • what about static datasets (no ins/del/upd)?
  • Q Best way to pack points?

30
R-trees - variations
  • what about static datasets (no ins/del/upd)?
  • Q Best way to pack points?
  • A1 plane-sweep
  • great for queries on x
  • terrible for y

31
R-trees - variations
  • what about static datasets (no ins/del/upd)?
  • Q Best way to pack points?
  • A1 plane-sweep
  • great for queries on x
  • bad for y

32
R-trees - variations
  • what about static datasets (no ins/del/upd)?
  • Q Best way to pack points?
  • A1 plane-sweep
  • great for queries on x
  • terrible for y
  • Q how to improve?

33
R-trees - variations
  • A plane-sweep on HILBERT curve!

34
R-trees - variations
  • A plane-sweep on HILBERT curve!
  • In fact, it can be made dynamic (how?), as well
    as to handle regions (how?)

35
R-trees - variations
  • Dynamic (Hilbert R-tree)
  • each point has an h-value (hilbert value)
  • insertions like a B-tree on the h-value
  • but also store MBR, for searches

36
R-trees - variations
  • what about other bounding shapes? (and why?)
  • A1 arbitrary-orientation lines (cell-tree,
    Guenther
  • A2 P-trees (polygon trees) (MB polygon 0, 90,
    45, 135 degree lines)

37
R-trees - variations
  • A3 L-shapes holes (hB-tree)
  • A4 TV-trees Lin, VLDB-Journal 1994
  • A5 SR-trees Katayama, SIGMOD97 (used in
    Informedia)

38
R-trees - conclusions
  • Popular method like multi-d B-trees
  • guaranteed utilization
  • good search times (for low-dim. at least)
  • Informix ships DataBlade with R-trees

39
Spatial Queries
40
Spatial Queries
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

41
Spatial Queries
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

42
Spatial Queries
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

43
Spatial Queries
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

44
Spatial Queries
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

45
R-trees - Range search
  • pseudocode
  • check the root
  • for each branch,
  • if its MBR intersects the query rectangle
  • apply range-search (or print out, if
    this
  • is a leaf)

46
R-trees - NN search
47
R-trees - NN search
  • Q How? (find near neighbor refine...)

48
R-trees - NN search
  • A1 depth-first search then, range query

P1
I
P3
C
A
G
H
F
B
J
E
P4
q
D
P2
49
R-trees - NN search
  • A1 depth-first search then, range query

P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
50
R-trees - NN search
  • A1 depth-first search then, range query

P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
51
R-trees - NN search
  • A2 Roussopoulos, sigmod95
  • priority queue, with promising MBRs, and their
    best and worst-case distance
  • main idea Every face of any MBR contains at
    least one point of an actual spatial object!

52
R-trees - NN search
consider only P2 and P4, for illustration
q
53
R-trees - NN search
best of P4
gt P4 is useless for 1-nn
worst of P2
H
J
E
P4
q
D
P2
54
R-trees - NN search
  • what is really the worst of, say, P2?

worst of P2
E
q
D
P2
55
R-trees - NN search
  • what is really the worst of, say, P2?
  • A the smallest of the two red segments!

q
P2
56
Nearest-neighbor searching
  • Branch and bound strategy
  • Compute MINDIST and MINMAXDIST RKV95
  • MINDIST(p,R) is the minimum distance between p
    and R with corner points l and u
  • the closest point in R is at least this distance
    away

R
u
Sqrt sum (pi-ri)2 where
ri li if pi lt li ui if pi gt ui pi
otherwise
p
p
MINDIST 0
l
p
57
Nearest-neighbor searching
  • MINMAXDIST(p,R) is the minimum of the maximum
    distance to each pair of faces of R
  • MaxDistanceToFace(p,R,k) distance between p and
    Maxk (M1,M2,..,Mk-1mk,MK1,..,Mn)
  • mi closer of the two boundary points along i
    axis
  • Mi farther of the two boundary points along i
    axis

Max2
u
p
l
Max1
58
Nearest-neighbor searching
  • MINMAXDIST(p,R) is the minimum of the maximum
    distance to each pair of faces of R
  • MaxDistanceToFace(p,R,k) distance between p and
    Maxk (M1,M2,..,Mk-1mk,MK1,..,Mn)
  • mi closer of the two boundary points along i
    axis
  • Mi farther of the two boundary points along i
    axis

Max1
u
p
l
Max2
59
Pruning
  • ESTIMATE smallest MINMAXDIST(p,R)
  • Prune an MBR R for which MINDIST(p,R) is
    greater than ESTIMATE.
  • Generalize to k-nearest neighbor searching
  • Maintain kth largest MINMAXDIST
  • Prune an MBR if MINDIST to it is larger than the
    current estimate of kth MINMAXDIST
  • Can use objects to refine estimate

60
Order of searching
  • Depth first order
  • Inspect children in MINDIST order
  • For each node in the tree keep a list of nodes to
    be visited
  • Prune some of these nodes in the list
  • Continue until the lists are empty

61
Another NN search
  • Global order HS99
  • Maintain distance to all entries in a common list
  • Order the list by MINDIST
  • Repeat
  • Inspect the next MBR in the list
  • Add the children to the list and reorder
  • Until all remaining MBRs can be pruned

62
Spatial Join
  • Find all parks in a city
  • Find all trails that go through a forest
  • Basic operation
  • find all pairs of objects that overlap
  • Single-scan queries
  • nearest neighbor queries, range queries
  • Multiple-scan queries
  • spatial join

63
Algorithms
  • No existing index structures
  • Transform data into 1-d space O89
  • z-transform sensitive to size of pixel
  • Partition-based spatial-merge join PW96
  • partition into tiles that can fit into memory
  • plane sweep algorithm on tiles
  • Spatial hash joins LR96, KS97
  • Sort data BBKK01
  • With index structures BKS93, HJR97
  • k-d trees and grid files
  • R-trees

64
R-tree based Join BKS93
S
R
65
Join1(R,S)
  • Repeat
  • Find a pair of intersecting entries E in R and F
    in S
  • If R and S are leaf pages then add (E,F) to
    result-set
  • Else Join1(E,F)
  • Until all pairs are examined
  • CPU and I/O bottleneck

66
Reducing CPU bottleneck
S
R
67
Join2(R,S,IntersectedVol)
  • Repeat
  • Find a pair of intersecting entries E in R and F
    in S that overlap with IntersectedVol
  • If R and S are leaf pages then add (E,F) to
    result-set
  • Else Join2(E,F,CommonEF)
  • Until all pairs are examined
  • 146 comparisons instead of 49
  • In general, number of comparisons equals
  • size(R) size(S) relevant(R)relevant(S)
  • Reduce the product term

68
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Consider the extents along x-axis Start with the
first entry r1 sweep a vertical line
69
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Check if (r1,s1) intersect along y-dimension Add
(r1,s1) to result set
70
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Check if (r1,s2) intersect along y-dimension Add
(r1,s2) to result set
71
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Reached the end of r1 Start with next entry r2
72
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Reposition sweep line
73
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Check if r2 and s1 intersect along y Do not add
(r2,s1) to result
74
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Reached the end of r2 Start with next entry s1
75
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Total of 2(r1) 1(r2) 0 (s1) 1(s2) 0(r3)
4 comparisons
76
Reducing I/O
  • Read schedule r1, s1, s2, r3
  • Every subtree examined only once
  • Consider a slightly different layout

77
Reducing I/O
S
R
s1
r2
r1
s2
r3
Read schedule is r1, s2, r2, s1, s2, r3
Subtree s2 is examined twice
78
Pinning of nodes
  • After examining a pair (E,F), compute the degree
    of intersection of each entry
  • degree(E) is the number of intersections between
    E and unprocessed rectangles of the other dataset
  • If the degrees are non-zero, pin the pages of the
    entry with maximum degree
  • Perform spatial joins for this page
  • Continue with plane sweep

79
Reducing I/O
S
R
s1
r2
r1
s2
r3
After computing join(r1,s2), degree(r1)
0 degree(s2) 1 So, examine s2 next Read
schedule r1, s2, r3, r2, s1 Subtree s2
examined only once
80
References
  • SK98 Optimal multi-step k-nearest neighbor
    search, T. Seidl and H. Kriegel, SIGMOD 1998
    154--165.
  • BBKK01 Epsilon Grid Order An Algorithm for the
    Similarity Join on Massive High-Dimensional Data,
    C. Bohm, B. Braunmüller, F. Krebs and H.-P.
    Kriegel, SIGMOD 2001.
  • RKV95 Roussopoulos N., Kelley S., Vincent F.
    Nearest Neighbor Queries. Proceedings of the
    ACM-SIGMOD International Conference on Management
    of Data, pages 71-79, 1995.

81
References
  • HS99 G. R. Hjaltason and H. Samet, Distance
    browsing in spatial databases, ACM Transactions
    on Database Systems 24, 2 (June 1999), 265-318
  • O89 Jack A. Orenstein Redundancy in Spatial
    Databases. SIGMOD Conference 1989 294-305
  • PW96 Jignesh M. Patel, David J. DeWitt
    Partition Based Spatial-Merge Join. SIGMOD
    Conference 1996 259-270
  • LR96 Ming-Ling Lo, Chinya V. Ravishankar
    Spatial Hash-Joins. SIGMOD Conference 1996
    247-258

82
References
  • KS97 Nick Koudas, Kenneth C. Sevcik Size
    Separation Spatial Join. SIGMOD Conference 1997
    324-335
  • HJR97 Yun-Wu Huang, Ning Jing, Elke A.
    Rundensteiner Spatial Joins Using R-trees
    Breadth-First Traversal with Global
    Optimizations. VLDB 1997, 396-405
  • BKS93 Thomas Brinkhoff, Hans-Peter Kriegel,
    Bernhard Seeger Efficient Processing of Spatial
    Joins Using R-Trees. SIGMOD Conference 1993
    237-246
Write a Comment
User Comments (0)
About PowerShow.com