Temple University CIS Dept. CIS616 Principles of Data Management - PowerPoint PPT Presentation

1 / 164
About This Presentation
Title:

Temple University CIS Dept. CIS616 Principles of Data Management

Description:

... (SAMs) ... Spatial Access Methods (SAMs) k-d trees. Point Quadtrees. MX-Quadtree ... SAMs - Detailed outline. spatial access methods. problem dfn. k-d ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 165
Provided by: Vas111
Category:

less

Transcript and Presenter's Notes

Title: Temple University CIS Dept. CIS616 Principles of Data Management


1
Temple University CIS Dept.CIS616 Principles
of Data Management
  • V. Megalooikonomou
  • Spatial Access Methods (SAMs)
  • (based on notes by Silberchatz,Korth, and
    Sudarshan and notes by C. Faloutsos at CMU)

2
General Overview
  • Multimedia Indexing
  • Spatial Access Methods (SAMs)
  • k-d trees
  • Point Quadtrees
  • MX-Quadtree
  • z-ordering
  • R-trees

3
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees

4
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer spatial queries
    (like??)

5
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

6
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

7
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

8
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

9
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs within e)

10
SAMs - motivation
  • Q applications?

11
SAMs - motivation
traditional DB
GIS
age
salary
12
SAMs - motivation
traditional DB
GIS
age
salary
13
SAMs - motivation
CAD/CAM
find elements too close to each other
14
SAMs - motivation
CAD/CAM
15
SAMs - motivation
eg,. std
S1
F(S1)
1
365
day
F(Sn)
Sn
eg, avg
1
365
day
16
SAMs solutions
  • K-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees
  • (grid files)
  • Q how would you organize, e.g., n-dim points, on
    disk? (C points per disk page)

17
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees

18
k-d trees
  • Used to store k dimensional point data
  • It is not used to store region data
  • A 2-d tree (i.e., for k2) stores 2-dimensional
    point data while a 3-d tree stores 3-dimensional
    point data, etc.

19
2-d trees node structure
  • Binary trees
  • Info information field
  • Xval,Yval coordinates of a point associated with
    the node
  • Llink, Rlink pointers to children
  • Properties (N node)
  • If level N even -
  • for all nodes M in the subtree rooted at N.Llink
    M.Xval
  • for all nodes P in the subtree rooted at N.Rlink
    P.Xval N.Xval
  • If level N odd -
  • Similarly use Yvals

20
2-d trees Example
21
2-d trees Insertion/Search
  • To insert a node N into the tree pointed by T
  • If N and T agree on Xval, Yval then overwrite T
  • Else, branch left if N.Xval otherwise (even levels)
  • Similarly for odd levels (branching on Yvals)

22
2-d trees Example of Insertion
Splitting of region by Banja Luka
Splitting of region by Derventa
Splitting of region by Toslic
Splitting of region by Sinj
23
2-d trees Deletion
  • Deletion of point (x,y) from T
  • If N is a leaf node easy
  • Otherwise either Tl (left subtree) or Tr (right
    subtree) is non-empty
  • Find a candidate replacement node R in Tl or
    Tr
  • Replace all of Ns non-link fields by those of R
  • Recursively delete R from Ti
  • Recursion guaranteed to terminate - Why?

24
2-d trees Deletion
  • Finding candidate replacement nodes for deletion
  • Replacement node R must bear same spatial
    relation to all nodes in Tl and Tr as node N

25
2-d trees Range Queries
  • Q Given a point (xc, yc) and a distance r find
    all points in the 2-d tree that lie within the
    circle
  • A Each node N in a 2-d tree implicitly
    represents a region RN If the circle (specified
    by the query) has no intersection with RN then
    there is no point in searching the subtree rooted
    at node N

26
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • z-ordering
  • R-trees

27
Point Quadtrees
  • Represent point data
  • Always split regions into 4 parts
  • 2-d tree a node N splits a region into two by
    drawing one line through the point (N.xval,
    N.yval)
  • Point quadtree a node N splits a region by
    drawing a horizontal and a vertical line through
    the point (N.xval, N.yval)
  • Four parts NW, SW, NE, and SE quadrants
  • Q Quadtree nodes have 4 children?

28
Point Quadtrees
  • Nodes in point quadtrees represent regions

29
Point quadtrees - Insertion
Splitting of region by Banja Luka
Splitting of region by Derventa
Splitting of region by Toslic
Splitting of region by Sinj
Splitting of region by Tuzla
30
Point Quadtrees - Insertion
31
Point quadtrees Deletion
  • Deletion of point (x,y) from T
  • If N is a leaf node easy
  • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is
    non-empty
  • Find a candidate replacement node R in one of
    the subtrees such that
  • Every other node R1 in N.NW is to the NW of R
  • Every other node R2 in N.SW is to the SW of R
  • etc
  • Replace all of Ns non-link fields by those of R
  • Recursively delete R from Ti
  • In general, it may not always be possible to find
    such as replacement node
  • Q What happens in the worst case?

32
Point quadtrees Deletion
  • Deletion of point (x,y) from T
  • If N is a leaf node easy
  • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is
    non-empty
  • Find a candidate replacement node R in one of
    the subtrees such that
  • Every other node R1 in N.NW is to the NW of R
  • Every other node R2 in N.SW is to the SW of R
  • etc
  • Replace all of Ns non-link fields by those of R
  • Recursively delete R from Ti
  • In general, it may not always be possible to find
    such as replacement node
  • Q What happens in the worst case? May require
    all nodes to be reinserted

33
Point quadtrees Range Searches
  • Each node in a point quadtree represents a region
  • Do not search regions that do not intersect the
    circle defined by the query

34
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees

35
MX-Quadtrees
  • Drawbacks of 2-d trees, point quadtrees
  • shape of tree depends upon the order in which
    objects are inserted into the tree
  • splits may be uneven depending upon where the
    point (N.xval, N.yval) is located inside the
    region (represented by N)
  • MX-quadtrees shape (and height) of tree
    independent of number of nodes and order of
    insertion

36
MX-Quadtrees
  • Assumption the map is represented as a grid of
    size (2k x 2k) for some k
  • When a region gets split it splits down the
    middle

37
MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
38
MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
39
MX-Quadtrees - Deletion
  • Fairly easy why?
  • All point are represented at the leaf level
  • Total time for deletion O(k)

40
MX-Quadtrees Range Queries
  • Same as in point quadtrees
  • One difference
  • Checking to see if a point is in the circle
    defined by the range query needs to be performed
    at the leaf level (points are stored at the leaf
    level)

41
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees

42
z-ordering
  • Q how would you organize, e.g., n-dim points, on
    disk? (C points per disk page)
  • Hint reduce the problem to 1-d points(!!)
  • Q1 why?
  • A
  • Q2 how?

43
z-ordering
  • Q how would you organize, e.g., n-dim points, on
    disk? (C points per disk page)
  • Hint reduce the problem to 1-d points (!!)
  • Q1 why?
  • A B-trees!
  • Q2 how?

44
z-ordering
  • Q2 how?
  • A assume finite granularity z-ordering
    bit-shuffling N-trees Morton keys
    geo-coding ...

45
z-ordering
  • Q2 how?
  • A assume finite granularity (e.g., 232x232 4x4
    here)
  • Q2.1 how to map n-d cells to 1-d cells?

46
z-ordering
  • Q2.1 how to map n-d cells to 1-d cells?

47
z-ordering
  • Q2.1 how to map n-d cells to 1-d cells?
  • A row-wise
  • Q is it good?

48
z-ordering
  • Q is it good?
  • A great for x axis bad for y axis

49
z-ordering
  • Q How about the snake curve?

50
z-ordering
  • Q How about the snake curve?
  • A still problems

232
232
51
z-ordering
  • Q Why are those curves bad?
  • A no distance preservation ( clustering)
  • Q solution?

232
232
52
z-ordering
  • Q solution? (w/ good clustering, and easy to
    compute, for 2-d and n-d?)

53
z-ordering
  • Q solution? (w/ good clustering, and easy to
    compute, for 2-d and n-d?)
  • A z-ordering/bit-shuffling/linear-quadtrees
  • looks better
  • few long jumps
  • scoops out the whole quadrant
  • before leaving it
  • a.k.a. space filling curves

54
z-ordering
  • z-ordering/bit-shuffling/linear-quadtrees
  • Q How to generate this curve (z f(x,y) )?
  • A 3 (equivalent) answers!

55
z-ordering
  • z-ordering/bit-shuffling/linear-quadtrees
  • Q How to generate this curve (z f(x,y))?
  • A1 z (or N) shapes, RECURSIVELY

order-2
order-1
...
order (n1)
56
z-ordering
  • Notice
  • self similar (well see about fractals, soon)
  • method is hard to use z ? f(x,y)

order-2
order-1
57
z-ordering
  • z-ordering/bit-shuffling/linear-quadtrees
  • Q How to generate this curve (z f(x,y) )?
  • A 3 (equivalent) answers!

Method 2?
58
z-ordering
  • bit-shuffling

y
11 10 01 00
00
10
x
01
11
59
z-ordering
  • bit-shuffling

y
11 10 01 00
How about the reverse (x,y) g(z) ?
00
10
x
01
11
60
z-ordering
  • bit-shuffling

y
11 10 01 00
How about n-d spaces?
00
10
x
01
11
61
z-ordering
  • z-ordering/bit-shuffling/linear-quadtrees
  • Q How to generate this curve (z f(x,y) )?
  • A 3 (equivalent) answers!

Method 3?
62
z-ordering
  • linear-quadtrees assign N-1, S-0 e.t.c.

W E
1
N S
0
0
1
63
z-ordering
  • ... and repeat recursively. Eg. zgray-cell
  • WNWN (0101)2 5

W E
11
00
1
N S
0
0
1
64
z-ordering
  • Drill z-value of grey cell, with the three
    methods?

W E
1
N S
0
0
1
65
z-ordering
  • Drill z-value of grey cell, with the three
    methods?

W E
method1 14 method2 shuffle(1110)
(1110)2 14
1
N S
0
0
1
66
z-ordering
  • Drill z-value of grey cell, with the three
    methods?

W E
method1 14 method2 shuffle(1110)
(1110)2 14 method3 ENES ... 14
1
N S
0
0
1
67
z-ordering - Detailed outline
  • spatial access methods
  • z-ordering
  • main idea - 3 methods
  • use w/ B-trees algorithms (range, knn queries
    ...)
  • non-point (eg., region) data
  • analysis variations
  • R-trees

68
z-ordering - usage algos
  • Q1 How to store on disk?
  • A
  • Q2 How to answer range queries etc

69
z-ordering - usage algos
  • Q1 How to store on disk?
  • A treat z-value as primary key feed to B-tree

PGH
SF
70
z-ordering - usage algos
  • MAJOR ADVANTAGES w/ B-tree
  • already inside commercial systems (no coding
    /debugging!)
  • concurrency recovery is ready

71
z-ordering - Detailed outline
  • spatial access methods
  • z-ordering
  • main idea - 3 methods
  • use w/ B-trees algorithms (range, knn queries
    ...)
  • non-point (eg., region) data
  • analysis variations
  • R-trees

72
z-ordering - variations
  • Q is z-ordering the best we can do?

73
z-ordering - variations
  • Q is z-ordering the best we can do?
  • A probably not - occasional long jumps
  • Q then?

74
z-ordering - variations
  • Q is z-ordering the best we can do?
  • A probably not - occasional long jumps
  • Q then? A1 Gray codes

75
z-ordering - variations
  • A2 Hilbert curve! (a.k.a. Hilbert-Peano curve)

76
z-ordering - variations
  • Looks better (never long jumps). How to derive
    it?

77
z-ordering - variations
  • Looks better (never long jumps). How to derive
    it?

order-1
order-2
order (n1)
...
78
z-ordering - variations
  • Q function for the Hilbert curve ( h f(x,y) )?
  • A bit-shuffling, followed by post-processing,
  • to account for rotations. Linear on bits.
  • See textbook, for pointers to
    code/algorithms (eg., Jagadish, 90)

79
z-ordering - variations
  • Q how about Hilbert curve in 3-d? n-d?
  • A Exists (and is not unique!). Eg., 3-d, order-1
    Hilbert curves (Hamiltonian paths on cube)

1
2
80
z-ordering - Detailed outline
  • spatial access methods
  • z-ordering
  • main idea - 3 methods
  • use w/ B-trees algorithms (range, knn queries
    ...)
  • non-point (eg., region) data
  • analysis variations
  • R-trees
  • ...

81
z-ordering - analysis
  • Q How many pieces (quad-tree blocks) per
    region?
  • A proportional to perimeter (surface etc)

82
z-ordering - analysis
  • (How long is the coastline, say, of England?
  • Paradox The answer changes with the yard-stick
    - fractals ...)

83
z-ordering - analysis
  • Q Should we decompose a region to full detail
    (and store in B-tree)?

84
z-ordering - analysis
  • Q Should we decompose a region to full detail
    (and store in B-tree)?
  • A NO! approximation with 1-3 pieces/z-values is
    best Orenstein90

85
z-ordering - analysis
  • Q how to measure the goodness of a curve?

86
z-ordering - analysis
  • Q how to measure the goodness of a curve?
  • A e.g., avg. of runs, for range queries

4 runs
3 runs
(runs disk accesses on B-tree)
87
z-ordering - analysis
  • Q So, is Hilbert really better?
  • A 27 fewer runs, for 2-d (similar for 3-d)
  • Q are there formulas for runs, of quadtree
    blocks etc?
  • A Yes (Jagadish Moon etc see textbook)

88
z-ordering - fun observations
  • Hilbert and z-ordering curves space filling
    curves eventually, they visit every point
  • in n-d space - therefore

89
z-ordering - fun observations
  • ... they show that the plane has as many points
    as a line (- headaches for 1900s
    mathematics/topology). (fractals, again!)

90
z-ordering - fun observations
  • Observation 2 Hilbert (like) curve for video
    encoding Y. Matias, CRYPTO 87
  • Given a frame, visit its pixels in randomized
  • hilbert order compress and transmit

91
z-ordering - fun observations
  • In general, Hilbert curve is great for preserving
    distances, clustering, vector quantization etc

92
Conclusions
  • z-ordering is a great idea (n-d points - 1-d
    points feed to B-trees)
  • used by TIGER system and (most probably) by other
    GIS products
  • works great with low-dim points

93
SAMs Detailed Outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees

94
SAMs - more detailed outline
  • R-trees
  • main idea file structure
  • (algorithms insertion/split)
  • (deletion)
  • (search range, nn, spatial joins)
  • variations (packed hilbert...)

95
R-trees
  • z-ordering cuts regions to pieces - dup. elim.
  • how could we avoid that?
  • Idea Minimum Bounding Rectangles

96
R-trees
  • Guttman 84 Main idea allow parents to overlap!
  • guaranteed 50 utilization
  • easier insertion/split algorithms.
  • (only deal with Minimum Bounding Rectangles -
    MBRs)

97
R-trees
  • eg., w/ fanout 4 group nearby rectangles to
    parent MBRs each group - disk page

I
C
A
G
H
F
B
J
E
D
98
R-trees
  • eg., w/ fanout 4

P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
99
R-trees
  • eg., w/ fanout 4

P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
100
R-trees - format of nodes
  • (MBR obj-ptr) for leaf nodes

x-low x-high y-low y-high ...
obj ptr
...
101
R-trees - format of nodes
  • (MBR node-ptr) for non-leaf nodes

102
R-trees - range search?
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
103
R-trees - range search?
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
104
R-trees - range search
  • Observations
  • every parent node completely covers its
    children
  • a child MBR may be covered by more than one
    parent - it is stored under ONLY ONE of them.
    (i.e., no need for dup. elim.)
  • a point query may follow multiple branches.
  • everything works for any dimensionality

105
SAMs - more detailed outline
  • R-trees
  • main idea file structure
  • algorithms insertion/split
  • deletion
  • search range, nn, spatial joins
  • performance analysis
  • variations (packed hilbert...)

106
R-trees - insertion
  • eg., rectangle X

P1
P3
I
C
A
G
H
F
B
X
J
E
P4
D
P2
107
R-trees - insertion
  • eg., rectangle X

P1
P3
I
C
A
G
H
F
B
X
J
E
P4
D
P2
X
108
R-trees - insertion
  • eg., rectangle Y

P1
P3
I
C
A
G
H
F
B
J
E
P4
Y
D
P2
109
R-trees - insertion
  • eg., rectangle Y extend suitable parent.

P1
P3
I
C
A
G
H
F
B
J
E
P4
Y
D
P2
Y
110
R-trees - insertion
  • eg., rectangle Y extend suitable parent.
  • Q how to measure suitability?

111
R-trees - insertion
  • eg., rectangle Y extend suitable parent.
  • Q how to measure suitability?
  • A by increase in area (volume) (more details
    later, under performance analysis)
  • Q what if there is no room? how to split?

112
R-trees - insertion
  • eg., rectangle W

P1
P3
K
I
C
A
G
W
H
F
B
J
K
E
P4
D
P2
113
R-trees - insertion
  • eg., rectangle W - focus on P1 - how to
    split?

P1
K
C
A
W
B
114
R-trees - insertion
  • eg., rectangle W - focus on P1 - how to
    split?

P1
  • (A1 plane sweep,
  • until 50 of rectangles)
  • A2 linear split
  • A3 quadratic split
  • A4 exponential split

K
C
A
W
B
115
R-trees - insertion split
  • pick two rectangles as seeds
  • assign each rectangle R to the closest seed

seed1
116
R-trees - insertion split
  • pick two rectangles as seeds
  • assign each rectangle R to the closest seed
  • Q how to measure closeness?

117
R-trees - insertion split
  • pick two rectangles as seeds
  • assign each rectangle R to the closest seed
  • Q how to measure closeness?
  • A by increase of area (volume)

118
R-trees - insertion split
  • pick two rectangles as seeds
  • assign each rectangle R to the closest seed

seed1
119
R-trees - insertion split
  • pick two rectangles as seeds
  • assign each rectangle R to the closest seed

seed1
120
R-trees - insertion split
  • pick two rectangles as seeds
  • assign each rectangle R to the closest seed
  • smart idea pre-sort rectangles according to
    delta of closeness (ie., schedule easiest choices
    first!)

121
R-trees - insertion - pseudocode
  • decide which parent to put new rectangle into
    (closest parent)
  • if overflow, split to two, using (say,) the
    quadratic split algorithm
  • propagate the split upwards, if necessary
  • update the MBRs of the affected parents.

122
R-trees - insertion - observations
  • many more split algorithms exist (next!)

123
SAMs - more detailed outline
  • R-trees
  • main idea file structure
  • algorithms insertion/split
  • deletion
  • search range, nn, spatial joins
  • performance analysis
  • variations (packed hilbert...)

124
R-trees - deletion
  • delete rectangle
  • if underflow
  • ??

125
R-trees - deletion
  • delete rectangle
  • if underflow
  • temporarily delete all siblings (!)
  • delete the parent node and
  • re-insert them

126
SAMs - more detailed outline
  • R-trees
  • main idea file structure
  • algorithms insertion/split
  • deletion
  • search range, nn, spatial joins
  • performance analysis
  • variations (packed hilbert...)

127
R-trees - range search
  • pseudocode
  • check the root
  • for each branch,
  • if its MBR intersects the query rectangle
  • apply range-search (or print out, if
    this
  • is a leaf)

128
R-trees - nn search
skip
129
R-trees - nn search
skip
  • Q How? (find near neighbor refine...)

130
R-trees - nn search
skip
  • A1 depth-first search then, range query

P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
131
R-trees - nn search
skip
  • A1 depth-first search then, range query

P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
132
R-trees - nn search
skip
  • A1 depth-first search then, range query

P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
133
R-trees - nn search
skip
  • A2 Roussopoulos, sigmod95
  • priority queue, with promising MBRs, and their
    best and worst-case distance
  • main idea

134
R-trees - nn search
skip
consider only P2 and P4, for illustration
q
135
R-trees - nn search
skip
best of P4
P4 is useless for 1-nn
worst of P2
H
J
E
P4
q
D
P2
136
R-trees - nn search
skip
  • what is really the worst of, say, P2?

worst of P2
E
q
D
P2
137
R-trees - nn search
skip
  • what is really the worst of, say, P2?
  • A the smallest of the two red segments!

q
P2
138
R-trees - nn search
skip
  • variations Hjaltason Samet incremental nn
  • build a priority queue
  • scan enough of the tree, to make sure you have
    the k nn
  • to find the (k1)-th, check the queue, and scan
    some more of the tree
  • optimal (but, may need too much memory)

139
SAMs - more detailed outline
skip
  • R-trees
  • main idea file structure
  • algorithms insertion/split
  • deletion
  • search range, nn, spatial joins
  • performance analysis
  • variations (packed hilbert...)

140
R-trees - spatial joins
skip
  • Spatial joins find (quickly) all
  • counties intersecting lakes

141
R-trees - spatial joins
skip
  • Assume that they are both organized in R-trees

142
R-trees - spatial joins
skip
  • for each parent P1 of tree T1
  • for each parent P2 of tree T2
  • if their MBRs intersect,
  • process them recursively (ie., check
    their
  • children)

143
R-trees - spatial joins
skip
  • Improvements - variations
  • - Seeger, sigmod 92 do some pre-filtering do
    plane-sweeping to avoid N1 N2 tests for
    intersection
  • - Lo Ravishankar, sigmod 94 seeded R-trees
  • (FYI, many more papers on spatial joins, without
    R-trees Koudas Sevcik, e.t.c.)

144
SAMs - more detailed outline
  • R-trees
  • main idea file structure
  • algorithms insertion/split
  • deletion
  • search range, nn, spatial joins
  • variations (packed hilbert...)

145
R-trees - variations
  • Guttmans R-trees sparked much follow-up work
  • can we do better splits?
  • what about static datasets (no ins/del/upd)?
  • what about other bounding shapes?

146
R-trees - variations
  • Guttmans R-trees sparked much follow-up work
  • can we do better splits?
  • i.e, defer splits?

147
R-trees - variations
  • A R-trees Kriegel, SIGMOD90
  • defer splits, by forced-reinsert, i.e. instead
    of splitting, temporarily delete some entries,
    shrink overflowing MBR, and re-insert those
    entries
  • Which ones to re-insert?
  • How many?

148
R-trees - variations
  • A R-trees Kriegel, SIGMOD90
  • defer splits, by forced-reinsert, i.e. instead
    of splitting, temporarily delete some entries,
    shrink overflowing MBR, and re-insert those
    entries
  • Which ones to re-insert?
  • How many? A 30

149
R-trees - variations
  • Q Other ways to defer splits?

150
R-trees - variations
  • Q Other ways to defer splits?
  • A Push a few keys to the closest sibling node
  • (closest ??)

151
R-trees - variations
  • R-trees Also try to minimize area AND
    perimeter, in their split.
  • Performance higher space utilization faster
    than plain R-trees. One of the most successful
    R-tree variants.

152
R-trees - variations
  • Guttmans R-trees sparked much follow-up work
  • can we do better splits?
  • what about static datasets (no ins/del/upd)?
  • Hilbert R-trees
  • what about other bounding shapes?

153
R-trees - variations
  • what about static datasets (no ins/del/upd)?
  • Q Best way to pack points?

154
R-trees - variations
  • what about static datasets (no ins/del/upd)?
  • Q Best way to pack points?
  • A1 plane-sweep
  • great for queries on x
  • terrible for y

155
R-trees - variations
  • what about static datasets (no ins/del/upd)?
  • Q Best way to pack points?
  • A1 plane-sweep
  • great for queries on x
  • bad for y

156
R-trees - variations
  • what about static datasets (no ins/del/upd)?
  • Q Best way to pack points?
  • A1 plane-sweep
  • great for queries on x
  • terrible for y
  • Q how to improve?

157
R-trees - variations
  • A plane-sweep on HILBERT curve!

158
R-trees - variations
  • A plane-sweep on HILBERT curve!
  • In fact, it can be made dynamic (how?), as well
    as to handle regions (how?)
  • A Kamel, VLDB94

159
R-trees - variations
  • Guttmans R-trees sparked much follow-up work
  • can we do better splits?
  • what about static datasets (no ins/del/upd)?
  • what about other bounding shapes?

160
R-trees - variations
  • what about other bounding shapes? (and why?)
  • A1 arbitrary-orientation lines (cell-tree,
    Guenther
  • A2 P-trees (polygon trees) (MB polygon 0, 90,
    45, 135 degree lines)

161
R-trees - variations
  • A3 L-shapes holes (hB-tree)
  • A4 TV-trees Lin, VLDB-Journal 1994
  • A5 SR-trees Katayama, SIGMOD97 (used in
    Informedia)

162
R-trees - conclusions
  • Popular method like multi-d B-trees
  • guaranteed utilization
  • good search times (for low-dim. at least)
  • R-, Hilbert- and SR-trees still used
  • IBM (Informix) ships DataBlade with R-trees

163
References
  • Guttman, A. (June 1984). R-Trees A Dynamic Index
    Structure for Spatial Searching. Proc. ACM
    SIGMOD, Boston, Mass.
  • Jagadish, H. V. (May 23-25, 1990). Linear
    Clustering of Objects with Multiple Attributes.
    ACM SIGMOD Conf., Atlantic City, NJ.
  • Lin, K.-I., H. V. Jagadish, et al. (Oct. 1994).
    The TV-tree - An Index Structure for
    High-dimensional Data. VLDB Journal 3 517-542.

164
References, contd
  • Pagel, B., H. Six, et al. (May 1993). Towards an
    Analysis of Range Query Performance. Proc. of ACM
    SIGACT-SIGMOD-SIGART Symposium on Principles of
    Database Systems (PODS), Washington, D.C.
  • Robinson, J. T. (1981). The k-D-B-Tree A Search
    Structure for Large Multidimensional Dynamic
    Indexes. Proc. ACM SIGMOD.
  • Roussopoulos, N., S. Kelley, et al. (May 1995).
    Nearest Neighbor Queries. Proc. of ACM-SIGMOD,
    San Jose, CA.
Write a Comment
User Comments (0)
About PowerShow.com