Title: Spatial Database Query Processing using Indices
1Spatial Database Query Processing using Indices
- Donghui Zhang
- CCIS Ph.D. Seminar
- CCIS, Northeastern University
2Content
- Spatial Database
- Selection Query
- R-tree
- k-d-B-tree
- Aggregation Query
- Nearest Neighbor Query
3Spatial Database
- Stores a set of spatial data.
- E.g. hotels, cities, roads, ...
- Objects have spatial properties!
- Support queries selection query, nearest
neighbor query, aggregation query, join query,
closest-pair query, ... - Need to build spatial indices
4Applications
- Geographical Information Systems e.g. selection
query retrieves objects for map. - Navigation Systems e.g. nearest neighbor query
finds the nearest hospital. - Environmental Systems e.g. aggregation query
finds the total precipitation. - etc.
5Selection Query
6Selection Query
- If no index, scan through the objects.
- Object volume may be large!
- Traditional index structure (B-tree) does not
work! since it clusters only on one dimension. - Tons of spatial index structures.
- R-tree family
- k-d-B-tree family
7Content
- Spatial Database
- Selection Query
- R-tree
- k-d-B-tree
- Aggregation Query
- Nearest Neighbor Query
8R-tree Guttman84
- Cluster close-by objects into data pages.
- Cluster references to the data pages into index
pages. - Recursively cluster till one root left.
- Store MBR (Minimum Bounding Rectangle) along with
each reference. - Rationale if a query does not intersect an MBR,
it does not intersect with any object in the
sub-tree.
9R-tree
R
I1
A
I2
10R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
A
I2
11R-tree More Details
- Balanced. Height typically?5.
- External index every disk page (index or data)
has fixed size, say 8KB. - Except for root, every page gt half full.
- An index page contains some index entries of the
form (MBR, child-page ref) - An selection query starts from the root and only
examine the sub-trees whose MBRs intersect the
query region.
12R-tree More Details
- To insert an object, choose a sub-tree which can
hold it. - If not possible, choose the sub-tree with minimum
area expansion. - If page overflows, split into two.
13Best Variation R-tree BKS90
- With dynamic insertion/deletion, the R-tree may
have large MBRs. - When a page overflows, use forced-reinsertion to
shrink the area. - Some objects are picked and re-inserted.
14Our Improvement to R-tree
- Fact 1 R-tree uses forced-reinsertion to (1)
shrink area of MBR (2) make the MBR square-like.
- Fact 2 upon page overflow, pick some objects
whose distance to center are the largest. - Observation the action in Fact 2 does not
achieve the goal in Fact 1!
15b
a
d
c
16b
a
d
c
17b
a
d
c
18Quality
, ??0,1, e.g. 0.5
0.25
1
0.5
0.5
1
1
Q1
Q2
Q4
19Gain
20 Our Solution
- D. Zhang and T. Xia, A Novel Improvement to the
R-tree Spatial Index using Gain/Loss Metrics,
ACM GIS, 2004. - Proposed algorithms to identify the set of
objects which, if removed, brings the maximum
gain. - Symmetrically, proposed the concept of loss. When
no sub-tree can hold a new object, choose the one
with minimum loss!
21Another Idea
- Motivation if an object is far from the other
objects, any leaf page that contains it will have
a large MBR. - Idea store such object at higher levels of the
tree! - Status rejected from ICDE. Will discuss
bulk-loading and re-submit.
22Content
- Spatial Database
- Selection Query
- R-tree
- k-d-B-tree
- Aggregation Query
- Nearest Neighbor Query
23k-d-B-tree
A
24k-d-B-tree
A
25k-d-B-tree
A
B
26k-d-B-tree
R
A
B
27k-d-B-tree
R
A
B
C
28k-d-B-tree
R
B
A
C
29Index Node Split
D
E
B
A
F
C
A
B
C
D
E
F
30Problem
B
C
A
D
E
F
G
31Problem
B
C
A
D
E
F
G
32Problem
B
C
A
D
E
F
G
33Best Variation hB-tree LS90
- Idea maintain the references in an index page as
a binary tree (kd-tree). - Theorem in a binary tree, it is always possible
to identify a sub-tree with 1/3, 2/3 leaf
nodes. - Make the corresponding references as a new index
page.
34Best Variation hB-tree LS90
x1
B
C
A
y3
y3
x2
x3
A
D
E
y2
y2
y1
B
C
y1
F
G
F
D
G
E
x1
x2
x3
35Best Variation hB-tree LS90
x1
B
C
A
y3
y3
x3
A
D
E
y2
B
C
y1
F
G
x1
x2
x3
36Content
- Spatial Database
- Selection Query
- R-tree
- k-d-B-tree
- Aggregation Query
- Nearest Neighbor Query
37Why Aggregation?
- Aggregation compute the total value over a
subset of records which satisfy some selection
condition (e.g. located in an interesting
region). - An important operator for data mining, on-line
query processing, data warehousing, etc. - Data volume is large. With aggregation, user can
get a good summary quickly.
38Point Aggregation
- How many restaurants are in Boston?
39Straightforward Approach
- Index the objects using R-tree Guttman84.
- Reduce to range search.
- Optimize by storing aggregate information at
internal nodes LM01.
- Nevertheless, query time is still O(n).
40Challenge
Can we compute the aggregate faster?
- Our approach specialized index, query time
reduces to logB2(n).
- box-sum ? dominance-sum
- BA-tree for dominance-sum
- D. Zhang, V. J. Tsotras and D. Gunopulos,
Efficient Aggregation over Objects with Extent,
PODS02.
41Dominance-Sum
- A set of weighted point objects
- Given query point p, compute total weight of
objects dominated by p (i.e. to the lower left of
p).
42Dominance-Sum
- A set of weighted point objects
- Given query point p, compute total weight of
objects dominated by p (i.e. to the lower left of
p).
dominance-sum 18
43BA-tree (for dominance-sum)
- 1-dimensional augmented B-tree
- Along with each child pointer in an index node,
store the total weight of points in the sub-tree - Query, update O(log(n)).
Total value of objects whose keyslt100? Follow a
single path!
44BA-tree (higher dimensions)
- indexes point objects
- each index record corresponds to a rectangular
region - region of parent is fully partitioned by regions
of children.
45k-d-B-tree
R
- Compute dominance-sum regarding point p by
examining all children that intersect the
rectangle origin, p. - In this example A, C, D, E, F, H.
46BA-tree
R
- Motivation for augmentation examine a single
child! - the rectangle origin, p can be divided into
four parts...
47BA-tree
R
- dominated by Fs lower-left corner
48BA-tree
R
49BA-tree
R
50BA-tree
R
51BA-tree
R
- Compute the total weight of points in these four
regions separately and add them up!
52BA-tree
R
- Total weight of objects in this region a single
value (independent to where p is) augment F with
this value (called subtotal).
53BA-tree
R
- Total weight of objects in this region computed
via a 1-dimensional BA-tree (called y-border) for
the y values of all objects to the left of F.
54BA-tree
R
- Total weight of objects in this region computed
via a 1-dimensional BA-tree (called x-border) for
the x values of all objects below F.
55BA-tree
R
- For this part, examine the sub-tree rooted by F.
- Only one child! thus a single path from root to
leaf.
56BA-tree
R
- Insertion besides the k-d-B-tree insertion (into
sub-tree of C), update subtotal (of F, G),
x-border (of B) and y-border (of H).
57BA-tree
R
- Insertion besides the k-d-B-tree insertion (into
sub-tree of C), update subtotal (of F, G),
x-border (of B) and y-border (of H).
58Content
- Spatial Database
- Selection Query
- R-tree
- k-d-B-tree
- Aggregation Query
- Nearest Neighbor Query
59Nearest Neighbor Query
- Given a query location q, find the object whose
distance to q is the closest.
The nearest neighbor is e.
60NN Query using R-tree
- Maintain a priority queue of references and
objects, sorted by the distance to q. - Initially, insert reference to the root.
- Every step, pop an entry closest to q.
- If pop an reference, get the page and insert its
children. - If pop an object, stop!
61MinDist
a
b
62R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
c
b
f
d
e
h
g
63R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
c
b
f
d
e
h
g
I2
I1
F
D
E
64R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
c
b
f
d
e
h
g
I2
I1
B
F
D
C
E
A
65R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
...... Finish if e is popped!
c
b
f
d
e
h
g
PQ
F
d
c
D
b
C
a
E
A
66R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
...... Finish if e is popped!
c
b
f
d
e
h
g
PQ
F
d
c
D
b
C
a
E
A
67Optimization using MinExistDist
B
MinExistDist(B)
- MinExistDist guarantees ? an object in B within
this distance.
- If MinExistDist(B)?MinDist(A), then A can be
pruned!
68Summary
- Spatial Databases
- Three queries selection query, aggregation
query, nearest neighbor query. - Two indices R-tree, k-d-B-tree.
- A lot more interesting research! E.g. find
fastest path on a road network, spatial data
mining, spatio-temporal, ......
Thank you!