Spatial Database Query Processing using Indices - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Spatial Database Query Processing using Indices

Description:

Spatial Database Query Processing using Indices Donghui Zhang CCIS Ph.D. Seminar CCIS, Northeastern University Content Spatial Database Selection Query R-tree k-d-B ... – PowerPoint PPT presentation

Number of Views:294
Avg rating:3.0/5.0
Slides: 69
Provided by: Defa502
Category:

less

Transcript and Presenter's Notes

Title: Spatial Database Query Processing using Indices


1
Spatial Database Query Processing using Indices
  • Donghui Zhang
  • CCIS Ph.D. Seminar
  • CCIS, Northeastern University

2
Content
  • Spatial Database
  • Selection Query
  • R-tree
  • k-d-B-tree
  • Aggregation Query
  • Nearest Neighbor Query

3
Spatial Database
  • Stores a set of spatial data.
  • E.g. hotels, cities, roads, ...
  • Objects have spatial properties!
  • Support queries selection query, nearest
    neighbor query, aggregation query, join query,
    closest-pair query, ...
  • Need to build spatial indices

4
Applications
  • Geographical Information Systems e.g. selection
    query retrieves objects for map.
  • Navigation Systems e.g. nearest neighbor query
    finds the nearest hospital.
  • Environmental Systems e.g. aggregation query
    finds the total precipitation.
  • etc.

5
Selection Query
6
Selection Query
  • If no index, scan through the objects.
  • Object volume may be large!
  • Traditional index structure (B-tree) does not
    work! since it clusters only on one dimension.
  • Tons of spatial index structures.
  • R-tree family
  • k-d-B-tree family

7
Content
  • Spatial Database
  • Selection Query
  • R-tree
  • k-d-B-tree
  • Aggregation Query
  • Nearest Neighbor Query

8
R-tree Guttman84
  • Cluster close-by objects into data pages.
  • Cluster references to the data pages into index
    pages.
  • Recursively cluster till one root left.
  • Store MBR (Minimum Bounding Rectangle) along with
    each reference.
  • Rationale if a query does not intersect an MBR,
    it does not intersect with any object in the
    sub-tree.

9
R-tree
R
I1
A
I2
10
R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
A
I2
11
R-tree More Details
  • Balanced. Height typically?5.
  • External index every disk page (index or data)
    has fixed size, say 8KB.
  • Except for root, every page gt half full.
  • An index page contains some index entries of the
    form (MBR, child-page ref)
  • An selection query starts from the root and only
    examine the sub-trees whose MBRs intersect the
    query region.

12
R-tree More Details
  • To insert an object, choose a sub-tree which can
    hold it.
  • If not possible, choose the sub-tree with minimum
    area expansion.
  • If page overflows, split into two.

13
Best Variation R-tree BKS90
  • With dynamic insertion/deletion, the R-tree may
    have large MBRs.
  • When a page overflows, use forced-reinsertion to
    shrink the area.
  • Some objects are picked and re-inserted.

14
Our Improvement to R-tree
  • Fact 1 R-tree uses forced-reinsertion to (1)
    shrink area of MBR (2) make the MBR square-like.
  • Fact 2 upon page overflow, pick some objects
    whose distance to center are the largest.
  • Observation the action in Fact 2 does not
    achieve the goal in Fact 1!

15
b
a
d
c
16
b
a
d
c
17
b
a
d
c
18
Quality
, ??0,1, e.g. 0.5
0.25
1
0.5
0.5
1
1
Q1
Q2
Q4
19
Gain
20
Our Solution
  • D. Zhang and T. Xia, A Novel Improvement to the
    R-tree Spatial Index using Gain/Loss Metrics,
    ACM GIS, 2004.
  • Proposed algorithms to identify the set of
    objects which, if removed, brings the maximum
    gain.
  • Symmetrically, proposed the concept of loss. When
    no sub-tree can hold a new object, choose the one
    with minimum loss!

21
Another Idea
  • Motivation if an object is far from the other
    objects, any leaf page that contains it will have
    a large MBR.
  • Idea store such object at higher levels of the
    tree!
  • Status rejected from ICDE. Will discuss
    bulk-loading and re-submit.

22
Content
  • Spatial Database
  • Selection Query
  • R-tree
  • k-d-B-tree
  • Aggregation Query
  • Nearest Neighbor Query

23
k-d-B-tree
A
24
k-d-B-tree
A
25
k-d-B-tree
A
B
26
k-d-B-tree
R
A
B
27
k-d-B-tree
R
A
B
C
28
k-d-B-tree
R
B
A
C
29
Index Node Split

D

E


B
A


F



C

A
B
C
D
E
F
30
Problem
B
C
A
D
E
F
G
31
Problem
B
C
A
D
E
F
G
32
Problem
B
C
A
D
E
F
G
33
Best Variation hB-tree LS90
  • Idea maintain the references in an index page as
    a binary tree (kd-tree).
  • Theorem in a binary tree, it is always possible
    to identify a sub-tree with 1/3, 2/3 leaf
    nodes.
  • Make the corresponding references as a new index
    page.

34
Best Variation hB-tree LS90
x1
B
C
A
y3
y3
x2
x3
A
D
E
y2
y2
y1
B
C
y1
F
G
F
D
G
E
x1
x2
x3
35
Best Variation hB-tree LS90
x1
B
C
A
y3
y3
x3
A
D
E
y2
B
C
y1
F
G
x1
x2
x3
36
Content
  • Spatial Database
  • Selection Query
  • R-tree
  • k-d-B-tree
  • Aggregation Query
  • Nearest Neighbor Query

37
Why Aggregation?
  • Aggregation compute the total value over a
    subset of records which satisfy some selection
    condition (e.g. located in an interesting
    region).
  • An important operator for data mining, on-line
    query processing, data warehousing, etc.
  • Data volume is large. With aggregation, user can
    get a good summary quickly.

38
Point Aggregation
  • How many restaurants are in Boston?

39
Straightforward Approach
  • Index the objects using R-tree Guttman84.
  • Reduce to range search.
  • Optimize by storing aggregate information at
    internal nodes LM01.
  • Nevertheless, query time is still O(n).

40
Challenge
Can we compute the aggregate faster?
  • Our approach specialized index, query time
    reduces to logB2(n).
  • box-sum ? dominance-sum
  • BA-tree for dominance-sum
  • D. Zhang, V. J. Tsotras and D. Gunopulos,
    Efficient Aggregation over Objects with Extent,
    PODS02.

41
Dominance-Sum
  • A set of weighted point objects
  • Given query point p, compute total weight of
    objects dominated by p (i.e. to the lower left of
    p).

42
Dominance-Sum
  • A set of weighted point objects
  • Given query point p, compute total weight of
    objects dominated by p (i.e. to the lower left of
    p).

dominance-sum 18
43
BA-tree (for dominance-sum)
  • 1-dimensional augmented B-tree
  • Along with each child pointer in an index node,
    store the total weight of points in the sub-tree
  • Query, update O(log(n)).

Total value of objects whose keyslt100? Follow a
single path!
44
BA-tree (higher dimensions)
  • augmented k-d-B-tree
  • k-d-B-tree Robinson81
  • indexes point objects
  • each index record corresponds to a rectangular
    region
  • region of parent is fully partitioned by regions
    of children.

45
k-d-B-tree
R
  • Compute dominance-sum regarding point p by
    examining all children that intersect the
    rectangle origin, p.
  • In this example A, C, D, E, F, H.

46
BA-tree
R
  • Motivation for augmentation examine a single
    child!
  • the rectangle origin, p can be divided into
    four parts...

47
BA-tree
R
  • dominated by Fs lower-left corner

48
BA-tree
R
  • to the left of F

49
BA-tree
R
  • below F

50
BA-tree
R
  • intersection with F

51
BA-tree
R
  • Compute the total weight of points in these four
    regions separately and add them up!

52
BA-tree
R
  • Total weight of objects in this region a single
    value (independent to where p is) augment F with
    this value (called subtotal).

53
BA-tree
R
  • Total weight of objects in this region computed
    via a 1-dimensional BA-tree (called y-border) for
    the y values of all objects to the left of F.

54
BA-tree
R
  • Total weight of objects in this region computed
    via a 1-dimensional BA-tree (called x-border) for
    the x values of all objects below F.

55
BA-tree
R
  • For this part, examine the sub-tree rooted by F.
  • Only one child! thus a single path from root to
    leaf.

56
BA-tree
R
  • Insertion besides the k-d-B-tree insertion (into
    sub-tree of C), update subtotal (of F, G),
    x-border (of B) and y-border (of H).

57
BA-tree
R
  • Insertion besides the k-d-B-tree insertion (into
    sub-tree of C), update subtotal (of F, G),
    x-border (of B) and y-border (of H).

58
Content
  • Spatial Database
  • Selection Query
  • R-tree
  • k-d-B-tree
  • Aggregation Query
  • Nearest Neighbor Query

59
Nearest Neighbor Query
  • Given a query location q, find the object whose
    distance to q is the closest.

The nearest neighbor is e.
60
NN Query using R-tree
  • Maintain a priority queue of references and
    objects, sorted by the distance to q.
  • Initially, insert reference to the root.
  • Every step, pop an entry closest to q.
  • If pop an reference, get the page and insert its
    children.
  • If pop an object, stop!

61
MinDist
a
b
62
R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
c
b
f
d
e
h
g
63
R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
c
b
f
d
e
h
g
I2
I1
F
D
E
64
R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
c
b
f
d
e
h
g
I2
I1
B
F
D
C
E
A
65
R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
...... Finish if e is popped!
c
b
f
d
e
h
g
PQ
F
d
c
D
b
C
a
E
A
66
R-tree
R
I1
I2
A
B
C
D
E
F
R
I1
a
A
I2
...... Finish if e is popped!
c
b
f
d
e
h
g
PQ
F
d
c
D
b
C
a
E
A
67
Optimization using MinExistDist
B
MinExistDist(B)
  • MinExistDist guarantees ? an object in B within
    this distance.
  • If MinExistDist(B)?MinDist(A), then A can be
    pruned!

68
Summary
  • Spatial Databases
  • Three queries selection query, aggregation
    query, nearest neighbor query.
  • Two indices R-tree, k-d-B-tree.
  • A lot more interesting research! E.g. find
    fastest path on a road network, spatial data
    mining, spatio-temporal, ......

Thank you!
Write a Comment
User Comments (0)
About PowerShow.com