The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree


1
The Priority R-Tree A Practically Efficient and
Worst-Case Optimal R-Tree
  • Lars Arge1, Mark de Berg2, Herman Haverkort3 and
    Ke Yi1
  • Department of Computer Science
  • Duke University
  • Department of Computer Science
  • TU Eindhoven
  • Institute of Information and Computing Sciences
  • Utrecht University

2
Problem Definition
  • Input
  • N rectangles in the plane
  • Window query Q
  • Output
  • All rectangles intersecting Q
  • Applications
  • Spatial databases
  • GIS
  • CAD
  • Computer vision
  • Robotics

3
R-Tree
  • Definition Guttman84
  • Advantages
  • Little redundancy
  • Multi-purpose
  • Easy to update

Fanout ?(B) B disk block size
G
F
E
B
A
H
I
A
B
C
D
E
F
G
H
I
C
D
4
How to Build an R-Tree
  • Repeated insertions
  • Guttman84
  • R-tree Sellis et al. 87
  • R-tree Beckmann et al. 90
  • Bulkloading
  • Hilbert R-Tree Kamel and Faloutos 94
  • Top-down Greedy Split Garcia et al. 98
  • Advantages
  • Much faster than repeated insertions
  • Better space utilization
  • Usually produce R-trees with higher quality

5
R-Tree Variant Hilbert R-Tree
Hilbert Curve
  • To build a Hilbert R-Tree (cost O(N/B logM/BN)
    I/Os)
  • Sort the rectangles by the Hilbert values of
    their centers
  • Build a B-tree on top
  • 4D Hilbert R-tree

6
R-Tree Variant TGS R-Tree
(Top-down Greedy Split)
  • To build a TGS R-tree
  • Start from the root and buildthe tree top-down
  • To build one node, use binary cutsuntil the
    desired fan-out is reached
  • To make a binary cut, consider4 orderings of the
    rectangles xmin, ymin, xmax, ymax
  • In each ordering, consider the B cutting
    positions
  • Choose the one that minimizes the sum of the
    areas of the two resulted bounding boxes
  • Typical bulk-load cost O(N/B log2N) I/Os

7
Our Results
  • None of existing R-tree variants has worst-case
    query performance guarantee!
  • In the worst-case, a query can visit all nodes in
    the tree even when the output size is zero
  • Priority R-Tree
  • The first R-tree variant that answers a query by
    visiting
    nodes in the worst case
  • T Output size
  • It is optimal!
  • There exists a dataset such that for any R-tree,
    there is an empty query that visits
    nodes. Kanth and Singh 99, Agarwal et
    al. 02

8
Roadmap
  • Pseudo-PR-Tree
  • Has the desired
    worst-case guarantee
  • Not a real R-tree
  • Transform a pseudo-PR-Tree into a PR-tree
  • A real R-tree
  • Maintain the worst-case guarantee
  • Experiments
  • PR-tree
  • Hilbert R-tree (2D and 4D)
  • TGS-R-tree

9
Building a Pseudo-PR-Tree
root
priority leaves
Step 1 take out B extreme rectangles from each
direction and put them into priority leaves
10
Building a Pseudo-PR-Tree
Step 2 Divide by the xmin coordinates and build
subtrees recursively. Division is performed
using xmin, ymin, xmax, ymax in a round-robin
fashion, like a 4D kd-tree
root
Analysis sketch nodes with at least one
priority leafcompletely reported O(T/B) nodes
with no priority leaf completely reported
11
Pseudo-PR-Tree to a Real R-tree
12
Query Complexity Remains Unchanged
Next level
nodes visited on leaf level
13
PR-Tree Bulkload Updates
  • Bulkload
  • O(N/Blog2N) I/Os?O(N/BlogM/BN) I/Os, using
    grid method Agarwal et al. 01
  • The same as Hilbert R-tree, but with a larger
    constant
  • Updates
  • Can use any previous heuristic to update in
    O(logBN) I/Os
  • Without worst-case query guarantee
  • Use logarithmic method
  • Insert O(logBN 1/B logM/BN log2(N/M)) I/Os
  • Delete O(logBN) I/Os
  • Extending to d-dimensions
  • Query bound O((N/B)1-1/d T/B), still optimal
  • Bulkload update bounds remain the same

14
Experiments
  • Implemented with TPIE
  • Priority R-tree
  • Hilbert R-tree
  • 4D Hilbert R-tree
  • TGS R-tree
  • Real-life data
  • TIGER datasets
  • 16 million rectangles
  • Synthetic data
  • Varying from normal to extreme data
  • 10 million rectangles

15
Experiments with Real-Life Data
  • Query performance on the TIGER datasets

Shown I/Os spent in answering a query
T/B
16
Experiments with Synthetic Data SIZE
Each side of a rectangle is uniformly
distributed in 0, max_side
Queries are squares with area 1
17
Experiments with Synthetic Data ASPECT
Fix the area, vary aspect ratio
18
Experiments with Synthetic Data SKEWED
Randomly place points, then do yyc on the
y-coordinates
19
Experiments with Synthetic Data CLUSTER
20
Conclusions
  • In theory
  • The PR-tree is the first R-tree variant that
    answers a window query in
    I/Os worst-case, which is optimal
  • In practice
  • Roughly the same as previous best R-trees on
    real-life and relatively nicely distributed data
  • Outperforms them significantly on more extreme
    data
  • Future work
  • How previous heuristics may affect the
    performance of the PR-tree in the dynamic case

21
Lower Bound Construction
  • Each bounding box intersects at least
    queries
  • N/B bounding boxes
  • queries
  • There exists a query that intersects at least
  • bounding boxes

22
Pseudo-PR-Tree Query Complexity
  • Nodes v visited where all rectangles in at least
    one of the priority leaves of vs parent are
    reported O(T/B)
  • Let v be a node visited but none of the priority
    leaves at its parent are reported completely,
    consider vs parent u

2D
4D
Q
ymin ymax(Q)
xmax xmin(Q)
23
Pseudo-PR-Tree Query Complexity
  • The cell in the 4D kd-tree of u is intersected by
    two different 3-dimensional hyper-planes
  • The intersection of each pair of such
    3-dimensional hyper-planes is a 2-dimensional
    hyper-plane
  • Lemma of cells in a d-dimensional kd-tree that
    intersect an axis-parallel f-dimensional
    hyper-plane is O((N/B)f/d)
  • So, such cells in a 4D kd-tree
  • Total nodes visited

u
24
Experiments with Real-Life Data
  • Datasets TIGER/Line data
  • Bulk-loading
Write a Comment
User Comments (0)
About PowerShow.com