The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree presentation

About This Presentation

Transcript and Presenter's Notes

Title: The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree

1
The Priority R-Tree A Practically Efficient and
Worst-Case Optimal R-Tree

Lars Arge1, Mark de Berg2, Herman Haverkort3 and
Ke Yi1
Department of Computer Science
Duke University
Department of Computer Science
TU Eindhoven
Institute of Information and Computing Sciences
Utrecht University

2
Problem Definition

Input
N rectangles in the plane
Window query Q
Output
All rectangles intersecting Q
Applications
Spatial databases
GIS
CAD
Computer vision
Robotics

3
R-Tree

Definition Guttman84
Advantages
Little redundancy
Multi-purpose
Easy to update

Fanout ?(B) B disk block size
G
F
E
B
A
H
I
A
B
C
D
E
F
G
H
I
C
D
4
How to Build an R-Tree

Repeated insertions
Guttman84
R-tree Sellis et al. 87
R-tree Beckmann et al. 90
Bulkloading
Hilbert R-Tree Kamel and Faloutos 94
Top-down Greedy Split Garcia et al. 98
Advantages
Much faster than repeated insertions
Better space utilization
Usually produce R-trees with higher quality

5
R-Tree Variant Hilbert R-Tree
Hilbert Curve

To build a Hilbert R-Tree (cost O(N/B logM/BN)
I/Os)
Sort the rectangles by the Hilbert values of
their centers
Build a B-tree on top
4D Hilbert R-tree

6
R-Tree Variant TGS R-Tree
(Top-down Greedy Split)

To build a TGS R-tree
Start from the root and buildthe tree top-down
To build one node, use binary cutsuntil the
desired fan-out is reached
To make a binary cut, consider4 orderings of the
rectangles xmin, ymin, xmax, ymax
In each ordering, consider the B cutting
positions
Choose the one that minimizes the sum of the
areas of the two resulted bounding boxes
Typical bulk-load cost O(N/B log2N) I/Os

7
Our Results

None of existing R-tree variants has worst-case
query performance guarantee!
In the worst-case, a query can visit all nodes in
the tree even when the output size is zero
Priority R-Tree
The first R-tree variant that answers a query by
visiting
nodes in the worst case
T Output size
It is optimal!
There exists a dataset such that for any R-tree,
there is an empty query that visits
nodes. Kanth and Singh 99, Agarwal et
al. 02

8
Roadmap

Pseudo-PR-Tree
Has the desired
worst-case guarantee
Not a real R-tree
Transform a pseudo-PR-Tree into a PR-tree
A real R-tree
Maintain the worst-case guarantee
Experiments
PR-tree
Hilbert R-tree (2D and 4D)
TGS-R-tree

9
Building a Pseudo-PR-Tree
root
priority leaves
Step 1 take out B extreme rectangles from each
direction and put them into priority leaves
10
Building a Pseudo-PR-Tree
Step 2 Divide by the xmin coordinates and build
subtrees recursively. Division is performed
using xmin, ymin, xmax, ymax in a round-robin
fashion, like a 4D kd-tree
root
Analysis sketch nodes with at least one
priority leafcompletely reported O(T/B) nodes
with no priority leaf completely reported
11
Pseudo-PR-Tree to a Real R-tree
12
Query Complexity Remains Unchanged
Next level
nodes visited on leaf level
13
PR-Tree Bulkload Updates

Bulkload
O(N/Blog2N) I/Os?O(N/BlogM/BN) I/Os, using
grid method Agarwal et al. 01
The same as Hilbert R-tree, but with a larger
constant
Updates
Can use any previous heuristic to update in
O(logBN) I/Os
Without worst-case query guarantee
Use logarithmic method
Insert O(logBN 1/B logM/BN log2(N/M)) I/Os
Delete O(logBN) I/Os
Extending to d-dimensions
Query bound O((N/B)1-1/d T/B), still optimal
Bulkload update bounds remain the same

14
Experiments

Implemented with TPIE
Priority R-tree
Hilbert R-tree
4D Hilbert R-tree
TGS R-tree
Real-life data
TIGER datasets
16 million rectangles
Synthetic data
Varying from normal to extreme data
10 million rectangles

15
Experiments with Real-Life Data

Query performance on the TIGER datasets

Shown I/Os spent in answering a query
T/B
16
Experiments with Synthetic Data SIZE
Each side of a rectangle is uniformly
distributed in 0, max_side
Queries are squares with area 1
17
Experiments with Synthetic Data ASPECT
Fix the area, vary aspect ratio
18
Experiments with Synthetic Data SKEWED
Randomly place points, then do yyc on the
y-coordinates
19
Experiments with Synthetic Data CLUSTER
20
Conclusions

In theory
The PR-tree is the first R-tree variant that
answers a window query in
I/Os worst-case, which is optimal
In practice
Roughly the same as previous best R-trees on
real-life and relatively nicely distributed data
Outperforms them significantly on more extreme
data
Future work
How previous heuristics may affect the
performance of the PR-tree in the dynamic case

21
Lower Bound Construction

Each bounding box intersects at least
queries
N/B bounding boxes
queries
There exists a query that intersects at least
bounding boxes

22
Pseudo-PR-Tree Query Complexity

Nodes v visited where all rectangles in at least
one of the priority leaves of vs parent are
reported O(T/B)
Let v be a node visited but none of the priority
leaves at its parent are reported completely,
consider vs parent u

2D
4D
Q
ymin ymax(Q)
xmax xmin(Q)
23
Pseudo-PR-Tree Query Complexity

The cell in the 4D kd-tree of u is intersected by
two different 3-dimensional hyper-planes
The intersection of each pair of such
3-dimensional hyper-planes is a 2-dimensional
hyper-plane
Lemma of cells in a d-dimensional kd-tree that
intersect an axis-parallel f-dimensional
hyper-plane is O((N/B)f/d)
So, such cells in a 4D kd-tree
Total nodes visited

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree PowerPoint PPT Presentation