The Rtree Index - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

The Rtree Index

Description:

'Find all hotels within a 10 mile radius. ... P. M. Stocker, W. Kent, and P. Hammersley, Eds. Very Large Data Bases. ... SIGMOD '90. ACM, New York, NY, 322-331. ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 27
Provided by: rulaal
Category:
Tags: hotel | index | new | rtree | york

less

Transcript and Presenter's Notes

Title: The Rtree Index


1
The R-tree Index
  • Rula Alnaser
  • Rob Bracero
  • CIS 6930 Database Systems
  • USF, Spring 09

2
R-Tree
  • Tree data structured index that handles spatial
    data
  • Height balanced data structure
  • Data entries are stored in leaf nodes
  • Nodes can be kept 50 full (except root)

3
  • Search key is a collection of intervals with one
    interval per dimension
  • Leaf entry ltn-dimension box, ridgt
  • Non-leaf entry ltn-dimension box, pointer to
    child nodegt

4
(No Transcript)
5
Search Queries
  • May have to search several sub-trees at each node
  • Improving search using convex polygons to
    approximate query region more accurately
  • Can detect that there is no node overlap even
    when we have non-empty bounding box intersection
  • Potential I/O cost saving (test node overlap is
    CPU cost)

6
Insert Entry ltB, ptrgt
  • Start at root and go down to best-fit leaf L.
  • Go to child whose box needs least enlargement to
    cover B resolve ties by going to smallest area
    child.
  • If best-fit leaf L has space, insert entry and
    stop. Otherwise, split L into L1 and L2.
  • Adjust entry for L in its parent so that the box
    now covers (only) L1.
  • Add an entry (in the parent node of L) for L2.
    (This could cause the parent node to recursively
    split.)

7
Splitting a Node During Insertion
  • The entries in node L plus the newly inserted
    entry must be distributed between L1 and L2.
  • Goal is to reduce likelihood of both L1 and L2
    being searched on subsequent queries.
  • Idea Redistribute so as to minimize area of L1
    plus area of L2.

GOOD SPLIT!
BAD!
8
Delete Entry
  • Deletion consists of searching for the entry to
    be deleted, removing it, and if the node becomes
    under-full, deleting the node and then
    re-inserting the remaining entries.

9
Types of Queries
  • Spatial Range Queries
  • Find all hotels within a 10 mile radius.
  • Nearest Neighbor Queries
  • Find the 5 parks closest to the USF campus.
  • Spatial Join Queries
  • Find all cities within 100 miles of each other.

10
Spatial Range Queries
R1
R4
R10
R9
R7
R8
R9
R10
R11
R12
R13
R14
R3
R8
R7
R2
R6
  • At each node, follow all children whose bounding
    box overlaps the query bounding box.

R13
R5
R12
R11
R14
11
Spatial Range Queries
R1
R4
R10
Q
R9
R7
R8
R9
R10
R11
R12
R13
R14
R3
R8
R7
R2
R6
  • At each node, follow all children whose bounding
    box overlaps the query bounding box.

R13
R5
R12
R11
R14
12
Spatial Range Queries
R1
R2
R1
R3
R4
R5
R6
R4
R10
Q
R9
R7
R8
R9
R10
R11
R12
R13
R14
R3
R8
R7
R2
R6
  • At each node, follow all children whose bounding
    box overlaps the query bounding box.

R13
R5
R12
R11
R14
13
Spatial Range Queries
R1
R2
R1
R3
R4
R5
R6
R4
R10
Q
R9
R7
R8
R9
R10
R11
R12
R13
R14
R3
R8
R7
R2
R6
  • At each node, follow all children whose bounding
    box overlaps the query bounding box.

R13
R5
R12
R11
R14
14
Spatial Range Queries
R1
R2
R1
R3
R4
R5
R6
R4
R10
Q
R9
R7
R8
R9
R10
R11
R12
R13
R14
R3
R8
R7
R2
R6
  • At each node, follow all children whose bounding
    box overlaps the query bounding box.

R13
R5
R12
R11
R14
N number of records in index M max entries
per node m min entries per node, m lt M/2
  • Best case O(logm N)
  • Worst Case O(N)
  • In practice, only a small portion of the tree is
    searched.

15
Nearest Neighbor Queries
R1
R4
R10
R9
R7
R8
R9
R10
R11
R12
R13
R14
R3
R8
Q
R7
R2
  • Calculate distance to all objects in leaf nodes
    that overlap the query region.

R6
R13
R5
R12
R11
R14
16
Nearest Neighbor Queries
R1
R4
R10
R9
R7
R8
R9
R10
R11
R12
R13
R14
R3
R8
Q
R7
R2
  • Calculate distance to all objects in leaf nodes
    that overlap the query region.
  • If no leaf nodes overlap the query region, expand
    the query bounding box and try again.

R6
R13
R5
R12
R11
R14
17
Nearest Neighbor Queries
R1
R4
R10
R9
R7
R8
R9
R10
R11
R12
R13
R14
R3
R8
Q
R7
R2
  • Calculate distance to all objects in leaf nodes
    that overlap the query region.
  • If no leaf nodes overlap the query region, expand
    the query bounding box and try again.

R6
R13
R5
R12
R11
R14
  • Complexity is similar to range search. It is
    dependent on the data set and query.
  • The algorithm may have to run multiple times
    until an overlapping leaf node is encountered.

18
Nearest Neighbor Queries
  • Alternative Depth-First
  • Starting at root, move to the child with minimum
    distance to query region.
  • Repeat recursively until an object is reached and
    store the distance to that object.
  • Move back up the tree and visit only those
    children that have a distance from the query
    region that is smaller than the smallest distance
    to an object found so far.
  • Once no such child node exists, the algorithm
    terminates.

19
Nearest Neighbor Queries
R1
R4
R10
R9
R7
R8
R9
R10
R11
R12
R13
R14
Q
R3
R8
R7
  • Depth-first search would visit R1, R3, R7, R4,
    R9.

R2
R6
R13
R5
R12
R11
R14
20
Nearest Neighbor Queries
  • Alternative Best-First
  • Starting at the root node, insert all children of
    current node into a priority queue order by
    distance to the query region.
  • Choose the node at the top of the priority queue
    (with smallest distance to query region) and
    repeat recursively.
  • Once a node at the leaf level is chosen, the
    algorithm terminates.
  • Better performance than depth-first search, but
    larger space requirements.

21
Nearest Neighbor Queries
R1
R4
R10
R9
R7
R8
R9
R10
R11
R12
R13
R14
Q
R3
R8
R7
  • Best-first search would visit R1, R3, R4, R9.

R2
R6
R13
R5
R12
R11
R14
22
R-Tree Variants
  • R-Tree
  • R-Tree

23
R-Tree
  • Avoids overlap by inserting entries into multiple
    leaves if necessary.
  • Entries of internal nodes have no overlap.
  • Point query searches now only follow one path
    along tree.
  • Redundant leaf node entries and larger tree
    height.

24
R-Tree
25
R-Tree
  • Uses forced re-inserts to try to minimize overlap
    of minimum bounding rectangles.
  • On an insert, instead of splitting an overflowed
    node, remove some entries and re-insert them.
  • Entries may fit in existing nodes and eliminate
    the need for a split.
  • Minimize box areas, perimeters, and overlap
    rather than only box areas when splitting.

26
References
  • R. Ramakrishnan and J. Gehrke. Database
    Management Systems. 3rd Edition, McGraw-Hill,
    2003.
  • Guttman, A. 1984. R-trees a dynamic index
    structure for spatial searching. In Proceedings
    of the 1984 ACM SIGMOD international Conference
    on Management of Data (Boston, Massachusetts,
    June 18 - 21, 1984). SIGMOD '84. ACM, New York,
    NY, 47-57.
  • Sellis, T. K., Roussopoulos, N., and Faloutsos,
    C. 1987. The R-Tree A Dynamic Index for
    Multi-Dimensional Objects. In Proceedings of the
    13th international Conference on Very Large Data
    Bases (September 01 - 04, 1987). P. M. Stocker,
    W. Kent, and P. Hammersley, Eds. Very Large Data
    Bases. Morgan Kaufmann Publishers, San Francisco,
    CA, 507-518.
  • Beckmann, N., Kriegel, H., Schneider, R., and
    Seeger, B. 1990. The R-tree an efficient and
    robust access method for points and rectangles.
    In Proceedings of the 1990 ACM SIGMOD
    international Conference on Management of Data
    (Atlantic City, New Jersey, United States, May 23
    - 26, 1990). SIGMOD '90. ACM, New York, NY,
    322-331.
Write a Comment
User Comments (0)
About PowerShow.com