SDRtree: A Scalable Distributed Rtree - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

SDRtree: A Scalable Distributed Rtree

Description:

IAMs adjust it incrementally. 24. Image Adjustment. Client contacts a server with a query ... IMSERVER: no IAMs among the servers. IMCLIENT: client images. 32 ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 34
Provided by: lri47
Category:

less

Transcript and Presenter's Notes

Title: SDRtree: A Scalable Distributed Rtree


1
SD-Rtree A Scalable Distributed Rtree
  • Witold Litwin
  • Cédric du Mouza Philippe Rigaux

2
Plan
  • Introduction
  • SDDS
  • R-tree
  • SD-Rtree Evolution
  • Balancing
  • Spatial Rotations
  • Overlapping
  • Redundant Coverage
  • Queries
  • Performance
  • Conclusion

3
SDDS Principles (1993)
  • Data are at server nodes
  • Communicating through point-to-point messaging
  • Overloaded servers split over new servers
  • Queries go to client nodes use local images of
    the SDDS
  • No central addressing component
  • A node can be client and server (peer)

4
SDDS Principles (1993)
  • An outdated image may send a query an incorrect
    server
  • Servers forward such a query to the correct
    server
  • Image gets adjusted
  • Image Adjustment Message (IAM) comes back
  • Client does not repeat the same error twice
  • Data are basically in the RAM of the servers

5
SD-Rtree a Spatial SDDS
Distributed Spatial Data
6
SD-Rtree a Spatial SDDS
  • Distributed Index
  • No central component

7
SD-Rtree a Spatial SDDS
  • Point Window Queries
  • kNN queries (future)

8
SD-Rtree Generalizes R-tree
  • R-tree
  • Nodes are minimal bounding boxes
  • Leaf nodes point to data
  • Internal nodes bound subtrees
  • May overlap
  • Split when overflow
  • Generate balanced m-ary tree

9
SD-Rtree Generalizes R-tree
  • R-tree
  • An insert may go through multiple paths
  • Ends up in the smallest bounding box
  • If there is any
  • One of the boxes gets enlarged
  • Box may split

10
SD-Rtree Generalizes R-tree
  • R-tree
  • Search may go through multiple paths
  • All paths may bring relevant objects

11
Distribution issues
  • First issue adapt the structure to the context
  • Cost model based on messages
  • No paging ! The degree M of the tree is not a
    constraint
  • gt split and balancing algorithms must be
    reviewed
  • Second issue distribute the tree over the
    servers
  • Balance evenly the load
  • Do not overload the root node
  • gt search algorithms must be reviewed as well

12
SD-Rtree a Balanced Binary Tree
  • Each split generates a new edge
  • Half of data moves to the new server
  • Each server hosts exactly one leaf and one
    internal node of SD-Rtree

13
SD-Rtree a Balanced Binary Tree
  • The SD-Rtree is a balanced binary tree,
    distributed on a set of servers, such that
  • Each internal node (or routing node) has exactly
    two sons
  • Each leaf node stores a subset of the indexed
    dataset
  • At each node, the height of the subtrees differ
    by at most one
  • Each server stores one data node and one routing
    node

14
Sd-tree Binary Tree Structure
  • di data node (leaf)
  • ri routing node (internal node)

15
Sd-tree Tree Distribution
16
Sd-tree Evolution
17
SD-Rtree Balancing
  • The binary tree should be height-balanced
  • The heights of the two subtrees rooted at any
    node should not differ by more than 1 (cf. AVL
    trees)
  • The tree height is then logarithmic in the number
    of leaves

18
SD-Rtree Balancing
  • SD-Rtree balancing occurs during splits
  • Messages are sent bottom-up to adjust the height
    of the ancestor nodes
  • Rotation occurs if an ancestor is imbalanced
  • SD-Rtree rotation are spatial
  • change rectangles of internal nodes
  • Best rotation minimizes rectangle overlapping
  • Tie breaking minimizes the dead space

19
SD-Rtree Spatial Rotations
20
Rotation Pattern
  • Properties
  • The sons of a node are not ordered
  • gt more freedom for reorganizing the tree
  • Any imbalanced node matches a rotation pattern
  • A rotation pattern is a subtree a(b(e(f,g),d),c)
    such that
  • h(c) h(d) h(f ) n - 1 (n gt 0)
  • h(g) max(0, n - 2)

21
SD-Rtree Spatial Rotation
22
Rotation Cost
  • Constant number of messages (3 or 6, depending on
    the choice)
  • Few rotations in practice
  • In particular when the dataset is uniformly
    distributed
  • See our experiments

23
SD-Rtree Images
  • Each image defines the addressing structure
  • Resides as cache on a client or on a peer
  • Starts with the address of the contact server
  • IAMs make it a subtree
  • Splits make images outdated
  • IAMs adjust it incrementally

24
Image Adjustment
  • Client contacts a server with a query
  • Each incorrect server initiates a traversal of
    the tree
  • During the traversal, the description of the
    nodes is collected
  • The correct server sends the up-to-date tree
    structure
  • The client updates its image

25
Image Construction
  • Using the image
  • The client first searches its local image and
    chooses the servers that best corresponds to the
    query
  • The correct server is found in O(log n) in the
    worst case

26
Out-of-range situation
27
Insertion of objects
28
Overlapping management
  • The directory rectangles in an Rtree may overlap
  • Local subtree does not suffice for locating all
    the nodes that contains the point (point query)
    or the window (window query) searched for.
  • SD-Rtree servers maintain data on node
    overlapping
  • Redundant Coverage
  • It avoids to systematically access the root node.

29
Redundant Coverage
  • Example
  • The region common to A and B is stored on both
    nodes
  • If a point query sent to A falls in the region
    shared with B A sends a point query message to B
  • For D we must keep the intersection with C or B
    here empty.

30
Queries
  • Point queries and window queries. The technique
    is similar to the insertion algorithm
  • Search in the client image a server whose mbb
    contains the point or intersects the window
  • Send the query to this server
  • If the server actually covers the point or the
    window it answers to the client else it sends
    the query to its parent node
  • A server uses the overlapping information to
    transmit the query

31
Experiments
  • Synthetic data (points and rectangles) generated
    with GSTD
  • 50.000 to 500.000 objects
  • 0 to 3.000 queries
  • Server capacity 3 000 objects
  • Comparison of three SD-Rtree variants
  • BASIC no image every query is processed
    top-down from the root
  • IMSERVER no IAMs among the servers
  • IMCLIENT client images

32
Cumulative Insert cost
33
Per Insert Cost
34
Cost of balancing
35
Image convergence
36
Distribution of messages
37
Cost per Query
38
Conclusion
  • SD-Rtree is an efficient scalable distributed
    Rtree
  • For very large spatial data collections
  • Can be processed in distributed RAM
  • Access time much faster than to disk data
  • Load balancing
  • Spatial rotations
  • Overlapping management
  • Redundant coverage
  • O(log n) worst insert cost
  • Future work
  • kNN-queries
  • Objects distribution balancing on servers

39
SD-Rtree
  • Thank You
  • for
  • Your Attention
  • Questions First.Last_at_dauphine.fr
Write a Comment
User Comments (0)
About PowerShow.com