Roger ZimmermannCOMPSAC 2004, September 30 - PowerPoint PPT Presentation

About This Presentation
Title:

Roger ZimmermannCOMPSAC 2004, September 30

Description:

DHT systems: Content Addressable Network (CAN), Chord, Pastry, etc. ... Store the key, value pair at the node which owns the zone within which the point P is located ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 27
Provided by: conference8
Category:

less

Transcript and Presenter's Notes

Title: Roger ZimmermannCOMPSAC 2004, September 30


1
Spatial Data Query Support in Peer-to-Peer Systems
  • Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang
  • Computer Science Department
  • University of Southern California
  • Los Angeles, CA 90089
  • COMPSAC 2004

2
Outline
  • Motivation
  • Introduction to DHTs (CAN)
  • Technical Approach
  • Results
  • Conclusions and Future Research

3
Motivation
  • Spatial data sets are used for many applications,
    e.g., GIS, CAD,
  • P2P systems provide a distributed platform that
    is very scalable.
  • Pros
  • Scalability, no central point of failure
  • Cons
  • Very dynamic (unreliable), topology maintenance
    required

4
Motivaton (cont.)
  • Question how to use P2P systems for spatial data
    sharing.
  • Query Challenges
  • Unstructured P2P systems querying by flooding is
    not efficient
  • Structured P2P systems based on DHTs (Chord,
    CAN) only efficient exact match queries are
    supported
  • E.g., search files based on their
    names/titlesput(key, value) get(key) return
    value

5
Motivation (cont.)
  • Spatial queries are usually range queries
  • Intersect, overlap
  • Nearest neighbor(s) (kNN)
  • DHTs are not suitable without modification

6
Distributed Hash Tables (DHT)
  • DHT systems Content Addressable Network (CAN),
    Chord, Pastry, etc.
  • Using DHT to allocate large data sets to many
    nodes with no central control
  • Data objects are near uniformly distributed
    through a hash function, resulting in superb
    scalability and load balance
  • Each node only maintains a small routing table to
    know its neighbors
  • Locating a particular data object requires
    O(logN) search steps on average

7
Content Addressable Network (CAN)
  • A scalable indexing mechanism in a P2P network
  • Creates a logical d-dimensional Cartesian
    coordinate space
  • Divides the space into zones, where each zone is
    controlled by a node in the system
  • Zones are dynamically partitioned or merged as
    nodes join and leave
  • Each Zone is addressed with a Virtual Identifier
    (VID), which is deterministically calculated from
    the location of the zone

8
Content Addressable Network (CAN)
Example A 2-D space partitioned into 7 CAN zones
9
Content Addressable Network (cont)
  • Node Operations
  • (e.g., Insertion)
  • Find a bootstrap node first

10
Content Addressable Network (cont)
  • 2. Randomly choose a point in the CAN plane and
    route the new node from the bootstrap node to the
    chosen location

11
Content Addressable Network (cont)
  • 3. The new node arrives at the destination zone
    covering that point. The destination zone is
    split into two zones, each controlled by one node
    (old and new)

12
Content Addressable Network (cont)
  • 4. Update the neighborhood zone routing
    information

13
Content Addressable Network (CAN)
  • Data Object Operation (e.g. Insertion)
  • Generate a key based on the object identification
    and insert data object as a ltkey, valuegt pair
  • Map the key into a point P in the CAN plane by
    using a uniform hash function
  • Store the ltkey, valuegt pair at the node which
    owns the zone within which the point P is located
  • To retrieve the value, the same hash function is
    applied to the key in order to regenerate the
    point P and find the zone owns that point, the
    zone will return the value to the client

14
Storing Spatial Data w/ DHTs
  • Hash function distributes data objects evenly
    within the space to achieve a balanced load
  • Spatial locality information needs to be
    preserved for range queries. Applying a hash
    function to spatial data will destroy locality
  • Related work explored storing R-tree or Quad-tree
    based index on DHT
  • Harwood et al. Hashing Spatial Content over
    Peer-to-Peer Networks
  • Mondal et al. P2PR-tree An R-tree-based Spatial
    Index for Peer-to-Peer Environments

15
Spatial Range Query Design for P2P Systems
  • Mapping a physical space to a CAN space
  • Propose a new hash function to map spatial data
    objects onto nodes over a modified CAN system
  • Purpose allow efficient spatial data query
    execution while at the same time considering load
    balance
  • Calculating the location of zones in the logical
    space Virtual Identifier (VID) tree for mapping
    purpose

16
Spatial Range Query Design for P2P Systems
  • Approach
  • Object key is generated with three different
    components
  • (a) Scatter region address based on the spatial
    locality of the object preserves spatial
    locality.
  • (b) Zone address randomized achieves load
    balance
  • (c) Object identifier (hashed)
  • The scatter region size is fixed and predetermined

17
Spatial Range Query Design for P2P Systems
(cont.)
  • The value of zone bit string is decided randomly
    and the object identifier is the data content
    hash result
  • The VID tree is created with its height
    determined by the scatter region size
  • The maximum number of zones is 2(ab)
  • The relationship between data locality and load
    balance can be determined along a spectrum

18
Spatial Range Query Design for P2P Systems
(cont.)
  • Scatter region(11000)e.g.a5 bits

000
001
010
011
100
101
110
111
11
10
01
00
19
Spatial Range Query Design for P2P Systems
(cont.)
  • Zonese.g.b4 bits

000
001
010
011
100
101
110
111
11
10
01
00
20
Spatial Range Query Design for P2P Systems
(cont.)
  • System Operation and Spatial Range Query
  • Node Operation
  • Bootstrap mechanism
  • Node join mechanism
  • Zone split and the search threshold
  • Balance the number of data objects in each zone
  • The zone being selected must be larger than the
    minimum zone size (1/2(ab))
  • The threshold is the upper bound on the number of
    search hops to find a zone to split
  • Data Object Insertion
  • Data Object Deletion
  • Spatial Range Query

21
Spatial Range Query Design for P2P Systems
(cont.)
Spatial Range Query Step 1 The querying node
launches a spatial range query.
22
Spatial Range Query Design for P2P Systems
(cont.)
Spatial Range Query Step 2The node determines
the overlapping scatter regions.
23
Spatial Range Query Design for P2P Systems
(cont.)
Spatial Range Query Step 3The node multicasts
the query to the overlapping scatter regions.
24
Spatial Range Query Design for P2P Systems
(cont.)
  • Step 4
  • The range query is multicast within all
    overlapping scatter regions (M-CAN).
  • Recall data is randomized within each scatter
    region, so an exhaustive search is necessary
  • Choice of scatter region size
  • Large good load balance uniform within a
    scatter region
  • Small exhaustive search covers less area

25
Conclusions and Future Research Directions
  • We proposed a hash function to preserve both
    spatial locality information and constrained load
    balance
  • The proposed mechanism works will with CAN P2P
    architecture
  • We are currently running simulations to test our
    approach

26
Thank you! Questions?
Write a Comment
User Comments (0)
About PowerShow.com