Roger ZimmermannCOMPSAC 2004, September 30 - PowerPoint PPT Presentation

About This Presentation

Title:

Roger ZimmermannCOMPSAC 2004, September 30

Description:

DHT systems: Content Addressable Network (CAN), Chord, Pastry, etc. ... Store the key, value pair at the node which owns the zone within which the point P is located ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 27

Provided by: conference8

Learn more at: https://conferences.computer.org

Category:

more less

Transcript and Presenter's Notes

Title: Roger ZimmermannCOMPSAC 2004, September 30

1
Spatial Data Query Support in Peer-to-Peer Systems

Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang
Computer Science Department
University of Southern California
Los Angeles, CA 90089
COMPSAC 2004

2
Outline

Motivation
Introduction to DHTs (CAN)
Technical Approach
Results
Conclusions and Future Research

3
Motivation

Spatial data sets are used for many applications,
e.g., GIS, CAD,
P2P systems provide a distributed platform that
is very scalable.
Pros
Scalability, no central point of failure
Cons
Very dynamic (unreliable), topology maintenance
required

4
Motivaton (cont.)

Question how to use P2P systems for spatial data
sharing.
Query Challenges
Unstructured P2P systems querying by flooding is
not efficient
Structured P2P systems based on DHTs (Chord,
CAN) only efficient exact match queries are
supported
E.g., search files based on their
names/titlesput(key, value) get(key) return
value

5
Motivation (cont.)

Spatial queries are usually range queries
Intersect, overlap
Nearest neighbor(s) (kNN)
DHTs are not suitable without modification

6
Distributed Hash Tables (DHT)

DHT systems Content Addressable Network (CAN),
Chord, Pastry, etc.
Using DHT to allocate large data sets to many
nodes with no central control
Data objects are near uniformly distributed
through a hash function, resulting in superb
scalability and load balance
Each node only maintains a small routing table to
know its neighbors
Locating a particular data object requires
O(logN) search steps on average

7
Content Addressable Network (CAN)

A scalable indexing mechanism in a P2P network
Creates a logical d-dimensional Cartesian
coordinate space
Divides the space into zones, where each zone is
controlled by a node in the system
Zones are dynamically partitioned or merged as
nodes join and leave
Each Zone is addressed with a Virtual Identifier
(VID), which is deterministically calculated from
the location of the zone

8
Content Addressable Network (CAN)
Example A 2-D space partitioned into 7 CAN zones
9
Content Addressable Network (cont)

Node Operations
(e.g., Insertion)
Find a bootstrap node first

10
Content Addressable Network (cont)

2. Randomly choose a point in the CAN plane and
route the new node from the bootstrap node to the
chosen location

11
Content Addressable Network (cont)

3. The new node arrives at the destination zone
covering that point. The destination zone is
split into two zones, each controlled by one node
(old and new)

12
Content Addressable Network (cont)

4. Update the neighborhood zone routing
information

13
Content Addressable Network (CAN)

Data Object Operation (e.g. Insertion)
Generate a key based on the object identification
and insert data object as a ltkey, valuegt pair
Map the key into a point P in the CAN plane by
using a uniform hash function
Store the ltkey, valuegt pair at the node which
owns the zone within which the point P is located
To retrieve the value, the same hash function is
applied to the key in order to regenerate the
point P and find the zone owns that point, the
zone will return the value to the client

14
Storing Spatial Data w/ DHTs

Hash function distributes data objects evenly
within the space to achieve a balanced load
Spatial locality information needs to be
preserved for range queries. Applying a hash
function to spatial data will destroy locality
Related work explored storing R-tree or Quad-tree
based index on DHT
Harwood et al. Hashing Spatial Content over
Peer-to-Peer Networks
Mondal et al. P2PR-tree An R-tree-based Spatial
Index for Peer-to-Peer Environments

15
Spatial Range Query Design for P2P Systems

Mapping a physical space to a CAN space
Propose a new hash function to map spatial data
objects onto nodes over a modified CAN system
Purpose allow efficient spatial data query
execution while at the same time considering load
balance
Calculating the location of zones in the logical
space Virtual Identifier (VID) tree for mapping
purpose

16
Spatial Range Query Design for P2P Systems

Approach
Object key is generated with three different
components
(a) Scatter region address based on the spatial
locality of the object preserves spatial
locality.
(b) Zone address randomized achieves load
balance
(c) Object identifier (hashed)
The scatter region size is fixed and predetermined

17
Spatial Range Query Design for P2P Systems
(cont.)

The value of zone bit string is decided randomly
and the object identifier is the data content
hash result
The VID tree is created with its height
determined by the scatter region size
The maximum number of zones is 2(ab)
The relationship between data locality and load
balance can be determined along a spectrum

18
Spatial Range Query Design for P2P Systems
(cont.)

Scatter region(11000)e.g.a5 bits

000
001
010
011
100
101
110
111
11
10
01
00
19
Spatial Range Query Design for P2P Systems
(cont.)

Zonese.g.b4 bits

000
001
010
011
100
101
110
111
11
10
01
00
20
Spatial Range Query Design for P2P Systems
(cont.)

System Operation and Spatial Range Query
Node Operation
Bootstrap mechanism
Node join mechanism
Zone split and the search threshold
Balance the number of data objects in each zone
The zone being selected must be larger than the
minimum zone size (1/2(ab))
The threshold is the upper bound on the number of
search hops to find a zone to split
Data Object Insertion
Data Object Deletion
Spatial Range Query

21
Spatial Range Query Design for P2P Systems
(cont.)
Spatial Range Query Step 1 The querying node
launches a spatial range query.
22
Spatial Range Query Design for P2P Systems
(cont.)
Spatial Range Query Step 2The node determines
the overlapping scatter regions.
23
Spatial Range Query Design for P2P Systems
(cont.)
Spatial Range Query Step 3The node multicasts
the query to the overlapping scatter regions.
24
Spatial Range Query Design for P2P Systems
(cont.)

Step 4
The range query is multicast within all
overlapping scatter regions (M-CAN).
Recall data is randomized within each scatter
region, so an exhaustive search is necessary
Choice of scatter region size
Large good load balance uniform within a
scatter region
Small exhaustive search covers less area

25
Conclusions and Future Research Directions

We proposed a hash function to preserve both
spatial locality information and constrained load
balance
The proposed mechanism works will with CAN P2P
architecture
We are currently running simulations to test our
approach

26
Thank you! Questions?

Write a Comment

User Comments (0)