Title: Roger ZimmermannCOMPSAC 2004, September 30
1Spatial Data Query Support in Peer-to-Peer Systems
- Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang
- Computer Science Department
- University of Southern California
- Los Angeles, CA 90089
- COMPSAC 2004
2Outline
- Motivation
- Introduction to DHTs (CAN)
- Technical Approach
- Results
- Conclusions and Future Research
3Motivation
- Spatial data sets are used for many applications,
e.g., GIS, CAD, - P2P systems provide a distributed platform that
is very scalable. - Pros
- Scalability, no central point of failure
- Cons
- Very dynamic (unreliable), topology maintenance
required
4Motivaton (cont.)
- Question how to use P2P systems for spatial data
sharing. - Query Challenges
- Unstructured P2P systems querying by flooding is
not efficient - Structured P2P systems based on DHTs (Chord,
CAN) only efficient exact match queries are
supported - E.g., search files based on their
names/titlesput(key, value) get(key) return
value
5Motivation (cont.)
- Spatial queries are usually range queries
- Intersect, overlap
- Nearest neighbor(s) (kNN)
- DHTs are not suitable without modification
6Distributed Hash Tables (DHT)
- DHT systems Content Addressable Network (CAN),
Chord, Pastry, etc. - Using DHT to allocate large data sets to many
nodes with no central control - Data objects are near uniformly distributed
through a hash function, resulting in superb
scalability and load balance - Each node only maintains a small routing table to
know its neighbors - Locating a particular data object requires
O(logN) search steps on average
7Content Addressable Network (CAN)
- A scalable indexing mechanism in a P2P network
- Creates a logical d-dimensional Cartesian
coordinate space - Divides the space into zones, where each zone is
controlled by a node in the system - Zones are dynamically partitioned or merged as
nodes join and leave - Each Zone is addressed with a Virtual Identifier
(VID), which is deterministically calculated from
the location of the zone
8Content Addressable Network (CAN)
Example A 2-D space partitioned into 7 CAN zones
9Content Addressable Network (cont)
- Node Operations
- (e.g., Insertion)
- Find a bootstrap node first
10Content Addressable Network (cont)
- 2. Randomly choose a point in the CAN plane and
route the new node from the bootstrap node to the
chosen location
11Content Addressable Network (cont)
- 3. The new node arrives at the destination zone
covering that point. The destination zone is
split into two zones, each controlled by one node
(old and new)
12Content Addressable Network (cont)
- 4. Update the neighborhood zone routing
information
13Content Addressable Network (CAN)
- Data Object Operation (e.g. Insertion)
- Generate a key based on the object identification
and insert data object as a ltkey, valuegt pair - Map the key into a point P in the CAN plane by
using a uniform hash function - Store the ltkey, valuegt pair at the node which
owns the zone within which the point P is located
- To retrieve the value, the same hash function is
applied to the key in order to regenerate the
point P and find the zone owns that point, the
zone will return the value to the client
14Storing Spatial Data w/ DHTs
- Hash function distributes data objects evenly
within the space to achieve a balanced load - Spatial locality information needs to be
preserved for range queries. Applying a hash
function to spatial data will destroy locality - Related work explored storing R-tree or Quad-tree
based index on DHT - Harwood et al. Hashing Spatial Content over
Peer-to-Peer Networks - Mondal et al. P2PR-tree An R-tree-based Spatial
Index for Peer-to-Peer Environments
15Spatial Range Query Design for P2P Systems
- Mapping a physical space to a CAN space
- Propose a new hash function to map spatial data
objects onto nodes over a modified CAN system - Purpose allow efficient spatial data query
execution while at the same time considering load
balance - Calculating the location of zones in the logical
space Virtual Identifier (VID) tree for mapping
purpose
16Spatial Range Query Design for P2P Systems
- Approach
- Object key is generated with three different
components - (a) Scatter region address based on the spatial
locality of the object preserves spatial
locality. - (b) Zone address randomized achieves load
balance - (c) Object identifier (hashed)
- The scatter region size is fixed and predetermined
17Spatial Range Query Design for P2P Systems
(cont.)
- The value of zone bit string is decided randomly
and the object identifier is the data content
hash result - The VID tree is created with its height
determined by the scatter region size - The maximum number of zones is 2(ab)
- The relationship between data locality and load
balance can be determined along a spectrum
18Spatial Range Query Design for P2P Systems
(cont.)
- Scatter region(11000)e.g.a5 bits
000
001
010
011
100
101
110
111
11
10
01
00
19Spatial Range Query Design for P2P Systems
(cont.)
000
001
010
011
100
101
110
111
11
10
01
00
20Spatial Range Query Design for P2P Systems
(cont.)
- System Operation and Spatial Range Query
- Node Operation
- Bootstrap mechanism
- Node join mechanism
- Zone split and the search threshold
- Balance the number of data objects in each zone
- The zone being selected must be larger than the
minimum zone size (1/2(ab)) - The threshold is the upper bound on the number of
search hops to find a zone to split - Data Object Insertion
- Data Object Deletion
- Spatial Range Query
21Spatial Range Query Design for P2P Systems
(cont.)
Spatial Range Query Step 1 The querying node
launches a spatial range query.
22Spatial Range Query Design for P2P Systems
(cont.)
Spatial Range Query Step 2The node determines
the overlapping scatter regions.
23Spatial Range Query Design for P2P Systems
(cont.)
Spatial Range Query Step 3The node multicasts
the query to the overlapping scatter regions.
24Spatial Range Query Design for P2P Systems
(cont.)
- Step 4
- The range query is multicast within all
overlapping scatter regions (M-CAN). - Recall data is randomized within each scatter
region, so an exhaustive search is necessary - Choice of scatter region size
- Large good load balance uniform within a
scatter region - Small exhaustive search covers less area
25Conclusions and Future Research Directions
- We proposed a hash function to preserve both
spatial locality information and constrained load
balance - The proposed mechanism works will with CAN P2P
architecture - We are currently running simulations to test our
approach
26Thank you! Questions?