Title: A Peer-to-peer Framework for Caching Range Queries
1A Peer-to-peer Framework for Caching Range Queries
- Ozgur D. Sahin
- Abhishek Gupta
- Divyakant Agrawal
- Amr El Abbadi
Department of Computer Science University of
California at Santa Barbara
2Outline
- Motivation
- Range mapping
- System overview
- Experimental results
- Conclusion and future work
3Motivation
- All queries are answered by the server
- Server is overloaded
- Scalability, availability
- Same/similar queries are evaluated multiple times
Central Data Server
Clients
4Motivation
- Users share their cached answers
- Server is contacted only if the P2P layer cannot
find an answer
Central Data Server
P2P Cache
Clients
5P2P Systems
- File sharing Napster, Gnutella, KaZaA,
- Central index or flooding
- Structured P2P systems CAN, Chord, Pastry,
Tapestry, - DHT/DOLR
- Efficient Routing logarithmic/sublinear
6CAN
- Uses a d-dimensional virtual space for routing
and object location - Virtual space is partitioned into zones and each
zone is maintained by a peer - Every peer is responsible for the objects that
are hashed into its zone
2-dimensional CAN
7Extending DHT functionality
- DHTs are designed for exact-match queries
- Piazza Univ. of Washington, Hyperion Univ. of
Toronto, PIER UC Berkeley - Extend DHTs for supporting range queries
- Selection of ranges is a primary operation for
any kind of data analysis - Main Goal Utilize a DHT in order to materialize
and locate cached answers of range queries
8Range Queries
- Given a range query, find the cached answers that
can be used to compute the query answer - Example If the result of lt20,35gt is already
cached in the system, then the query lt22,30gt can
be answered using the cached result - lt22,30gt is subsumed by lt20,35gt so the cached
result is the super-set of the answer
9DHTs for locating ranges
- Can we use original DHTs?
Finds exact answers but not the similar ones!
10Extending CAN
- For single attribute, the virtual space is a
2-dimensional CAN - The boundaries are determined by the domain of
the range attribute
Virtual space when attribute domain is 20,80
11Mapping Scheme
- Range ltx,ygt is mapped to point (x,y)
End value
- Super-ranges are
- only in the upper-
- left region
Start value
12Space Partitioning
- Virtual space is partitioned into rectangular
zones - Each zone is assigned to an active peer
- With this mapping, the data source is responsible
for the top-left zone
13Space Partitioning
14Space Partitioning
S
15Space Partitioning
S
16Space Partitioning
S
- Each active peer keeps a list of passive peers
- Passive peers register with active peers
17Zone Split
- An active peer splits its zone when it is
overloaded - Load can be due to storage or bandwidth, etc.
- Split line is selected by the owner of the zone
- Even partitioning of the zone and the cached
results
- New zone is assigned to a passive peer
18Routing
- Same as in CAN (Greedy routing)
- Each zone passes the message to the neighbor
closest to the destination
19Sharing cached answers
- Map the range to a point and send a notification
message towards that point - Destination peer keeps the index information
A
20Querying
- Map the range to a point and send a query message
towards that point - Destination peer searches the local index
A
P
21Forwarding
- If no result is found at the destination, then
- The zones on the upper-left region may have
super-ranges - Destination zone forwards the request to
upper-left zones
22Acceptable Fit
- Forwarding is controlled by a parameter
AcceptableFit - It is a real value between 0,1
- offset AcceptableFit x domain
- Acceptable range for a range query ltlow, highgt
is then - ltlowoffset , highoffsetgt
offset
(50,55)
offset
23Forwarding Schemes
- Two schemes for forwarding
- Flooding Flood to all candidate zones
- Directed Forwarding Iteratively forward to a
single neighbor, that has the largest overlap
with the acceptable region - Stop if a result is found or a certain number of
peers are contacted (DirectedLimit)
24Flooding vs. Directed Forwarding
25Updates
- Tuple with value 40 is updated!
Go to the corresponding point, (40,40), and
flood to the upper-left region
- Costly, so we need better solutions
- Batching updates
26Multiple range attributes
- Each attribute maps to two dimensions
- A range query over k attributes is mapped to a
point in 2k-dimensional CAN
( 20ltAlt40 , 50ltBlt60 )
(- , - , - , -)
(20,40,50,60)
- Forwarding
- Decreasing coordinates along odd dimensions
- Increasing coordinates along even dimensions
( 10ltAlt50 , 40ltBlt70 ) ? (10,50,40,70)
27Experiment Settings
- Single attribute with domain 0,500
- The system is initially empty
- Range queries are selected uniformly at random
- For every zone
- Split Point5 , Routing Threshold3
28Flooding vs. Directed Forwarding
Performance with flood forwarding
Performance with directed forwarding
29Routing is scalable
Visited zones with Flood forwarding
Visited zones with Directed forwarding
30Load Distribution
1000 peers, 10000 queries
31Conclusion and Future Work
- We presented a simple yet powerful mapping for
ranges which allows us to leverage DHT
infrastructure for range queries - Limitations/Future Work
- Number of attributes should be fixed
- Does not work with other DHTs
- Assumes the existence of passive peers for load
balancing
32Questions?
odsahin_at_cs.ucsb.edu http//www.cs.ucsb.edu/dsl/ga
ia.html