Title: Scalable P2P Search
1Scalable P2P Search
- Daniel A. Menascé
- George Mason University
2Outline
- Motivation behind P2P systems
- Considerations
- Resource-Location Problem
- Probabilistic Search Protocol
- Protocol Performance
- Conclusion
- Other P2P Efforts
3Motivation (1)
- Client-server model
- Underusage of Internets bandwidth
- Increasing load on dedicated servers
- Vulnerable attacks to servers
- Single point of failure
4Motivation (2)
- P2P systems rely on individual computers
computing power storage capacity - Better utilize bandwith
- Distribute load in a self organizing manner
- Robust to random attacks
- Especially, if the P2P system exhibits the small
world property (Most peers have few links to
other peers) - Enhance reliability leading to fault tolerance
- Does not rely on dedicated servers
5Motivation (3)
- Some application areas of P2P systems
- Distributed directory systems
- E-commerce models
- Web service discovery
6Considerations
- P2P nodes,
- act as both clients servers
- form an application network and route messages
- i.e. for locating a service
- Design of communication (messaging) protocols
have crutial importance by means of efficiency - Naive approaches usually fail
- i.e. Gnutellas flood routing adds traffic
overhead
7Resource Location Problem
- Deterministic approach given a resource name,
find the node or nodes that manage the resource - May not be feasible in a very large network
- Probabilistic approach given a resource name,
find with a given probability the node or nodes
that manage the resource
8Probabilistic Search Protocol (1)
- Trades performance and scalability for the
probability Pf, that a resource will be located - Goal Achieve a probability Pf close to 1 with
much lower cost compared to the deterministic case
9Probabilistic Search Protocol (2)
- Cost Measurement
- Number of messages exchanged
- Bandwidth used
- Number of peers contacted
- Abbrevations Terms
- LD Local Directory
- DC Directory Cache
- N(s) Neighborhood of peer s
- Nodes that are one hop away or in the same LAN
10Probabilistic Search Protocol (3)
- LD can be managed without intervention of P2P
system - Each source has a unique location-independent
global identifier (GUID) - Can be computed with a hash function (i.e. SHA-1)
or can be assigned according to the resource
managed (ISBN in case of a book) - DC points to the presumed location of resources
managed by other peers (also contains GUID
physical address (network address) mapping)
11Probabilistic Search Protocol (4)
A peer-to-peer computing system
12Probabilistic Search Protocol (5)
The algorithm
13Probabilistic Search Protocol (6)
An example scenario (SearchRequest messages
propagation)
14Probabilistic Search Protocol (7)
An example scenario (ResourceFound messages
propagation)
15Probabilistic Search Protocol (8)
- Two basic operations
- SearchRequest(src, res, RevPath, TTL)
- ResourceFound(src, res, RevPath, v)
- res being searched by source src was found at
peer v - Broadcast probability, p
- Can be adjusted according to the path traversed
16Protocol Performance (1)
Model 120 peers, average node degree 5
17Protocol Performance (2)
- When p becomes 0.6, Pf reaches the value of 0.9
while only 10.5 of the peers are involved in the
search - A relatively small p can generate reasonable Pf
for finding a resource at a very low cost - Small fraction of the peer nodes are involved in
the search
18Protocol Performance (3)
- Implementation on a randomly generated topology
- Each computer on a LAN simulates multiple peer
nodes - For N peers generate a regular graph each node
having k neighbours - Rewire the graph
- If graph becomes disconnected, restart the
process
19Protocol Performance (4)
Model N30, k4, P(rewiring)0.1,
size(DC)1ofRes, LRU
20Protocol Performance (5)
- Maximum value of the curve occurs when
- p 0.5
- At this value, a resource will be found with
probability 0.84 within 3.4 hops from the source
on average
21Conclusion
- Issues that have not been addressed
- Cache replacement policies
- Cache invalidation options
- Provision to avoid broadcasting a message more
than once - Optimization
- Directly sending the result to the search source
before traversing the path backwards would
increase the responce time
22Other P2P Efforts
- Gnutella (Used mainly for file sharing)
- Uses flood routing to broadcast queries
- Gridella (A Gnutella-compatible system)
- Reduces bandwidth by superimposing a binary tree
on top of the P2P network - Freenet (Creates a virtual file system)
- Pools unused disk space accross many computers
- JXTA (A suite of protocols)
- Uses specialized peers that can register with
each other - A middle ground between centralized and
decentralized approaches