Scalable P2P Search - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Scalable P2P Search

Description:

Scalable P2P Search Daniel A. Menasc George Mason University – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 23
Provided by: edut1551
Category:
Tags: jxta | p2p | scalable | search

less

Transcript and Presenter's Notes

Title: Scalable P2P Search


1
Scalable P2P Search
  • Daniel A. Menascé
  • George Mason University

2
Outline
  • Motivation behind P2P systems
  • Considerations
  • Resource-Location Problem
  • Probabilistic Search Protocol
  • Protocol Performance
  • Conclusion
  • Other P2P Efforts

3
Motivation (1)
  • Client-server model
  • Underusage of Internets bandwidth
  • Increasing load on dedicated servers
  • Vulnerable attacks to servers
  • Single point of failure

4
Motivation (2)
  • P2P systems rely on individual computers
    computing power storage capacity
  • Better utilize bandwith
  • Distribute load in a self organizing manner
  • Robust to random attacks
  • Especially, if the P2P system exhibits the small
    world property (Most peers have few links to
    other peers)
  • Enhance reliability leading to fault tolerance
  • Does not rely on dedicated servers

5
Motivation (3)
  • Some application areas of P2P systems
  • Distributed directory systems
  • E-commerce models
  • Web service discovery

6
Considerations
  • P2P nodes,
  • act as both clients servers
  • form an application network and route messages
  • i.e. for locating a service
  • Design of communication (messaging) protocols
    have crutial importance by means of efficiency
  • Naive approaches usually fail
  • i.e. Gnutellas flood routing adds traffic
    overhead

7
Resource Location Problem
  • Deterministic approach given a resource name,
    find the node or nodes that manage the resource
  • May not be feasible in a very large network
  • Probabilistic approach given a resource name,
    find with a given probability the node or nodes
    that manage the resource

8
Probabilistic Search Protocol (1)
  • Trades performance and scalability for the
    probability Pf, that a resource will be located
  • Goal Achieve a probability Pf close to 1 with
    much lower cost compared to the deterministic case

9
Probabilistic Search Protocol (2)
  • Cost Measurement
  • Number of messages exchanged
  • Bandwidth used
  • Number of peers contacted
  • Abbrevations Terms
  • LD Local Directory
  • DC Directory Cache
  • N(s) Neighborhood of peer s
  • Nodes that are one hop away or in the same LAN

10
Probabilistic Search Protocol (3)
  • LD can be managed without intervention of P2P
    system
  • Each source has a unique location-independent
    global identifier (GUID)
  • Can be computed with a hash function (i.e. SHA-1)
    or can be assigned according to the resource
    managed (ISBN in case of a book)
  • DC points to the presumed location of resources
    managed by other peers (also contains GUID
    physical address (network address) mapping)

11
Probabilistic Search Protocol (4)
A peer-to-peer computing system
12
Probabilistic Search Protocol (5)
The algorithm
13
Probabilistic Search Protocol (6)
An example scenario (SearchRequest messages
propagation)
14
Probabilistic Search Protocol (7)
An example scenario (ResourceFound messages
propagation)
15
Probabilistic Search Protocol (8)
  • Two basic operations
  • SearchRequest(src, res, RevPath, TTL)
  • ResourceFound(src, res, RevPath, v)
  • res being searched by source src was found at
    peer v
  • Broadcast probability, p
  • Can be adjusted according to the path traversed

16
Protocol Performance (1)
Model 120 peers, average node degree 5
17
Protocol Performance (2)
  • When p becomes 0.6, Pf reaches the value of 0.9
    while only 10.5 of the peers are involved in the
    search
  • A relatively small p can generate reasonable Pf
    for finding a resource at a very low cost
  • Small fraction of the peer nodes are involved in
    the search

18
Protocol Performance (3)
  • Implementation on a randomly generated topology
  • Each computer on a LAN simulates multiple peer
    nodes
  • For N peers generate a regular graph each node
    having k neighbours
  • Rewire the graph
  • If graph becomes disconnected, restart the
    process

19
Protocol Performance (4)
Model N30, k4, P(rewiring)0.1,
size(DC)1ofRes, LRU
20
Protocol Performance (5)
  • Maximum value of the curve occurs when
  • p 0.5
  • At this value, a resource will be found with
    probability 0.84 within 3.4 hops from the source
    on average

21
Conclusion
  • Issues that have not been addressed
  • Cache replacement policies
  • Cache invalidation options
  • Provision to avoid broadcasting a message more
    than once
  • Optimization
  • Directly sending the result to the search source
    before traversing the path backwards would
    increase the responce time

22
Other P2P Efforts
  • Gnutella (Used mainly for file sharing)
  • Uses flood routing to broadcast queries
  • Gridella (A Gnutella-compatible system)
  • Reduces bandwidth by superimposing a binary tree
    on top of the P2P network
  • Freenet (Creates a virtual file system)
  • Pools unused disk space accross many computers
  • JXTA (A suite of protocols)
  • Uses specialized peers that can register with
    each other
  • A middle ground between centralized and
    decentralized approaches
Write a Comment
User Comments (0)
About PowerShow.com