GIA:%20Making%20Gnutella-like%20P2P%20Systems%20Scalable - PowerPoint PPT Presentation

About This Presentation
Title:

GIA:%20Making%20Gnutella-like%20P2P%20Systems%20Scalable

Description:

E.g., Napster, Gnutella, KaZaA. File sharing is the dominant P2P app. Mass-market ... ray-of-light.mp3' Distributed Hash Tables (DHTs) Structured solution ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 25
Provided by: yati7
Category:

less

Transcript and Presenter's Notes

Title: GIA:%20Making%20Gnutella-like%20P2P%20Systems%20Scalable


1
GIA Making Gnutella-like P2P Systems Scalable
  • Yatin Chawathe
  • Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee
    Breslau

2
The Peer-to-peer Phenomenon
  • Internet-scale distributed system
  • Distributed file-sharing applications
  • E.g., Napster, Gnutella, KaZaA
  • File sharing is the dominant P2P app
  • Mass-market
  • Mostly music, some video, software

3
The Problem
  • Potentially millions of users
  • Wide range of heterogeneity
  • Large transient user population
  • Existing search solutions cannot scale
  • Flooding-based solutions limit capacity
  • Distributed Hash Tables (DHTs) not necessarily
    appropriate

4
Our Solution GIA
  • Scalable Gnutella-like P2P system
  • Design principles
  • Explicitly account for node heterogeneity
  • Query load proportional to node capacity
  • Results
  • Gia outperforms Gnutella by 35 orders of
    magnitude

5
Outline
  • Existing approaches
  • GIA Scalable Gnutella
  • Results Simulations Experiments
  • Conclusion

6
Gnutella
  • Distributed search and download
  • Unstructured ad-hoc topology
  • Peers connect to random nodes
  • Random search
  • Flood queries across network
  • Scaling problems
  • As network grows, search overhead increases

P6
P5
P4 has madonna- american-life.mp3
P1
P4
who has madonna
P2 has madonna- ray-of-light.mp3
P3
P2
7
Distributed Hash Tables (DHTs)
  • Structured solution
  • Given a filename, find its location
  • Can DHTs do file sharing?
  • Probably, but with lots of extra workCaching,
    keyword searching
  • Do we need DHTs?
  • Not necessarily Great at finding rare files, but
    most queries are for popular files

8
Other Solutions
  • Supernodes KaZaA
  • Classify nodes as low- or high-capacity
  • Only pushes the problem to a bigger scale
  • Random Walks Lv et al
  • Forwarding is blind
  • Queries can get stuck in overloaded nodes
  • Biased Random Walks Adamic et al
  • Right idea, but exacerbates overloaded-node
    problem

9
Outline
  • Existing approaches
  • GIA Scalable Gnutella
  • Results Simulations Experiments
  • Conclusion

10
GIA 10,000-foot view
  • Unstructured, but take node capacity into account
  • High-capacity nodes have room for more queries
    so, send most queries to them
  • Will work only if high-capacity nodes
  • Have correspondingly more answers, and
  • Are easily reachable from other nodes

11
GIA Design
  • Make high-capacity nodes easily reachable
  • Dynamic topology adaptation
  • Make high-capacity nodes have more answers
  • One-hop replication
  • Search efficiently
  • Biased random walks
  • Prevent overloaded nodes
  • Active flow control
  • Make high-capacity nodes easily reachable
  • Dynamic topology adaptation
  • Make high-capacity nodes have more answers
  • One-hop replication
  • Search efficiently
  • Biased random walks
  • Prevent overloaded nodes
  • Active flow control

Query
12
Dynamic Topology Adaptation
  • Make high-capacity nodes have high degree (i.e.,
    more neighbors)
  • Per-node level of satisfaction, S
  • 0 ? no neighbors, 1 ? enough neighbors
  • Function of
  • Nodes capacity ? Neighbors capacities
  • Neighbors degrees ? Their age
  • When S ltlt 1, look for neighbors aggressively

13
Active Flow Control
  • Accept queries based on capacity
  • Actively allocation tokens to neighbors
  • Send query to neighbor only if we have received
    token from it
  • Incentives for advertising true capacity
  • High capacity neighbors get more tokens
  • Allocate tokens with weighted fair queuing

14
Practical Considerations
  • Query resilience node death
  • Periodic keep-alive messages
  • Query responses are implicit keep-alives
  • Determining node capacity
  • Function of bandwidth and age of node
  • Finding rare items
  • Bifurcate the random walk every 10 hops


15
Outline
  • Existing approaches
  • GIA Scalable Gnutella
  • Results Simulations Experiments
  • Conclusion

16
Simulation Results
  • Compare four systems
  • FLOOD TTL-scoped, random topologies
  • RWRT Random walks, random topologies
  • SUPER Supernode-based search
  • GIA search using GIA protocol suite
  • Metric
  • Collapse point aggregate throughput that the
    system can sustain

17
Questions
  • What is the relative performance of the four
    algorithms?
  • Which of the GIA components matters the most?
  • How does the system behave in the face of
    transient nodes?

18
System Performance
19
Factor Analysis
Algorithm Collapse point
RWRT 0.0005
RWRTOHR 0.005
RWRTBIAS 0.0015
RWRTTADAPT 0.001
RWRTFLWCTL 0.0006
Algorithm Collapse point
GIA 7
GIA OHR 0.004
GIA BIAS 6
GIA TADAPT 0.2
GIA FLWCTL 2
20
Transient Behavior
Static SUPER
Static RWRT (1 repl)
21
Deployment
  • Prototype client implementation using C
  • Deployed on PlanetLab
  • 100 machines spread across 4 continents
  • Measured the progress of topology adaptation

22
Progress of Topology Adaptation
23
Outline
  • Existing approaches
  • GIA Scalable Gnutella
  • Results Simulations Experiments
  • Conclusion

24
Summary
  • GIA scalable Gnutella
  • 35 orders of magnitude improvement in system
    capacity
  • Unstructured approach is good enough!
  • DHTs may be overkill
  • Incremental changes to deployed systems
  • Status Prototype implementation deployed on
    PlanetLab
Write a Comment
User Comments (0)
About PowerShow.com