Exploring Tradeoffs in Failure Detection in P2P Networks - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Exploring Tradeoffs in Failure Detection in P2P Networks

Description:

Network Model and Assumptions. Keep-alive Techniques. Performance Evaluation. Conclusion. Network Model and Assumptions. P2P system with n nodes. Each node A knows ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 22
Provided by: Shelley106
Category:

less

Transcript and Presenter's Notes

Title: Exploring Tradeoffs in Failure Detection in P2P Networks


1
Exploring Tradeoffs in Failure Detection in P2P
Networks
  • Shelley Zhuang, Ion Stoica, Randy Katz
  • HIIT Short Course
  • August 18-20, 2003

2
Problem Statement
  • One of the key challenges to achieve robustness
    in overlay networks quickly detect a node
    failure
  • Canonical solution each node periodically pings
    its neighbors
  • Propose keep-alive techniques
  • Study the fundamental limitations and tradeoffs
    between detection time, control overhead, and
    probability of false positives

3
Outline
  • Motivation
  • Network Model and Assumptions
  • Keep-alive Techniques
  • Performance Evaluation
  • Conclusion

4
Network Model and Assumptions
  • P2P system with n nodes
  • Each node A knows d other nodes
  • Average path length l
  • Node up-time i.i.d. T exponential(?f)
  • Failstop failures
  • If a neighbor is lost, a node can use another
    neighbor to route the packet w/o affecting the
    path length

5
Packet Loss Probability
  • d average time it takes a node to detect that a
    neighbor has failed
  • Probability that a node forwards a packet to a
    neighbor that has failed is 1- e-?f d ? d?f
  • P(T-t ? d T?t) P(Tltd)
  • Probability that the packet is lost is pl ? ld?f

pdf
T
d
6
Outline
  • Motivation
  • Network Model and Assumptions
  • Keep-alive Techniques
  • Performance Evaluation
  • Conclusion

7
Aliveness Techniques
  • Baseline
  • Each node sends a ping message to each of its
    neighbors every ? seconds

B
C
A
D
8
Aliveness Techniques
  • Information Sharing
  • Piggyback failures of neighbors in
    acknowledgement messages
  • Best case completely connected graph of degree d

B
C
D
A
9
Aliveness Techniques
  • Boosting
  • When a node detects failure of a neighbor, D, it
    announces to all other nodes that have D as their
    neighbor
  • Best case completely connected graph of degree d

B
C
D
A
10
Outline
  • Motivation
  • Network Model and Assumptions
  • Keep-alive Techniques
  • Performance Evaluation
  • Conclusion

11
Performance Evaluation
  • Case studies
  • d-regular network
  • Chord lookup protocol
  • Chord event driven simulator
  • Gnutella join/leave trace
  • Packet loss rate
  • Control overhead
  • Planetlab experiments
  • Planetlab event driven simulator
  • False positives

12
Loss Rate Gnutella
  • Loss Rate Lookup timeouts / Lookups
  • 20 lookups per second

Boosting (simple) - No additional state
13
Loss Rate Gnutella
  • Tto seconds before deciding that a probe is lost
  • Multiple losses before deciding that a neighbor
    has failed

14
Overhead (count) Gnutella
  • Constant probing overhead (1 probe/second)
  • Small difference due to boost messages

15
Overhead (bps) Gnutella
  • Boosting w/ bptr 1.29 times the baseline

16
Overhead (bps) Gnutella
  • Send backpointers every 10 probe acks

17
False Positive Planetlab
  • Propagation of positive information
  • Most false positives are of TO 0, 1? increase
    probe timeout threshold

18
Overhead (bps) Planetlab
  • Overhead from boost messages and positive
    information correlate with the loss rate

19
Outline
  • Motivation
  • Network Model and Assumptions
  • Keep-alive Techniques
  • Performance Evaluation
  • Conclusion

20
Conclusion
  • Examined three keep-alive techniques in Chord
    with Gnutella join/leave trace
  • By carefully designing keep-alive algorithms, it
    is possible to significantly reduce packet loss
    probability
  • Probability of false positive for boosting with
    backpointer lt 0.01 for loss rate 8.6 by
    propagating positive information and increasing
    probe timeout threshold

21
Future Work
  • Evaluate keep-alives schemes under massive
    failures and churn
  • Optimal control resource allocation strategy for
    a given network topology, failure rate, and load
    distribution
  • Other applications of keep-alive techniques?
Write a Comment
User Comments (0)
About PowerShow.com