Peer Pressure: Distributed Recovery in Gnutella - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Peer Pressure: Distributed Recovery in Gnutella

Description:

We used the most common client in our simulation model - Bearshare. Bootstrapping ... Bearshare clients do something similar. Connect to service 'pubic.bearshare.net' ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 32
Provided by: lar86
Category:

less

Transcript and Presenter's Notes

Title: Peer Pressure: Distributed Recovery in Gnutella


1
Peer Pressure Distributed Recovery in Gnutella
  • Pedram Keyani
  • Brian Larson
  • Muthukumar Senthil
  • Computer Science Department
  • Stanford University

2
Introduction
  • Gnutella is a P2P file sharing protocol
  • The issue we are addressing is distributed
    recovery from malicious attacks in Gnutella
  • Our solution is a mechanism for proactive failure
    detection and recovery
  • Our experimental process and models
  • The fruits of our labor RESULTS!

3
Failure in Gnutella
  • Failure of nodes in Gnutella can be caused by any
    number of reasons
  • Failure of 4 of the most highly connected nodes
    in Gnutella fragments the network to the point
    where it is unusable by anyone
  • The exact details of this are outlined in work
    done by Stefan Saroiu

4
Scale Free Networks (Gnutella, Internet)
  • Abide by power law where
  • of nodes of degree N is proportional to N
    -lambda
  • Lambda is observed to be roughly 2.3
  • Scale Free networks are highly resilient to large
    scale random failures but weak for malicious
    attacks on the most highly connected well known
    nodes

5
Exponential Networks
  • Connections between nodes are random
  • No preferential connections ensures no node holds
    the entire network together
  • They react the same way to malicious attacks and
    random failures

6
Scale Free and Exponential
7
Our Hypothesis
  • In order to allow Gnutella to recover from
    malicious attacks nodes must plan for failures by
    discovering and maintaining backup connections to
    form an exponential network. These backups will
    be used to replace dead neighbors in the case of
    a malicious attack.

8
Recovery Method
  • Build and maintain a virtual exponential network
    connecting all the nodes
  • Accomplish this through random node discovery
  • Detect malicious attacks on active network
  • Switch over to exponential network

9
Random Node Discovery
  • Problem no centralized name authority to give a
    truly random node
  • Solution use random walks through the network to
    arrive at random node
  • Random Discovery Ping (RDP) is forwarded to only
    one of a nodes neighbors, selected in such a way
    to give a random distribution
  • RDPs use a hop count of 20, roughly equal to the
    network diameter

10
Maintenance of Virtual Exponential Network
  • Each node discovers N random nodes, where N is
    the minimum number of connections the node wants
    to maintain
  • Then periodically ping these nodes to make sure
    they are alive
  • Discover new neighbors to replace them should
    they die

11
Failure Detection
  • Random failures result in loss of 1st degree
    neighbors
  • Malicious attacks result in greater loss of 2nd
    degree neighbors than 1st degree
  • Keep a history (30 seconds) of 1st and 2nd degree
    neighbor loss
  • If 2nd degree loss exceeds 1st degree loss and a
    threshold (50), mark as malicious

12
Reacting to Failures
  • For each neighbor lost, replace it with a node
    from the virtual exponential network
  • Only nodes local to an attack will switch,
    preserving the rest of the network structure
  • Do not attempt to discover additional random
    nodes during an attack
  • When attack is deemed to be over, return to
    normal operations

13
P2P Simulator
  • Generalized P2P network simulator
  • Handles message routing, time management
  • Support for bringing nodes up or down, injecting
    failures, logging
  • Also created a compatible Gnutella client, and
    our enhanced Gnutella client
  • About 5k lines of Java

14
Modeling Gnutella
  • No standard way to do this
  • Protocol only specifies message formats
  • Clients free to implement other aspects
  • Some degree of standardization
  • We used the most common client in our simulation
    model - Bearshare

15
Bootstrapping
  • How do nodes connect in our simulation?
  • Defunct www.gnutellahosts.com
  • Maintain list of highly-available, well-connected
    nodes
  • Clients connect by receiving one of these nodes
  • Bearshare clients do something similar
  • Connect to service pubic.bearshare.net
  • Keep a range of neighbors (3-10)

16
Uptime Distribution
  • How long do nodes stay up in our simulation?
  • Modeled by a power law function
  • Most nodes are up for a short period of time, few
    are up for a long period
  • Many users just sign off after getting their
    content
  • Most users are dialup users
  • Within a reasonable time slice, nodes have
    uptimes following the power law distribution

17
Our Experiments
  • Ran with recovery method and without
  • No failures just ran our simulator without
    removing any nodes (control)
  • Malicious attack on most highly connected nodes

18
Malicious Attack
  • Ran the experiment for 10 minutes
  • We removed 5 of the most highly connected nodes
    over a 5 minute interval in the middle
  • Representative of a coordinated distributed
    attack on the network

19
Metrics
  • Large number of metrics that we could have used
  • We picked metrics that measure
  • How partitioned the network is
  • How useful the network is in sending queries

20
Size of Largest Connected Component
  • Largest set of nodes V, where any vm and vn ? V
    have a path between each other
  • Measures the number of nodes that can potentially
    communicate with each other
  • Can get any data from any other node

21
of Connected Components
  • Number of separate pieces of the network
  • If number of CCs is large then the network is
    heavily partitioned
  • Not possible to retrieve content between CCs
  • Want to monitor this number to make sure it is
    not increasing

22
Nodes Reachable Within 6 Hops
  • Sum of number of 1st, 2nd . . ., 6th degree
    neighbors of a node
  • End to end measurement of how many nodes you can
    reach with a query
  • Typically queries are forwarded about 6 nodes
  • Rough estimate of the number of nodes a user can
    search.

23
Results Largest CC
24
Results Number of CCs
25
Results - of nodes within 6 hops
26
Failure Detection Results
27
Random Node Distribution
28
Messages Per Node Results
29
Conclusions
  • By planning for and detecting failures our
    recovery method can drastically increase the
    likelihood that the network will not become
    partitioned
  • It lessens the impact of malicious attacks on the
    querying capability of the network

30
Further Work
  • Investigating other techniques for random node
    discovery
  • Restoring network to a scale free topology
    immediately following failures
  • How the Gnutella network has changed over time

31
Thanks
  • Stefan Saroiu and Steven Gribble for letting us
    use their data and giving us advice
  • Armando Fox, George Candea, Dave Patterson, Aaron
    Brown

Bling-Bling Industries, 2001
Write a Comment
User Comments (0)
About PowerShow.com