Peer Pressure: Distributed Recovery in Gnutella presentation

About This Presentation

Transcript and Presenter's Notes

Title: Peer Pressure: Distributed Recovery in Gnutella

1
Peer Pressure Distributed Recovery in Gnutella

Pedram Keyani
Brian Larson
Muthukumar Senthil
Computer Science Department
Stanford University

2
Introduction

Gnutella is a P2P file sharing protocol
The issue we are addressing is distributed
recovery from malicious attacks in Gnutella
Our solution is a mechanism for proactive failure
detection and recovery
Our experimental process and models
The fruits of our labor RESULTS!

3
Failure in Gnutella

Failure of nodes in Gnutella can be caused by any
number of reasons
Failure of 4 of the most highly connected nodes
in Gnutella fragments the network to the point
where it is unusable by anyone
The exact details of this are outlined in work
done by Stefan Saroiu

4
Scale Free Networks (Gnutella, Internet)

Abide by power law where
of nodes of degree N is proportional to N
-lambda
Lambda is observed to be roughly 2.3
Scale Free networks are highly resilient to large
scale random failures but weak for malicious
attacks on the most highly connected well known
nodes

5
Exponential Networks

Connections between nodes are random
No preferential connections ensures no node holds
the entire network together
They react the same way to malicious attacks and
random failures

6
Scale Free and Exponential
7
Our Hypothesis

In order to allow Gnutella to recover from
malicious attacks nodes must plan for failures by
discovering and maintaining backup connections to
form an exponential network. These backups will
be used to replace dead neighbors in the case of
a malicious attack.

8
Recovery Method

Build and maintain a virtual exponential network
connecting all the nodes
Accomplish this through random node discovery
Detect malicious attacks on active network
Switch over to exponential network

9
Random Node Discovery

Problem no centralized name authority to give a
truly random node
Solution use random walks through the network to
arrive at random node
Random Discovery Ping (RDP) is forwarded to only
one of a nodes neighbors, selected in such a way
to give a random distribution
RDPs use a hop count of 20, roughly equal to the
network diameter

10
Maintenance of Virtual Exponential Network

Each node discovers N random nodes, where N is
the minimum number of connections the node wants
to maintain
Then periodically ping these nodes to make sure
they are alive
Discover new neighbors to replace them should
they die

11
Failure Detection

Random failures result in loss of 1st degree
neighbors
Malicious attacks result in greater loss of 2nd
degree neighbors than 1st degree
Keep a history (30 seconds) of 1st and 2nd degree
neighbor loss
If 2nd degree loss exceeds 1st degree loss and a
threshold (50), mark as malicious

12
Reacting to Failures

For each neighbor lost, replace it with a node
from the virtual exponential network
Only nodes local to an attack will switch,
preserving the rest of the network structure
Do not attempt to discover additional random
nodes during an attack
When attack is deemed to be over, return to
normal operations

13
P2P Simulator

Generalized P2P network simulator
Handles message routing, time management
Support for bringing nodes up or down, injecting
failures, logging
Also created a compatible Gnutella client, and
our enhanced Gnutella client
About 5k lines of Java

14
Modeling Gnutella

No standard way to do this
Protocol only specifies message formats
Clients free to implement other aspects
Some degree of standardization
We used the most common client in our simulation
model - Bearshare

15
Bootstrapping

How do nodes connect in our simulation?
Defunct www.gnutellahosts.com
Maintain list of highly-available, well-connected
nodes
Clients connect by receiving one of these nodes
Bearshare clients do something similar
Connect to service pubic.bearshare.net
Keep a range of neighbors (3-10)

16
Uptime Distribution

How long do nodes stay up in our simulation?
Modeled by a power law function
Most nodes are up for a short period of time, few
are up for a long period
Many users just sign off after getting their
content
Most users are dialup users
Within a reasonable time slice, nodes have
uptimes following the power law distribution

17
Our Experiments

Ran with recovery method and without
No failures just ran our simulator without
removing any nodes (control)
Malicious attack on most highly connected nodes

18
Malicious Attack

Ran the experiment for 10 minutes
We removed 5 of the most highly connected nodes
over a 5 minute interval in the middle
Representative of a coordinated distributed
attack on the network

19
Metrics

Large number of metrics that we could have used
We picked metrics that measure
How partitioned the network is
How useful the network is in sending queries

20
Size of Largest Connected Component

Largest set of nodes V, where any vm and vn ? V
have a path between each other
Measures the number of nodes that can potentially
communicate with each other
Can get any data from any other node

21
of Connected Components

Number of separate pieces of the network
If number of CCs is large then the network is
heavily partitioned
Not possible to retrieve content between CCs
Want to monitor this number to make sure it is
not increasing

22
Nodes Reachable Within 6 Hops

Sum of number of 1st, 2nd . . ., 6th degree
neighbors of a node
End to end measurement of how many nodes you can
reach with a query
Typically queries are forwarded about 6 nodes
Rough estimate of the number of nodes a user can
search.

23
Results Largest CC
24
Results Number of CCs
25
Results - of nodes within 6 hops
26
Failure Detection Results
27
Random Node Distribution
28
Messages Per Node Results
29
Conclusions

By planning for and detecting failures our
recovery method can drastically increase the
likelihood that the network will not become
partitioned
It lessens the impact of malicious attacks on the
querying capability of the network

30
Further Work

Investigating other techniques for random node
discovery
Restoring network to a scale free topology
immediately following failures
How the Gnutella network has changed over time

31
Thanks

Stefan Saroiu and Steven Gribble for letting us
use their data and giving us advice
Armando Fox, George Candea, Dave Patterson, Aaron
Brown

Bling-Bling Industries, 2001

Write a Comment

User Comments (0)

About PowerShow.com

Peer Pressure: Distributed Recovery in Gnutella PowerPoint PPT Presentation