Title: PeertoPeer Systems
1Peer-to-Peer Systems
Fundamentals and Design of Distributed Systems
D.H.J. Epema
Parallel and Distributed Systems Group
2Peer-to-Peer (P2P) systems
- In most DSs, some nodes have more authority or
functionality than others (e.g., in a
client-server system) - In peer-to-peer systems, all nodes have the same
authority and functionality, i.e., these systems
are decentralized - Some of the first P2P were amateur systems for
information storage and retrieval - P2P systems are dynamic nodes can come and go
- Perhaps P2P systems arent that new Usenet since
1979 (news propagation mechanism among UNIX
machines)
3A Peer
- Pronunciation pir
- Etymology Middle English, From Middle French
per, from per, adjective, equal, from Latin par - Date 13th century
- Meaning One that is of equal standing with
another, especially, one belonging to the same
societal group, especially based on age, grade,
or status - So in Dutch een gelijke
4Applications of P2P systems
- Information storage/content distribution
- e.g., for music or video files
- file-oriented or search-oriented
- examples Napster, Freenet, Gnutella, Chord,
Pastry, CAN, BitTorrent, KaZaA, eDonkey - Computation
- tap unused processing capacity
- usually embarrassingly parallel applications
- example SETI_at_HOME (Search for Extra-Terrestrial
Intelligence) - Collaboration
- games, virtual meetings
5Issues in P2P systems
- Searching (routing, locating) how to find (new)
files/nodes, and how to route the replies back - Downloading get the required contents
efficiently - Growth nodes can join the system
- Shrinking nodes can leave the system (fail or
disconnect) - Performance
- Scalability
- Freeriding deter users from only downloading
- Security e.g., anonymity
- The network of nodes participating in a P2P
system as part of the Internet is an example of
an overlay network
peers
6Levels of P2P-ness
- True P2P
- a true lack of any central authority
- examples Freenet, Gnutella, Chord
- Central P2P
- central components in the system
- examples
- Napster (has a central location database)
- SETI_at_HOME (has a single central machine handing
out the work) - BitTorrent (search is centralized)
- Question Can a P2P system exist without a
centrally managed component?
7Topics
- Anonimity
- Graph structures
- graph properties
- three graph types
- Routing/searching
- Case studies
- Freenet
- Gnutella
- Chord/CFS
- CAN
- Pastry
- BitTorrent
8Anonymity
- Some P2P systems have as one of their goals
anonymity - There are different forms of anonymity w.r.t. a
document - of the author
- of the publisher
- of the servers storing the document
- of the readers
- of the document servers do not know what
documents they are storing - of the query a server cannot tell what document
it is using as a response
9Small-world effect (1)
- In 1967, Stanley Milgram did the following
experiment - he gave the same letter to 160 random people in
Omaha, Nebraska - he asked them to get the letters to a stockbroker
in Boston, Ma - intermediaries had to know each other on a
first-name basis - 42 letters made it to the stockbroker
- with a median number of 5.5 intermediaries
- at the time, the US had a population of about 200
million
10Small-world effect (2)
- Apparently, path lengths in social networks tend
to be short - Many people only know people in a small social
circle, but a few have connections in far-a-way
places - These highly connected nodes are very important
- Similar phenomenon in WWW
- number of clicks to get from any page to any
other page - is unidirectional
- portals play an important role here
11Graph concepts
- Given a connected graph (directed or not)
- Degree of a node number of nodes it is connected
to - In a regular graph, all nodes have equal degree
- The average pathlength is average of the lengths
of the (a) shortest path across all pairs of
nodes - The neighborhood of a node is the set of nodes it
is connected to - The clustering coefficient of a node is the
fraction of potential links among the nodes in
its neighborhood that is actually present - The clustering coefficient of a graph is the
average of the clustering coefficients of its
nodes
degree3
neighborhood
6 potential links
12Graph types
- We will consider three graph types
- Regular graphs
- high clustering coefficient, long paths
- Random graphs
- low clustering coefficient, short paths
- Power-law graphs
- varying clustering coefficients, short paths,
- found in practice
- (Small-world graphs high clustering coefficient,
short paths)
13Regular graphs lattices
- Characterized by three numbers
- dimension d
- number of nodes along one dimension n
- number of links in each of the 2d directions k/2
- Structure
- start with a torus of some dimension
- add (k/2)-1 connections in each direction to the
nearest nodes
k4
d1, k2
14 Regular graphs 1-Lattices
neighborhood
k2
k4
Clustering coefficient
0
0.5
15Regular graphs 2-Lattices
k2 k4
(4)
(8)
Clustering coefficient
0
6/28
number of potential connections
16Properties of lattices
- Degree dk
- Average pathlength in a d-lattice is
approximately - d (1/2) (n/2) / (k/2) dn/2k
- Clustering coefficient of a 1-lattice
- Regular graphs have a
- high average pathlength
- high clustering coefficient
- So no small-world effect
1
n-1
maximum distance
dimension
step size
average
(3/4) ((k-2) / (k-1))
17Random graphs
- Number of nodes n
- Probability of any potential link being present
p - Average degree kp(n-1)
- Total number of links pn(n-1)/2
- May not be connected
- Experimental result for large n and k5, the
largest component of a random graph is almost
equal to the complete graph
n-1 potential links
p
18Properties of random graphs
- Clustering coefficient
- Average pathlength approximately
- Intuitive explanation
- start with a single node
- add its k neighbors
- add the k2 neighbors of those neighbors
- do this l times, with l the average pathlength
- then we have all nodes, so nkl
- Random graphs have a
- low average pathlength
- low clustering coefficient
- So no small-world effect
p
p
log(n)/log(k)
19Degree distribution random graphs
- The degree in a random graph with n nodes has a
binomial distribution -
- This decreases very fast for large k
-
probability that degree is equal to k
k links out of n-1 potential ones
20Comparison regular-random
Q has 1 of Ps neighbors as neighbor
Q
random graph
10,000
P has 100 neighbors
P
0.74
50
2.00
0.01
21Power-law graphs (1) definition
- The fraction of nodes Pk with k links satisfies
- Pk Ck-?
- for some C, k, ? gt 0
- Pk decreases very slowly
- Usually the degree exponent ? is between 2 and 3
- Properties
- there exist a few nodes with many connections
- a low average path length (order log(n))
- random failures dont have a large impact
- Power-law graphs occur often in nature
- Other name scale-free networks
22Power-law graphs (2) construction
- Deterministic method to construct power-law
graphs - start with a graph G consisting of a single node
- add to identical copies G and G at a lower
level - connect the root of G with all leaves of G and
G - repeat steps 2 and 3 as often as you like
9 nodes 4 leaves at lowest level
G1
GG2
27 nodes 8 leaves at lowest level
G
G
23Power-law graphs (3) degree exp.
- Graph Gn after n steps
- number of nodes 3n
- number of leaves at lowest level 2n
- degree of root 2n1 2 (with induction, equal
to 2n-2 2.2n-1) - Hub root of the graph at some level
- Concentrate for computation of degree exponent on
hubs - Consider graph Gi after i steps
- After n steps (ngti), there are 3n-i copies of Gi
(and of its root) - Of these roots, a fraction of 2/3 never increase
their degree
degree of root in Gn-1
links to new leaves
degree changed degree unchanged
Gi
3 x Gi
9 x Gi
ni1
ni2
ni
24Power-law graphs (4) degree exp.
degree of Gi
- So there are (2/3)3n-i nodes (hubs) in Gn with
degree 2i1-2 2i1 - So a fraction of (2/3)3-i23-(i1) has this
degree (divide by 3n) - Might as well say a relative fraction of 3-i has
degree 2i - As a consequence, the degree distribution
satisfies - Pk k(-ln 3/ln 2)
- So the degree exponent is ? ln 3 / ln 2
- Clustering coefficient is 0 (no triangles)
25Power-law graphs (5)
- Stochastic methods for generating power-law
graphs - incremental growth add nodes with links or sets
of links one by one - preferential attachment connect new nodes to
highly connected ones - Node i has degree di
- Node i is chosen with probability Pidi/?di
- Start with some graph with m0 nodes and m0-1
edges - In each step
- with probability p add mm0 edges, with edges
chosen according to probability distribution
Pi - with probability 1-p add a single node with m
links, also according to the distribution Pi
26Power-law graphs (6)
- From T. Bu and D. Towsley, On Distinguishing
between Internet Power Law Topology Generators,
IEEE Infocom 2002 - The topology of the network of Autonomous
Systems in the Internet has a power-law structure
with a degree exponent of -2.18 - The characteristic path length L of a graph is
the median of the means of the shortest path
lengths connecting every node to all other nodes - Comparison of Internet AS graph with random
graph with same characteristic pathlength
Internet
random
27Routing/searching (1)
- In some P2P systems both nodes and files are
identified by random numbers - node ids for instance a hash of IP-addresses
- file keys for instance a hash on contents or
keywords - File search may be by
- file name
- key words
- file key
- Reply to a search can be
- unique (search for specific file)
- not unique (multiple replies satisfying key words)
28Routing/searching (2)
- Distributed hash tables (DHTs)
- use of random numbers for nodes and files
- main problem mapping files to nodes
- mapping is fixed
- Unstructured P2P systems
- mapping of files to nodes is not fixed
- Guaranteed routing
- search for existing file is always successful
- examples Chord, DHTs in general
- Probabilistic routing
- search for existing file may fail
- examples Freenet, Gnutella
29Probabilistic routing
- Messages contain a Time-To-Live (TTL, or
hops-to-live) value which determines the maximum
number of hops a request message can travel - The TTL value is decremented in every node
visited - When the TTL value reaches zero, the message is
not forwarded anymore - Messages contain a pseudo-random pseudo-unique
identifier, to determine whether a message has
visited a node previously - If so, the message may be discarded or sent one
hop back
TTL
3
2
1
0
30Freenet design goals
- Freenet is a cooperative distributed file system
for anonymous information storage and retrieval
with location independence and lazy replication - Nodes volunteer disk space, no guarantees for
permanent availability of files - Design goals
- anonymity of both producers and consumers of
information - efficient dynamic storage and routing of
information - decentralization of network functions
- No
- broadcast searches for files
- centralized location indexes of files
31Freenet file keys (1)
- Files with a keyword-signed key
- user chooses a descriptive text string for the
file - from this keyword, a public/private key pair is
derived - the public half is hashed to yield the file key
- the private half is used to sign the file
- the publisher of a file publishes descriptive
text string - user uses the file key to locate the file
- user uses the signature to check the file (do I
get what I asked for) - Problems
- two users may choose the same descriptive keyword
- users may insert junk files under popular
keywords (key-squatting)
32Freenet file keys (2)
- Files with a signed-subspace key
- enables users to have personal name spaces,
identified by a randomly generated public/private
key - only owner of name space can add files (requires
private key)
33Freenet retrieving data (1)
- Every node maintains a routing cache with
(file-key, node-id) pairs of files it expects to
be at specific nodes
file key, node-id
F387
node 5
F783
F456
node 8
F124
node 1
34Freenet retrieving data (2)
- A request for a file is sent to the node that
stores the file with the closest file-key in the
routing cache - This is repeated along a chain of nodes
fid398
F398
1
8
5
35Freenet retrieving data (3)
- A request travels a chain of length at most equal
to the TTL - In the message, also a depth is maintained
- increased at every hop
- used as TTL for the way back
- Nodes can play with these values to increase
anonimity - When the TTL reaches zero, the request is not
forwarded, and a failure message is sent back
36Freenet retrieving data (4)
- When a node holding the file is found, the file
is sent back along the chain traveled by the
request - Each node along this chain
- caches the file
- records the pair (file key, id of originating
node) in its routing cache - To support anonymity, nodes along the way may
pretend that they are the originator of the file -
(398,5)
(398,5)
5
(398,5)
F398
requester for file F398
F398
F398
F398
37Freenet storing data (1)
- A request for insert contains
- a file key
- a hops-to-live value (number of copies to
create) - The request for insert is routed as a request for
a file - Each node along the way checks whether a key
collision occurs - If so
- an error message is sent back
- the user has to choose a new file keyword and try
again - When no collision is detected, the file is stored
along the whole chain traveled by the insert
request
38Freenet storing data (2)
- When no collision is detected
- a success message is sent back
- the file is stored along the whole chain traveled
by the insert request - the pair (file key, id of inserter) is entered in
the routing caches of the nodes along the chain
(398,1)
(398,1)
(398,1)
1
5
F398
inserter of file F398
F398
F398
F398
39Freenet effects of retrieving and storing
- The keys in a nodes routing table tend to get
clustered - a node attracts requests for similar keys
- a node attracts inserts for similar keys
- Popular data will get stored (logically) close to
the requesters - Requests and inserts increase connectivity,
because nodes get to know about previously
unknown nodes -
40Freenet remarks
- Both the routing tables and the file storage may
implement LRU - In general, it seems that Freenet in operation
tends to a power-law network (connections are
entries in the routing caches) -
41Gnutella (1)
- Gnutella is search-oriented a request may get
multiple answers - Requests are broadcast
- Requests carry a TTL (typical initial value 7)
- When a request revisits a node, it is discarded
- Nodes are usually connected to about 3-5 other
nodes - As a result, messages get to about 10,000 nodes
(47 16,000)
1
2
3
42Gnutella (2)
- Replies travel back along the route of the
request - Replies do not contain contents, but the IP
address of the location found - Downloads through HTTP
- Gnutella should be self-organizing fast nodes in
the core of the network, slow ones at the fringes - Was very popular in 1999-2002
- There are modifications to the protocol that make
it more efficient
43Distributed hash tables (DHTs)
- Nodes and file keys are random numbers of some
number of bits - Usually assumed to be uniformly distributed
- So nodes with ids that are close are with high
probability diverse in geography, ownership, etc. - Some mechanism is used to map (hash) keys to
nodes in a distributed system - Main question route requests for keys to the
proper node - DHTs usually
- have guaranteed routing
- do not support anonimity
44Chord hashing (1)
no nodes in between
- Chord uses what is called consistent hashing
- Both files key and node-ids are m bits wide
(e.g., m128) - Assume the numbers 0 through 2m-1 to be arranged
clock-wise along a ring - Some of these represent actual nodes
- For a number k, successor(k) is the first node in
the clock-wise direction equal to or larger than
k - Assume a network of N nodes
2
1
0
2m-1
actual nodes
successor
45Chord hashing (2)
- A file with key k is mapped to the node with id
successor(k) - So every node is responsible for the files with
keys that map to a segment of the ring
successor(k)
node id and file key coincide
k
nodes files mapped to
segment
responsible node
46Chord locating (1)
- In principle easy
- every nodes keeps the id and the IP-address of
its successor in the ring - when locating the file with key k, a node sends
the file key along the ring - when the file key reaches node successor(k), it
is clear if the file is present in the system - reply sent to originator of request
- is not efficient (of order N/2)
k
successor(k)
47Chord locating (2)
successor(n1) successor(n2) successor(n4) su
ccessor(n8)
- Additional data structure in every node
- the finger table
- contains entries of (node-id, IP-address) pairs
at distances equal to powers of 2 - In node n,
- fingerisuccessor(n2i-1), i1,2,,m
- Many of the first elements of the finger table
will be equal gaps between successive nodes in
the ring are of expected size 2m/N - With a million nodes (220), the gaps between
nodes are of size 2108, and the first 108
elements of the finger table are expected to
coincide
n2i-1
n
n2i
48Chord locating (3)
- When node n wants to locate key k
- if kfingeri for some i, check with node k
- if kltfinger1, check with node finger1
- else
- send a request to the largest node nfingeri
with kgtn - n repeats all of this
case 1
case 2
k
k
n
n
k
case 3a
49Chord locating (4)
- Two possibilities
- replies back to originating node
- replies sent along to next nodes
node n
node n
k
finger1
file key k
n
fingeri
node n
n
k
k
finger1
fingeri1
node n
50Chord locating (5)
- Complexity of look-up log(N)
- First complexity is of order at most m (
log(ids) ) - Let node n look for key k
- Let node p be the last node preceding k (last
step in search) - Consider the interval n2i-1,n2i) that contains
p - In n, let fingerifsuccessor(n2i-1)
- Then fp
- Distance between n and f is at least 2i-1
- Distance between f and p is at most 2i-1
- So f is closer to p than to n distance is cut at
least in half!
n
p
)
f
k
2i-1
2i-1
51Chord locating (6)
- Assume nodes are uniformly distributed along the
ring - So a node covers a segment of the ring of
length 2m/N - Why complexity log(N) instead of m ?
- Reason many of the first elements in a finger
table coincide because nodes are sparsely
distributed - So many of the last steps are covered by a single
step
n
is fingerk for k not small
52Chord locating (7)
- On average, on a segment of the ring of length L,
there are N(L/2m) nodes (L/2m is the fraction of
the ring considered) - After A steps, the distance between the current
query node and the key k will be reduced to at
most 2m/2A (previous slide) - Take A2log(N)log(N2) the distance between the
current query node and the key k will be reduced
to 2m/N2 (2AN2) - Probability of having a node on a segment of
length 2m/N2 is 1/N, which can be neglected - So only A steps needed
after A steps
no nodes
53Chord node joins (1)
relocate these file keys from n2 to n
- In principle easy
- a new node n has to know any existing node n in
the ring - and ask it to find the elements in its finger
table by searching for successor(n2i), for
i0,1, - node n can then announce its presence to its
successor (n2) to get the keys it should store - However, the finger tables of other nodes may
have to be adapted (in particular, of its
predecessor (n1)) - In general, when data structures are suspected to
be inconsistent, a node can execute steps 1 and 2
n
n2
n1
finger1
n
54Chord node joins (2)
n
- Additional data structure predecessor
- Steps for n to join
- node n gets finger1n2 from n
- node n contacts n2
- n2 sends its predecessor (n1) back, which is now
the predecessor of n - n2 sets its predecessor to n
- node n contacts n1 so n1 can set its successor to
n
n2
n1
n
successor predecessor
55Content Addressable Network (CAN) (1)
- CAN is a DHT with a d-dimensional uniform hash
function - Keys are deterministically mapped to a point in a
d-dimensional torus T - T is split up into zones (dynamically as peers
join and leave) - Every peer is responsible for one zone
key
peer
hash function
zone
T, d2
56CAN (2) Routing
- Every node maintains a routing table with
- the coordinates of the neighboring zones
- the IP addresses of the responsible peers
- Use greedy routing algorithm
- Complexity (assume n peers/zones) average path
length is (d/4) n(1/d)
number of zones n(1/d)
5 neighbors
key requested
57CAN (3) joining
- A peer A that wants to join
- randomly chooses a point in P in T
- contacts any CAN peer B
- and asks B to search for the zone/peer (say C)
that holds P - Then the zone of C is split up equally between A
and C
C
A
P
B
58CAN (4) leaving
- When a peer wants to leave, its zone is merged
with a neighbor if that leads to a valid zone - Otherwise, the neighbor with the smallest zone is
temporarily responsible for two zone
59PASTRY (1)
- Node ids and file keys are K bits e.g., K128
bits - Let N2K
- Node ids and file keys are assumed to be integers
with digits in base 2b for some b - If b4 and K128
- Pastry maps a file key to the node with id
numerically closest to it - Pastry uses hypercube-like routing
- Pastry uses a notion of (scalar) proximity
(round-trip time, number of hops in the Internet)
node id
1001
1101
..
0101
a0
a1
a31
digits
60Intermezzo Hypercubes
10
11
- n.2n-1 connections
- maximum distance n
- nodes that differ in 1 bit are connected
n 2
00
01
011
- Routing
- - scan bits from right to left
- - if different, send to neighbor
- with same bit different
- - repeat until end
000 -gt 111
111
010
110
n 3
001
101
000
100
61Pastry (2) routing table
- Assume b2, K16
- Notation common prefix-next digit-rest of
nodeid - Routing table in node 10233102 (IP addresses
omitted)
all other possible values
no node with suitable id known
No common prefix
Coincide in first bit
Coincide in first 2 bits
first 3 bits
Number of rows?
Number of columns?
62Pastry (3) routing table
- Size of routing table (log2bN) x (2b-1)
- A node tries to fill all rows in its routing
table - The lower in the table, the more difficult this
is (fewer and fewer nodes have the right prefix) - Routing/searching for file key k in node n
- send the request to a node which has a longer
common prefix with k than n - if such a node does not appear in the routing
table, send k to a node with a common prefix of
equal length which is closer to k - repeat this until found
- For most entries in the routing table, there are
many choiceschoose one with good proximity
63Pastry (4) routing table
- In node 10233102, search for key 10211011
- Routing complexity log2bN
64Pastry (5) leaf set
- It is a rare coincidence that a routing table is
completely filled - The leaf set of a node is a set of nodes of size
L, for some L - The leaf set of node n contains the L/2
numerically closest smaller nodeids and the L/2
numerically closest larger nodeids - Smallest (largest) of all these S (T)
- When routing for key k, always first check
whether SkT - If so, map k to closest of these
- Otherwise, use routing table
k
n
S
T
L/2
L/2
65Pastry (6) node joins
- Assume a joining node n knows at least one node
n1 - Problem how to fill the routing table of n
- Solution
- let n ask n1 to look for itself!!!!
- if n and n1 do not have a common prefix, n1 will
send the request to node n2 which has the first
digit in common with n - so the second row of the routing table of n2 will
have the first bit equal to the first bit of n - so n uses the second row of the routing table of
n2 - and n uses the third row of the routing table of
n3 - etc.
use leaf set of z as n is numerically close to z
use first row
use second row
n1
n
n2
z
66BitTorrent (1) searching
- BitTorrent is only a file downloading protocol
- It depends for file searching on other components
(web sites with content lists and trackers) - Trackers central components that keep track for
each file which peers are currently downloading
it - A peer that wants to download a file gets a
.torrent file from a web site with the address of
a tracker
peers downloading same file
Suprnova
tracker
67BitTorrent (2) downloading
- Files are split up in chunks (pieces of the file)
of size, e.g., 256 KByte (on the order of 1000
chunks per file) - A swarm is the set of peers that have all or part
of a file and are still online - Peers in a swarm exchange the ids of the chunks
they have - Leechers have only a part of a file and barter
for chunks of the file (tit-for-tat) - Seeders have the complete file and upload for free
downloaders
seeder
68BitTorrent (3) downloading
- Downloaders barter with a small number of peers
(e.g., 4) - Optimistic unchoking every 30 seconds, a
downloader contacts a fifth peer to see whether
is has better download performance - If so, it replaces one of the four peers
- BitTorrent implements
- the rarest-first policy to ensure a uniform
distribution of pieces among downloaders - endgame mode prevent peers who have all but a
few pieces from waiting too long to finish
69BitTorrent analysis (1)
- A fluid model of the operation of BitTorrent for
a single file - Parameters of the model
- x(t) the number of downloaders
- y(t) the number of seeders
- ? the arrival rate of peers wanting to download
the file - T the rate with which downloaders abort the
download - ? the rate with which seeders leave the system
- µ the upload bandwidth of all peers
- c the download bandwidth all peers, cµ
downloaders
seeders
?
?
x(t)
y(t)
?
70BitTorrent analysis (2)
- Maximum download capacity cx(t)
- Maximum upload capacity µ(?x(t) y(t))
- The parameter ? accounts for the reduced
download effectiveness of downloaders because
they do not have the complete file - Arrival and departure distributions assumed to
be exponential - System evolution governed by two differential
equations
become a seeder
arrivals
abort
number of downloaders
leave
number of seeders
71BitTorrent analysis (3)
- In steady state dx(t)/dt dy(t)/dt 0
- Then
- Suppose ? is positive
- Then with ß-1maxc-1,?-1(µ-1-?-1),
72BitTorrent analysis (4)
- Let T be the average download time of the
downloaders that become seeders - Arrival rate of downloaders that become seeders
- Fraction of downloaders that become seeders
- Littles formula
- average number of jobs in system N
- arrival rate ?
- average time in system W
- then N?W
- Apply here
?
N
W
73BitTorrent analysis (5)
- So
- with
- Consequences
- BitTorrent scales well T does not depend on the
arrival rate - When download effectiveness ? increases, T
decreases - When departure rate of seeders ? increases, T
increases - Up to some point, increasing the download
bandwidth c decreases T (until it is no
bottleneck anymore) - Similar statement for the upload bandwidth µ
74BitTorrent analysis (6)
- Suppose downloader D is connected to a set O of k
other downloaders - ? 1 - P(D has no piece that any peers in O
needs) - Assume that the piece distributions at the peers
are identical and independent - Then ? 1 - P(D has no piece that peer j in O
needs)k
O
D
75BitTorrent analysis (7)
- Assume that
- the numbers of pieces in the downloaders are
uniformly distributed on 0,1,,N-1, with N the
number of pieces in the file - the pieces at any downloader are taken at random
and uniformly from the pieces of the file
(rarest-piece first) - Then a computation shows that
- So in practice, ? will be close to 1