PeertoPeer Systems - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

PeertoPeer Systems

Description:

In peer-to-peer systems, all nodes have the same authority and ... Some of the first P2P were amateur' systems for information ... with a torus of some ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 76
Provided by: EPE28
Category:

less

Transcript and Presenter's Notes

Title: PeertoPeer Systems


1
Peer-to-Peer Systems
Fundamentals and Design of Distributed Systems
D.H.J. Epema
Parallel and Distributed Systems Group
2
Peer-to-Peer (P2P) systems
  • In most DSs, some nodes have more authority or
    functionality than others (e.g., in a
    client-server system)
  • In peer-to-peer systems, all nodes have the same
    authority and functionality, i.e., these systems
    are decentralized
  • Some of the first P2P were amateur systems for
    information storage and retrieval
  • P2P systems are dynamic nodes can come and go
  • Perhaps P2P systems arent that new Usenet since
    1979 (news propagation mechanism among UNIX
    machines)

3
A Peer
  • Pronunciation pir
  • Etymology Middle English, From Middle French
    per, from per, adjective, equal, from Latin par
  • Date 13th century
  • Meaning One that is of equal standing with
    another, especially, one belonging to the same
    societal group, especially based on age, grade,
    or status
  • So in Dutch een gelijke

4
Applications of P2P systems
  • Information storage/content distribution
  • e.g., for music or video files
  • file-oriented or search-oriented
  • examples Napster, Freenet, Gnutella, Chord,
    Pastry, CAN, BitTorrent, KaZaA, eDonkey
  • Computation
  • tap unused processing capacity
  • usually embarrassingly parallel applications
  • example SETI_at_HOME (Search for Extra-Terrestrial
    Intelligence)
  • Collaboration
  • games, virtual meetings

5
Issues in P2P systems
  • Searching (routing, locating) how to find (new)
    files/nodes, and how to route the replies back
  • Downloading get the required contents
    efficiently
  • Growth nodes can join the system
  • Shrinking nodes can leave the system (fail or
    disconnect)
  • Performance
  • Scalability
  • Freeriding deter users from only downloading
  • Security e.g., anonymity
  • The network of nodes participating in a P2P
    system as part of the Internet is an example of
    an overlay network

peers
6
Levels of P2P-ness
  • True P2P
  • a true lack of any central authority
  • examples Freenet, Gnutella, Chord
  • Central P2P
  • central components in the system
  • examples
  • Napster (has a central location database)
  • SETI_at_HOME (has a single central machine handing
    out the work)
  • BitTorrent (search is centralized)
  • Question Can a P2P system exist without a
    centrally managed component?

7
Topics
  • Anonimity
  • Graph structures
  • graph properties
  • three graph types
  • Routing/searching
  • Case studies
  • Freenet
  • Gnutella
  • Chord/CFS
  • CAN
  • Pastry
  • BitTorrent

8
Anonymity
  • Some P2P systems have as one of their goals
    anonymity
  • There are different forms of anonymity w.r.t. a
    document
  • of the author
  • of the publisher
  • of the servers storing the document
  • of the readers
  • of the document servers do not know what
    documents they are storing
  • of the query a server cannot tell what document
    it is using as a response

9
Small-world effect (1)
  • In 1967, Stanley Milgram did the following
    experiment
  • he gave the same letter to 160 random people in
    Omaha, Nebraska
  • he asked them to get the letters to a stockbroker
    in Boston, Ma
  • intermediaries had to know each other on a
    first-name basis
  • 42 letters made it to the stockbroker
  • with a median number of 5.5 intermediaries
  • at the time, the US had a population of about 200
    million

10
Small-world effect (2)
  • Apparently, path lengths in social networks tend
    to be short
  • Many people only know people in a small social
    circle, but a few have connections in far-a-way
    places
  • These highly connected nodes are very important
  • Similar phenomenon in WWW
  • number of clicks to get from any page to any
    other page
  • is unidirectional
  • portals play an important role here

11
Graph concepts
  • Given a connected graph (directed or not)
  • Degree of a node number of nodes it is connected
    to
  • In a regular graph, all nodes have equal degree
  • The average pathlength is average of the lengths
    of the (a) shortest path across all pairs of
    nodes
  • The neighborhood of a node is the set of nodes it
    is connected to
  • The clustering coefficient of a node is the
    fraction of potential links among the nodes in
    its neighborhood that is actually present
  • The clustering coefficient of a graph is the
    average of the clustering coefficients of its
    nodes

degree3
neighborhood
6 potential links
12
Graph types
  • We will consider three graph types
  • Regular graphs
  • high clustering coefficient, long paths
  • Random graphs
  • low clustering coefficient, short paths
  • Power-law graphs
  • varying clustering coefficients, short paths,
  • found in practice
  • (Small-world graphs high clustering coefficient,
    short paths)

13
Regular graphs lattices
  • Characterized by three numbers
  • dimension d
  • number of nodes along one dimension n
  • number of links in each of the 2d directions k/2
  • Structure
  • start with a torus of some dimension
  • add (k/2)-1 connections in each direction to the
    nearest nodes

k4
d1, k2
14
Regular graphs 1-Lattices
neighborhood
k2
k4
Clustering coefficient
0
0.5
15
Regular graphs 2-Lattices
k2 k4
(4)

(8)
Clustering coefficient
0
6/28
number of potential connections
16
Properties of lattices
  • Degree dk
  • Average pathlength in a d-lattice is
    approximately
  • d (1/2) (n/2) / (k/2) dn/2k
  • Clustering coefficient of a 1-lattice
  • Regular graphs have a
  • high average pathlength
  • high clustering coefficient
  • So no small-world effect

1
n-1
maximum distance
dimension
step size
average
(3/4) ((k-2) / (k-1))
17
Random graphs
  • Number of nodes n
  • Probability of any potential link being present
    p
  • Average degree kp(n-1)
  • Total number of links pn(n-1)/2
  • May not be connected
  • Experimental result for large n and k5, the
    largest component of a random graph is almost
    equal to the complete graph

n-1 potential links
p
18
Properties of random graphs
  • Clustering coefficient
  • Average pathlength approximately
  • Intuitive explanation
  • start with a single node
  • add its k neighbors
  • add the k2 neighbors of those neighbors
  • do this l times, with l the average pathlength
  • then we have all nodes, so nkl
  • Random graphs have a
  • low average pathlength
  • low clustering coefficient
  • So no small-world effect

p
p
log(n)/log(k)
19
Degree distribution random graphs
  • The degree in a random graph with n nodes has a
    binomial distribution
  • This decreases very fast for large k

probability that degree is equal to k
k links out of n-1 potential ones
20
Comparison regular-random
Q has 1 of Ps neighbors as neighbor
Q
random graph




10,000
P has 100 neighbors
P
0.74
50
2.00
0.01
21
Power-law graphs (1) definition
  • The fraction of nodes Pk with k links satisfies
  • Pk Ck-?
  • for some C, k, ? gt 0
  • Pk decreases very slowly
  • Usually the degree exponent ? is between 2 and 3
  • Properties
  • there exist a few nodes with many connections
  • a low average path length (order log(n))
  • random failures dont have a large impact
  • Power-law graphs occur often in nature
  • Other name scale-free networks

22
Power-law graphs (2) construction
  • Deterministic method to construct power-law
    graphs
  • start with a graph G consisting of a single node
  • add to identical copies G and G at a lower
    level
  • connect the root of G with all leaves of G and
    G
  • repeat steps 2 and 3 as often as you like

9 nodes 4 leaves at lowest level
G1
GG2
27 nodes 8 leaves at lowest level
G
G
23
Power-law graphs (3) degree exp.
  • Graph Gn after n steps
  • number of nodes 3n
  • number of leaves at lowest level 2n
  • degree of root 2n1 2 (with induction, equal
    to 2n-2 2.2n-1)
  • Hub root of the graph at some level
  • Concentrate for computation of degree exponent on
    hubs
  • Consider graph Gi after i steps
  • After n steps (ngti), there are 3n-i copies of Gi
    (and of its root)
  • Of these roots, a fraction of 2/3 never increase
    their degree

degree of root in Gn-1
links to new leaves
degree changed degree unchanged
Gi
3 x Gi
9 x Gi
ni1
ni2
ni
24
Power-law graphs (4) degree exp.
degree of Gi
  • So there are (2/3)3n-i nodes (hubs) in Gn with
    degree 2i1-2 2i1
  • So a fraction of (2/3)3-i23-(i1) has this
    degree (divide by 3n)
  • Might as well say a relative fraction of 3-i has
    degree 2i
  • As a consequence, the degree distribution
    satisfies
  • Pk k(-ln 3/ln 2)
  • So the degree exponent is ? ln 3 / ln 2
  • Clustering coefficient is 0 (no triangles)

25
Power-law graphs (5)
  • Stochastic methods for generating power-law
    graphs
  • incremental growth add nodes with links or sets
    of links one by one
  • preferential attachment connect new nodes to
    highly connected ones
  • Node i has degree di
  • Node i is chosen with probability Pidi/?di
  • Start with some graph with m0 nodes and m0-1
    edges
  • In each step
  • with probability p add mm0 edges, with edges
    chosen according to probability distribution
    Pi
  • with probability 1-p add a single node with m
    links, also according to the distribution Pi

26
Power-law graphs (6)
  • From T. Bu and D. Towsley, On Distinguishing
    between Internet Power Law Topology Generators,
    IEEE Infocom 2002
  • The topology of the network of Autonomous
    Systems in the Internet has a power-law structure
    with a degree exponent of -2.18
  • The characteristic path length L of a graph is
    the median of the means of the shortest path
    lengths connecting every node to all other nodes
  • Comparison of Internet AS graph with random
    graph with same characteristic pathlength

Internet
random
27
Routing/searching (1)
  • In some P2P systems both nodes and files are
    identified by random numbers
  • node ids for instance a hash of IP-addresses
  • file keys for instance a hash on contents or
    keywords
  • File search may be by
  • file name
  • key words
  • file key
  • Reply to a search can be
  • unique (search for specific file)
  • not unique (multiple replies satisfying key words)

28
Routing/searching (2)
  • Distributed hash tables (DHTs)
  • use of random numbers for nodes and files
  • main problem mapping files to nodes
  • mapping is fixed
  • Unstructured P2P systems
  • mapping of files to nodes is not fixed
  • Guaranteed routing
  • search for existing file is always successful
  • examples Chord, DHTs in general
  • Probabilistic routing
  • search for existing file may fail
  • examples Freenet, Gnutella

29
Probabilistic routing
  • Messages contain a Time-To-Live (TTL, or
    hops-to-live) value which determines the maximum
    number of hops a request message can travel
  • The TTL value is decremented in every node
    visited
  • When the TTL value reaches zero, the message is
    not forwarded anymore
  • Messages contain a pseudo-random pseudo-unique
    identifier, to determine whether a message has
    visited a node previously
  • If so, the message may be discarded or sent one
    hop back

TTL
3
2
1
0
30
Freenet design goals
  • Freenet is a cooperative distributed file system
    for anonymous information storage and retrieval
    with location independence and lazy replication
  • Nodes volunteer disk space, no guarantees for
    permanent availability of files
  • Design goals
  • anonymity of both producers and consumers of
    information
  • efficient dynamic storage and routing of
    information
  • decentralization of network functions
  • No
  • broadcast searches for files
  • centralized location indexes of files

31
Freenet file keys (1)
  • Files with a keyword-signed key
  • user chooses a descriptive text string for the
    file
  • from this keyword, a public/private key pair is
    derived
  • the public half is hashed to yield the file key
  • the private half is used to sign the file
  • the publisher of a file publishes descriptive
    text string
  • user uses the file key to locate the file
  • user uses the signature to check the file (do I
    get what I asked for)
  • Problems
  • two users may choose the same descriptive keyword
  • users may insert junk files under popular
    keywords (key-squatting)

32
Freenet file keys (2)
  • Files with a signed-subspace key
  • enables users to have personal name spaces,
    identified by a randomly generated public/private
    key
  • only owner of name space can add files (requires
    private key)

33
Freenet retrieving data (1)
  • Every node maintains a routing cache with
    (file-key, node-id) pairs of files it expects to
    be at specific nodes

file key, node-id
F387
node 5
F783
F456
node 8
F124
node 1
34
Freenet retrieving data (2)
  • A request for a file is sent to the node that
    stores the file with the closest file-key in the
    routing cache
  • This is repeated along a chain of nodes

fid398
F398
1
8
5
35
Freenet retrieving data (3)
  • A request travels a chain of length at most equal
    to the TTL
  • In the message, also a depth is maintained
  • increased at every hop
  • used as TTL for the way back
  • Nodes can play with these values to increase
    anonimity
  • When the TTL reaches zero, the request is not
    forwarded, and a failure message is sent back

36
Freenet retrieving data (4)
  • When a node holding the file is found, the file
    is sent back along the chain traveled by the
    request
  • Each node along this chain
  • caches the file
  • records the pair (file key, id of originating
    node) in its routing cache
  • To support anonymity, nodes along the way may
    pretend that they are the originator of the file

(398,5)
(398,5)
5
(398,5)
F398
requester for file F398
F398
F398
F398
37
Freenet storing data (1)
  • A request for insert contains
  • a file key
  • a hops-to-live value (number of copies to
    create)
  • The request for insert is routed as a request for
    a file
  • Each node along the way checks whether a key
    collision occurs
  • If so
  • an error message is sent back
  • the user has to choose a new file keyword and try
    again
  • When no collision is detected, the file is stored
    along the whole chain traveled by the insert
    request

38
Freenet storing data (2)
  • When no collision is detected
  • a success message is sent back
  • the file is stored along the whole chain traveled
    by the insert request
  • the pair (file key, id of inserter) is entered in
    the routing caches of the nodes along the chain

(398,1)
(398,1)
(398,1)
1
5
F398
inserter of file F398
F398
F398
F398
39
Freenet effects of retrieving and storing
  • The keys in a nodes routing table tend to get
    clustered
  • a node attracts requests for similar keys
  • a node attracts inserts for similar keys
  • Popular data will get stored (logically) close to
    the requesters
  • Requests and inserts increase connectivity,
    because nodes get to know about previously
    unknown nodes

40
Freenet remarks
  • Both the routing tables and the file storage may
    implement LRU
  • In general, it seems that Freenet in operation
    tends to a power-law network (connections are
    entries in the routing caches)

41
Gnutella (1)
  • Gnutella is search-oriented a request may get
    multiple answers
  • Requests are broadcast
  • Requests carry a TTL (typical initial value 7)
  • When a request revisits a node, it is discarded
  • Nodes are usually connected to about 3-5 other
    nodes
  • As a result, messages get to about 10,000 nodes
    (47 16,000)

1
2
3
42
Gnutella (2)
  • Replies travel back along the route of the
    request
  • Replies do not contain contents, but the IP
    address of the location found
  • Downloads through HTTP
  • Gnutella should be self-organizing fast nodes in
    the core of the network, slow ones at the fringes
  • Was very popular in 1999-2002
  • There are modifications to the protocol that make
    it more efficient

43
Distributed hash tables (DHTs)
  • Nodes and file keys are random numbers of some
    number of bits
  • Usually assumed to be uniformly distributed
  • So nodes with ids that are close are with high
    probability diverse in geography, ownership, etc.
  • Some mechanism is used to map (hash) keys to
    nodes in a distributed system
  • Main question route requests for keys to the
    proper node
  • DHTs usually
  • have guaranteed routing
  • do not support anonimity

44
Chord hashing (1)
no nodes in between
  • Chord uses what is called consistent hashing
  • Both files key and node-ids are m bits wide
    (e.g., m128)
  • Assume the numbers 0 through 2m-1 to be arranged
    clock-wise along a ring
  • Some of these represent actual nodes
  • For a number k, successor(k) is the first node in
    the clock-wise direction equal to or larger than
    k
  • Assume a network of N nodes

2
1
0
2m-1
actual nodes
successor
45
Chord hashing (2)
  • A file with key k is mapped to the node with id
    successor(k)
  • So every node is responsible for the files with
    keys that map to a segment of the ring

successor(k)
node id and file key coincide
k
nodes files mapped to
segment
responsible node
46
Chord locating (1)
  • In principle easy
  • every nodes keeps the id and the IP-address of
    its successor in the ring
  • when locating the file with key k, a node sends
    the file key along the ring
  • when the file key reaches node successor(k), it
    is clear if the file is present in the system
  • reply sent to originator of request
  • is not efficient (of order N/2)

k
successor(k)
47
Chord locating (2)
successor(n1) successor(n2) successor(n4) su
ccessor(n8)
  • Additional data structure in every node
  • the finger table
  • contains entries of (node-id, IP-address) pairs
    at distances equal to powers of 2
  • In node n,
  • fingerisuccessor(n2i-1), i1,2,,m
  • Many of the first elements of the finger table
    will be equal gaps between successive nodes in
    the ring are of expected size 2m/N
  • With a million nodes (220), the gaps between
    nodes are of size 2108, and the first 108
    elements of the finger table are expected to
    coincide

n2i-1
n
n2i
48
Chord locating (3)
  • When node n wants to locate key k
  • if kfingeri for some i, check with node k
  • if kltfinger1, check with node finger1
  • else
  • send a request to the largest node nfingeri
    with kgtn
  • n repeats all of this

case 1
case 2
k
k
n
n
k
case 3a
49
Chord locating (4)
  • Two possibilities
  • replies back to originating node
  • replies sent along to next nodes

node n
node n
k
finger1
file key k
n
fingeri
node n
n
k
k

finger1
fingeri1
node n
50
Chord locating (5)
  • Complexity of look-up log(N)
  • First complexity is of order at most m (
    log(ids) )
  • Let node n look for key k
  • Let node p be the last node preceding k (last
    step in search)
  • Consider the interval n2i-1,n2i) that contains
    p
  • In n, let fingerifsuccessor(n2i-1)
  • Then fp
  • Distance between n and f is at least 2i-1
  • Distance between f and p is at most 2i-1
  • So f is closer to p than to n distance is cut at
    least in half!

n
p

)
f
k
2i-1
2i-1
51
Chord locating (6)
  • Assume nodes are uniformly distributed along the
    ring
  • So a node covers a segment of the ring of
    length 2m/N
  • Why complexity log(N) instead of m ?
  • Reason many of the first elements in a finger
    table coincide because nodes are sparsely
    distributed
  • So many of the last steps are covered by a single
    step

n
is fingerk for k not small
52
Chord locating (7)
  • On average, on a segment of the ring of length L,
    there are N(L/2m) nodes (L/2m is the fraction of
    the ring considered)
  • After A steps, the distance between the current
    query node and the key k will be reduced to at
    most 2m/2A (previous slide)
  • Take A2log(N)log(N2) the distance between the
    current query node and the key k will be reduced
    to 2m/N2 (2AN2)
  • Probability of having a node on a segment of
    length 2m/N2 is 1/N, which can be neglected
  • So only A steps needed

after A steps
no nodes
53
Chord node joins (1)
relocate these file keys from n2 to n
  • In principle easy
  • a new node n has to know any existing node n in
    the ring
  • and ask it to find the elements in its finger
    table by searching for successor(n2i), for
    i0,1,
  • node n can then announce its presence to its
    successor (n2) to get the keys it should store
  • However, the finger tables of other nodes may
    have to be adapted (in particular, of its
    predecessor (n1))
  • In general, when data structures are suspected to
    be inconsistent, a node can execute steps 1 and 2

n
n2
n1
finger1
n
54
Chord node joins (2)
n
  • Additional data structure predecessor
  • Steps for n to join
  • node n gets finger1n2 from n
  • node n contacts n2
  • n2 sends its predecessor (n1) back, which is now
    the predecessor of n
  • n2 sets its predecessor to n
  • node n contacts n1 so n1 can set its successor to
    n

n2
n1
n
successor predecessor
55
Content Addressable Network (CAN) (1)
  • CAN is a DHT with a d-dimensional uniform hash
    function
  • Keys are deterministically mapped to a point in a
    d-dimensional torus T
  • T is split up into zones (dynamically as peers
    join and leave)
  • Every peer is responsible for one zone

key
peer
hash function
zone
T, d2
56
CAN (2) Routing
  • Every node maintains a routing table with
  • the coordinates of the neighboring zones
  • the IP addresses of the responsible peers
  • Use greedy routing algorithm
  • Complexity (assume n peers/zones) average path
    length is (d/4) n(1/d)

number of zones n(1/d)
5 neighbors
key requested
57
CAN (3) joining
  • A peer A that wants to join
  • randomly chooses a point in P in T
  • contacts any CAN peer B
  • and asks B to search for the zone/peer (say C)
    that holds P
  • Then the zone of C is split up equally between A
    and C

C
A
P
B
58
CAN (4) leaving
  • When a peer wants to leave, its zone is merged
    with a neighbor if that leads to a valid zone
  • Otherwise, the neighbor with the smallest zone is
    temporarily responsible for two zone

59
PASTRY (1)
  • Node ids and file keys are K bits e.g., K128
    bits
  • Let N2K
  • Node ids and file keys are assumed to be integers
    with digits in base 2b for some b
  • If b4 and K128
  • Pastry maps a file key to the node with id
    numerically closest to it
  • Pastry uses hypercube-like routing
  • Pastry uses a notion of (scalar) proximity
    (round-trip time, number of hops in the Internet)

node id
1001
1101
..
0101
a0
a1
a31
digits
60
Intermezzo Hypercubes
10
11
  • n.2n-1 connections
  • maximum distance n
  • nodes that differ in 1 bit are connected

n 2
00
01
011
  • Routing
  • - scan bits from right to left
  • - if different, send to neighbor
  • with same bit different
  • - repeat until end

000 -gt 111
111
010
110
n 3
001
101
000
100
61
Pastry (2) routing table
  • Assume b2, K16
  • Notation common prefix-next digit-rest of
    nodeid
  • Routing table in node 10233102 (IP addresses
    omitted)

all other possible values
no node with suitable id known
No common prefix
Coincide in first bit
Coincide in first 2 bits
first 3 bits
Number of rows?
Number of columns?
62
Pastry (3) routing table
  • Size of routing table (log2bN) x (2b-1)
  • A node tries to fill all rows in its routing
    table
  • The lower in the table, the more difficult this
    is (fewer and fewer nodes have the right prefix)
  • Routing/searching for file key k in node n
  • send the request to a node which has a longer
    common prefix with k than n
  • if such a node does not appear in the routing
    table, send k to a node with a common prefix of
    equal length which is closer to k
  • repeat this until found
  • For most entries in the routing table, there are
    many choiceschoose one with good proximity

63
Pastry (4) routing table
  • In node 10233102, search for key 10211011
  • Routing complexity log2bN

64
Pastry (5) leaf set
  • It is a rare coincidence that a routing table is
    completely filled
  • The leaf set of a node is a set of nodes of size
    L, for some L
  • The leaf set of node n contains the L/2
    numerically closest smaller nodeids and the L/2
    numerically closest larger nodeids
  • Smallest (largest) of all these S (T)
  • When routing for key k, always first check
    whether SkT
  • If so, map k to closest of these
  • Otherwise, use routing table

k
n
S
T
L/2
L/2
65
Pastry (6) node joins
  • Assume a joining node n knows at least one node
    n1
  • Problem how to fill the routing table of n
  • Solution
  • let n ask n1 to look for itself!!!!
  • if n and n1 do not have a common prefix, n1 will
    send the request to node n2 which has the first
    digit in common with n
  • so the second row of the routing table of n2 will
    have the first bit equal to the first bit of n
  • so n uses the second row of the routing table of
    n2
  • and n uses the third row of the routing table of
    n3
  • etc.

use leaf set of z as n is numerically close to z
use first row
use second row
n1
n
n2
z
66
BitTorrent (1) searching
  • BitTorrent is only a file downloading protocol
  • It depends for file searching on other components
    (web sites with content lists and trackers)
  • Trackers central components that keep track for
    each file which peers are currently downloading
    it
  • A peer that wants to download a file gets a
    .torrent file from a web site with the address of
    a tracker

peers downloading same file
Suprnova
tracker
67
BitTorrent (2) downloading
  • Files are split up in chunks (pieces of the file)
    of size, e.g., 256 KByte (on the order of 1000
    chunks per file)
  • A swarm is the set of peers that have all or part
    of a file and are still online
  • Peers in a swarm exchange the ids of the chunks
    they have
  • Leechers have only a part of a file and barter
    for chunks of the file (tit-for-tat)
  • Seeders have the complete file and upload for free

downloaders
seeder
68
BitTorrent (3) downloading
  • Downloaders barter with a small number of peers
    (e.g., 4)
  • Optimistic unchoking every 30 seconds, a
    downloader contacts a fifth peer to see whether
    is has better download performance
  • If so, it replaces one of the four peers
  • BitTorrent implements
  • the rarest-first policy to ensure a uniform
    distribution of pieces among downloaders
  • endgame mode prevent peers who have all but a
    few pieces from waiting too long to finish

69
BitTorrent analysis (1)
  • A fluid model of the operation of BitTorrent for
    a single file
  • Parameters of the model
  • x(t) the number of downloaders
  • y(t) the number of seeders
  • ? the arrival rate of peers wanting to download
    the file
  • T the rate with which downloaders abort the
    download
  • ? the rate with which seeders leave the system
  • µ the upload bandwidth of all peers
  • c the download bandwidth all peers, cµ

downloaders
seeders
?
?
x(t)
y(t)
?
70
BitTorrent analysis (2)
  • Maximum download capacity cx(t)
  • Maximum upload capacity µ(?x(t) y(t))
  • The parameter ? accounts for the reduced
    download effectiveness of downloaders because
    they do not have the complete file
  • Arrival and departure distributions assumed to
    be exponential
  • System evolution governed by two differential
    equations

become a seeder
arrivals
abort
number of downloaders
leave
number of seeders
71
BitTorrent analysis (3)
  • In steady state dx(t)/dt dy(t)/dt 0
  • Then
  • Suppose ? is positive
  • Then with ß-1maxc-1,?-1(µ-1-?-1),

72
BitTorrent analysis (4)
  • Let T be the average download time of the
    downloaders that become seeders
  • Arrival rate of downloaders that become seeders
  • Fraction of downloaders that become seeders
  • Littles formula
  • average number of jobs in system N
  • arrival rate ?
  • average time in system W
  • then N?W
  • Apply here

?
N
W
73
BitTorrent analysis (5)
  • So
  • with
  • Consequences
  • BitTorrent scales well T does not depend on the
    arrival rate
  • When download effectiveness ? increases, T
    decreases
  • When departure rate of seeders ? increases, T
    increases
  • Up to some point, increasing the download
    bandwidth c decreases T (until it is no
    bottleneck anymore)
  • Similar statement for the upload bandwidth µ

74
BitTorrent analysis (6)
  • Suppose downloader D is connected to a set O of k
    other downloaders
  • ? 1 - P(D has no piece that any peers in O
    needs)
  • Assume that the piece distributions at the peers
    are identical and independent
  • Then ? 1 - P(D has no piece that peer j in O
    needs)k

O
D
75
BitTorrent analysis (7)
  • Assume that
  • the numbers of pieces in the downloaders are
    uniformly distributed on 0,1,,N-1, with N the
    number of pieces in the file
  • the pieces at any downloader are taken at random
    and uniformly from the pieces of the file
    (rarest-piece first)
  • Then a computation shows that
  • So in practice, ? will be close to 1
Write a Comment
User Comments (0)
About PowerShow.com