Title: IEG5270 Advanced Topics in P2P Networking Modeling and analysis of p2p content distribution algorith
1IEG5270 Advanced Topics in P2P
NetworkingModeling and analysis of p2p content
distribution algorithmsPart 1
- Dah Ming Chiu
- Chinese University of Hong Kong
2Recap
- The secrets of P2P content distribution
- Use multiple trees!
- Tree-based (push)
- Data-driven (pull)
- Utilize peers resources to satisfy goal
- File downing (as fast as possible)
- Streaming (as continuous as possible at given
playback rate) - BitTorrent
- A working, data-driven, p2p file-downing
algorithm - Bootstrapping metainfo, tracker
- Peer selection algorithm
- Chunk selection algorithm
3Models
- Why do we need models?
- To understand the limit of p2p system
- To understand the key factors, for designing
better algorithms - To help operate a p2p system
- The complexity of modeling a p2p system
- It is a large scale distributed system
- A large number of peers, and each with its own
state - The complexity of modeling the network conditions
- Links, routers, ISPs, bottlenecks
- The complexity of modeling the peer dynamics
- How do they arrive and leave the system
- Accuracy versus simplicity/usefulness
4Two types of models
- Throughput models
- Try to understand the capacity (maximum
throughput) of BT-like systems - A large number of models in this category
- Different levels of abstraction
- Different peer dynamics
- Different variations of the BT algorithm
- Exploring different metrics
- P2P streaming models
- YP Zhou et al is the first (?)
- Model the playback buffer to study playback
performance
5The uplink sharing model
- Assume away the network complexity
- Only the uplink can be the bottleneck in peer
communications - Practically all models make this assumption
- Its like the Poisson arrival assumption in
studying a queue - The term is coined in Mundingers thesis
6Mundiners thesis
- Summarized in
- Optimal Scheduling of P2P File Dissemination
- To appear in Journal of Scheduling
- Study the problem using a discrete model
- Closer to real situation
- Linkage to previous work on scheduling
- The results can be easily appreciated using fluid
assumption - Most other models make fluid assumption there
are many chunks, each very small - Key assumptions
- Uplink sharing model
- Static peer population
7Related work
- The broadcast problem
- Makespan the number of rounds for all N nodes
to receive M message from a sender - Unidirectional communication 2M log2(N)
- Bidirectional communication M log2(N)
- In general, if peer is uplink capacity is Ci,
(server is C0), what is the minimum makespan? - In this case, peers serve each other
asynchronously
8Convert to a synchronous problem
- First show minimum makespan can be achieved using
a synchronous system - Upload to one peer at a time
- Upload chunks in discrete time slots
9Prove equal capacity case by construction
The M2, N3 case
The first phase of the Mgt2, Ngt3 case
In the subsequent phase, peers start
back-filling Construct a strategy to finish all
pieces See paper for details
10Discussion
- The meaning of the formula
- For sufficiently large M, the minimum makespan
changes very slowly with increasing peer
population
11The unequal capacity case
- What is the minimum makespan for peers with
different uplink capacities? - It is a MILP (Mixed integer linear programming
problem) - See paper for details
- The following lemma is used to discretize the
problem
But there is a simple closed-form solution,
asymptotically when M is very large!
12Consider simple examples(small N and M)
- C1C2CN, but CS can be different
- Example 1 N2, M1, two cases
- Both peers download from server
- Peer 1 download from server, peer 2 download from
peer 1
13Example 2
- N2, M2, four cases
- Everything downloaded from server
- One peer downloads from server, 2nd peer download
from 1st - One peer downloads from server, 2nd peer downs
partly from 1st and partly from server - Each peer downloads exactly one chunk from
server, and the other chunk from each other
14Example 2 analysis
- N2, M2, four cases
- Everything downloaded from server
- One peer downloads from server, 2nd peer download
from 1st - One peer downloads from server, 2nd peer downs
partly from 1st and partly from server - Each peer downloads exactly one chunk from
server, and the other chunk from each other
15Minimum makespan for NM2
16The M? case fluid assumption
- Instead of makespan, think in terms of throughput
- Given the uplink bandwidth from the server and
peers, how do we allocate it so content can reach
each peer at the (same) maximum rate? - Mundinger proved something more general
For us, it is more intuitive, and more
immediately useful to look at the single server
case
17The simpler/more useful result
- Given the uplink sharing model (right figure) and
very large M, the maximum throughput is
This is a special case of Theorem 4.
DM Chiu et al, Can network coding help in P2P
networks, invited paper at 2nd NETCOD workshop,
2006
18Proof step 1 R is upper bound
- C0 is uplink bandwidth from server
- Server must be able to send content out at least
once it is a clear upper bound - C0 sumj(Cj) is the total uplink bandwidth from
server and all peers - Each peer must receive servers content from some
where. - The total demand is therefore nR, which must
equal to the total supply
19Proof step 2 Realizing R by construction
- One 1-hop tree (from sender)
- N 2-hop trees (from each peer)
- Assign rates optimally, satisfying the uplink
constraints - Two cases
- C0 gt C/(n-1) where C sum Ci
- C0 lt C/(n-1)
20Maximum rate achieved
- Case 1 C0 gt C/(n-1) where C sum Ci
- Assign rate Ci/(n-1) to the ith 2-hop spanning
tree - The ith spanning tree can deliver Ci/(n-1) to
each other peer - Assign rate C0 C/(n-1) to the 1-hop spanning
tree - The server can deliver (C0-C/(n-1))/n to each
peer - Each peer receives (C0 C)/n
- Case 2 C0 lt C/(n-1)
- Assign rate C0Ci/C to the ith 2-hop spanning
tree, which can then deliver to all other peers
at that rate - Each peer receives sum(C0Ci/C) C0
21Summary
- Simple static model
- Uplink sharing model
- Large M fluid assumption
- Static population (1 server n peers)
- Peers always able to help each other
- One simple extension more seeders
- m seeders and n downloaders
- Reason some peers stay around after becoming
seeders - Try work this out yourself
22Dynamic peers model
- Qiu and Srikant, Modeling and performance
analysis of BitTorrent-like peer-to-peer
networks, Sigcomm 2004 (Google Scholar 200)
In Mundingers model
a constant n a constant 1 0 Constant
Ci Unbounded 0
this is assumed to be 1
23A fluid model of population
- System equations
- In steady state
- where
?gt0 assumed
24Downloading time
Average number of peers who will complete
downloads
Average rate downloads are completed
Average downloading time
where
25Observations
- T is not dependent on arrival rate ?!
- When sharing efficiency ? increases, T decreases
- When seeders departure rate ? increases, T
increases - Initially, when downloading rate c increases, T
decreases, but c is large enough (uplinks become
bottleneck), c no longer affects T - Same observation about peer uplink rate ?
- Normally c gt ? for a given peer but if ? lt ?,
then there will be abundant uplink capacity (due
to helpers), and downlink will be the bottleneck
26Effect of sharing efficiency ?
- In the capacity equation (Mundinger), ?1 is
assumed - The analysis so far assumes 0lt?lt1
- When ?0, (no sharing by downloaders), two cases
- If rate of seeds leaving (?) is less than the
rate a seed can help a peer download a file (?),
then the system is limited by the downloading
capacity T 1/c - If otherwise
- This equation means y will decrease to 0, and the
system dies. - Note, in this model, there is no server, only
seeders.
27Stability
- It is not sufficient to derive a steady state
population by setting dx/dt dy/dt 0 - Must also show that the system will converge to
the steady state - See detailed discussion of local stability in
paper
28Simulation results
29Comparison to a queuing system
- A regular queue models a client-server system,
when the arrival rate exceeds the service rate,
the system blows up (queue goes to infinity) - The analysis of a p2p system is similar to that
for a queue, except the customers are also
servers - The arrival of a customer also brings some
additional service capacity (?gt0 case) - This makes T stay finite the p2p queue is
infinitely scalable! - A queue may different service models FCFS, LCFS,
priority, PS - What is the effect of different service models in
a p2p system?
30Tradeoff in rate allocationin uplink sharing
model
- B Fan, DM Chiu and J Lui, The delicate tradeoffs
in BitTorrent-like file sharing protocol design,
ICNP 2006 - Similar to Qiu Srikants model
- But multiple classes of peers
- Arrival rate
- Uplink capacity
- Downlink capacity
- fat and thin peers if 2 classes
- No seeders
31Feasible rate allocation space
32Performance metric 1average downloading time
- In steady state
- Number of type i peers
- Total upload capacity
- C??
- ciui/di ?
- ci is the share ratio of peer i
- Download time of type i peers
33Performance metric 2 fairness
- Fairness index
- It can be applied to any quantity x. In this
case, let xi ci - Therefore, fairness index
34Tradeoff
- Each rate allocation (u,d) yields a different T
and F - The space of different strategies and achievable
(T,F) - Consider three specific allocations
- Optimal downloading time
- Optimal fairness
- in terms of share ratio
- Max-min rate allocation
- Equalize the downloading rate of all peers,
unless it is limited by the downlink capacity
35Optimizing downloading time
- Problem definition
- Note
- assuming uploading rate uplink capacity
- Smaller type number means higher uplink
capacity - Applying standard Lagrange multiplier methods and
KKT conditions - Type 1 peer gets less downloading rate
proportionally speaking - Type 1 peer may even get less downloading rate
than thin peers
36Optimizing fairness
- This means equalizing the share ratios
- Therefore
- By same Lagrange multiplier methods
37Max-min rate allocation
- Problem definition
- Can use the water filling algorithm to find the
solution. - If none of the downlinks are bottlenecks
38Compare the results
- Since we have the closed form results for these
cases - The concept of Pareto optimal, or non-inferior
allocations
39Apply it to distributed algorithms
- Rate allocation analysis so far assumes knowledge
only available in centralized algorithms - Examples of distributed algorithms found in BT
- Tit-for-tat
- This yields the most fair rate allocation, in
equilibrium. - Peers form clusters
- Random
- This gives a solution similar to max-min
- Each peer gets average uploading rate in steady
state - Possible to adopt a weighted average of the two
- Select ns neighbors using tit-for-tat, and na
neighbors randomly
40Different mix of TFT and random
For small na, the mixed strategy is not able to
find best matched neighbors, (total number of
peers 100)
41Recap
- So far, we considered three models
- Mundingers model
- Gives throughput capacity for small N,M, or very
large M - But static peer population
- Qiu-Srikant model
- Gives average delay (hence also capacity) for
dynamic peer population, and various other
parameters - But sharing efficiency modeled by one parameter ?
- Single class of peers
- Fan-Chiu-Lui model
- Multiple classes of peers, and the tradeoff of
throughput and fairness - Assumes perfect sharing efficiency
- How to more accurately model sharing efficiency?