Title: A Step Back Reflections on P2P
1A Step BackReflections on P2P
- Boris Capitanu
- Ellick Chan
- 10/12/2004
22 P2P or Not 2 P2P?
- Mema Roussopoulos
- Mary Baker
- David S. H. Rosenthal
- TJ Giuli
- Petros Maniatis
- Jeff Mogul
3Ideal P2P properties
- Self Organizing
- P2P routing
- Discovery
- Symmetric communication
- Peers are approximately equal
- Decentralized control
- No single point of failure
4Constraints
- Budget
- Resource relevance to participants
- Trust
- Rate of system change
- Criticality
- Accountability
- Fault tolerance
5Candidate problems
- Routing Problems
- Internet Routing (RON)
- Ad hoc Routing in Disaster Recovery
- Metropolitan-area Cell Phone Forwarding
- Backup
- Internet Backup
- Corporate Backup
- Distributed Monitoring
6Candidate problems
- Data Sharing
- File sharing
- Censorship Resistance
- Data Dissemination
- Usenet
- Non-critical Content Distribution
- Critical Flash Crowds
- Auditing
- Digital Preservation
- Distributed Time Stamping
7P2P models
Usenet
Gnutella
Images from http//www.cybergeography.org/atlas/mo
re_topology.html
82 P2P or not P2P
Budget
Relevance
Trust
9Budget
Low
Effect
High
- Lowest possible cost per peer, rather than lowest
global cost - Bit Torrent, Gnutella, Freenet, etc.
- SETI_at_home
- Dictates how many peers join
- Decides if P2P is viable for problem
- Worries less about performance criticality
- Favors centralized approaches, P2P irrelevant
- Clusters, High performance computing
10Relevance
Low
Effect
High
- Personal data
- Private data
- Internet backup
- Corporate backup
- Web caching
- Relevance of resources encourages peers to join
- When resource relevance is high, cooperation in
a P2P solution evolves naturally
- File sharing
- Freenet
- Content distribution
- Internet routing
- Bit Torrent
- Gnutella
- Kazaa
11Trust
Low
Effect
High
- Encryption
- Anonymity
- Freenet
- Oceanstore
- Ivy
- Timestamping
- MojoNation
- Gnutella
- Napster
- Overlays
- File sharing
- Usenet
12Rate of Change
Low
Effect
High
- Tangler
- Freenet
- LOCKSS
- Time stamping
- Content distribution
- Usenet
- Flash crowds
- Churn
- Timeliness
- Consistency
- Internet routing
- Online net monitoring
13Criticality
Low
Effect
High
- Usenet
- Content distribution
- Offline net study
- File sharing
- Centralized control
- Accountability
- Fault tolerance
- Ad hoc disaster recovery
- Flash crowds
- Internet monitoring
- Routing
142 P2P or not P2P
Budget
Relevance
Trust
15Conclusion
- Framework for analyzing P2P applications
- Captures constraints and app requirements
- Limited budget is motivating factor
- Problems with low relevance are inappropriate for
P2P
16Critique
- Strengths
- Quantifies application requirements and suitable
use cases - Generically describes suitability of classes of
P2P apps - Weaknesses
- Incomplete view of requirements
- Fuzzy requirements not accounted for
17Service Capacity of Peer to Peer Networks
- Xianying Yang
- Gustavo de Veciana
18Service Capacity
- of peers available to serve a document
- Throughput of P2P system
- Average delay
- Rate of dissemination
- What factors govern the effectiveness of a system
to scale?
19Research problem
- Analyze behavior of P2P systems
- Describe and model capacity behavior
- Transient regime
- Steady state
- Analyze conditions
- Does system scale as modeled?
- Are delays and throughput bounded?
20Throughput
21Throughput
22Service capacity model
- Steady state
- Impact of peer join/departure
- Performance
- Factors
- Peer selection
- Data management
- Multipart downloads
- Size of parts
- Admission and scheduling
- Traffic
23Analysis
- 2 States
- Transient (branching process model)
- Steady state
- Deterministic
- Branching process
- Markov chain
24Deterministic
0
Time
Rate
1
- N-1 users want a doc
- N2k
- S bits per request
- S(n-1) bits total
- Time interval ? at s/b seconds
- Exponential growth
- Ability to serve large bursts
- Average delays scales by lg(n)
0
Count
1
25Deterministic
1
Time
Rate
1
- N-1 users want a doc
- N2k
- S bits per request
- S(n-1) bits total
- Time interval ? at s/b seconds
- Exponential growth
- Ability to serve large bursts
- Average delays scales by lg(n)
0
Count
2
1
26Deterministic
2
Time
Rate
2
- N-1 users want a doc
- N2k
- S bits per request
- S(n-1) bits total
- Time interval ? at s/b seconds
- Exponential growth
- Ability to serve large bursts
- Average delays scales by lg(n)
0
Count
6
1
2
2
27Deterministic
3
Time
Rate
4
- N-1 users want a doc
- N2k
- S bits per request
- S(n-1) bits total
- Time interval ? at s/b seconds
- Exponential growth
- Ability to serve large bursts
- Average delays scales by lg(n)
0
Count
8
1
2
3
3
2
3
3
28Multipart
- M identical size chunks
- Service completions at s/mb??m seconds
- Optimization, peers favor others with no chunks
- At time k, system is partitioned into k sets
Ai,i1k. - Ai2k-i
- Ai corresponds to peers who have only received
the ith chunk
A4
A2
A3
A1
Time slot k
29Multipart
- M identical size chunks
- Service completions at s/mb??m seconds
- Optimization, peers favor others with no chunks
- At time k, system is partitioned into k sets
Ai,i1k. - Ai2k-i
- Ai corresponds to peers who have only received
the ith chunk
A4
A2
A3
Time slot k
30Peer groups
A1
A2
A3
A4
S
t0
t1
t2
tk
31Multipart
- Delay is in effect reduced by a factor of m
- Large values of m better, but require more
network overhead - Congestion, bandwidth bottleneck ignored in this
model
32Branching Process Model
- Let Nd(t)peers serving document d at time t.
- Ti is a random variable, transfer time
- ET?1/?
- Age dependent branching process model, v2
33Branching Process Model
34Branching Process Model
- ????are growth characteristics
- If T is exponentially distributed, ??????????
- If T is deterministic, ?????ln2???????
- Exponential distribution increases growth exponent
35Effect of v on Growth
Theorem II
- ??is inversely proportional to v
- Large fanout decreases growth exponent
- Intuition limit number of downloads at each peer
36Peer Churn
- Peers exit system with probability 1-??upon
completion - If v??
- When peers exit, allowing multiple upload ensures
document availability and system growth
System increases slowly with increasing v
37Effect of m
- Allowing multipart downloads increases
performance by factor m - Growth rate increased by factor m
- Delay factor is reduced by 1/m
- Assumes peers are not simultaneously sharing
multiple parts of files
38Summary
Multipart
Branching
Deterministic
- Time interval ??for transfer
- N2k
- Delays bounded by ??log n
- Exponential growth
- Time interval ???m
- Delays bounded by (??m)?log n
- Space partitioned into sets
- More chunks is faster
- Network overhead is high
- Time interval ? is a random variable
- Delays bounded by log ?
- Parameters ??? determine operation
- Accounts for congestion, churn
39Markov Chain
- Distant past irrelevant with knowledge of recent
past - Sequence of random variables, X1Xn
- Transition matrix
- Eigenvectors determine stable state conditions
40Markov Chain
Sunny
Rainy
P(RainySunny)
Sunny
Rainy
P(RainyRainy)
P(SunnyRainy)
P(SunnySunny)
Weather, day 0
Weather, day 1
Weather, day 2
Weather, day n
41Markov Model
- Poisson process r
- State
- xof peers requesting
- y peers hosting
- Multipart files
- Partial peers contribute at rate
- Total rate
i
Q
S0
Si
(1)
Exponentially distributed
Full service rate
Exit rate
42Markov Model
43Markov Model
44Performance
Seeds/downloaders
- Seeds/downloaders
- ? is upload ratio of downloader to seed
- System with high ? leverages capacity
- Marginal change of system performance low when
offered load is high
45Bit Torrent
- Multipart d/l
- Chunk size 1 mb
- Credit system
- Updates every 5 min
- 150-200 file insertions
Service capacity
Throughput
Delay
46Total Throughput
47Average Throughput
48Offered load and Delay
49Throughput
50Conclusions
- Credit system, growth are diametric
- Offered load linearly scales with number of peers
- Large multi-part files spread better
- Peer churn reduces throughput to constant
- Delays decrease with offered load
51Scooped, Again
- Jonathan Ledlie, Jeff Shneidman, Margo Seltzer,
John Huth
52Outline
- Introduction
- Grid Computing
- P2P Systems
- Fallacies preventing cooperation
- Shared and Disjoint Problems
- Conclusions
What they are, Goals, Manifestations
53Introduction
- Background, Motivation
- Peer-to-Peer vs. Grid Computing
- Overlapping problem domain
- P2P focuses on research
- Grid is concerned with concrete, tangible
solutions - History, repeated the Web
54Introduction cont.
- Current trends
- Divergent, parallel development
- Duplication of work
- Grid risk of non-optimal solutions
- Missing out on P2Ps strong achievements (search
and storage scalability, decentralization,
anonymity, denial of service prevention) - Cooperation is the key
55Grids
- What is the Grid?
- a type of parallel and distributed system that
enables the sharing, selection, and aggregation
of resources distributed across multiple
administrative domains based on the resources
availability, capability, performance, cost, and
users QoS requirements - Short version virtualizing computer resources
- Large scale heterogeneous resource sharing
(different platforms, hardware/software
architectures, and computer languages) - Functional classification
- Computational grids (run batch jobs during idle
times) - Data grids
56Grid Layout
57Grid Goals
- Design goal
- Solve problems too big for a single
supercomputer, but retain the flexibility to work
on multiple smaller problems - Self-configuring, self-tuning, self-healing
- Allow data sharing and support computation across
administrative domains - Standardized programming interface
- GGF (Global Grid Forum)
- Globus toolkit the de facto standard for grid
middleware
58Grid Manifestations
- Protocols
- Resource management
- Grid Resource Allocation Management Protocol
(GRAM) - Information services
- Monitoring and Discovery Service (MDS)
- Security services
- Grid Security Infrastructure (GSI)
- Data movement and management
- Global Access to Secondary Storage (GASS),
GridFTP - Tools
- Grid Portal Software (GridPort, OGCE)
- Grid Packaging Toolkit
- Grid-enabled MPI (MPICH-G2)
- Network Weather Service
- Condor (CPU cycle scavenging) and Condor-G (job
submission) - APIs
- Web Services Open Grid Services Architecture
(OGSA)
59P2P
- What is P2P?
- a class of applications that take advantage of
resources storage, cycles, content, human
presence available at the edges of the
Internet - Decentralized, non-hierarchical node organization
- Inherently untrusted
60P2P Goals
- Cost sharing / reduction
- Every peer responsible of its own cost
- Reduction of file storage costs
- Reduction of computation costs
- Improved scalability / reliability
- Lack of centralization allows new algorithms
(CAN, Chordetc) to be designed to allow improved
scalability - Resource Aggregation
- Every peer lends its own resources to the network
- Increased Autonomy
- Tasks are performed locally no central service
provider
61P2P Goals cont.
- Anonymity / Privacy
- FreeNet
- Dynamism
- Nodes enter and leave the system in a transparent
way - Ad-hoc communication
- Members can join and leave based on their
physical location or interests
62Summary
- Grids
- Parallel, distributed systems concerned with
resource sharing, selection, aggregation - Resource availability, capability, performance,
cost, and user QoS requirements are considered - Self-configuring, self-tuning, self-healing
- Idle cycle and storage utilization
- P2P
- Distributed systems that take advantage of
resources scattered throughout the Internet - Decentralized, non-hierarchical node organization
- Concerned with fault-tolerance, scalability,
availabilityetc. - Idle cycle and storage utilization
63Summary cont.
- Grid
- Distributed computation
- distributed.net
- SETI_at_home
- Data production / aggregation
- P2P
- Distributed file sharing
- Gnutella, KaZaA
- Distributed computation
- distributed.net
- Anonymity
- Freenet, Publius
64Outline
- Introduction
- Grid Computing
- P2P Systems
- Fallacies preventing cooperation
- Shared and Disjoint Problems
- Conclusions
What they are, Goals, Manifestations
65Fallacies preventing cooperation
- The technical problems in Grid systems are
different from those in p2p systems - Usage misconception Grid for computing problems,
P2P for file sharing - Data handling and data production in Grid systems
has become important - P2P used in desktop collaboration and network
computation - open problems in both camps have striking
similarities
66Fallacies preventing cooperation
- While the technical problems are similar, the
architectures (physical topology, bandwidth
availability and use, trust model, etc.) demand
that the specific solutions be fundamentally
different - Solving common problems through sharing good
ideas from each community - Application dependent special requirements
tailored to application needs, however the
technical approaches for solving a particular
problem could benefit both communities
67Fallacies preventing cooperation
- Grid projects do not have the flexibility to try
new algorithms/ideas because they have to get
real work done. P2P research is all about this
flexibility - Grid has room for flexible research, too
- Testing new applications and protocols
- Users willing to adopt different technologies to
get the work done
68Outline
- Introduction
- Grid Computing
- P2P Systems
- Fallacies preventing cooperation
- Shared and Disjoint Problems
- Conclusions
What they are, Goals, Manifestations
69Shared problems
- Topology Formation
- Node join and neighbor discovery
- Work has been done by both groups
- Grid On fully decentralized resource discovery
in grid environments - P2P Self-organization in p2p systems
- Grid infrastructure in not flexible hard coded
- Could benefit from P2P research prototypes
70Shared problems cont.
- Utilization
- Resource discovery, data retrieval
- P2P hash-based look-up schemes are useful
- Resource management / optimization
- How to best utilize resources in a network
- Data replication/caching examined by both
communities - Scheduling and handling of contention
- P2P focus bandwidth usage (e.g. Gnutella)
- Grid focus scheduling
- Load balancing break large tasks into
distributed smaller ones
71Shared problems cont.
- Coping with Failure
- P2P lossy storage model (Freenet, Gnutella)
- Considerations for Grid adaptability
- Different common loss model
- Storage size (order of half a petabyte/month)
- Security-related issues
- Authenticity verification of data/computation
- Availability resilience to DoS attacks
- Authorization ACLs
72Shared problems cont.
- Maintenance
- P2P essentially no standards or APIs
- Efforts by Berkeley BOINC, Google Compute,
overlay standardization - Grid pushes for a standardized API
- GGF (Global Grid Forum)
- OGSA (Open Grid Services Architecture)
- Web services oriented API Globus as reference
implementation
73Disjoint Problems
- Anonymity
- Not really useful for Grid systems, yet
74Conclusions
- A lot of overlap between the goals and research
interests of the two communities - P2P community needs to consider the needs of the
Grid users to see how existing research can be
applied successfully to Grid problems - Aim for common standards as much as possible
75The Capacity of Wireless Networks
- Piyush Gupta, P. R. Kumar
76Outline
- Introduction
- Arbitrary Networks
- Protocol and Physical Model
- Upper bound on transport capacity
- Constructive lower bound on transport capacity
- Random Networks
- Protocol and Physical Model
- Constructive lower bound on throughput capacity
- Possible Implications
- Discussion of tradeoffs
- Conclusion
77Introduction
- Ad-hoc wireless networks
- No centralized control
- Each node involved in routing scheme
- Problems
- Network layer routing
- MAC varying network topology, decentralization
- TDMA too complex no centralized control
- FDMA inefficient in dense networks
- CDMA difficult to implement
- Random access preferred
78Introduction cont.
- Sharing channels - hidden and exposed
terminal problem MACA, MACAW use handshake
signals to alleviate part of these problems - Physical layer
- power regulated to minimize interference
- Exploring the capacity of wireless networks
- n nodes deployed in a 1 sq. meter region
- average distance between source and destination
is L-bar - Bandwidth each node can transmit at W bps over
common wireless channel - Multi-hop transmission with buffering
- Two types
- Arbitrary Networks
- Random Networks
79Arbitrary Networks
- Nodes are arbitrarily distributed over a unit
area disc - Destination is arbitrary
- Rate is arbitrary
- Transmission range is arbitrary
- How can we model if a transmission was received
successfully by the receiver? - Two models Protocol Model, Physical Model
80Protocol Model
- Transmission from Xi to Xj is successful if for
every node Xk transmitting simultaneously - where Xi denotes the location of a node, and ?
is the guarding zone specified by the protocol
81Physical Model
- - subset of nodes
transmitting simultaneously at some time instant
over a certain sub-channel - Pk power level chosen at node Xk
- A transmission originating at node Xi is
successfully received at node Xj if - ß minimum signal-to-interference ratio
necessary for successful reception - N ambient noise power level
- a 2
- Signal power decays with distance as 1/ra
82Transport Capacity of Arbitrary Networks
- Bit-meter a bit transported a distance of 1m
- used as indicator of a networks transport
capacity - Protocol Model
- Main result
- If this capacity is divided between the n nodes,
we have - for each node
- For equidistant destinations, the throughput
capacity is
83- Physical Model
- Main result
- is feasible
- is not
feasible - for appropriate values of c, c
84Upper bound on transport capacity
- Assumptions
- There are n nodes arbitrarily located in a planar
disk of unit area - The network transports lnT bits over T seconds
(each node generates bits at rate l bps) - The average distance between source and
destination of a bit is L - Transmissions are slotted into synchronized slots
of length t seconds
85Upper bound on transport capacity
- Protocol Model
- Physical Model
- If Pmax/Pmin
86Constructive lower bound on transport capacity
- Theorems and Lemmas that show a scenario where
the order of the upper bound presented earlier is
achieved - There exists a placement of nodes and assignment
of traffic patterns such that the network can
achieve -
- under Protocol Model, and
- under Physical Model
- Proofs are in the paper
87Outline
- Introduction
- Arbitrary Networks
- Protocol and Physical Model
- Upper bound on transport capacity
- Constructive lower bound on transport capacity
- Random Networks
- Protocol and Physical Model
- Constructive lower bound on throughput capacity
- Possible Implications
- Discussion of tradeoffs
- Conclusion
88Random Networks
- n nodes randomly located on the surface of a
sphere of area 1 sq. meter (S2), or disk of area
1 sq. meter in the plane - independently and uniformly distributed
- randomly chosen destination with send rate l(n)
bps - assumptions all nodes are homogeneous (all
transmissions employ the same nominal range or
power) - Two models
- Protocol model, Physical model
89Protocol Model
- A transmission from Xi reaches Xj successfully if
for every other Xk transmitting, the following
holds - 1.
- 2.
- where Xi represents the location of a node and r
is the common range -
90Physical Model
- - subset of nodes transmitting
simultaneously at some time instant over a
certain sub-channel - Let P be the common power level
- Then, a transmission from a node Xi is
successfully received by node Xj if
91Throughput Capacity of Random Networks
- Feasible throughput
- if a transmission schedule can be achieved such
that every node can send l(n) bits/sec on average
to its destination node - depends on the location of nodes (random)
- Result
- Protocol model
- Physical model
92Constructive lower bound on throughput capacity
- Goal show that virtual channel capacity
guarantee of each source-destination pair of
randomly located nodes is - with probability approaching 1 as
for c 0 - Steps
- define a Voronoi tessellation of S² where each
cell is carefully chosen in relation to the
number of nodes - bound the number of interfering neighbors of a
Voronoi cell - bound the length of an all-cell transmission
schedule - define the routes of a packet in the Voronoi
tessellation - prove that each cell contains at least one node
- calculate the expected routes that pass through a
cell and infer the expected traffic of each node
93Outline
- Introduction
- Arbitrary Networks
- Protocol and Physical Model
- Upper bound on transport capacity
- Constructive lower bound on transport capacity
- Random Networks
- Protocol and Physical Model
- Constructive lower bound on throughput capacity
- Possible Implications
- Discussion of tradeoffs
- Conclusion
94Possible Implications
- Results allow for a perfect scheduling algorithm
that knows the location of all nodes and traffic
demands, and coordinates the wireless
transmissions temporally and spatially to avoid
collisions (however, if the nodes are mobile or
location information is not available, the
capacity can only be smaller) - As the number of nodes increases, the throughput
decreases - Feasible scenario If communication occurs only
between nearby nodes, the bit rate does not
decrease with n - Scaled distance between source and destination is
O(1/sqrt(n)) meters - Power consumption
- Faster rate of decay of signal power with
distance allows greater transport and throughput
capacity
95Implications cont.
- Division of labor is possible
- One node in a cell can be designated to relay
multi-hop packets, if desired - Tradeoffs upper bound on throughput
- Conflict between reducing the number of hops and
increase spatial concurrency and frequency reuse - Must reduce r(n) to smallest value possible
without losing connectivity
96Tradeoffs cont.
- Arbitrary Networks under the Protocol model
- Constraints that determine the transport capacity
to be at most - The length of routes
- Consumption of two-dimensional area by
transmission - Total number of nodes
97Conclusions
- Designers may want to consider designing networks
with small number of nodes - Communication with nearby nodes at constant bit
rates can be provided in a dense clusters of
nodes, since the source destination distance
shrink as O(1/sqrt(n))
98Appendix A spatial tessellation
- Voronoi tessellation of the surface of the S²
sphere - A Voronoi cell is the set of all points which are
closer to ai than to any of the other ajs - Adjacent cells share a common point
- Every node in a cell is within distance r(n) of
every node in own cell or adjacent cell - Interfering neighbors a point in one cell is
within a distance (2?)r(n) of some point in the
other cell
99Tessellation Properties
- For each e0, there is a Voronoi tessellation
such that Each cell contains a disk of radius e
and is contained in a disk of radius 2e - Every Voronoi cell contains a disk of area
- with radius r(n)
- Every Voronoi cell is contained in a disk of
radius 2r(n)
100Bound on number of interfering neighbors of a cell
- Every cell in Vn has no more than c1 interfering
neighbors - c1 f(?) and grows linearly in (1?)²
- Allows construction of a schedule of bounded
length - Each cell in the tessellation Vn has an
opportunity to transmit every 1c1 slots such
that transmission is successful within a r(n)
distance from the transmitter (in the Protocol
Model)