Title: Replication Strategies in Unstructured PeertoPeer Networks
1Replication Strategies in Unstructured
Peer-to-Peer Networks
- Edith Cohen _at_ ATT
- Scott Shenker _at_ ICSI
- March 26, 2003
- UNC COMP 290-86
2What are the types of P2P Networks?
- Three Structural Classifications
- Centralized
- Structured Decentralized
- Unstructured Decentralized
3Centralized
- The network has a central index that all peers
contact - Ex. Napster
- Think of as DNS / directory
- Advantages Minimize network traffic, search
whole network quickly - Disadvantages Central point of failure,
scalability, not a true p2p
Query
Response
Request
4Structured Decentralized
- No central point
- Nodes organize themselves in relation to data
- Ex. Chord, Pastry, CAN
- Advantages true p2p, expected search times, load
balancing - Disadvantages Extra overhead for structure,
discovering keys
5Unstructured Decentralized
- No central point
- No relation between topology and data
- Most popular system
- Ex. Gnutella, Kazaa
- Advantages true p2p, low cost to create
- Disadvantages Overall data knowledge,
efficiency
6Fundamental Point of Unstructured
- Hosts receive a query with no relationship to its
content - Searches are equivalent to randomly probing
network - Goal is to minimize number of hosts contacted to
resolve query
7How to minimize?
- One way Replication!!
- Can replicate either when an item is stored or
searched for - FastTrack protocol uses super nodes that
replicate indices of other nodes - Note Paper says Gnutella does not support, but
several clients have UltraPeer / SuperNode/
Reflector technology
8Replication Strategies
- Even though FastTrack searches are faster, it has
the same replication strategy as vanilla
Gnutella - number of index entries per item is
proportional to number of hosts with item - Fundamental QuestionWhat is the optimal
replication strategy?
9Metric for Replication Strategies
- Expected search size for successful queries
- Performance on insoluble queries
- Related to maximum search size
10Two Natural Strategies
- Uniform - replicate everything equally
- Minimizes maximum search size
- Proportional - replicate based on search
popularity - Minimizes search size for more queries
- Minimizes maximum utilization rate
11Optimal Strategy
- Proportional and Uniform are the two extremes of
strategy range - Both have same expected search size!
- However, all the strategies in between are better
- Will show Square-root minimizes expected
successful search size - Optimal strategy between Uniform and Square-root
12Model Legend
- Number of nodes in network
- Capacity of node (homogeneous)
- Total capacity of network
- Number of distinct items in network
- Number of copies of ith item
- Fraction of total capacity
- allocation
- Fraction of queries for ith item
- query rates with
13Model (cont.)
- A replication strategy is a mapping from the
query rate distribution to the allocation - Minimum fraction of total copies
- Maximum fraction of total copies
-
14Expected Search Size (ESS)
- The distribution of finding an item is geometric
with probability ofand therefore an expected
value of - ESS
- Want to find allocation that minimizes
15Bounded Search Size
- L Maximum search size
- An item is locatable if found by a search with
high probability - The probability for an unsuccessful search of a
locatable object is - The cost of insoluble queries is related to L
- is the solution that minimizes L
16Heterogeneity
- Model assumes constant capacity and bandwidths
- Variations can be accounted for by using mean of
capacities weighted by visitation
17Two Allocation Strategies
- An allocation is defined when
- An allocation is uniform when
- An allocation is proportional when
- Lemma 3.1Both have the same ESS independent
of query distribution
18Characterizing Allocations
- Choose two items
- Let
- Uniform Proportional
- ESS proportional to
- Use 1st derivative to find minimum
19(No Transcript)
20Allocation Space
- An allocation lies between uniform and
proportional if - Theorem 3.1 ESS is less for allocations between
uniform and proportional - Lemma 3.2 ESS is larger for allocations outside
this range
21Sketch of Theorem 3.1
- The solution space is the bounded area with
constrained by and - The ESS function is at a maximum
when at a vertex - Uniform or Proportional at that point
- If a vertex were to be a hybrid of these,
allocations with larger ESS could be constructed
iteratively until one is reached
22Square-root Allocation
- A square-root allocation has
- Square-root allocation minimizes ESS
- is the exact value
- The gain factor is
- Lemma 4.2
23Gain Factors for Query Distributions
Zipf-like query distributions withith item
proportional to i-w
24Gain Factor for Real-life Distributions
25Square-root Allocation
- Fix the set of locatable items and maximum search
size - Square-root may not be defined
- Square-root minimizes the ESS for these
constraints - Closer to Uniform if insoluble dominate
- Closer to Square-root if few insoluble
26Square-root Construction
- Lemma 5.1 There exists a unique allocation p
s.t.1)2)This allocation minimizes the ESS
27Proportional Allocation
- When fixed as before, proportional may not be
defined - Proportional allocation is the unique allocation
satisfying the following conditions -
-
28Replication Algorithms
- These protocols perform two main tasks
- Creation Requesting node creates C copies of
item after successfully querying for it - Deletion Mechanisms may vary, but the more
recent of two copies has the lower probability of
being deleted at an instance
29Deletion in Existing Networks
- Copy deletion occurs in two ways
- Node going offline - content unavailable
- Replacement Policy
- Replacement should be independent of queries
- Least Recently Used (LRU) and Least Frequently
Used (LFU) do not satisfy - First In First Out (FIFO), fixed lifetime, and
random do satisfy
30Steady State
- System is in steady state when lifetime
distributions of items do not change with time - Let ltCigt be the average value of C for item i
- Claim 6.1 If ltCigt/ ltCjgt remains fixed, then pi
/ pj approaches qiltCigt/ qj ltCjgt - Proportional when same C values for all
31Designing Square-root Allocations
- The challenging task is designing square-root
algorithms with only local information and no
exterior protocol - Corollary 6.1
- Propose three algorithms to achieve this
- Path Replication
- Replication with Sibling-Number Memory
- Probe Memory
32Path Replication
- Remember the ESS Ai for an item is inversely
proportional to pi, which is proportional to
qiltCigt - If Ci is set to Ai, then at the fixed point they
are proportional to 1/sqrt(qi) - Let ltAigt be the fixed point value. Then new
copies are generated at Ai / ltAigt of optimal rate - May overshoot - converge slowly
33Replication with Sibling-Number Memory
- At a given time, search size may not be a good
indicator for query rate - Need additional bookkeeping to stay close to
fixed point - Sibling-Number Memory (SNM)
- Each copy has record of number of sibling copies
generated at the time and time print - Age of copy related to survival-rate
34Replication w/ SNM (cont.)
- Let d be the number of sibling copies generated
after some past query a and be the survival
rate, with each copy storing these values - Over some time T, is an estimator
for the number of requests for the item, with PT
the set of copies within this range - is the expectation of over
copies
35Replication with Probe Memory
- Each node records the number of recent probes
per item and the cumulative search size - Probe rate ri qiAi
- Can estimate qi based on estimates for Ai and ri
(could aggregate over several nodes) - The natural node set is a query path
36Replication w/ PM (cont.)
- Let kv be the number of probes v has witnessed
and Sv be the sum of search sizes. Stored for
each item. - Let V be a set of nodes
37Optimal Algorithm
- Remember that Square-root is not the optimal
algorithm - The optimal algorithm is a hybrid of uniform and
square-root - Uniform marks with nodes should have permanent
copies of item - Square-root indicates which nodes should have
transient copies - Parameters need to be adjusted to balance cost of
insoluble / soluble
38Simulation
- The square-root allocation algorithms for path
replication and replication w/ SNM were simulated
to illustrate convergence - 10,000 nodes with fixed lifetimes and queries
issued at fixed intervals
39(No Transcript)
40(No Transcript)
41Summary
- In unstructured peer-to-peer networks, the
natural strategies of uniform and proportional
perform equally, with all strategies between
performing better - For soluble queries, Square-root performs best
with the optimal between this and uniform in
general - Despite its nonlinear and non-local nature, three
Square-root algorithms are given
42Realism
- Like most peer-to-peer algorithms, question of
performance in realistic environment - Is their heterogeneity model accurate?How would
this correspond to a true Internet environment?
Mobile environment? - What is the effects of a dynamic network?
- How could this be related to structured p2p
networks?