Replication Strategies in Unstructured PeertoPeer Networks presentation

About This Presentation

Transcript and Presenter's Notes

Title: Replication Strategies in Unstructured PeertoPeer Networks

1
Replication Strategies in Unstructured
Peer-to-Peer Networks

Edith Cohen _at_ ATT
Scott Shenker _at_ ICSI
March 26, 2003
UNC COMP 290-86

2
What are the types of P2P Networks?

Three Structural Classifications
Centralized
Structured Decentralized
Unstructured Decentralized

3
Centralized

The network has a central index that all peers
contact
Ex. Napster
Think of as DNS / directory
Advantages Minimize network traffic, search
whole network quickly
Disadvantages Central point of failure,
scalability, not a true p2p

Query
Response
Request
4
Structured Decentralized

No central point
Nodes organize themselves in relation to data
Ex. Chord, Pastry, CAN
Advantages true p2p, expected search times, load
balancing
Disadvantages Extra overhead for structure,
discovering keys

5
Unstructured Decentralized

No central point
No relation between topology and data
Most popular system
Ex. Gnutella, Kazaa
Advantages true p2p, low cost to create
Disadvantages Overall data knowledge,
efficiency

6
Fundamental Point of Unstructured

Hosts receive a query with no relationship to its
content
Searches are equivalent to randomly probing
network
Goal is to minimize number of hosts contacted to
resolve query

7
How to minimize?

One way Replication!!
Can replicate either when an item is stored or
searched for
FastTrack protocol uses super nodes that
replicate indices of other nodes
Note Paper says Gnutella does not support, but
several clients have UltraPeer / SuperNode/
Reflector technology

8
Replication Strategies

Even though FastTrack searches are faster, it has
the same replication strategy as vanilla
Gnutella - number of index entries per item is
proportional to number of hosts with item
Fundamental QuestionWhat is the optimal
replication strategy?

9
Metric for Replication Strategies

Expected search size for successful queries
Performance on insoluble queries
Related to maximum search size

10
Two Natural Strategies

Uniform - replicate everything equally
Minimizes maximum search size
Proportional - replicate based on search
popularity
Minimizes search size for more queries
Minimizes maximum utilization rate

11
Optimal Strategy

Proportional and Uniform are the two extremes of
strategy range
Both have same expected search size!
However, all the strategies in between are better
Will show Square-root minimizes expected
successful search size
Optimal strategy between Uniform and Square-root

12
Model Legend

Number of nodes in network
Capacity of node (homogeneous)
Total capacity of network
Number of distinct items in network
Number of copies of ith item
Fraction of total capacity
allocation
Fraction of queries for ith item
query rates with

13
Model (cont.)

A replication strategy is a mapping from the
query rate distribution to the allocation
Minimum fraction of total copies
Maximum fraction of total copies

14
Expected Search Size (ESS)

The distribution of finding an item is geometric
with probability ofand therefore an expected
value of
ESS
Want to find allocation that minimizes

15
Bounded Search Size

L Maximum search size
An item is locatable if found by a search with
high probability
The probability for an unsuccessful search of a
locatable object is
The cost of insoluble queries is related to L
is the solution that minimizes L

16
Heterogeneity

Model assumes constant capacity and bandwidths
Variations can be accounted for by using mean of
capacities weighted by visitation

17
Two Allocation Strategies

An allocation is defined when
An allocation is uniform when
An allocation is proportional when
Lemma 3.1Both have the same ESS independent
of query distribution

18
Characterizing Allocations

Choose two items
Let
Uniform Proportional
ESS proportional to
Use 1st derivative to find minimum

19
(No Transcript)
20
Allocation Space

An allocation lies between uniform and
proportional if
Theorem 3.1 ESS is less for allocations between
uniform and proportional
Lemma 3.2 ESS is larger for allocations outside
this range

21
Sketch of Theorem 3.1

The solution space is the bounded area with
constrained by and
The ESS function is at a maximum
when at a vertex
Uniform or Proportional at that point
If a vertex were to be a hybrid of these,
allocations with larger ESS could be constructed
iteratively until one is reached

22
Square-root Allocation

A square-root allocation has
Square-root allocation minimizes ESS
is the exact value
The gain factor is
Lemma 4.2

23
Gain Factors for Query Distributions
Zipf-like query distributions withith item
proportional to i-w
24
Gain Factor for Real-life Distributions
25
Square-root Allocation

Fix the set of locatable items and maximum search
size
Square-root may not be defined
Square-root minimizes the ESS for these
constraints
Closer to Uniform if insoluble dominate
Closer to Square-root if few insoluble

26
Square-root Construction

Lemma 5.1 There exists a unique allocation p
s.t.1)2)This allocation minimizes the ESS

27
Proportional Allocation

When fixed as before, proportional may not be
defined
Proportional allocation is the unique allocation
satisfying the following conditions

28
Replication Algorithms

These protocols perform two main tasks
Creation Requesting node creates C copies of
item after successfully querying for it
Deletion Mechanisms may vary, but the more
recent of two copies has the lower probability of
being deleted at an instance

29
Deletion in Existing Networks

Copy deletion occurs in two ways
Node going offline - content unavailable
Replacement Policy
Replacement should be independent of queries
Least Recently Used (LRU) and Least Frequently
Used (LFU) do not satisfy
First In First Out (FIFO), fixed lifetime, and
random do satisfy

30
Steady State

System is in steady state when lifetime
distributions of items do not change with time
Let ltCigt be the average value of C for item i
Claim 6.1 If ltCigt/ ltCjgt remains fixed, then pi
/ pj approaches qiltCigt/ qj ltCjgt
Proportional when same C values for all

31
Designing Square-root Allocations

The challenging task is designing square-root
algorithms with only local information and no
exterior protocol
Corollary 6.1
Propose three algorithms to achieve this
Path Replication
Replication with Sibling-Number Memory
Probe Memory

32
Path Replication

Remember the ESS Ai for an item is inversely
proportional to pi, which is proportional to
qiltCigt
If Ci is set to Ai, then at the fixed point they
are proportional to 1/sqrt(qi)
Let ltAigt be the fixed point value. Then new
copies are generated at Ai / ltAigt of optimal rate
May overshoot - converge slowly

33
Replication with Sibling-Number Memory

At a given time, search size may not be a good
indicator for query rate
Need additional bookkeeping to stay close to
fixed point
Sibling-Number Memory (SNM)
Each copy has record of number of sibling copies
generated at the time and time print
Age of copy related to survival-rate

34
Replication w/ SNM (cont.)

Let d be the number of sibling copies generated
after some past query a and be the survival
rate, with each copy storing these values
Over some time T, is an estimator
for the number of requests for the item, with PT
the set of copies within this range
is the expectation of over
copies

35
Replication with Probe Memory

Each node records the number of recent probes
per item and the cumulative search size
Probe rate ri qiAi
Can estimate qi based on estimates for Ai and ri
(could aggregate over several nodes)
The natural node set is a query path

36
Replication w/ PM (cont.)

Let kv be the number of probes v has witnessed
and Sv be the sum of search sizes. Stored for
each item.
Let V be a set of nodes

37
Optimal Algorithm

Remember that Square-root is not the optimal
algorithm
The optimal algorithm is a hybrid of uniform and
square-root
Uniform marks with nodes should have permanent
copies of item
Square-root indicates which nodes should have
transient copies
Parameters need to be adjusted to balance cost of
insoluble / soluble

38
Simulation

The square-root allocation algorithms for path
replication and replication w/ SNM were simulated
to illustrate convergence
10,000 nodes with fixed lifetimes and queries
issued at fixed intervals

39
(No Transcript)
40
(No Transcript)
41
Summary

In unstructured peer-to-peer networks, the
natural strategies of uniform and proportional
perform equally, with all strategies between
performing better
For soluble queries, Square-root performs best
with the optimal between this and uniform in
general
Despite its nonlinear and non-local nature, three
Square-root algorithms are given

42
Realism

Like most peer-to-peer algorithms, question of
performance in realistic environment
Is their heterogeneity model accurate?How would
this correspond to a true Internet environment?
Mobile environment?
What is the effects of a dynamic network?
How could this be related to structured p2p
networks?

Write a Comment

User Comments (0)

About PowerShow.com

Replication Strategies in Unstructured PeertoPeer Networks PowerPoint PPT Presentation