Title: p2p06
1Topics in Database Systems Data Management in
Peer-to-Peer Systems
2Agenda ??a s?µe?a
1. S??t?µ? pe?????? ??a replication se ad?µ?ta
p2p s?st?µata 2. ?p?d?µ???? ???????µ?? ??a
???µe??se?? ??t????f?? (Demers et al paper µ?a
efa?µ???) 3. ???a pa?ade??µata ad?µ?t??
s?st?µ?t?? a. GIA b. KAZAA c. Bittorent ???
ep?µe?? ??µpt? Freenet, Pastry, eDonkey ??a
pa??de??µa µ?a p2p database (PIER)
3Reasons for Replication
- Performance
- load balancing
- locality place copies close to the requestor
- geographic locality (more choices for the next
step in search) - reduce number of hops
- Availability
- In case of failures
- Peer departures
4Replication Theory Replica Allocation Policies
in Unstructured P2P Systems
E. Cohen and S. Shenker, Replication Strategies
in Unstructured Peer-to-Peer Networks. SIGCOMM
2002 Q. Lv et al, Search and Replication in
Unstructured Peer-to-Peer Networks, ICS02
Replication Part ?a? ta d?? a?af????ta? se
performance
5Replication Allocation Scheme
Question how to use replication to improve
search efficiency in unstructured networks?
How many copies of each object so that the
search overhead for the object is minimized,
assuming that the total amount of storage for
objects in the network is fixed
6Replication Theory - Model
Assume m objects and n nodes Each node capacity
?, total capacity R n ? How to allocate R
among the m objects? Determine ri number of
copies (distinct nodes) that hold a copy of i S
i1, m ri R (R total capacity) Also, pi ri/R
Fraction of total capacity allocated to
i Allocation represented by the vector (p1, p2,
. pm) (r1/R, r2/R, rm/R)
7Replication Theory - Model
Assume that object i is requested with relative
rates qi, we normalize it by setting S i1, m qi
1 For convenience, assume 1 ltlt ri ? n and that
q1 ? q2 ? ? qm
Map the query distribution q to an allocation
vector p
Bounds for pi At least one copy, ri ? 1, Lower
value l 1/R At most n copies, ri ? n, Upper
value, u n/R
8Replication Theory
Assume that searches go on until a copy is
found We want to determine ri that minimizes the
average search size (number of nodes probed) to
locate an item i Need to compute average search
size per item Searches consist of randomly
probing sites until the desired object is found
search at each step draws a node uniformly at
random and asks whether it has a copy
9Replication Theory
Ai Expectation (average search size) for object
i is the inverse of the fraction of sites that
have replicas of the object Ai n/ri The
average search size A of all the objects (average
number of nodes probed per object query) A Si
qi Ai n Si qi/ri
Minimize A n Si qi/ri
10Replication Theory
Minimize Si qi/pi Subject to Spi 1 and l ? pi
? u
Monotonicity Since q1 ? q2 ? ? qm, we must
have p1 ? p2 ? ? pm More copies to more
popular, but how many?
11Uniform Replication
Create the same number of replicas for each
object ri R/m Average search size for uniform
replication Ai n/ri m/? Auniform Si qi m/?
m/? (m n/R) Which is independent of the query
distribution
12Proportional Replication
Create a number of replicas for each object
proportional to the query rate ri R qi
13Proportional Replication
Create a number of replicas for each object
proportional to the query rate ri R qi
Number of replicas for each object ri R
qi Average search size for uniform
replication Ai n/ri n/R qi Aproportioanl Si
qi n/R qi m/? Auniform again independent of
the query distribution Why? Objects whose query
rate are greater than average (gt1/m) do better
with proportional, and the other do better with
uniform The weighted average balances out to be
the same
14Uniform and Proportional Replication
- Summary
- Uniform Allocation pi 1/m
- Simple, resources are divided equally
- Proportional Allocation pi qi
- Fair, resources per item proportional to demand
- Reflects current P2P practices
15Space of Possible Allocations
- Definition Allocation p1, p2, p3,, pm is
in-between Uniform and Proportional if
- for 1lt i ltm, q i1/q i lt p i1/p i lt 1
- (1 for uniform, for proportial, we want to
favor popular but not too much) - Theorem1 All (strictly) in-between strategies
are (strictly) better than Uniform and
Proportional
Theorem2 p is worse than Uniform/Proportional
if for all i, p i1/p i gt 1 (popular gets
less) OR for all i, q i1/q i gt p i1/p i (less
popular gets less than fair share)
Proportional and Uniform are the worst
reasonable strategies
16Square-Root Replication
Find ri that minimizes A, A Si qi Ai n Si
qi/ri This is done for ri ? vqi where ? R/Si
vqi Then the average search size is Aoptimal
1/? (Si vqi)2
17How much can we gain by using SR ?
Zipf-like query rates
Auniform/ASR
18Other Metrics Discussion
- Utilization rate, the rate of requests that a
replica of an object i receives - Ui R qi/ri
- For uniform replication,
- all objects have the same average search size,
- but replicas have utilization rates proportional
to their query rates - Proportional replication achieves perfect load
balancing with all replicas having the same
utilization rate, - but average search sizes vary with more popular
objects having smaller average search sizes than
less popular ones
19Replication Summary
20Assumption that there is at least one copy per
object
- Query is soluble if there are sufficiently many
copies of the item. - Query is insoluble if item is rare or non
existent.
- What is the search size of a query?
- Soluble queries number of probes until answer is
found. - Insoluble queries maximum search size
21- SR is best for soluble queries
- Uniform minimizes cost of insoluble queries
What is the optimal strategy?
OPT is a hybrid of Uniform and SR Tuned to
balance cost of soluble and insoluble queries
uniformly allocate a minimum number of copies
per item, use SR for the rest
22We now know what we need.
How do we get there?
23Replication Algorithms
- Uniform and Proportional are easy
- Uniform When item is created, replicate its key
in a fixed number of hosts. - Proportional for each query, replicate the key
in a fixed number of hosts (need to know or
estimate the query rate)
Desired properties of algorithm
- Fully distributed where peers communicate through
random probes minimal bookkeeping and no more
communication than what is needed for search. - Converge to/obtain SR allocation when query rates
remain steady.
24Replication Algorithms
- Uniform and Proportional are easy
- Uniform When item is created, replicate its key
in a fixed number of hosts. - Proportional for each query, replicate the key
in a fixed number of hosts (need to know or
estimate the query rate)
25Replication Algorithms
Desired properties of algorithm
- Fully distributed where peers communicate through
random probes minimal bookkeeping and no more
communication than what is needed for search. - Converge to/obtain SR allocation when query rates
remain steady.
26Achieving Square-Root Replication
- How can we achieve square-root replication in
practice? - Assume that each query keeps track of the search
size - Each time a query is finished the object is
copied to a number of sites proportional to the
number of probes - On average object i will be replicated on c n/ri
times each time a query is issued (for some
constant c) - It can be shown that this gives square root
27Achieving Square-Root Replication
What about replica deletion? Steady state
creation time equal with the deletion time The
lifetime of replicas must be independent of
object identity or query rate FIFO or random
deletions is ok LRU or LFU no
28Replication
Thus, for Square-root replication an object
should be replicated at a number of nodes that
is proportional to the number of probes that the
search required
29Replication - Implementation
Two strategies are popular Owner
Replication When a search is successful, the
object is stored at the requestor node only (used
in Gnutella) Path Replication When a search
succeeds, the object is stored at all nodes along
the path from the requestor node to the provider
node (used in Freenet) Following the reverse path
back to the requestor
30Replication - Implementation
If a p2p system uses k-walkers, the number of
nodes between the requestor and the provider node
is 1/k of the total nodes visited (number of
probes) Then, path replication should result in
square-root replication Problem Tends to
replicate nodes that are topologically along the
same path
31Replication - Implementation
Random Replication When a search succeeds, we
count the number of nodes on the path between the
requestor and the provider Say p Then, randomly
pick p of the nodes that the k walkers visited to
replicate the object Harder to implement
32Experimental Evaluation
Both path and random replication generates
replication ratios quite close to square-root of
query rates
Path replication and random replication reduces
the overall message traffic by a factor of 3 to 4
respectively
Much of the traffic reduction comes from reducing
the number of hops
Path and random, better than owner For example,
queries that finish with 4 hops, 71 owner, 86
path, 91 random
33Replication Unstructured P2Pepidemic
algorithms
34Reasons for Replication
Besides storage, cost associated with
replication Consistency Maintenance
35Methods for spreading updates Push originate
from the site where the update appeared To reach
the sites that hold copies Pull the sites
holding copies contact the master site Epidemics
for spreading updates
36A. Demers et al, Epidemic Algorithms for
Replicated Database Maintenance, SOSP 87
Update at a single site Randomized algorithms
for distributing updates and driving replicas
towards consistency Ensure that the effect of
every update is eventually reflected to all
replicas Sites become fully consistent only when
all updating activity has stopped and the system
has become quiescent Analogous to epidemics
37Methods for spreading updates Direct mail
(server-initiated) each new update is
immediately mailed from its originating site to
all other sites () Timely reasonably
efficient (-) Not all sites know all other sites
(stateless) (-) Mails may be lost Anti-entropy
every site regularly chooses another site at
random and by exchanging content resolves any
differences between them () Extremely reliable
but requires exchanging content and resolving
updates (-) Propagates updates much more slowly
than direct mail
38- Methods for spreading updates
- Rumor mongering
- Sites are initially ignorant when a site
receives a new update it becomes a hot rumor - While a site holds a hot rumor, it periodically
chooses another site at random and ensures that
the other site has seen the update - When a site has tried to share a hot rumor with
too many sites that have already seen it, the
site stops treating the rumor as hot and retains
the update without propagating it further - Rumor cycles can be more frequent that
anti-entropy cycles, because they require fewer
resources at each site, but there is a chance
that an update will not reach all sites
39- Anti-entropy and rumor spreading are examples of
epidemic algorithms - Three types of sites
- Infective A site that holds an update that is
willing to share is hold - Susceptible A site that has not yet received an
update - Removed A site that has received an update but
is no longer willing to share - Anti-entropy simple epidemic where all sites are
always either infective or susceptible
40?? paper a?af??eta? se a?ta??a?? ???? t??
pe??e??µ???? t?? ??µß??
A set S of n sites, each storing a copy of a
database The database copy at site s ? S is a
time varying partial function s.ValueOf K ?
uV x t T set of keys set of values
set of timestamps (totally ordered by lt V
contains the element NIL s.ValueOfk NIL, t
item with k has been deleted from the
database Assume, just one item s.ValueOf ? uV
x tT thus, an ordered pair consisting of a
value and a timestamp The first component may be
NIL indicating that the item was deleted by the
time indicated by the second component
41- The goal of the update distribution process is to
drive the system towards - s, s ?S s.ValueOf s.ValueOf
- Operation invoked to update the database
- UpdateuV s.ValueOf r, Now)
42Direct Mail
At the site s where an update occurs For each
s ? S PostMailtos, msg(Update, s.ValueOf)
s originator of the update s receiver of the
update
Each site s receiving the update message
(Update, (u, t)) If s.ValueOf.t lt t
s.ValueOf ? (u, t)
- The complete set S must be known to s (stateful
server) - PostMail messages are queued so that the server
is not delayed (asynchronous), but may fail when
queues overflow or their destination are
inaccessible for a long time - n (number of sites) messages per update
- traffic proportional to n and the average
distance between sites
43Anti-Entropy
At each site s periodically execute For some s
? S ResolveDifferences, s
s pushes its value to s
s ? s
Three ways to execute ResolveDifference Push
(sender (server) - driven) If s.Valueof.t gt
s.Valueof.t s.ValueOf ? s.ValueOf Pull
(receiver (client) driven) If s.Valueof.t lt
s.Valueof.t s.ValueOf ? s.ValueOf Push-Pull
s.Valueof.t gt s.Valueof.t ? s.ValueOf ?
s.ValueOf s.Valueof.t lt s.Valueof.t ? s.ValueOf
? s.ValueOf
s pulls s and gets s value
44Anti-Entropy
- Assume that
- Site s is chosen uniformly at random from the
set S - Each site executes the anti-entropy algorithm
once per period
- ?p?de????eta? ?t?,
- An update will eventually infect the entire
population - ?e?????ta? ap? ??a? µ???sµ??? (infected) ??µß?,
a?t? ep?t?????eta? se ????? a?????? to the log of
the population size - ? sta?e?? t?? a?a????a? e?a?t?ta? ap? t? a? ?a
???s?µ?p????e? push ? pull
45Anti-Entropy
Let pi be the probability of a site remaining
susceptible (has not received the update) after
the i cycle of anti-entropy (?????µe ?a te??e?
st? 0 ?s? t? d??at?? p?? ??????a) For pull, A
site remains susceptible after the i1 cycle, if
(a) it was susceptible after the i cycle and (b)
it contacted a susceptible site in the i1
cycle pi1 (pi)2 For push, A site remains
susceptible after the i1 cycle, if (a) it was
susceptible after the i cycle and (b) no
infectious site choose to contact in the i1
cycle pi1 pi (1 1/n)n(1-pi) pi1 pi e-1
1 1/n (site is not contacted by a node) n(1-pi)
number of infectious nodes at cycle i
Pull is preferable than push
46Anti-Entropy
- Te??e? ?t? ?? ??µß?? a?ta???ss??? ??? t?
pe??e??µe?? t??? ?p?te ?p???e? t? ??µa t?
st?????µe st? d??t?? ?a? p?? s????????µe ta
st??µ??t?pa - Use checksums
- Ok, a? ta checksums s?????? s?µf?????
- A list of recent updates for which (now
timestamp) lt threshold t - Compare fist recent updates, update databases and
the ckecksums and then compare the updated
checksums, choice of t - Maintain an inverted list of updates ordered by
timestamp - Perform anti-entropy by exchanging timestamps at
reverse timestamp order until their checksums
agree
47Complex Epidemics Rumor Spreading
- Initial State n individuals initially inactive
(susceptible) - Rumor plantingspreading
- We plant a rumor with one person who becomes
active (infective), phoning other people at
random and sharing the rumor - Every person bearing the rumor also becomes
active and likewise shares the rumor - When an active individual makes an unnecessary
phone call (the recipient already knows the
rumor), then with probability 1/k the active
individual loses interest in sharing the rumor
(becomes removed) - We would like to know
- How fast the system converges to an inactive
state (no one is infective) - The percentage of people that know the rumor
when the inactive state is reached
48Complex Epidemics Rumor Spreading
Let s, i, r be the fraction of individuals that
are susceptible, infective and removed s i r
1 ds/dt - s i di/dt s i 1/k(1-s) i s e
(k1)(1-s) An exponential decrease of s with
k For k 1, 20 miss the rumor For k 2, only
6 miss it
Unnecessary phone calls
49Criteria to characterize epidemics
- Residue
- The value of s when i is zero the remaining
susceptible when the epidemic finishes - Traffic
- m Total update traffic / Number of sites
- Delay
- Average delay (tavg) difference between the
time of the initial injection of an update and
the arrival of the update at a given site
averaged over all sites - The delay until (tlast) the reception by the
last site that will receive the update during an
epidemic
50Simple variations of rumor spreading
Blind vs. Feedback Feedback variation a sender
loses interest only if the recipient knows the
rumor Blind variation a sender loses interest
with probability 1/k regardless of the
recipient Counter vs. Coin Instead of losing
interest with probability 1/k, use a counter so
that we loose interest only after k unnecessary
contacts s e-m There are nm updates sent The
probability that a single site misses all these
updates is (1 1/n)nm
m is the traffic
??e? t?? ?d?a s??s? µeta?? traffic ?a? residue
Counters and feedback improve the delay, with
counters playing a more significant role
51Simple variations of rumor spreading
Push vs. Pull Pull converges faster If there are
numerous independent updates, a pull request is
likely to find a source with a non-empty rumor
list If the database is quiescent, the push
phase ceases to introduce traffic overhead,
while the pull continues to inject useless
requests for updates
Counter, feedback and pull work better
52- Minimization
- Use a push and pull together, if both sites know
the update, only the site with the smaller
counter is incremented - Connection Limit
- A site can be the recipient of more than one push
in a cycle, while for pull, a site can service an
unlimited number of requests - What if we set a limit
- Push gets better (reduce traffic, since the
spread grows exponentially, most traffic occurs
at the end) - Pull gets worst
53Hunting If a connection is rejected, then the
choosing site can hunt for alternate
sites push and pull similar if connection
limit 1 and infinite hunt
54Complex Epidemic and Anti-entropy
Anti-entropy can be run infrequently to back-up a
complex epidemic, so that every update eventually
reaches (or is suspended at) every site What
happens when an update is discovered during
anti-entropy use rumor mongering (e.g., make it
a hot rumor) or direct mail
55Deletion and Death Certificates
Replace deleted items with death certificates
which carry timestamps and spread like ordinary
data When old copies of deleted items meet death
certificates, the old items are removed. But
when to delete death certificates?
56Dormant Death Certificates
Define some threshold (but some items may be
resurrected re-appear) If the death
certificate is older than the expected time
required to propagate it to all sites, then the
existence of an obsolete copy of the
corresponding data item is unlikely Delete very
old certificates at most sites, retaining
dormant copies at only a few sites (like
antibodies) Use two thresholds, t1 and t2 a
list of r retention sites names with each death
certificate (chosen at random when the death
certificate is created) Once t1 is reached, all
servers but the servers in the retention list
delete the death certificate Dormant death
certificates are deleted when t1 t2 is reached
57Anti-Entropy with Dormant Death Certificates
Whenever a dormant death certificate encounters
an obsolete data item, it must be activated
58Spatial Distribution
How to choose partners Consider spatial
distributions in which the choice tends to favor
nearby servers
59Spatial Distribution
The cost of sending an update to a nearby site is
much lower that the cost of sending the update to
a distant site Favor nearby neighbors Trade off
between Average traffic per link and Convergence
times Example linear network, only nearest
neighbor O(1) and O(n) vs uniform random
connections O(n) and O(log n) Determine the
probability of connecting to a site at distance
d For spreading updates on a line, d-2
distribution the probability of connecting to a
site at distance d is proportional to d-2 In
general, each site s independently choose
connections according to a distribution that is a
function of Qs(d), where Qs(d) is the cumulative
number of sites at distance d or less from s
60Spatial Distribution and Anti-Entropy
Extensive simulation on the actual topology with
a number of different spatial distributions A
different class of distributions less sensitive
to sudden increases of Qs(d) Let each site s
build a list of the other sites sorted by their
distances from s Select anti-entropy exchange
partners from the sorted list according to a
function f(i), where i is its position on the
list (averaging the probabilities of selecting
equidistant sites) Non-uniform distribution
induce less overload on critical links
61Spatial Distribution and Rumors
Anti-entropy converges with probability 1 for a
spatial distribution such that for every pair
(s, s) of sites there is a nonzero probability
that s will choose to exchange data with
s However, rumor mongering is less robust
against changes in spatial distributions and
network topology As the spatial distribution is
made less uniform, we can increase the value of k
to compensate