p2p06 - PowerPoint PPT Presentation

About This Presentation
Title:

p2p06

Description:

2. ?p?d? ???? ??????? ?? ??a ??? e??se?? ??t????f?? (Demers et al ... Then, randomly pick p of the nodes that the k walkers visited to replicate the object ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 62
Provided by: ep58
Category:
Tags: p2p06 | walkers

less

Transcript and Presenter's Notes

Title: p2p06


1
Topics in Database Systems Data Management in
Peer-to-Peer Systems
2
Agenda ??a s?µe?a
1. S??t?µ? pe?????? ??a replication se ad?µ?ta
p2p s?st?µata 2. ?p?d?µ???? ???????µ?? ??a
???µe??se?? ??t????f?? (Demers et al paper µ?a
efa?µ???) 3. ???a pa?ade??µata ad?µ?t??
s?st?µ?t?? a. GIA b. KAZAA c. Bittorent ???
ep?µe?? ??µpt? Freenet, Pastry, eDonkey ??a
pa??de??µa µ?a p2p database (PIER)
3
Reasons for Replication
  • Performance
  • load balancing
  • locality place copies close to the requestor
  • geographic locality (more choices for the next
    step in search)
  • reduce number of hops
  • Availability
  • In case of failures
  • Peer departures

4
Replication Theory Replica Allocation Policies
in Unstructured P2P Systems
E. Cohen and S. Shenker, Replication Strategies
in Unstructured Peer-to-Peer Networks. SIGCOMM
2002 Q. Lv et al, Search and Replication in
Unstructured Peer-to-Peer Networks, ICS02
Replication Part ?a? ta d?? a?af????ta? se
performance
5
Replication Allocation Scheme
Question how to use replication to improve
search efficiency in unstructured networks?
How many copies of each object so that the
search overhead for the object is minimized,
assuming that the total amount of storage for
objects in the network is fixed
6
Replication Theory - Model
Assume m objects and n nodes Each node capacity
?, total capacity R n ? How to allocate R
among the m objects? Determine ri number of
copies (distinct nodes) that hold a copy of i S
i1, m ri R (R total capacity) Also, pi ri/R
Fraction of total capacity allocated to
i Allocation represented by the vector (p1, p2,
. pm) (r1/R, r2/R, rm/R)
7
Replication Theory - Model
Assume that object i is requested with relative
rates qi, we normalize it by setting S i1, m qi
1 For convenience, assume 1 ltlt ri ? n and that
q1 ? q2 ? ? qm
Map the query distribution q to an allocation
vector p
Bounds for pi At least one copy, ri ? 1, Lower
value l 1/R At most n copies, ri ? n, Upper
value, u n/R
8
Replication Theory
Assume that searches go on until a copy is
found We want to determine ri that minimizes the
average search size (number of nodes probed) to
locate an item i Need to compute average search
size per item Searches consist of randomly
probing sites until the desired object is found
search at each step draws a node uniformly at
random and asks whether it has a copy
9
Replication Theory
Ai Expectation (average search size) for object
i is the inverse of the fraction of sites that
have replicas of the object Ai n/ri The
average search size A of all the objects (average
number of nodes probed per object query) A Si
qi Ai n Si qi/ri
Minimize A n Si qi/ri
10
Replication Theory
Minimize Si qi/pi Subject to Spi 1 and l ? pi
? u
Monotonicity Since q1 ? q2 ? ? qm, we must
have p1 ? p2 ? ? pm More copies to more
popular, but how many?
11
Uniform Replication
Create the same number of replicas for each
object ri R/m Average search size for uniform
replication Ai n/ri m/? Auniform Si qi m/?
m/? (m n/R) Which is independent of the query
distribution
12
Proportional Replication
Create a number of replicas for each object
proportional to the query rate ri R qi
13
Proportional Replication
Create a number of replicas for each object
proportional to the query rate ri R qi
Number of replicas for each object ri R
qi Average search size for uniform
replication Ai n/ri n/R qi Aproportioanl Si
qi n/R qi m/? Auniform again independent of
the query distribution Why? Objects whose query
rate are greater than average (gt1/m) do better
with proportional, and the other do better with
uniform The weighted average balances out to be
the same
14
Uniform and Proportional Replication
  • Summary
  • Uniform Allocation pi 1/m
  • Simple, resources are divided equally
  • Proportional Allocation pi qi
  • Fair, resources per item proportional to demand
  • Reflects current P2P practices

15
Space of Possible Allocations
  • Definition Allocation p1, p2, p3,, pm is
    in-between Uniform and Proportional if
  • for 1lt i ltm, q i1/q i lt p i1/p i lt 1
  • (1 for uniform, for proportial, we want to
    favor popular but not too much)
  • Theorem1 All (strictly) in-between strategies
    are (strictly) better than Uniform and
    Proportional

Theorem2 p is worse than Uniform/Proportional
if for all i, p i1/p i gt 1 (popular gets
less) OR for all i, q i1/q i gt p i1/p i (less
popular gets less than fair share)
Proportional and Uniform are the worst
reasonable strategies
16
Square-Root Replication
Find ri that minimizes A, A Si qi Ai n Si
qi/ri This is done for ri ? vqi where ? R/Si
vqi Then the average search size is Aoptimal
1/? (Si vqi)2
17
How much can we gain by using SR ?
Zipf-like query rates
Auniform/ASR
18
Other Metrics Discussion
  • Utilization rate, the rate of requests that a
    replica of an object i receives
  • Ui R qi/ri
  • For uniform replication,
  • all objects have the same average search size,
  • but replicas have utilization rates proportional
    to their query rates
  • Proportional replication achieves perfect load
    balancing with all replicas having the same
    utilization rate,
  • but average search sizes vary with more popular
    objects having smaller average search sizes than
    less popular ones

19
Replication Summary
20
Assumption that there is at least one copy per
object
  • Query is soluble if there are sufficiently many
    copies of the item.
  • Query is insoluble if item is rare or non
    existent.
  • What is the search size of a query?
  • Soluble queries number of probes until answer is
    found.
  • Insoluble queries maximum search size

21
  • SR is best for soluble queries
  • Uniform minimizes cost of insoluble queries

What is the optimal strategy?
OPT is a hybrid of Uniform and SR Tuned to
balance cost of soluble and insoluble queries
uniformly allocate a minimum number of copies
per item, use SR for the rest
22
We now know what we need.
How do we get there?
23
Replication Algorithms
  • Uniform and Proportional are easy
  • Uniform When item is created, replicate its key
    in a fixed number of hosts.
  • Proportional for each query, replicate the key
    in a fixed number of hosts (need to know or
    estimate the query rate)

Desired properties of algorithm
  • Fully distributed where peers communicate through
    random probes minimal bookkeeping and no more
    communication than what is needed for search.
  • Converge to/obtain SR allocation when query rates
    remain steady.

24
Replication Algorithms
  • Uniform and Proportional are easy
  • Uniform When item is created, replicate its key
    in a fixed number of hosts.
  • Proportional for each query, replicate the key
    in a fixed number of hosts (need to know or
    estimate the query rate)

25
Replication Algorithms
Desired properties of algorithm
  • Fully distributed where peers communicate through
    random probes minimal bookkeeping and no more
    communication than what is needed for search.
  • Converge to/obtain SR allocation when query rates
    remain steady.

26
Achieving Square-Root Replication
  • How can we achieve square-root replication in
    practice?
  • Assume that each query keeps track of the search
    size
  • Each time a query is finished the object is
    copied to a number of sites proportional to the
    number of probes
  • On average object i will be replicated on c n/ri
    times each time a query is issued (for some
    constant c)
  • It can be shown that this gives square root

27
Achieving Square-Root Replication
What about replica deletion? Steady state
creation time equal with the deletion time The
lifetime of replicas must be independent of
object identity or query rate FIFO or random
deletions is ok LRU or LFU no
28
Replication
Thus, for Square-root replication an object
should be replicated at a number of nodes that
is proportional to the number of probes that the
search required
29
Replication - Implementation
Two strategies are popular Owner
Replication When a search is successful, the
object is stored at the requestor node only (used
in Gnutella) Path Replication When a search
succeeds, the object is stored at all nodes along
the path from the requestor node to the provider
node (used in Freenet) Following the reverse path
back to the requestor
30
Replication - Implementation
If a p2p system uses k-walkers, the number of
nodes between the requestor and the provider node
is 1/k of the total nodes visited (number of
probes) Then, path replication should result in
square-root replication Problem Tends to
replicate nodes that are topologically along the
same path
31
Replication - Implementation
Random Replication When a search succeeds, we
count the number of nodes on the path between the
requestor and the provider Say p Then, randomly
pick p of the nodes that the k walkers visited to
replicate the object Harder to implement
32
Experimental Evaluation
Both path and random replication generates
replication ratios quite close to square-root of
query rates
Path replication and random replication reduces
the overall message traffic by a factor of 3 to 4
respectively
Much of the traffic reduction comes from reducing
the number of hops
Path and random, better than owner For example,
queries that finish with 4 hops, 71 owner, 86
path, 91 random
33
Replication Unstructured P2Pepidemic
algorithms
34
Reasons for Replication
Besides storage, cost associated with
replication Consistency Maintenance
35
Methods for spreading updates Push originate
from the site where the update appeared To reach
the sites that hold copies Pull the sites
holding copies contact the master site Epidemics
for spreading updates
36
A. Demers et al, Epidemic Algorithms for
Replicated Database Maintenance, SOSP 87
Update at a single site Randomized algorithms
for distributing updates and driving replicas
towards consistency Ensure that the effect of
every update is eventually reflected to all
replicas Sites become fully consistent only when
all updating activity has stopped and the system
has become quiescent Analogous to epidemics
37
Methods for spreading updates Direct mail
(server-initiated) each new update is
immediately mailed from its originating site to
all other sites () Timely reasonably
efficient (-) Not all sites know all other sites
(stateless) (-) Mails may be lost Anti-entropy
every site regularly chooses another site at
random and by exchanging content resolves any
differences between them () Extremely reliable
but requires exchanging content and resolving
updates (-) Propagates updates much more slowly
than direct mail
38
  • Methods for spreading updates
  • Rumor mongering
  • Sites are initially ignorant when a site
    receives a new update it becomes a hot rumor
  • While a site holds a hot rumor, it periodically
    chooses another site at random and ensures that
    the other site has seen the update
  • When a site has tried to share a hot rumor with
    too many sites that have already seen it, the
    site stops treating the rumor as hot and retains
    the update without propagating it further
  • Rumor cycles can be more frequent that
    anti-entropy cycles, because they require fewer
    resources at each site, but there is a chance
    that an update will not reach all sites

39
  • Anti-entropy and rumor spreading are examples of
    epidemic algorithms
  • Three types of sites
  • Infective A site that holds an update that is
    willing to share is hold
  • Susceptible A site that has not yet received an
    update
  • Removed A site that has received an update but
    is no longer willing to share
  • Anti-entropy simple epidemic where all sites are
    always either infective or susceptible

40
?? paper a?af??eta? se a?ta??a?? ???? t??
pe??e??µ???? t?? ??µß??
A set S of n sites, each storing a copy of a
database The database copy at site s ? S is a
time varying partial function s.ValueOf K ?
uV x t T set of keys set of values
set of timestamps (totally ordered by lt V
contains the element NIL s.ValueOfk NIL, t
item with k has been deleted from the
database Assume, just one item s.ValueOf ? uV
x tT thus, an ordered pair consisting of a
value and a timestamp The first component may be
NIL indicating that the item was deleted by the
time indicated by the second component
41
  • The goal of the update distribution process is to
    drive the system towards
  • s, s ?S s.ValueOf s.ValueOf
  • Operation invoked to update the database
  • UpdateuV s.ValueOf r, Now)

42
Direct Mail
At the site s where an update occurs For each
s ? S PostMailtos, msg(Update, s.ValueOf)
s originator of the update s receiver of the
update
Each site s receiving the update message
(Update, (u, t)) If s.ValueOf.t lt t
s.ValueOf ? (u, t)
  • The complete set S must be known to s (stateful
    server)
  • PostMail messages are queued so that the server
    is not delayed (asynchronous), but may fail when
    queues overflow or their destination are
    inaccessible for a long time
  • n (number of sites) messages per update
  • traffic proportional to n and the average
    distance between sites

43
Anti-Entropy
At each site s periodically execute For some s
? S ResolveDifferences, s
s pushes its value to s
s ? s
Three ways to execute ResolveDifference Push
(sender (server) - driven) If s.Valueof.t gt
s.Valueof.t s.ValueOf ? s.ValueOf Pull
(receiver (client) driven) If s.Valueof.t lt
s.Valueof.t s.ValueOf ? s.ValueOf Push-Pull
s.Valueof.t gt s.Valueof.t ? s.ValueOf ?
s.ValueOf s.Valueof.t lt s.Valueof.t ? s.ValueOf
? s.ValueOf
s pulls s and gets s value
44
Anti-Entropy
  • Assume that
  • Site s is chosen uniformly at random from the
    set S
  • Each site executes the anti-entropy algorithm
    once per period
  • ?p?de????eta? ?t?,
  • An update will eventually infect the entire
    population
  • ?e?????ta? ap? ??a? µ???sµ??? (infected) ??µß?,
    a?t? ep?t?????eta? se ????? a?????? to the log of
    the population size
  • ? sta?e?? t?? a?a????a? e?a?t?ta? ap? t? a? ?a
    ???s?µ?p????e? push ? pull

45
Anti-Entropy
Let pi be the probability of a site remaining
susceptible (has not received the update) after
the i cycle of anti-entropy (?????µe ?a te??e?
st? 0 ?s? t? d??at?? p?? ??????a) For pull, A
site remains susceptible after the i1 cycle, if
(a) it was susceptible after the i cycle and (b)
it contacted a susceptible site in the i1
cycle pi1 (pi)2 For push, A site remains
susceptible after the i1 cycle, if (a) it was
susceptible after the i cycle and (b) no
infectious site choose to contact in the i1
cycle pi1 pi (1 1/n)n(1-pi) pi1 pi e-1
1 1/n (site is not contacted by a node) n(1-pi)
number of infectious nodes at cycle i
Pull is preferable than push
46
Anti-Entropy
  • Te??e? ?t? ?? ??µß?? a?ta???ss??? ??? t?
    pe??e??µe?? t??? ?p?te ?p???e? t? ??µa t?
    st?????µe st? d??t?? ?a? p?? s????????µe ta
    st??µ??t?pa
  • Use checksums
  • Ok, a? ta checksums s?????? s?µf?????
  • A list of recent updates for which (now
    timestamp) lt threshold t
  • Compare fist recent updates, update databases and
    the ckecksums and then compare the updated
    checksums, choice of t
  • Maintain an inverted list of updates ordered by
    timestamp
  • Perform anti-entropy by exchanging timestamps at
    reverse timestamp order until their checksums
    agree

47
Complex Epidemics Rumor Spreading
  • Initial State n individuals initially inactive
    (susceptible)
  • Rumor plantingspreading
  • We plant a rumor with one person who becomes
    active (infective), phoning other people at
    random and sharing the rumor
  • Every person bearing the rumor also becomes
    active and likewise shares the rumor
  • When an active individual makes an unnecessary
    phone call (the recipient already knows the
    rumor), then with probability 1/k the active
    individual loses interest in sharing the rumor
    (becomes removed)
  • We would like to know
  • How fast the system converges to an inactive
    state (no one is infective)
  • The percentage of people that know the rumor
    when the inactive state is reached

48
Complex Epidemics Rumor Spreading
Let s, i, r be the fraction of individuals that
are susceptible, infective and removed s i r
1 ds/dt - s i di/dt s i 1/k(1-s) i s e
(k1)(1-s) An exponential decrease of s with
k For k 1, 20 miss the rumor For k 2, only
6 miss it
Unnecessary phone calls
49
Criteria to characterize epidemics
  • Residue
  • The value of s when i is zero the remaining
    susceptible when the epidemic finishes
  • Traffic
  • m Total update traffic / Number of sites
  • Delay
  • Average delay (tavg) difference between the
    time of the initial injection of an update and
    the arrival of the update at a given site
    averaged over all sites
  • The delay until (tlast) the reception by the
    last site that will receive the update during an
    epidemic

50
Simple variations of rumor spreading
Blind vs. Feedback Feedback variation a sender
loses interest only if the recipient knows the
rumor Blind variation a sender loses interest
with probability 1/k regardless of the
recipient Counter vs. Coin Instead of losing
interest with probability 1/k, use a counter so
that we loose interest only after k unnecessary
contacts s e-m There are nm updates sent The
probability that a single site misses all these
updates is (1 1/n)nm
m is the traffic
??e? t?? ?d?a s??s? µeta?? traffic ?a? residue
Counters and feedback improve the delay, with
counters playing a more significant role
51
Simple variations of rumor spreading
Push vs. Pull Pull converges faster If there are
numerous independent updates, a pull request is
likely to find a source with a non-empty rumor
list If the database is quiescent, the push
phase ceases to introduce traffic overhead,
while the pull continues to inject useless
requests for updates
Counter, feedback and pull work better
52
  • Minimization
  • Use a push and pull together, if both sites know
    the update, only the site with the smaller
    counter is incremented
  • Connection Limit
  • A site can be the recipient of more than one push
    in a cycle, while for pull, a site can service an
    unlimited number of requests
  • What if we set a limit
  • Push gets better (reduce traffic, since the
    spread grows exponentially, most traffic occurs
    at the end)
  • Pull gets worst

53
Hunting If a connection is rejected, then the
choosing site can hunt for alternate
sites push and pull similar if connection
limit 1 and infinite hunt
54
Complex Epidemic and Anti-entropy
Anti-entropy can be run infrequently to back-up a
complex epidemic, so that every update eventually
reaches (or is suspended at) every site What
happens when an update is discovered during
anti-entropy use rumor mongering (e.g., make it
a hot rumor) or direct mail
55
Deletion and Death Certificates
Replace deleted items with death certificates
which carry timestamps and spread like ordinary
data When old copies of deleted items meet death
certificates, the old items are removed. But
when to delete death certificates?
56
Dormant Death Certificates
Define some threshold (but some items may be
resurrected re-appear) If the death
certificate is older than the expected time
required to propagate it to all sites, then the
existence of an obsolete copy of the
corresponding data item is unlikely Delete very
old certificates at most sites, retaining
dormant copies at only a few sites (like
antibodies) Use two thresholds, t1 and t2 a
list of r retention sites names with each death
certificate (chosen at random when the death
certificate is created) Once t1 is reached, all
servers but the servers in the retention list
delete the death certificate Dormant death
certificates are deleted when t1 t2 is reached
57
Anti-Entropy with Dormant Death Certificates
Whenever a dormant death certificate encounters
an obsolete data item, it must be activated
58
Spatial Distribution
How to choose partners Consider spatial
distributions in which the choice tends to favor
nearby servers
59
Spatial Distribution
The cost of sending an update to a nearby site is
much lower that the cost of sending the update to
a distant site Favor nearby neighbors Trade off
between Average traffic per link and Convergence
times Example linear network, only nearest
neighbor O(1) and O(n) vs uniform random
connections O(n) and O(log n) Determine the
probability of connecting to a site at distance
d For spreading updates on a line, d-2
distribution the probability of connecting to a
site at distance d is proportional to d-2 In
general, each site s independently choose
connections according to a distribution that is a
function of Qs(d), where Qs(d) is the cumulative
number of sites at distance d or less from s
60
Spatial Distribution and Anti-Entropy
Extensive simulation on the actual topology with
a number of different spatial distributions A
different class of distributions less sensitive
to sudden increases of Qs(d) Let each site s
build a list of the other sites sorted by their
distances from s Select anti-entropy exchange
partners from the sorted list according to a
function f(i), where i is its position on the
list (averaging the probabilities of selecting
equidistant sites) Non-uniform distribution
induce less overload on critical links
61
Spatial Distribution and Rumors
Anti-entropy converges with probability 1 for a
spatial distribution such that for every pair
(s, s) of sites there is a nonzero probability
that s will choose to exchange data with
s However, rumor mongering is less robust
against changes in spatial distributions and
network topology As the spatial distribution is
made less uniform, we can increase the value of k
to compensate
Write a Comment
User Comments (0)
About PowerShow.com