p2p06

About This Presentation

Title:

p2p06

Description:

... Implementation If a p2p system uses ... Less durability/availability Types of Replication Caching vs ... Consistency Maintenance Make reads faster in ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 94

Provided by: ep58

Category:

more less

Transcript and Presenter's Notes

Title: p2p06

1
Topics in Database Systems Data Management in
Peer-to-Peer Systems
PART 1 Replication and other issues
2
Agenda ??a s?µe?a
1. ?e????af? t?? e??as??? t?? µa??µat?? 2. Ge????
??a Replication 3. Replication Theory for
Unstructured (Cohen et al paper) 4. Epidemic
Algorithms for Updates (Demers et al paper)
3
Term Projects

???as?e? t???? t?p??
????? ??p??? e?e???t??? ?a?a?t??a ??e???eta?
?a s?efte?te
?e? ?p???e? µ?a ??s? (??a t?? ?d?a e??as?a
pa?ap??? ap? µ?a ?µ?de?)
Ta ??e?a 3 ?t?µa a?? ?µ?da
?? ??ete ??p??a ???? ?d?a ???eta? a??? ???
a?t?µata
Ta ft???ete µ?a web se??da ??a t? project t??
?p??a ?a µ?? ste??ete
replicate content and not index (for
durability)!!!

4
Term Projects
??G?S?? ????? I Ta ep????ete
??a ????? ap? µ?a ??sta ap? ????a ?a ????a
af????? p??ί??µata d?a?e???s?? ded?µ???? e?te se
?e?t????p???µ??a s?st?µata e?te se ?ata?eµ?µ??a
s?st?µata ????? t?? ?d??t?te? t?? s?st?µ?t??
?µ?t?µ??. St???? t?? e??as?a? e??a? ? s?ed?as?
µ?a e?d???? t?? p??ί??µat?? ?at??????? ??a ??a
s?st?µa ?µ?t?µ?? ??µί??. ? e??as?a sa? ?a p??pe?
?a pe????e? µ?a µ??f? a???????s?? t?? p??s????s??
sa?. ??t? µp??e? ?a e??a? ?e???t??? (p?, e?t?µ?s?
p???p????t?ta? t?? ??s??, ap?de??? t?? ????t?ta?
? ????? ?d??t?t?? (p? e??s????p?s? f??t???) t??
??s??) ?/?a? ?a pe???aµί??e? µ?a µ???? ???p???s?.
Ta pa?ad?sete ??a ????? p?? ?a ??e? t?? µ??f?
e?e???t???? e??as?a? (?a d????? ?d???e?).
?p?s??, ?a pa???s??sete t?? e??as?a sa? st?
µ???µa (?a d????? ?d???e?).
5
Term Projects

????a ??a t?? ???as?e? ??p?? ?
1-3 ??a???te ?p???d?p?te (??a) ap? ta sections
3, 4 ? 5 ap? t? M. Stonebraker, P. M. Aoki, W.
Litwin, A. Pfeffer, A. Sah, J. Sidell C. Staelin
and A. Yu. Mariposa A Wide-Area Distributed
Database System. VLDB J., 5(1), 1996, 48-63.
4 ??et?ste p?? t? pa?a??t? p?? s???t?saµe st?
µ???µa µp??e? ?a p??sa?µ?ste? ??a p2p A. J.
Demers, D. H. Greene, C. Hauser, W. Irish, J.
Larson, S. Shenker, H. E. Sturgis, D. C.
Swinehart, D. B. Terry Epidemic Algorithms for
Replicated Database Maintenance. PODC 1987 1-12
5 Te??e?ste µ?a ?ata?eµ?µ??? (p2p) e?d??? e???
bitmap index.
G?a ta bitmap indexes µp??e?te ?a s?µί???e?te?te
?p???d?p?te ί?ί??? ί?se?? ded?µ???? ?/?a? t?
pa?a??t?
P. E. O'Neil and D. Quass. Improved Query
Performance with Variant Indexes. Proc. SIGMOD
Conference, 1997, 38-49.
6 ??et?ste p?? t? pa?a??t? p?? af??? sensor
networks µp??e? ?a efa?µ?ste? se p2p s?st?µata
D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos,
V. Kalogeraki, V. Tsotras, M. Vlachos, N. Koudas,
D. Srivastava The Threshold Join Algorithm for
Top-k Queries in Distributed Sensor Networks,
DMSN Workshop, 2005.

6
Term Projects
??G?S?? ????? ?? Ta
ep????ete ??a ????? p?? af??? ??µata t?? pe??????
t?? s?st?µ?t?? ?µ?t?µ?? ??µί?? p?? de? ????µe
?a???e? st? µ???µa, s???e???µ??a (i) security,
(ii) trust/reputation, (iii) incentives, (iv)
publish-subscribe s?st?µata. ?a???s?as? t??
?????? st? µ???µa. (a) p??te??ete ??p??a
ep??tas? t?? ??????, p? efa?µ??? t?? se ???? t?p?
overlay, ίe?t??s? ??p???? ?a?a?t???st???? t??
??p. Se a?t?? t?? pe??pt?s?, ?a p??pe? ?a
s?µpe????ίete ?a? ??p??a µ??f? a???????s?? t??
ep??tas??. ??t? µp??e? ?a e??a? ?e???t??? (p?,
e?t?µ?s? p???p????t?ta? t?? ??s?? ??p) ?/?a? ?a
pe???aµί??e? µ?a µ???? ???p???s?, e?te (ί) ?a
???p???sete ??a ??a??p???t??? ??µµ?t? t?? ??????.
Ta pa?ad?sete ??a ????? p?? ?a ??e? t?? µ??f?
e?e???t???? e??as?a? (?a d????? ?d???e?).
?p?s??, ?a d?sete µ?a de?te?? pa???s?as? st?
µ???µa a?t? t? f??? t?? e??as?a sa? (?a d?????
?d???e?).
7
Term Projects

????a ??a t?? ???as?e? ??p?? ?I
Security E. Sit and R. Morris Security
Considerations for Peer-to-Peer Distributed Hash
Tables. IPTPS 2002 261-269 D. S. Wallach A
Survey of Peer-to-Peer Security Issues. ISSS
2002 42-57
Incentives M. Feldman, K. Lai, I. Stoica and J.
Chuang Robust incentive techniques for
peer-to-peer networks. ACM Conference on
Electronic Commerce 2004 102-111
Trust/Reputation S. D. Kamvar, M. T. Schlosser,
H. Garcia-Molina The Eigentrust algorithm for
reputation management in P2P networks. WWW 2003
640-651
Publish/subscribe M. Bender, S. Michel, S.
Parkitny, and G. Weikum A Comparative Study of
Pub/Sub Methods in Structured P2P Networks.
DBISP2P 2006, Seoul, South Korea, Springer, 2006

8
Term Projects
??G?S?? ????? ?I? Ta
ep????ete ??a ap? ta s?st?µata p?? af?????
????sµ??? s?st?µ?t?? ?µ?t?µ?? ??µί??. Ta p??pe?
?a e??atast?sete t? s?et??? ????sµ??? ?a? ?a
?atas?e??sete µ?a µ???? efa?µ???. Ta pa?ad?sete
??a ????? p?? ?a pe???aµί??e? ??a s??t?µ?
e??e???d?? ??a t? s?st?µa ?a? µ?a pe????af? t??
efa?µ??? sa?. ?p?s??, ?a pa???s??sete t??
e??as?a sa? st? µ???µa (?a d????? ?d???e?). ?
pa???s?as? ?a p??pe? ?a pe???aµί??e? ?a? ??a
s??t?µ? demo.
9
Term Projects
?a S?st?µata ??a t?? ???as?e? ??p?? I?I 1
OpenDHT OpenDHT is a publicly accessible
distributed hash table (DHT) service. 2 P2
Declarative Networking P2 is a system which uses
a high-level declarative language to express
overlay networks in a highly compact and reusable
form 3 PeerSim PeerSim is a simulation
environment for P2P protocols in java.
10
Term Projects
????esµ?e? ?e? 7 S??µat?sµ?? ?µ?d?? ?a?
ep????? e??as?a? ?e? 14 1-2 se??de? "p??tas?
e??as?a?" (project proposal) (?a d?d??? ?d???e?)
?e? 21 p??a??? ?a ????µe µ?a µ????
pa???s?as?/s???t?s? t?? e??as??? t?? te?e?ta?a
eίd?µ?da p??? ta ???st???e??a ?a? 11
?a???s??se?? ?????? ?µ?da? ?? ?a? 18 "
" ?a? 25 ?a??d?s? ???as?a? (??a t? ?????, ?a
d????? ?d???e?) Ta ?p???e? ??a te???? workshop
p?? ?a pa???s?ast??? ?? e??as?e? ???? t?? ?µ?d??.
11
Agenda ??a s?µe?a

?e????af? t?? e??as??? t?? µa??µat??
2. Ge???? ??a Replication
3. Replication Theory for Unstructured (Cohen et
al paper)
4. Epidemic Algorithms for Updates (Demers et al
paper)

12
Types of Replication

Two types of replication
Metadata/Index replicate index entries
Data/Document replication replicate the actual
data (e.g., music files)
Metadata vs Data
() Lighter storage and bandwidth wise
() Sizes of replicated objects more uniform
(-) Adds an extra hop for actually getting the
data
(-) More frequent updates
(-) Less durability/availability

13
Types of Replication
Caching vs Replication Cache Store data
retrieved from a previous request
(client-initiated) Replication More proactive,
a copy of a data item may be stored at a node
even if the node has not requested it
14
Reasons for Replication

Reasons for replication
Performance
load balancing
locality place copies close to the requestor
geographic locality (more choices for the next
step in search)
reduce number of hops
Availability
In case of failures
Peer departures

15
Reasons for Replication
Besides storage, cost associated with
replication Consistency Maintenance Make reads
faster in the expense of slower writes
16

No proactive replication (Gnutella)
Hosts store and serve only what they requested
A copy can be found only by probing a host with a
copy
Proactive replication of keys ( meta data
pointer) for search efficiency (FastTrack, DHTs)
Proactive replication of copies for search
and download efficiency, anonymity. (Freenet)

17
Issues
Which items (data/metadata) to replicate Based
on popularity In traditional distributed systems,
also rate of read/write cost benefit the
ratio read-savings/write-increase Where to
replicate (allocation schema)
18
Issues
How/When to update Both data items and metadata
19
Database-Flavored Replication Control Protocols
Lets assume the existence of a data item x with
copies x1, x2, , xn x logical data item xis
physical data items
A replication control protocol is responsible for
mapping each read/write on a logical data item
(R(x)/W(x)) to a set of read/writes on a
(possibly) proper subset of the physical data
item copies of x
20
One Copy Serializability
Correctness A DBMS for a replicated database
should behave like a DBMS managing a one-copy
(i.e., non-replicated) database insofar as users
can tell
One-copy schedule replace operation of data
copies with operations on data items
One-copy serializable (1SR) the schedule of
transactions on a replicated database be
equivalent to a serial execution of those
transactions on a one-copy database
21
ROWA
Read One/Write All (ROWA) A replication control
protocol that maps each read to only one copy of
the item and each write to a set of writes on all
physical data item copies.
Even if one of the copies is unavailable an
update transaction cannot terminate
22
Write-All-Available
Write-all-available A replication control
protocol that maps each read to only one copy of
the item and each write to a set of writes on all
available physical data item copies.
23
Quorum-Based Voting

Read quorum Vr and a write quorum Vw to read or
write a data item
If a given data item has a total of V votes, the
quorums have to obey the following rules
Vr Vw gt V
Vw gt V/2

Rule 1 ensures that a data item is not read or
written by two transactions concurrently
(R/W) Rule 2 ensures that two write operations
from two transactions cannot occur concurrently
on the same data item (W/W)
24
Distributing Writes
Immediate writes Deffered writes Access only one
copy of the data item, it delays the distribution
of writes to other sites until the transaction
has terminated and is ready to commit. It
maintains an intention list of deferred
updates After the transaction terminates, it send
the appropriate portion of the intention list to
each site that contains replicated
copies Optimizations aborts cost less may
delay commitment delays the detection of
conflicts Primary or master copy Updates at a
single copy per item
25
Eager vs Lazy Replication
Eager replication keeps all replicas
synchronized by updating all replicas in a single
transaction Lazy replication asynchronously
propagate replica updates to other nodes after
the replicating transaction commits
In p2p, lazy replication (or soft state)
26
Update Propagation

Stateless or State-full (the item-owners know
which nodes holds copies of the item)
Who initiates the update
Push by the server item (copy) that changes
Pull by the client holding the copy

27
Update Propagation

When
Periodic
Immediate
Lazy when an inconsistency is detected
Threshold-based Freshness (e.g., number of
updates or actual time)
Value
Expiration-Time Items expire (become invalid)
after that time (most often used in p2p)
Adaptive periodic
Reduce or increase period based on the updates
seen between two successive updates
Stateless or State-full (the item-owners know
which nodes holds copies of the item)

28
Summary Design parameters and performance (CAN)
Path-length Neighbor state Total path latency Per-hop latency volume Multiple routes replicas
Dimensions (d) O(dn1/d) O(d) ? - - ? -
Realities (r) ? O(r) ? - O(r) ? O(r)
MAXPEERS (p) O(1/p) O(p) ? ? O(p) ? O(p)
Hash functions (k) - - ? - ?(k) - O(k)
RTT-weighted routing - - ? ? - - -
Uniform partitioning heuristic Reduced variance Reduces variance - - Reduced variance - -
Only on replicated data
29
CHORD Failures

Replication
Each node maintain a successor list of its r
nearest successors
Upon failure, use the next successor in the list
Modify stabilize to fix the list

Other nodes may attempt to send requests through
the failed node Use alternate nodes found in the
routing table of preceding nodes or in the
successor list
30
CHORD Failures

Theorem If we use a successor list with r
?(logN) in an initially stable network and then
every node fails with probability 1/2, then
with high probability, find_successor returns
the closest living successor
the expected time to execute find_successor in
the failed network is O(logN)

A lookup fails, if all r nodes in the successor
list fail. All fail with probability 2-r
(independent failures) 1/N
31
CHORD Replication
Store replicas of a key at the k nodes succeeding
the key Successor list helps to keep the number
of replicas per item known Other approach store
a copy per region
32
BATON Failures
There is routing redundancy

Upon node departure or failure, the parent can
reconstruct the entries
Assume node x fails, any detected failures of x
are reported to its parent y
y regenerates the routing tables of x Theorem 2
Messages are routed
Sideways (redundancy similar to CHORD)
Up-down (can find its parent through its
neighbors)

33
Replication - Beehive

Proactive model-driven replication
Passive (demand-driven) replication such as
caching objects along a lookup path
Hint for BATON
Beehive
The length of the average query path reduced by
one when an object is proactively replicated at
all nodes logically preceding that node on all
query paths
BATON
Range queries
Many paths to data

Any ideas?
34
Agenda ??a s?µe?a
1. ?e????af? t?? e??as??? t?? µa??µat?? 2. Ge????
??a Replication 3. Replication Theory for
Unstructured (Cohen et al paper) 4. Epidemic
Algorithms for Updates (Demers et al paper)
35
Replication Theory Replica Allocation Policies
in Unstructured P2P Systems
E. Cohen and S. Shenker, Replication Strategies
in Unstructured Peer-to-Peer Networks. SIGCOMM
2002 Q. Lv et al, Search and Replication in
Unstructured Peer-to-Peer Networks, ICS02
Replication Part
36
Replication Allocation Scheme
Question how to use replication to improve
search efficiency in unstructured networks?
How many copies of each object so that the
search overhead for the object is minimized,
assuming that the total amount of storage for
objects in the network is fixed
37
Replication Theory - Model
Assume m objects and n nodes Each node capacity
?, total capacity R n ? How to allocate R
among the m objects? Determine ri number of
copies (distinct nodes) that hold a copy of i S
i1, m ri R (R total capacity) Also, pi ri/R
Fraction of total capacity allocated to
I Allocation represented by the vector (p1, p2,
. pm) (r1/R, r2/R, rm/R)
38
Replication Theory - Model
Assume that object i is requested with relative
rates qi, we normalize it by setting S i1, m qi
1 For convenience, assume 1 ltlt ri ? n and that
q1 ? q2 ? ? qm
Map the query distribution q to an allocation
vector p
39
Replication Theory - Model
Assume all nodes equal capacity ?, ? R/n
R ? m (at least one copy per item) m gt ? (else,
the problem is trivial, maintain copies of all
items everywhere)
Bounds for pi At least one copy, ri ? 1, Lower
value l 1/R At most n copies, ri ? n, Upper
value, u n/R
40
Replication Theory
Assume that searches go on until a copy is
found We want to determine ri that minimizes the
average search size (number of nodes probed) to
locate an item i Need to compute average search
size per item Searches consist of randomly
probing sites until the desired object is found
search at each step draws a node uniformly at
random and asks whether it has a copy
41
Search Example

2 probes

4 probes
42
Replication Theory
The probability Pr(k) that the object I is found
at the kth probe is given Pr(k) Pr(not
found in the previous k-1 probes) Pr(found in one
(the kth) probe) (1 ri/n)k-1 ri/n k
(search size step at which the item is found) is
a random variable with geometric distribution and
? ri/n gt expectation n/ri
43
Replication Theory
Ai Expectation (average search size) for object
i is the inverse of the fraction of sites that
have replicas of the object Ai n/ri The
average search size A of all the objects (average
number of nodes probed per object query) A Si
qi Ai n Si qi/ri
Minimize A n Si qi/ri
44
Replication Theory
If we have no limit on ri, replicate everything
everywhere Then, the average search size Ai
n/ri 1 Search becomes trivial
Assume a limit on R and that the average number
of replicas per site ? R/n is fixed
How to allocate these R replicas among the m
objects how many replicas per object
45
Replication Theory
Minimize Si qi/pi Subject to Spi 1 and l ? pi
? u
Monotonicity Since q1 ? q2 ? ? qm, we must
have p1 ? p2 ? ? pm More copies to more
popular, but how many?
46
Uniform Replication
Create the same number of replicas for each
object ri R/m Average search size for uniform
replication Ai n/ri m/? Auniform Si qi m/?
m/? (m n/R) Which is independent of the query
distribution
47
Proportional Replication
It makes sense to allocate more copies to objects
that are frequently queried, this should reduce
the search size for the more popular objects
Create a number of replicas for each object
proportional to the query rate ri R qi
48
Proportional Replication
Number of replicas for each object ri R
qi Average search size for uniform
replication Ai n/ri n/R qi Aproportioanl Si
qi n/R qi m/? Auniform again independent of
the query distribution Why? Objects whose query
rate are greater than average (gt1/m) do better
with proportional, and the other do better with
uniform The weighted average balances out to be
the same
49
Uniform and Proportional Replication

Summary
Uniform Allocation pi 1/m
Simple, resources are divided equally
Proportional Allocation pi qi
Fair, resources per item proportional to demand
Reflects current P2P practices

50
Space of Possible Allocations
So what is the optimal way to allocate replicas
so that A is minimized?

q i1/q i ? p i1/p i
As the query rate decreases, how much does the
ratio of allocated replicas behave
Reasonable
p i1/p i ? 1
1 for uniform

51
Space of Possible Allocations

Definition Allocation p1, p2, p3,, pm is
in-between Uniform and Proportional if
for 1lt i ltm, q i1/q i lt p i1/p i lt 1
(1 for uniform, for proportial, we want to
favor popular but not too much)
Theorem1 All (strictly) in-between strategies
are (strictly) better than Uniform and
Proportional

Theorem2 p is worse than Uniform/Proportional
if for all i, p i1/p i gt 1 (more popular gets
less) OR for all i, q i1/q i gt p i1/p i (less
popular gets less than fair share)
Proportional and Uniform are the worst
reasonable strategies
52
Space of allocations on 2 items
Uniform
Proportional
p2/p1
q2/q1
53
So, what is the best strategy?
54
Square-Root Replication
Find ri that minimizes A, A Si qi Ai n Si
qi/ri This is done for ri ? vqi where ? R/Si
vqi Then the average search size is Aoptimal
1/? (Si vqi)2
55
How much can we gain by using SR ?
Zipf-like query rates
Auniform/ASR
56
Other Metrics Discussion

Utilization rate, the rate of requests that a
replica of an object i receives
Ui R qi/ri
For uniform replication,
all objects have the same average search size,
but replicas have utilization rates proportional
to their query rates
Proportional replication achieves perfect load
balancing with all replicas having the same
utilization rate,
but average search sizes vary with more popular
objects having smaller average search sizes than
less popular ones

57
Replication Summary
58
Pareto Distribution (for the queries)
59
Pareto Distribution (for the queries)
Both model Power-law distributions Zipf what is
the size (popularity) of the r-th ranked -- y
r-b Pareto how many have size gt r (look at
the frequency distribution) PX gt x x-k PX
x x-(k1) x-a "The r-th hottest item has n
queries" is equivalent to saying "r items have n
or more queries". This is exactly the definition
of the Pareto distribution, except the x and y
axes are flipped. Whereas for Zipf, we have r
(rank) and compute n, in Pareto we have n and
compute r (rank) Reference http//www.hpl.hp.com
/research/idl/papers/ranking/ranking.html
60
Replication (summary)
Each object i is replicated on ri nodes and the
total number of objects stored is R, that is S
i1, m ri R

Uniform All objects are replicated at the same
number of nodes
ri R/m
(2) Proportional The replication of an object is
proportional to the query probability of the
object
ri ? qi
(3) Square-root The replication of an object i
is proportional to the square root of its query
probability qi
ri ? vqi

61
Assumption that there is at least one copy per
object

Query is soluble if there are sufficiently many
copies of the item.
Query is insoluble if item is rare or non
existent.

What is the search size of a query ?
Soluble queries number of probes until answer is
found.
Insoluble queries maximum search size

SR is best for soluble queries
Uniform minimizes cost of insoluble queries

What is the optimal strategy?
63
104 items, Zipf-like w1.5
All Soluble
85 Soluble
All Insoluble
Uniform
SR
64
We now know what we need.
How do we get there?
65
Replication Algorithms

Uniform and Proportional are easy
Uniform When item is created, replicate its key
in a fixed number of hosts.
Proportional for each query, replicate the key
in a fixed number of hosts (need to know or
estimate the query rate)

Desired properties of algorithm

Fully distributed where peers communicate through
random probes minimal bookkeeping and no more
communication than what is needed for search.
Converge to/obtain SR allocation when query rates
remain steady.

66
Replication - Implementation
Two strategies are popular Owner
Replication When a search is successful, the
object is stored at the requestor node only (used
in Gnutella) Path Replication When a search
succeeds, the object is stored at all nodes along
the path from the requestor node to the provider
node (used in Freenet) Following the reverse path
back to the requestor
67
Achieving Square-Root Replication

How can we achieve square-root replication in
practice?
Assume that each query keeps track of the search
size
Each time a query is finished the object is
copied to a number of sites proportional to the
number of probes
On average object i will be replicated on c n/ri
times each time a query is issued (for some
constant c)
It can be shown that this gives square root

68
Replication - Conclusion
Thus, for Square-root replication an object
should be replicated at a number of nodes that
is proportional to the number of probes that the
search required
69
Replication - Implementation
If a p2p system uses k-walkers, the number of
nodes between the requestor and the provider node
is 1/k of the total nodes visited (number of
probes) Then, path replication should result in
square-root replication Problem Tends to
replicate nodes that are topologically along the
same path
70
Replication - Implementation
Random Replication When a search succeeds, we
count the number of nodes on the path between the
requestor and the provider Say p Then, randomly
pick p of the nodes that the k walkers visited to
replicate the object Harder to implement
71
Achieving Square-Root Replication
What about replica deletion? Steady state
creation time equal with the deletion time The
lifetime of replicas must be independent of
object identity or query rate FIFO or random
deletions is ok LRU or LFU no
72
Replication Evaluation

Study the three replication strategies in the
Random graph network topology
Simulation Details
Place the m distinct objects randomly into the
network
Query generator generates queries according to a
Poisson process at 5 queries/sec
Zipf-distribution of queries among the m objects
(with a 1.2)
For each query, the initiator is chosen randomly
Then a 32-walker random walk with state keeping
and checking every 4 steps
Each sites stores at most objAllow (40) objects
Random Deletion
Warm-up period of 10,000 secs
Snapshots every 2,000 query chunks

73
Replication Evaluation

For each replication strategy
What kind of replication ratio distribution does
the strategy generate?
What is the average number of messages per node
in a system using the strategy
What is the distribution of number of hops in a
system using the strategy

74
Evaluation Replication Ratio
Both path and random replication generates
replication ratios quite close to square-root of
query rates
75
Evaluation Messages
Path replication and random replication reduces
the overall message traffic by a factor of 3 to 4
76
Evaluation Hops
Much of the traffic reduction comes from reducing
the number of hops
Path and random, better than owner For example,
queries that finish with 4 hops, 71 owner, 86
path, 91 random
77
Summary

Random Search/replication Model probes to
random hosts
Proportional allocation current practice
Uniform allocation best for insoluble queries
Soluble queries
Proportional and Uniform allocations are two
extremes with same average performance
Square-Root allocation minimizes Average Search
Size
OPT (all queries) lies between SR and Uniform
SR/OPT allocation can be realized by simple
algorithms.

78
Discussion
Cohen et al paper Path replication overshoots or
undershoot the fixed point if queries arrive in
large bursts or time between search and
subsequent copy generator is large more
involved algorithms than path replication Extensi
ons for variable size issues or nodes with
heterogeneous capacities Many issues Other
types of graphs, adaptability, etc
79
Agenda ??a s?µe?a
1. ?e????af? t?? e??as??? t?? µa??µat?? 2. Ge????
??a Replication 3. Replication Theory for
Unstructured (Cohen et al paper) 4. Epidemic
Algorithms for Updates (Demers et al paper)
80
Replication Unstructured P2Pepidemic
algorithms
81

Replication Policy
How many copies
Where (owner, path, random path)
Update Policy
Synchronous vs Asynchronous
Master Copy

82
Methods for spreading updates Push originate
from the site where the update appeared To reach
the sites that hold copies Pull the sites
holding copies contact the master site Expiration
times Epidemics for spreading updates
83
A. Demers et al, Epidemic Algorithms for
Replicated Database Maintenance, SOSP 87
Update at a single site Randomized algorithms
for distributing updates and driving replicas
towards consistency Ensure that the effect of
every update is eventually reflected to all
replicas Sites become fully consistent only when
all updating activity has stopped and the system
has become quiescent Analogous to epidemics
84
Methods for spreading updates Direct mail each
new update is immediately mailed from its
originating site to all other sites () Timely
reasonably efficient (-) Not all sites know all
other sites (stateless) (-) Mails may be
lost Anti-entropy every site regularly chooses
another site at random and by exchanging content
resolves any differences between them ()
Extremely reliable but requires exchanging
content and resolving updates (-) Propagates
updates much more slowly than direct mail
85

Methods for spreading updates
Rumor mongering
Sites are initially ignorant when a site
receives a new update it becomes a hot rumor
While a site holds a hot rumor, it periodically
chooses another site at random and ensures that
the other site has seen the update
When a site has tried to share a hot rumor with
too many sites that have already seen it, the
site stops treating the rumor as hot and retains
the update without propagating it further
Rumor cycles can be more frequent that
anti-entropy cycles, because they require fewer
resources at each site, but there is a chance
that an update will not reach all sites

Anti-entropy and rumor spreading are examples of
epidemic algorithms
Three types of sites
Infective A site that holds an update that is
willing to share is hold
Susceptible A site that has not yet received an
update
Removed A site that has received an update but
is no longer willing to share
Anti-entropy simple epidemic where all sites are
always either infective or susceptible

87
A set S of n sites, each storing a copy of a
database The database copy at site s ? S is a
time varying partial function s.ValueOf K ?
uV x t T set of keys set of values
set of timestamps (totally ordered by lt V
contains the element NIL s.ValueOfk NIL, t
item with k has been deleted from the
database Assume, just one item s.ValueOf ? uV
x tT thus, an ordered pair consisting of a
value and a timestamp The first component may be
NIL indicating that the item was deleted by the
time indicated by the second component
88

The goal of the update distribution process is to
drive the system towards
s, s ?S s.ValueOf s.ValueOf
Operation invoked to update the database
UpdateuV s.ValueOf r, Now)

89
Direct Mail
At the site s where an update occurs For each
s ? S PostMailtos, msg(Update, s.ValueOf)
s originator of the update s receiver of the
update
Each site s receiving the update message
(Update, (u, t)) If s.ValueOf.t lt t
s.ValueOf ? (u, t)

The complete set S must be known to s (stateful
server)
PostMail messages are queued so that the server
is not delayed (asynchronous), but may fail when
queues overflow or their destination are
inaccessible for a long time
n (number of sites) messages per update
traffic proportional to n and the average
distance between sites

90
Anti-Entropy
At each site s periodically execute For some s
? S ResolveDifferences, s
s pushes its value to s
s ? s
Three ways to execute ResolveDifference Push
(sender (server) - driven) If s.Valueof.t gt
s.Valueof.t s.ValueOf ? s.ValueOf Pull
(receiver (client) driven) If s.Valueof.t lt
s.Valueof.t s.ValueOf ? s.ValueOf Push-Pull
s.Valueof.t gt s.Valueof.t ? s.ValueOf ?
s.ValueOf s.Valueof.t lt s.Valueof.t ? s.ValueOf
? s.ValueOf
s pulls s and gets s value
91
Anti-Entropy

Assume that
Site s is chosen uniformly at random from the
set S
Each site executes the anti-entropy algorithm
once per period

It can be proved that
An update will eventually infect the entire
population
Starting from a single affected site, this can
be achieved in time proportional to the log of
the population size

92
Anti-Entropy
Let pi be the probability of a site remaining
susceptible (has not received the update) after
the i cycle of anti-entropy For pull, A site
remains susceptible after the i1 cycle, if (a)
it was susceptible after the i cycle and (b) it
contacted a susceptible site in the i1
cycle pi1 (pi)2 For push, A site remains
susceptible after the i1 cycle, if (a) it was
susceptible after the i cycle and (b) no
infectious site choose to contact in the i1
cycle pi1 pi (1 1/n)n(1-pi)
1 1/n (site is not contacted by a node) n(1-pi)
number of infectious nodes at cycle i
Pull is preferable than push
93
Anti-Entropy
More next week

Write a Comment

User Comments (0)