Epidemic Protocols - PowerPoint PPT Presentation

About This Presentation

Title:

Epidemic Protocols

Description:

Epidemic algorithms for replicated database maintenance; Alan ... Solution; Jim Gray, Pat Helland, Patrick O'Neil, Dennis Sasha, SIGMOD 1996 ( Read in CS632) ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 31

Provided by: testz4

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Epidemic Protocols

1
Epidemic Protocols

CS614
March 7th 2002
Ashish Motivala

2
Papers

Epidemic algorithms for replicated database
maintenance Alan Demers, Dan Greene, Carl
Hauser, Wes Irish and John Larson Proceedings of
the Sixth Annual ACM Symposium on Principles of
distributed computing , 1987
Bimodal multicast Kenneth P. Birman, Mark
Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu and
Yaron Minsky ACM Trans. Comput. Syst. 17, 2
(May. 1999)
Managing update conflicts in Bayou, a weakly
connected replicated storage system D. B. Terry,
M. M. Theimer, Karin Petersen, A. J. Demers, M.
J. Spreitzer and C. H. Hauser SOSP1995.
Flexible update propagation for weakly consistent
replication Karin Petersen, Mike J. Spreitzer,
Douglas B. Terry, Marvin M. Theimer and Alan J.
Demers SOSP, 1997,
Fighting fire with fire using randomized gossip
to combat stochastic scalability limits Indranil
Gupta, Kenneth P. Birman, Robbert van Renesse To
appear, March, 2002
Dangers of Replication and a Solution Jim Gray,
Pat Helland, Patrick ONeil, Dennis Sasha, SIGMOD
1996 (ltlt Read in CS632)

3
Simple Epidemic

Assume a fixed population of size n
For simplicity, assume homogeneous spreading
Simple epidemic any one can infect any one with
equal probability
Assume that k members are already infected
infection occurs in rounds

4
Probability of Infection

Probability Pinfect(k,n) that a particular
uninfected member is infected in a round if k are
already in a round if k are already infected?
Pinfect(k,n) 1 P(nobody infects member)
1 (1 1/n)k
E(newly infected members) (n-k)x Pinfect(k,n)
Basically its a Binomial Distribution

5
2 Phases

Intuition 2 Phases
Infection
Initial Growth Factor
is very high about 2
Exponential growth
Uninfection
Slow death of uninfection
to start
Exponential decline
Number of rounds necessary to infect the entire
population is O(log n)
First Half 1 -gt n/2 Phase 1
Second Half n/2 -gt n Phase 2
For large n, Pinfect(n/2,n) 1 (1/e)0.5 0.4

6
Applications for Epidemic Protocols

Reliable Multicast virtual synchrony, randomized
rumour spreading.
Systems (Database Replication) Clearinghouse,
Grapevine, Bayou
Membership and Failure Detection SWIM, SCAMP
Data Aggregation
Other distributed protocols leader election
Lightweight Prob. broadcast delta reliability
Li Li's work Kempe and Kleinberg's work
Our focus today

7
Grapevine and Clearinghouse

Weakly consistent replication was used at Xerox
PARC
Grapevine and Clearinghouse name services
Updates are propagated by unreliable multicast
(direct mail).
Periodic anti-entropy exchanges among replicas
ensure that they eventually converge, even if
updates are lost.
Arbitrary pairs of replicas periodically
establish contact and resolve all differences
between their databases.
Various mechanisms (e.g., MD5 digests and update
logs) reduce the volume of data exchanged in the
common case.
Deletions handled as a special case via death
certificates recording the delete operation as
an update.

8
Epidemic Algorithm Rumour Mongering

Each replica periodically touches a selected
susceptible peer site and infects it with
updates.
Transfer every update known to the carrier but
not the victim in pull and vice versa in push.
Rumours are dropped using counter or coins
schemes.
Partner selection is randomized using a variety
of heuristics. Distance vs. Convergence Tradeoff.
ie. If only neighbours are updated then link
traffic is O(1) but convergence traffic is O(n).
Sites connect to others at distance d with
probability d-a
Theory shows that the epidemic will eventually
the entire population (assuming it is connected).
Heuristics (push vs. pull) affect traffic load
and the expected time-to-convergence. Pull
converges faster than push.
Pull pi1 (pi) 2
Push pi1 pi/e where pi prob. of a site
being susceptible after i rounds (cycles)

9
Recap.

Two Reliable Multicast Models
SRM
Local repair of problems but no end-to-end
guarantees
Virtual synchrony model (Isis, Horus, Ensemble)
All or nothing message delivery with ordering
Membership managed on behalf of group
State transfer to joining member
Great performance for small systems. In large
group sizes, under perturbations (heavy load,
applications acting little flakey) performance is
very hard to maintain.

10
Multicast scaling issue (SRM)
11
Multicast scaling issue (Ensemble)
12
Bimodal Multicast

2 Sub-protocols
Unreliable data distribution (IP multicast)
Upon arrival, a message enters the receivers
message buffer.
Messages are delivered to the application layer
in FIFO order, and are garbage collected out of
the message buffer after some period of time.
The second sub-protocol is used to repair gaps in
the message delivery record
processes maintain a list of a random subset of
the full system membership. In practice, we
weight this list to contain primarily processes
from close by processes accessible over
low-latency links.

13
Start by using unreliable multicast to rapidly
distribute the message. But some messages may not
get through, and some processes may be faulty.
So initial state involves partial distribution of
multicast(s)
14
Periodically (e.g. every 100ms) each process
sends a digest describing its state to some
randomly selected group member. The digest
identifies messages. It doesnt include them.
15
Recipient checks the gossip digest against its
own history and solicits a copy of any missing
message from the process that sent the gossip
16
Processes respond to solicitations received
during a round of gossip by retransmitting the
requested message. The round lasts much longer
than a typical RPC time.
17
Optimizations

Request retransmissions most recent multicast
first
Idea is to catch up quickly leaving at most one
gap in the retrieved sequence
Participants bound the amount of data they will
retransmit during any given round of gossip. If
too much is solicited they ignore the excess
requests

18
Optimizations

Label each gossip message with senders gossip
round number
Ignore solicitations that have expired round
number, reasoning that they arrived very late
hence are probably no longer correct
Dont retransmit same message twice in a row to
any given destination (the copy may still be in
transit hence request may be redundant)

19
Optimizations

Use IP multicast when retransmitting a message if
several processes lack a copy
For example, if solicited twice
Also, if a retransmission is received from far
away
Tradeoff excess messages versus low latency
Use regional TTL to restrict multicast scope

20
Bimodal Multicast and SRM with system wide
constant noise, tree topology
Repair requests (per sec)
21
(No Transcript)
22
Two predicates

Predicate I A faulty outcome is one where more
than 10 but less than 90 of the processes get
the multicast.
Predicate II A faulty outcome is one where
roughly half get the multicast and failures might
conceal true outcome

23
Bimodal Multicast is amenable to formal analysis
24
Unlimited scalability!

Probabilistic gossip routes around congestion
And probabilistic reliability model lets the
system move on if a computer lags behind
Results in
Constant communication costs
Constant loads on links
Steady behavior even under stress

25
Good things?

Overcome Internet limitations using randomized
P2P gossip
However, Internet routing can defeat our clever
solutions unless we know network topology
Both have great scalability and can survive under
stress
And both are backed by formal models as well as
real code and experimental data

26
Further Work
27
Research Locations

Cornell Spinglass
http//www.cs.cornell.edu/Info/Projects/Spinglass
/index.html
SWIM http//www.cs.cornell.edu/gupta/swim
MSR Cambridge (Kermarrec) http//research.microso
ft.com/camdis/gossip.htm
EPFL (Guerraoui) http//lpdwww.epfl.ch/publicatio
ns

28
Bayou Basics

The motivation for Bayou comes from observations
of mobile computing.
Connections are expensive, frequent, and often
intermittent.
Collaborating agents are likely to be guaranteed
simultaneous connections.
Bayou accommodates these applications by helping
them manage weakly consistent data. Bayou does
not attempt to be transparent.

29
Bayou Basics (cont.)

Applications should use specific knowledge of
their data, along with the knowledge that data
may be stale, to detect and resolve conflicts.
Applications detect and resolve conflicts
differently
Bayou allows for arbitrary dependencies,
constraints, and detection of write/write and
read/write conflicts.
Programs resolve conflicts with each write.
Resolution may involve cascading back-outs.
Procedures must be deterministic so that they may
be replayed on multiple machines.
A write is considered tentative until committed
at the primary server.
A global ordering is used by the primary server
to dictate which of several conflicting writes
wins.
A modification is stable once it reaches the
primary server.
Primary servers have authority, a tradeoff that
allows data to become stable w/o hearing
responses from all clients and servers.

30
Implementation

Two applications are studied, a bibliographic
database and a meeting room scheduler.
Anti-entropy A client may connect to any server
for reading and writing data.
Servers replicate all data, and synchronize using
pair-wise communication.
Anti-entropy insures eventual consistency of the
database (they "gossip"). A primary server is the
authoritative source of consistency.
Implementation each server logs committed and
tentative data. Anti-entropy sessions update
these logs accordingly.
Access control and security security is achieved
with public key cryptography, access control by
allowing users to grant and revoke privileges.
Primary servers are responsible for managing
revocation lists.