Epidemics

About This Presentation

Title:

Epidemics

Description:

Epidemics, CS 598 IG, Fall 2004. Prelude: Multicasting. Many protocols. MBONE, 6BONE, XTP, etc. ... But as graph shows, not that scalable. 4. Epidemics, CS 598 ... – PowerPoint PPT presentation

Number of Views:158

Avg rating:3.0/5.0

Slides: 90

Provided by: me6112

Category:

more less

Transcript and Presenter's Notes

Title: Epidemics

1
Epidemics

by Charles Yang
Ted Pongthawornkamol
9/16/20

2
Prelude Multicasting

Many protocols
MBONE, 6BONE, XTP, etc.
Principally designed for scalability
Fault tolerance really isnt addressed

3
Multicasting (cont)

Scalable Reliable Multicast
But as graph shows, not that scalable

4
So now what? Epidemics!

Recap from Indys 1st lecture
Definitions
Infective node with update it wants to share
Susceptible node which has not yet received the
update
Removed previously infective node which is no
longer sharing

5
Recap (cont)

Infective node n receives a msg and forwards with
probability p to a susceptible node
Can be shown that spreads quickly with high
probability
Lightweight
Highly fault-tolerant

6
Outline of Presentation

Epidemic Algorithms for Replicated Database
Maintenance
Bimodal Multicast
Gossip-Based Ad Hoc Routing

7
Epidemic Algorithms for Replicated Database
Maintenance

Xeroxs Corporite Internet (CIN), Clearinghouse
Servers, about 1986-1987
Name resolution service
several hundred ethernets, connected by gateways
and phone lines
DBs were filling up bandwidth for replication

8
The Problem

Inject an update at one server, and have it
propagate to all other servers
how to make it robust and scale well?
important factors
convergence time time reqd for update to
propagate to all sites
network traffic traffic reqd to propagate a
single update (want to minimize!)

9
3 Methods for Spreading Updates

direct mail (basically multicast or flooding)
anti-entropy (epidemic)
rumor mongering/gossiping (epidemic)

10
CINs Initial Configuration

Direct Mail to send updates
Anti-entropy to bring DBs to sync
Re-mailing if previous anti-entropy disagreed
Anti-entropy Run once/day between 12am to 6am
Eventually, anti-entropy couldnt complete in
allowed time due to traffic
For instance, for a domain stored at 300 sites,
90,000 messages might be introduced 1 night

11
Direct Mail
s
12
Direct Mail
s
13
Direct Mail
s
14
Direct Mail Issues

a lot of b/w - n messages per update
not quite reliable message can be lost (crashes,
buffer overflows)
s may also not have current knowledge of S (set
of all sites)

15
Anti Entropy

Run in bg to recover from errors
initially from direct mail, later from rumor
mongering
Executed periodically
FOR SOME s ? S DO
ResolveDifferences, s
ENDLOOP

16
Anti-Entropy (after direct mail)
s
17
Anti-Entropy (Cycle 1, start)
s
18
Anti-Entropy (Cycle 1, end)
s
19
Anti-Entropy (Cycle 2, start)
s
20
Anti-Entropy (Cycle 2, end)
s
21
Anti-Entropy (Cycle 3, start)
s
22
Anti-Entropy (Cycle 3, end)
s
23
Anti Entropy (cont)

Assume s is chosen uniformly (talk about spatial
distribs later)
slow and expensive, but reliable
since usually used as backup, the of
susceptible sites is small
Pull, Push-pull, push

24
Pull

pi is prob that site remains susceptible in ith
cycle
A site remains susceptible after i1st cycle if
it was susceptible after ith cycle
and it contacted a susceptible site in i1st
cycle
? pi1 (pi)2,
converges rapidly to 0 when pi is small
In other words very unlikely that susceptible
sites will remain after a while

25
Push

A site remains susceptible after i1st cycle if
it was susceptible after ith cycle
and no infectious site contacted it in i1st
cycle
pi1 pi(1-1/n)n(1-pi)
Approximately pi1 pie-1
Converges too, but not nearly as quick as pull
Hence pull, or push-pull is preferred to just
push

26
Some Anti-Entropy Optimizations

Comparing DBs is expensive, but since most DBs
are pretty similar
Could maintain checksum of db
compare checksums
If dont match, then start comparing DBs
Naïve!

27
Optimizations (cont)

Define time window ? (time that updates should be
spread by)
Keep checksums of database AND a recent update
list w/age lt ?
2 sites first exchange checksums and recent
update list
compute new checksums, and then compare
? must be chosen well
If n grows too much
expected time for msg spread gt ?
recent update lists likely to be diff
Another variation inverted index of db by
timestamp
sites can exchange updates in reverse timestamp
order until the checksums match

28
Complex Epidemics / Rumor Mongering / Gossip

Replace multicasting
At the expense of slightly larger convergence
time
And a distinct, though very small probability of
failure
Called complex just to distinguish from simple
epidemics like anti-entropy

29
Basic (Complex) Epidemic

Susceptible site receives a hot rumor and becomes
infective
Randomly shares with another susceptible site
Uniform at Random
When contacts a site that knows rumor already
probability 1/k lose interest in sharing the
rumor (and become removed)
After a while, high probability that everyone
knows

30
Can model with differential equations (fun!)

sir1
Differentiate

31
c is determined by i(1-?)??

For large n, ? goes to zero
Giving a solution
i(s) is zero when se-(k1)(1-s)
Yeah, yeah so what does it mean?
implicit equation for s
s decreases exponentially with k (1/k prob site
becomes removed)
k1, 20 will miss
k2, 6 will miss
So with each consecutive round, high probability
there will be no susceptibles left

32
Can vary complex epidemics

Concerned with
Residue when i is zero, whats s? (people who
never heard the rumor)
Traffic
Delay
tavg - time for a random node to receive the msg
tlast - time for the last node who will receive
the msg, to receive it

33
Variations (cont)

Blind vs Feedback
blind loses interest with 1/k no matter if
contacted node knew msg or not
Counter vs Coin
With counter, can lost interest after k
unnecessary contacts
Push vs Pull
Basic used push, but can use pull
will work if high number of independent updates
but when db is quiescent, more useless overhead
than push

34
? Variations (cont)

Minimization
Use a push and pull together, and if both sides
know update, then the site with smaller counter
is incremented (equality, both incremented)
Connection limit
If theres a lot of updates, need a connection
limit
Pull gets worse but push gets better!
Hunting
If one connection rejected, try another

35
So instead of mailing anti-entropy

Use rumor mongering
And back up with anti-entropy

36
Death Certificates

With anti-entropy, deletion doesnt really work
absence of entry will be replaced by an old
version
Death Certificates
carry timestamps
when compared with older entry, the older entry
is deleted
they take up space
but if you delete them, risk chance of seeing old
resurrected data
Enter Dormant Death Certificates

37
Dormant Death Certificates

Two thresholds ?1 and ? 2
Each server retains DC within ?1
After ?1 , most sites delete DC, while a few keep
it
If old data meets dormant DC, propagate the DC
again
After ?1 ?2 , delete the dormant DC

38
Dormant DCs (cont)

Does not scale indefinitely
n grows so much, time to propagate DCs exceeds ?1
More likely to activate dormant DCs, which are
propogated adding to overhead
The ultimate result is catastrophic failure.

39
Dormant DCs (cont)

Dont spread dormant DC
And if reactivated, can reset timestamp
But this is wrong (might cancel a legitimate
update)
So use second ts called activation timestamp
which is set if its reactivated

40
Spatial Distributions

networks arent heterogeneous
some links are slower than others
can be broken up into different types of zones
we want to favor locality as we spread updates to
minimize traffic

41
Spatial Distributions (cont)

probability of connecting to a site at distance d
is 1/da, where a is to be determined
intuitively, a indicates the amount of locality
youre going to be connecting at
So increase in a -gt increase in locality
w/ increased locality, need to compensate in
order to break out of locality
more connections
more rounds
Also generalized to more more dimensions 1/d-2D

42
Spatial Distribution

Anti-Entropy
notice Bushey (trans-Atlantic) traffic
uniform (75.74) vs a2 (2.38)
For gossiping
since rumors eventually become inactive, it needs
to spread a lot in the beginning
hence, pump up k

43
Summary for Demers et al

Direct Mailing
Rumor Mongering
Anti-Entropy
Issues
Research into effect of and optimizing for
topology
Need to know S
Scalability with n
churn
Bimodal Multicast will address
What about throughput stability
What about higher rate of msgs?

44
Bimodal multicast

A technique to apply epidemic concept to achieve
scalable and reliable multicast
Use epidemic in term of anti-entropy
Randomly choose members in the group
Synchronize state

45
Two classes of multicast

strong reliability
atomicity
delivery ordering
virtual synchrony
security
real-time
more overhead, unpredictable behavior under some
situations
best-effort reliability
scalable
provide no end-to-end delivery
No strong membership view
Certain level failure discovery
SRM,MUSE,RMTP,etc.

46
Multicast Examples

Virtual synchrony
Strong reliable
significant degradation even just few node
failures
suitable for small groups, limited to short
bursts of multicasts
SRM
Best-effort reliable
Error-prone to stochastic failures
Meltdown can occur in large network
None of them addresses stability problem under
failures

47
Fault-tolerance problem

Virtual synchrony perform badly under failures

48
Bimodal multicast

Also called probabilistic broadcast (pbcast)
fill the gap between two approaches
scalable
predictably reliable even under bad conditions
Complement with existing mechanism, such as
Virtual Synchrony
Atomic
Provide stability
Throughput stability
Multicast stability

49
Pbcast protocol

consists of two concurrent subprotocols
Optimistic dissemination protocol , such as
IP-multicast
Two-phase anti-entropy protocol to deal with
synchronization problem
first phase detect packet message loss
second phase corrects losses

50
Optimistic dissemination protocol

each nodes must possess the list of all members
generate set of spanning trees
Simple algorithms
Randomly choose a spanning tree
every node uses the same spanning tree to forward
the message
A set of spanning trees is needed to calculate
each time nodes join or nodes leave

51
Two-Phase Anti-Entropy Protocol

detect and correct any inconsistencies by
gossiping
At first , nodes randomly choose members to
forward message histories
Also called a digest
Second, recipient nodes may ask for missing
message from sender nodes
Emphasizing most recent histories than old ones
Preventing system degradation by faulty nodes
trying to get all messages in history

52
Example

Some processes cannot get the message by
unreliable multicast
Process P misses message M0 , Q misses M1
P get M0 at first round of anti-antropy, Q get M1
at next round

53
Optimizations

Some Optimizations are used with bimodal
multicast to gain better performances
Soft-Failure Detection
Round Retransmission Limit
Cyclic Retransmissions
Most-Recent-First Retransmission
Independent Numbering of Rounds
Random Graphs for Scalability
Multicast for some retransmission

54
Computational Result
55
Throughput Stability
Number of susceptible processes versus number of
gossip rounds when the initial multicast fails
(left) and when it reaches 90 of processes
(right note scale). Both runs assume 1000
processes.
56
Latency

Expected number of rounds increase as a function
of log(n)
Variance of latency increases as a function of
sqrt(n)
Scalable

57
Performance comparison

Compare bimodal multicast with two multicast
protocols
Virtual Synchrony
SRM
Bimodal multicast beats both protocols under
failures

58
Pbcast VS Virtual Synchrony

Pbcast reacts to failures better

59
Bandwidth comparison
60
Stability
61
Pbcast VS SRM

Pbcast incurs much less overhead under failures

62
Optimizations
63
Conclusion

Bimodal Multicast using anti-entropy to achieve
both reliability and scalability
perform well under failure
Predictable traffic overhead
Can be used with strong reliability multicast ,
such as virtual synchrony
Suggestion
Incur constant overhead (even when network is in
good condition)
Trade-off
Lack of membership management mechanism
Nodes join nodes leave
Cannot handle churn

64
Gossip-based Ad Hoc Routing

The concept of epidemic can be adopted for ad hoc
routing
Proved to be more efficient than traditional
flooding method Hass et al.
Exhibits bimodal behavior (From percolation
theory Grimmett et al. )
Implemented and tested with AODV routing protocol

65
Problem

In a mobile ad-hoc network which no fixed
infrastructure
A route to other nodes constantly changes
Each node can use only broadcasting communication
How to find routes to other nodes in network?
GPS
expensive
Flooding
More overhead
Gossip-based

66
Gossiping concept

When an arbitrary node receives a message, with
probability p it forward the message to all of
its neighbors by broadcasting
On the other hand, with probability 1-p it
discard the message

67
Flooding VS Gossip
68
Percolation Theory

Gossiping exhibits bimodal behavior
Given probability to gossip p
?S(p) is the probability that gossip does not die
out
If a gossip does not die out, there is ?R(p)
probability that a node will get a message
In most case, ?R(p) 1
How to find a lowest p
Also called percolation threshold (pc)

What is maximum number of nodes picked and graph
is still connected?
Answer (1-pc)n
69
How to choose p ?

If p is too small ( p -gt 0 )
Little traffic overhead
The communication probably dies out and many
nodes will not get the message
If p is too big ( p -gt 1)
More reliable (almost all nodes get the message)
More traffic overhead (flooding if p 1)

70
GOSSIP1

In some cases, the message dies out at the source
because of few sources neighbors
To deal with such cases, the message will be sent
by flooding at the beginning , and then continues
to gossip later
GOSSIP1(p,k) forwards first k hops with
probability 1 and then continues forwarding with
probability p

71
GOSSIP1 Example

3-hop flooding
72
(No Transcript)
73
Simulation on 1000x1000 grid
74
Dropoff

Most real graphs are finite
Some nodes are close to the boundary
From the result, such nodes have lower
probability to get the message
The reasons are
Boundary nodes have few neighbors
Back-propagation is not possible for boundary
nodes

75
Optimizations

Some techniques are adopted to boost the
performance of gossip routing
Two-threshold scheme
Preventing premature gossip death
Retries
Zones

76
Two-threshold scheme

Nodes with few neighbors should gossip with high
probability p
GOSSIP2(p1,k,p2,n)
GOSSIP1(p1,k) if sender nodes have more than or
equal to n neighbors
GOSSIP1(p2,k) otherwise
Useful for sparse graph

77
GOSSIP2 Example

k3,n3,p21

3-hop flooding
78
GOSSIP1 VS GOSSIP2
79
Preventing premature gossip death

If a node received a message and decided not to
forward it
But then after a period, it notices that it
received very few gossips from its neighbors
Probably because broadcast die out
That node finally decide to broadcast
GOSSIP3(p,k,m)
GOSSIP1(p,k)
In case of 1-p situation, if fewer messages than
m are received from neighbors (a sign of gossip
death), flooding with p1

80
GOSSIP3 Example

k3,m1

3-hop flooding
81
GOSSIP1 VS GOSSIP3

With the same performance , GOSSIP3 uses less
overhead

82
Retries

With gossiping protocol, there will always be a
chance that an existing route cannot be found
Suitable with bimodal behavior communication
Most transmissions success
every nodes get the message
No retries
A few transmissions die out
Almost none gets the message (almost no overhead)
Use retries

83
Zones

Each node maintain a list of members within its
k-hop zone
a route to member in its zone can be done without
broadcasting
Suitable for small network
Solve boundary nodes problems
Solve intermediate cases (non-bimodal effect)
Requires each node to maintain a list of members

84
Performance analysis

AODV
A well-known ad-hoc routing protocol
Node u requests a route to node v
Flooding with small radius
If a route to v is not found, try again with
larger and larger radius
If fails, finally flood throughout the network
AODVG
Instead of final flood, gossiping are used

85
AODV VS AODVG
86
Conclusion

Epidemic concept can be adopted for ad-hoc
routing
Scalable
Fault - tolerance
Less overhead than flooding
Offer good level of reliability
Suggestion
How to find p pc?
Can we use feedback?

87
Epidemics Summary

Epidemic algorithms in DB replication
gossip anti-entropy
spatial redistribution
Bimodal Multicast
Multicast anti-entropy
Uniformly weigh with random choose
Bimodal behavior
Gossip Based Ad-Hoc Routing
Percolating effect
Bimodal behavior
How to find threshold ?

88
Additional Papers

Efficient Epidemic-style Protocols for Reliable
and Scalable Multicast Gupta, Kermarrec
Ganesh
Topologically sensitive
Reducing overhead in more quiet systems

89
References

I. Gupta. CS598IG, Fall 2004, First Lecture.
A. Demers, D. Greene, C. Hauser, W. Irish, J.
Larson, S. Shenker, H. Sturgis, D. Swinehart D.
Terry. Epidemic Algorithms for Replicated
Database Maintenance .
K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M.
Budiu Y. Minsky. Bimodal Multicast.
Z. Haas, J. Halpern L. Li. Gossip-Based Ad Hoc
Routing.
I. Gupta, A. Kermarrec A. Ganesh. Efficient
Epidemic-style Protocols for Reliable and
Scalable Multicast.