Epidemics - PowerPoint PPT Presentation

1 / 89
About This Presentation
Title:

Epidemics

Description:

Epidemics, CS 598 IG, Fall 2004. Prelude: Multicasting. Many protocols. MBONE, 6BONE, XTP, etc. ... But as graph shows, not that scalable. 4. Epidemics, CS 598 ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 90
Provided by: me6112
Category:

less

Transcript and Presenter's Notes

Title: Epidemics


1
Epidemics
  • by Charles Yang
  • Ted Pongthawornkamol
  • 9/16/20

2
Prelude Multicasting
  • Many protocols
  • MBONE, 6BONE, XTP, etc.
  • Principally designed for scalability
  • Fault tolerance really isnt addressed

3
Multicasting (cont)
  • Scalable Reliable Multicast
  • But as graph shows, not that scalable

4
So now what? Epidemics!
  • Recap from Indys 1st lecture
  • Definitions
  • Infective node with update it wants to share
  • Susceptible node which has not yet received the
    update
  • Removed previously infective node which is no
    longer sharing

5
Recap (cont)
  • Infective node n receives a msg and forwards with
    probability p to a susceptible node
  • Can be shown that spreads quickly with high
    probability
  • Lightweight
  • Highly fault-tolerant

6
Outline of Presentation
  • Epidemic Algorithms for Replicated Database
    Maintenance
  • Bimodal Multicast
  • Gossip-Based Ad Hoc Routing

7
Epidemic Algorithms for Replicated Database
Maintenance
  • Xeroxs Corporite Internet (CIN), Clearinghouse
    Servers, about 1986-1987
  • Name resolution service
  • several hundred ethernets, connected by gateways
    and phone lines
  • DBs were filling up bandwidth for replication

8
The Problem
  • Inject an update at one server, and have it
    propagate to all other servers
  • how to make it robust and scale well?
  • important factors
  • convergence time time reqd for update to
    propagate to all sites
  • network traffic traffic reqd to propagate a
    single update (want to minimize!)

9
3 Methods for Spreading Updates
  • direct mail (basically multicast or flooding)
  • anti-entropy (epidemic)
  • rumor mongering/gossiping (epidemic)

10
CINs Initial Configuration
  • Direct Mail to send updates
  • Anti-entropy to bring DBs to sync
  • Re-mailing if previous anti-entropy disagreed
  • Anti-entropy Run once/day between 12am to 6am
  • Eventually, anti-entropy couldnt complete in
    allowed time due to traffic
  • For instance, for a domain stored at 300 sites,
    90,000 messages might be introduced 1 night

11
Direct Mail
s
12
Direct Mail
s
13
Direct Mail
s
14
Direct Mail Issues
  • a lot of b/w - n messages per update
  • not quite reliable message can be lost (crashes,
    buffer overflows)
  • s may also not have current knowledge of S (set
    of all sites)

15
Anti Entropy
  • Run in bg to recover from errors
  • initially from direct mail, later from rumor
    mongering
  • Executed periodically
  • FOR SOME s ? S DO
  • ResolveDifferences, s
  • ENDLOOP

16
Anti-Entropy (after direct mail)
s
17
Anti-Entropy (Cycle 1, start)
s
18
Anti-Entropy (Cycle 1, end)
s
19
Anti-Entropy (Cycle 2, start)
s
20
Anti-Entropy (Cycle 2, end)
s
21
Anti-Entropy (Cycle 3, start)
s
22
Anti-Entropy (Cycle 3, end)
s
23
Anti Entropy (cont)
  • Assume s is chosen uniformly (talk about spatial
    distribs later)
  • slow and expensive, but reliable
  • since usually used as backup, the of
    susceptible sites is small
  • Pull, Push-pull, push

24
Pull
  • pi is prob that site remains susceptible in ith
    cycle
  • A site remains susceptible after i1st cycle if
  • it was susceptible after ith cycle
  • and it contacted a susceptible site in i1st
    cycle
  • ? pi1 (pi)2,
  • converges rapidly to 0 when pi is small
  • In other words very unlikely that susceptible
    sites will remain after a while

25
Push
  • A site remains susceptible after i1st cycle if
  • it was susceptible after ith cycle
  • and no infectious site contacted it in i1st
    cycle
  • pi1 pi(1-1/n)n(1-pi)
  • Approximately pi1 pie-1
  • Converges too, but not nearly as quick as pull
  • Hence pull, or push-pull is preferred to just
    push

26
Some Anti-Entropy Optimizations
  • Comparing DBs is expensive, but since most DBs
    are pretty similar
  • Could maintain checksum of db
  • compare checksums
  • If dont match, then start comparing DBs
  • Naïve!

27
Optimizations (cont)
  • Define time window ? (time that updates should be
    spread by)
  • Keep checksums of database AND a recent update
    list w/age lt ?
  • 2 sites first exchange checksums and recent
    update list
  • compute new checksums, and then compare
  • ? must be chosen well
  • If n grows too much
  • expected time for msg spread gt ?
  • recent update lists likely to be diff
  • Another variation inverted index of db by
    timestamp
  • sites can exchange updates in reverse timestamp
    order until the checksums match

28
Complex Epidemics / Rumor Mongering / Gossip
  • Replace multicasting
  • At the expense of slightly larger convergence
    time
  • And a distinct, though very small probability of
    failure
  • Called complex just to distinguish from simple
    epidemics like anti-entropy

29
Basic (Complex) Epidemic
  • Susceptible site receives a hot rumor and becomes
    infective
  • Randomly shares with another susceptible site
  • Uniform at Random
  • When contacts a site that knows rumor already
  • probability 1/k lose interest in sharing the
    rumor (and become removed)
  • After a while, high probability that everyone
    knows

30
Can model with differential equations (fun!)
  • sir1
  • Differentiate

31
c is determined by i(1-?)??
  • For large n, ? goes to zero
  • Giving a solution
  • i(s) is zero when se-(k1)(1-s)
  • Yeah, yeah so what does it mean?
  • implicit equation for s
  • s decreases exponentially with k (1/k prob site
    becomes removed)
  • k1, 20 will miss
  • k2, 6 will miss
  • So with each consecutive round, high probability
    there will be no susceptibles left

32
Can vary complex epidemics
  • Concerned with
  • Residue when i is zero, whats s? (people who
    never heard the rumor)
  • Traffic
  • Delay
  • tavg - time for a random node to receive the msg
  • tlast - time for the last node who will receive
    the msg, to receive it

33
Variations (cont)
  • Blind vs Feedback
  • blind loses interest with 1/k no matter if
    contacted node knew msg or not
  • Counter vs Coin
  • With counter, can lost interest after k
    unnecessary contacts
  • Push vs Pull
  • Basic used push, but can use pull
  • will work if high number of independent updates
  • but when db is quiescent, more useless overhead
    than push

34
? Variations (cont)
  • Minimization
  • Use a push and pull together, and if both sides
    know update, then the site with smaller counter
    is incremented (equality, both incremented)
  • Connection limit
  • If theres a lot of updates, need a connection
    limit
  • Pull gets worse but push gets better!
  • Hunting
  • If one connection rejected, try another

35
So instead of mailing anti-entropy
  • Use rumor mongering
  • And back up with anti-entropy

36
Death Certificates
  • With anti-entropy, deletion doesnt really work
  • absence of entry will be replaced by an old
    version
  • Death Certificates
  • carry timestamps
  • when compared with older entry, the older entry
    is deleted
  • they take up space
  • but if you delete them, risk chance of seeing old
    resurrected data
  • Enter Dormant Death Certificates

37
Dormant Death Certificates
  • Two thresholds ?1 and ? 2
  • Each server retains DC within ?1
  • After ?1 , most sites delete DC, while a few keep
    it
  • If old data meets dormant DC, propagate the DC
    again
  • After ?1 ?2 , delete the dormant DC

38
Dormant DCs (cont)
  • Does not scale indefinitely
  • n grows so much, time to propagate DCs exceeds ?1
  • More likely to activate dormant DCs, which are
    propogated adding to overhead
  • The ultimate result is catastrophic failure.

39
Dormant DCs (cont)
  • Dont spread dormant DC
  • And if reactivated, can reset timestamp
  • But this is wrong (might cancel a legitimate
    update)
  • So use second ts called activation timestamp
    which is set if its reactivated

40
Spatial Distributions
  • networks arent heterogeneous
  • some links are slower than others
  • can be broken up into different types of zones
  • we want to favor locality as we spread updates to
    minimize traffic

41
Spatial Distributions (cont)
  • probability of connecting to a site at distance d
    is 1/da, where a is to be determined
  • intuitively, a indicates the amount of locality
    youre going to be connecting at
  • So increase in a -gt increase in locality
  • w/ increased locality, need to compensate in
    order to break out of locality
  • more connections
  • more rounds
  • Also generalized to more more dimensions 1/d-2D

42
Spatial Distribution
  • Anti-Entropy
  • notice Bushey (trans-Atlantic) traffic
  • uniform (75.74) vs a2 (2.38)
  • For gossiping
  • since rumors eventually become inactive, it needs
    to spread a lot in the beginning
  • hence, pump up k

43
Summary for Demers et al
  • Direct Mailing
  • Rumor Mongering
  • Anti-Entropy
  • Issues
  • Research into effect of and optimizing for
    topology
  • Need to know S
  • Scalability with n
  • churn
  • Bimodal Multicast will address
  • What about throughput stability
  • What about higher rate of msgs?

44
Bimodal multicast
  • A technique to apply epidemic concept to achieve
    scalable and reliable multicast
  • Use epidemic in term of anti-entropy
  • Randomly choose members in the group
  • Synchronize state

45
Two classes of multicast
  • strong reliability
  • atomicity
  • delivery ordering
  • virtual synchrony
  • security
  • real-time
  • more overhead, unpredictable behavior under some
    situations
  • best-effort reliability
  • scalable
  • provide no end-to-end delivery
  • No strong membership view
  • Certain level failure discovery
  • SRM,MUSE,RMTP,etc.

46
Multicast Examples
  • Virtual synchrony
  • Strong reliable
  • significant degradation even just few node
    failures
  • suitable for small groups, limited to short
    bursts of multicasts
  • SRM
  • Best-effort reliable
  • Error-prone to stochastic failures
  • Meltdown can occur in large network
  • None of them addresses stability problem under
    failures

47
Fault-tolerance problem
  • Virtual synchrony perform badly under failures

48
Bimodal multicast
  • Also called probabilistic broadcast (pbcast)
  • fill the gap between two approaches
  • scalable
  • predictably reliable even under bad conditions
  • Complement with existing mechanism, such as
    Virtual Synchrony
  • Atomic
  • Provide stability
  • Throughput stability
  • Multicast stability

49
Pbcast protocol
  • consists of two concurrent subprotocols
  • Optimistic dissemination protocol , such as
    IP-multicast
  • Two-phase anti-entropy protocol to deal with
    synchronization problem
  • first phase detect packet message loss
  • second phase corrects losses

50
Optimistic dissemination protocol
  • each nodes must possess the list of all members
  • generate set of spanning trees
  • Simple algorithms
  • Randomly choose a spanning tree
  • every node uses the same spanning tree to forward
    the message
  • A set of spanning trees is needed to calculate
    each time nodes join or nodes leave

51
Two-Phase Anti-Entropy Protocol
  • detect and correct any inconsistencies by
    gossiping
  • At first , nodes randomly choose members to
    forward message histories
  • Also called a digest
  • Second, recipient nodes may ask for missing
    message from sender nodes
  • Emphasizing most recent histories than old ones
  • Preventing system degradation by faulty nodes
    trying to get all messages in history

52
Example
  • Some processes cannot get the message by
    unreliable multicast
  • Process P misses message M0 , Q misses M1
  • P get M0 at first round of anti-antropy, Q get M1
    at next round

53
Optimizations
  • Some Optimizations are used with bimodal
    multicast to gain better performances
  • Soft-Failure Detection
  • Round Retransmission Limit
  • Cyclic Retransmissions
  • Most-Recent-First Retransmission
  • Independent Numbering of Rounds
  • Random Graphs for Scalability
  • Multicast for some retransmission

54
Computational Result
55
Throughput Stability
Number of susceptible processes versus number of
gossip rounds when the initial multicast fails
(left) and when it reaches 90 of processes
(right note scale). Both runs assume 1000
processes.
56
Latency
  • Expected number of rounds increase as a function
    of log(n)
  • Variance of latency increases as a function of
    sqrt(n)
  • Scalable

57
Performance comparison
  • Compare bimodal multicast with two multicast
    protocols
  • Virtual Synchrony
  • SRM
  • Bimodal multicast beats both protocols under
    failures

58
Pbcast VS Virtual Synchrony
  • Pbcast reacts to failures better

59
Bandwidth comparison
60
Stability
61
Pbcast VS SRM
  • Pbcast incurs much less overhead under failures

62
Optimizations
63
Conclusion
  • Bimodal Multicast using anti-entropy to achieve
    both reliability and scalability
  • perform well under failure
  • Predictable traffic overhead
  • Can be used with strong reliability multicast ,
    such as virtual synchrony
  • Suggestion
  • Incur constant overhead (even when network is in
    good condition)
  • Trade-off
  • Lack of membership management mechanism
  • Nodes join nodes leave
  • Cannot handle churn

64
Gossip-based Ad Hoc Routing
  • The concept of epidemic can be adopted for ad hoc
    routing
  • Proved to be more efficient than traditional
    flooding method Hass et al.
  • Exhibits bimodal behavior (From percolation
    theory Grimmett et al. )
  • Implemented and tested with AODV routing protocol

65
Problem
  • In a mobile ad-hoc network which no fixed
    infrastructure
  • A route to other nodes constantly changes
  • Each node can use only broadcasting communication
  • How to find routes to other nodes in network?
  • GPS
  • expensive
  • Flooding
  • More overhead
  • Gossip-based

66
Gossiping concept
  • When an arbitrary node receives a message, with
    probability p it forward the message to all of
    its neighbors by broadcasting
  • On the other hand, with probability 1-p it
    discard the message

67
Flooding VS Gossip
68
Percolation Theory
  • Gossiping exhibits bimodal behavior
  • Given probability to gossip p
  • ?S(p) is the probability that gossip does not die
    out
  • If a gossip does not die out, there is ?R(p)
    probability that a node will get a message
  • In most case, ?R(p) 1
  • How to find a lowest p
  • Also called percolation threshold (pc)

What is maximum number of nodes picked and graph
is still connected?
Answer (1-pc)n
69
How to choose p ?
  • If p is too small ( p -gt 0 )
  • Little traffic overhead
  • The communication probably dies out and many
    nodes will not get the message
  • If p is too big ( p -gt 1)
  • More reliable (almost all nodes get the message)
  • More traffic overhead (flooding if p 1)

70
GOSSIP1
  • In some cases, the message dies out at the source
    because of few sources neighbors
  • To deal with such cases, the message will be sent
    by flooding at the beginning , and then continues
    to gossip later
  • GOSSIP1(p,k) forwards first k hops with
    probability 1 and then continues forwarding with
    probability p

71
GOSSIP1 Example
  • k3

3-hop flooding
72
(No Transcript)
73
Simulation on 1000x1000 grid
74
Dropoff
  • Most real graphs are finite
  • Some nodes are close to the boundary
  • From the result, such nodes have lower
    probability to get the message
  • The reasons are
  • Boundary nodes have few neighbors
  • Back-propagation is not possible for boundary
    nodes

75
Optimizations
  • Some techniques are adopted to boost the
    performance of gossip routing
  • Two-threshold scheme
  • Preventing premature gossip death
  • Retries
  • Zones

76
Two-threshold scheme
  • Nodes with few neighbors should gossip with high
    probability p
  • GOSSIP2(p1,k,p2,n)
  • GOSSIP1(p1,k) if sender nodes have more than or
    equal to n neighbors
  • GOSSIP1(p2,k) otherwise
  • Useful for sparse graph

77
GOSSIP2 Example
  • k3,n3,p21

3-hop flooding
78
GOSSIP1 VS GOSSIP2
79
Preventing premature gossip death
  • If a node received a message and decided not to
    forward it
  • But then after a period, it notices that it
    received very few gossips from its neighbors
  • Probably because broadcast die out
  • That node finally decide to broadcast
  • GOSSIP3(p,k,m)
  • GOSSIP1(p,k)
  • In case of 1-p situation, if fewer messages than
    m are received from neighbors (a sign of gossip
    death), flooding with p1

80
GOSSIP3 Example
  • k3,m1

3-hop flooding
81
GOSSIP1 VS GOSSIP3
  • With the same performance , GOSSIP3 uses less
    overhead

82
Retries
  • With gossiping protocol, there will always be a
    chance that an existing route cannot be found
  • Suitable with bimodal behavior communication
  • Most transmissions success
  • every nodes get the message
  • No retries
  • A few transmissions die out
  • Almost none gets the message (almost no overhead)
  • Use retries

83
Zones
  • Each node maintain a list of members within its
    k-hop zone
  • a route to member in its zone can be done without
    broadcasting
  • Suitable for small network
  • Solve boundary nodes problems
  • Solve intermediate cases (non-bimodal effect)
  • Requires each node to maintain a list of members

84
Performance analysis
  • AODV
  • A well-known ad-hoc routing protocol
  • Node u requests a route to node v
  • Flooding with small radius
  • If a route to v is not found, try again with
    larger and larger radius
  • If fails, finally flood throughout the network
  • AODVG
  • Instead of final flood, gossiping are used

85
AODV VS AODVG
86
Conclusion
  • Epidemic concept can be adopted for ad-hoc
    routing
  • Scalable
  • Fault - tolerance
  • Less overhead than flooding
  • Offer good level of reliability
  • Suggestion
  • How to find p pc?
  • Can we use feedback?

87
Epidemics Summary
  • Epidemic algorithms in DB replication
  • gossip anti-entropy
  • spatial redistribution
  • Bimodal Multicast
  • Multicast anti-entropy
  • Uniformly weigh with random choose
  • Bimodal behavior
  • Gossip Based Ad-Hoc Routing
  • Percolating effect
  • Bimodal behavior
  • How to find threshold ?

88
Additional Papers
  • Efficient Epidemic-style Protocols for Reliable
    and Scalable Multicast Gupta, Kermarrec
    Ganesh
  • Topologically sensitive
  • Reducing overhead in more quiet systems

89
References
  • I. Gupta. CS598IG, Fall 2004, First Lecture.
  • A. Demers, D. Greene, C. Hauser, W. Irish, J.
    Larson, S. Shenker, H. Sturgis, D. Swinehart D.
    Terry. Epidemic Algorithms for Replicated
    Database Maintenance .
  • K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M.
    Budiu Y. Minsky. Bimodal Multicast.
  • Z. Haas, J. Halpern L. Li. Gossip-Based Ad Hoc
    Routing.
  • I. Gupta, A. Kermarrec A. Ganesh. Efficient
    Epidemic-style Protocols for Reliable and
    Scalable Multicast.
Write a Comment
User Comments (0)
About PowerShow.com