Title: Epidemics
1Epidemics
- Presented By
- Lucas Cook and Wade Fagen
- CS 525, The University of Illinois (UIUC)
- 6 February 2007
2History
- Two schools of algorithms multicast
- Proactive
- Reactive
- Existing Algorithms/Implementations
- SRM
- IP Multicast (best-effort)
- NNTP (gossip)
- IRC (hierarchical multicast)
- PSYC (multicast more web-like than IRC)
3Multicast Routing
- Reliable Multicast Routing
- Ensure that a message sent from any node is
received by all other nodes within any
distributed system. - but we dont live in an ideal world.
4Multicast Routing
- Three general categories
- Algorithms that provide strong reliability
properties. - Ex atomic multicasts
Round
1
2
3
5Atomic Multicast
- Nodes only process at the beginning of rounds
- New rounds dont start until all previous
messages are received
Round
1
2
3
6Multicast Routing
- Three general categories
- strong reliability algorithms
- best-effort reliability algorithms
- Ex MUSE algorithm
- Provides no assurance of end-to-end reliability
assurance - Problems and solutions exist both at the physical
network layer and the overlay area - Focus of distributed systems overlay
7Best Effort Multicast Routing
- Many algorithms implement some neighbor-based
approach
8Best Effort Multicast Routing
- End-to-end assurance may be lost by one nodes
failure
9Multicast Routing
- Three general categories
- strong reliability algorithms
- best-effort reliability algorithms
- proactively probabilistic multicast algorithms
- Provides predictable reliability
- Goal Achieve better reliability than
best-effort without the overhead of strong
reliability - Method Epidemics
10Epidemic Algorithms
- Epidemics help ensure probabilistic end-to-end
reliability with an assurance of almost all or
almost none structure - Tradeoff between scale and reliability epidemics
allow for expansive scale with near-perfect
reliability
11Epidemic Algorithms
- To be less verbose, the following citations are
used throughout the presentation - 1 Bimodal multicast, K Birman et al, ACM TOCS
1999 - 2 Epidemic algorithms for replicated database
maintenance, A. Demers et al, PODC 1987. - 3 Gossip-based ad hoc routing, Z. Haas et al,
Infocom 2002
12Epidemic Algorithms in Databases
- Site updating has been a key problem since the
beginning of distributed database work - Data is injected at one site
- Data needs to be updated at every site
Incoming Transaction
13Epidemic Algorithms in Databases
- Classic Examples
- NNTP
- First use of e-mail servers
- etc
14Epidemic Algorithms in Databases
- Three core concepts
- Direct Communication
- Bottleneck!
- Anti-Entropy Measure
- Possible full comparison (slow!)
- Rumor Management
- Only updates!
15Epidemic Algorithms in Databases
- Three states of a message /node
- Susceptive Message not received at node
- Infective Message is actively propagated by node
- Removed Message is no longer actively propagated
by node
16Epidemic Algorithms in Databases
- Decide on two phase algorithm
- Phase 1 Rumor Mongering
- Probabilistic spread of messages to (hopefully)
nearly all nodes - Considerations between Push/Pull models
17Epidemic Algorithms in Databases
- Decide on two phase algorithm
- Phase 1 Rumor Mongering
- Phase 2 Epidemic (Anti-Entropy)
- Ran periodically in the background
- Ran at each node
18Epidemic Push/Pull
- Generic Epidemic Message
- An epidemic message contains a summary of recent
events - Two types push and pull
- The different types of messages allow
formalization of the mathematics
19Epidemic Push/Pull
- A push is a message sent from some infected
site to a susceptible site.
push
Incoming Transaction
20Epidemic Push/Pull
- A pull is a message sent from some susceptible
site to an infected site
push
push
pull
Incoming Transaction
21Epidemic Algorithms in Databases
- With the general idea, the specifics of 2
relate to databases - Two primary distributed operations
- Data Insertion INSERT, UPDATE
- Data Deletion DELETE
- Epidemic message for DELETE are augmented with a
death certificate - In 2, SELECT is simply done locally at each
distributed end point of the database
22Epidemic Algorithms in Databases
- Results published in 2
- Key Result Replacing deterministic algorithms
for database consistency - Actual Results Simulation-based solution only
- Showed internal based results
- Simulation of traditional schemes wasnt done for
accurate comparison
23Bimodal Multicast
- 1 presents a bimodal multicast algorithm called
pbcast - pbcast probabilistic broadcast
24The pbcast Algorithm (from 1)
- Six Properties
- Atomicity (probabilistically)
- Throughput Stability
- Ordering (FIFO)
- Multicast Stability
- Detection of Lost Messages
- Scalability
- Acceptability of soft failures
25The pbcast Algorithm (from 1)
- Two sub-protocols
- Part 1 Hierarchical broadcast
- Unreliable, best-effort approach
- Part 2 Anti-entropy to correct packet loss if
needed - Results in predictable end-to-end assurances
26The pbcast Algorithm (from 1)
- Basic Hierarchical Broadcast
m1
m1
m1
Node 1 1 Node 2 1 Node 3 1 Node 4 1
27The pbcast Algorithm (from 1)
- Basic Hierarchical Broadcast
m1
m2
m2
m2
m1
m1
m1
m1
m1
m2
m2
m2
m2
m2
m1
m1
Node 1 1, 2 Node 2 1, 2 Node 3 1, 2 Node
4 1, 2
28The pbcast Algorithm (from 1)
- Basic Hierarchical Broadcast
m2
m1
m1
m2
m1
m2
m1
m2
m1
m1
m1
m1
m1
m1
m1
m1
m1
m3
m2
m2
m2
m2
m2
m3
m2
m3
m2
m2
m2
m1
m1
m1
m1
m1
m1
Node 1 1, 2 Node 2 1, 2 Node 3 1, 2,
3 Node 4 1, 2
Node 1 1, 2 Node 2 1, 2 Node 3 1, 2,
3 Node 4 1, 2
Node 1 1, 2 Node 2 1, 2 Node 3 1, 2,
3 Node 4 1, 2
29The pbcast Algorithm (from 1)
- Basic Hierarchical Broadcast
m2
m1
m4
m1
m1
m1
m1
m2
m1
m2
m1
m2
m1
m1
m2
m1
m2
m1
m2
m1
m4
m2
m1
m4
m4
m4
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m3
m2
m2
m2
m3
m2
m3
m2
m2
m2
m2
m3
m2
m3
m2
m3
m2
m2
m2
m2
m2
m2
m2
m2
m2
m2
m2
m2
m1
m4
m1
m1
m1
m1
m1
m4
m4
m1
m4
m1
m4
m1
m4
m1
m4
m1
m4
m1
m4
m1
m4
m1
m4
Node 1 1, 2, 4 Node 2 1, 2, 4 Node 3 1,
2, 3, 4 Node 4 1, 2, 4
Node 1 1, 2, 4 Node 2 1, 2, 4 Node 3 1,
2, 3, 4 Node 4 1, 2, 4
30The pbcast Algorithm (from 1)
- Basic Hierarchical Broadcast
m4
m2
m1
m5
m4
m5
m1
m3
m2
m2
m1
m4
m5
Node 1 1, 2, 4 Node 2 1, 2, 4, 5 Node 3
1, 2, 3, 4, 5 Node 4 1, 2, 4, 5
31The pbcast Algorithm (from 1)
- The anti-entropy protocol runs simultaneously
with the broadcast messages - Protocol runs in rounds
- Ran at every process
- Rounds longer than round-trip time
- Paper suggests 100ms
- maybe a traffic-based metric would be better?
32The pbcast Algorithm (from 1)
m1
m1
m1
m1
m1
m1
m5
m5
m2
m3
m4
m2
m3
m2
m4
m3
m2
m4
m3
m2
Rounds need not be synchronized across nodes!
33The pbcast Algorithm (from 1)
m1
m1
m1
m1
m1
m1
m5
m5
m2
m3
m4
m2
m3
m2
m4
m3
m2
m4
m3
m2
For example sake, well assume they happento all
occur at the same time across all nodes
34The pbcast Algorithm (from 1)
- Anti-entropy round
- Gossip Messages
- Each process chooses another random process and
sends a summary of its recent messages
35The pbcast Algorithm (from 1)
m1
m1
m1
m1
m1
m1
m1
m1
m1
m5
m5
m5
m5
m5
m4
m3
m2
m2
m3
m2
m4
m3
m2
m4
m3
m4
m2
m2
m3
m2
m4
m3
m2
m4
m3
m2
Node 1 ? Node 3 1 Node 2 ? Node 1 1, 2 Node
3 ? Node 2 1, 2 Node 4 ? Node 2 1
36The pbcast Algorithm (from 1)
m1
m1
m1
m1
m1
m1
m5
m5
m4
m3
m2
m2
m3
m2
m4
m3
m2
m4
m3
m2
Node 1 ? Node 4 1, 2 Node 2 ? Node 1 1, 2,
4 Node 3 ? Node 4 1, 2, 3, 4 Node 4 ? Node 3
1, 2
37The pbcast Algorithm (from 1)
- Summary contains missed messages
m1
m5
m4
m3
m2
Node 1 ? Node 4 1, 2 Node 2 ? Node 1 1, 2,
4 Node 3 ? Node 4 1, 2, 3, 4 Node 4 ? Node 3
1, 2
Node 1 ? Node 4 1, 2 Node 2 ? Node 1 1, 2,
4 Node 3 ? Node 4 1, 2, 3, 4 Node 4 ? Node 3
1, 2
Node 1 ? Node 4 1, 2 Node 2 ? Node 1 1, 2,
4 Node 3 ? Node 4 1, 2, 3, 4 Node 4 ? Node 3
1, 2
38The pbcast Algorithm (from 1)
- Anti-entropy round
- Solicitation Messages
- Messages sent back to the sender of the gossip
message requesting a resend of a given set of
messages (not necessarily the original source) - Message Resend
- Upon reception of a solicitation message, the
sender resends that message
39The pbcast Algorithm (from 1)
- Summary contains missed messages
m1
m5
This is m3
m4
m3
m2
What was m3?
Node 1 ? Node 4 1, 2 Node 2 ? Node 1 1, 2,
4 Node 3 ? Node 4 1, 2, 3, 4 Node 4 ? Node 3
1, 2, 3
40The pbcast Algorithm (from 1)
- Summary contains missed messages
m1
m5
m4
m3
m2
41The pbcast Algorithm (from 1)
- Anti-entropy Protocol
- 1 suggests a number of optimizations
- Reduces numbers of rounds required to gossip
about messages - Reduces the redundant messages
- 1 also suggests a number of extensions
- Gossip about a messages to a fraction of all
nodes (ex 100 in a 10,000 node system)
42The pbcast Algorithm (from 1)
43The pbcast Algorithm (from 1)
44The pbcast Algorithm (from 1)
45The pbcast Algorithm (from 1)
46Epidemics
- With 1 and 2, a good overview of key epidemic
behavior and strategies have been established. - In 1 and 2, domain specific optimizations
were often applied.
47Ad-Hoc Routing Epidemics
- 3 focuses on reachability of epidemics for
wireless ad hoc routing - need to broadcast (multicast) to find routes
- theoretical/abstract simulation analysis
- 1 basic protocol, 4 extensions
- Goal find optimal configurations based on
reachability vs. "load"
48Ad-Hoc Routing Epidemics
- GOSSIP1(p, k)
- k The number of rounds of gossiping about the
message with 100 probability - p The probability of gossiping about the
message after k rounds - Optimal
- 1000 nodes (0.75, 4) to (0.65, 4)
- Backpropagation Effects
49Ad-Hoc Routing Epidemics
50Ad-Hoc Routing Epidemics
- GOSSIP2(p1, k, p2, n)
- Just like GOSSIP1(p1, k), unless the number of
neighbors are n -- then GOSSIP1(p2, k) - Intuition
- Nodes with low degrees will have the hardest time
receiving information - Optimal
- GOSSIP1(0.8, 4) performs like GOSSIP2(0.6, 4, 1,
6) but with 13 more messages
51Ad-Hoc Routing Epidemics
- GOSSIP3(p, k, m)
- Same as GOSSIP1(p, k), but if node receives less
than m messages then the message should be
gossiped about - Intuition
- If a node is not seeing many copies of a message,
the network is likely not completely infected - Optimal
- GOSSIP3(0.65, 4, 1) is better than GOSSIP1(0.75,
4) with 8 less messages
52Ad-Hoc Routing Epidemics
- GOSSIP4(p, k, k)
- Just like GOSSIP1(p, k), but the node has
knowledge of all nodes within a k node radius. - Intuition
- Flooding of gossip messages only required at the
boundary of your zone. - The creation of zones will eliminate the
backpropagation effect that is prone to
GOSSIP1(p, k) - Optimal
- Balance zones are required
- Larger zone requires more overhead
53Ad-Hoc Routing Epidemics
- Similar to site percolation
- Problem Overview
- There exists some number of nodes lined up in a
rectangle grid format
54Ad-Hoc Routing Epidemics
- Similar to site percolation
- Problem Overview
- Each node (except edges) connect to all their
neighbors (N, E, S, and W)
55Ad-Hoc Routing Epidemics
- Similar to site percolation
- Problem Overview
- Question If p nodes are removed, are the
connected nodes still connected?
p 40 of nodes removed
56Ad-Hoc Routing Epidemics
- Similar to site percolation
- Problem Overview
- Question If p nodes are removed, are the
connected nodes still connected?
p 40 of nodes removed
57Overview
- 1 introduced bimodal routing and provided the
pbcast algorithm - 2 took the idea of bimodal routing and gossip
protocols and applied it to a distributed
database - Finally, 3 provided well-founded results
examining reachability of nodes
58Discussion
- In both 1 and 2, a two layer approach was
applied - Phase 1 Best attempt broadcasting
- Phase 2 Anti-entropy
- 2 chooses epidemic broadcast model, and
justifies by considering rebroadcast - However, Anti-entropy will be run in the
background no matter what happened at Phase 1 - Maybe a messages seen based metric for
initializing Phase 2?
59Discussion
- Epidemic algorithms is not appropriate in all
cases - The epidemic approach means messages will be
received multiple times at a given site
60Discussion
- Example Bandwidth-restrictive environments
- Streaming Video?
- Downlink Rate 3 mbps
- Assume n1,000 nodes
- Maximum unique content (average case)
- 3 mbps / lg(1,000) 0.3 mbps
- Is this at all ideal?
- and this assume a constant factor of 1.0.
61Discussion
- In worst cases, epidemics require a near
symmetrical connection - Server sends Client and Phone video
- Server High download and high upload
- Client Moderate download and low upload
- Phone Low download and near-zero upload
Server
Client
Phone
Phone (1/2) bandwidth of Client
62Discussion
- In worst cases, epidemics require a near
symmetrical connection - Server sends Client and Phone video
- Server High download and high upload
- Client Moderate download and low upload
- Phone Low download and near-zero upload
Server
Client
Phone
Transmission fails from Server ? Client
63Discussion
- In worst cases, epidemics require a near
symmetrical connection - Server sends Client and Phone video
- Server High download and high upload
- Client Moderate download and low upload
- Phone Low download and near-zero upload
Server
Client
Phone
Client gossips (to phone) for any missed messages
64Discussion
- In worst cases, epidemics require a near
symmetrical connection - Server sends Client and Phone video
- Server High download and high upload
- Client Moderate download and low upload
- Phone Low download and near-zero upload
Server
Client
Phone
Phone relays the missed message to Client
65Discussion
- In worst cases, epidemics require a near
symmetrical connection - Server sends Client and Phone video
- Server High download and high upload
- Client Moderate download and low upload
- Phone Low download and near-zero upload
Server
Client
Phone
Phones bandwidth is so slow the Client falls
behind
66Discussion
- Epidemic Security?
- Messages may be seen by 20 nodes before it
reaches you. - Digital signature from sender?
- Requires a large collection of private/public
keys (one /node) - Distributed trust mechanism?
- Other solutions?
67Discussion
- Human-Centric Computing?
- Epidemic algorithms assumes each node knows of
either the entire system or a large set of
neighbors. - What if a User A doesnt want to communicate
with User B? - Inference of user specific information may be
able to be inferred by an unknown user - More prevalent in partial multicast situations
68Discussion
- In 3, what about metrics besides reachability?
- Amount of worth?
- Latency?
- Bandwidth?
- Can a model be mathematically formal and sound
without explicitly taking those factors into
account?
69(No Transcript)