Title: PLATO: Predictive Latency-Aware Total Ordering
1PLATO Predictive Latency-Aware Total Ordering
- Mahesh Balakrishnan
- Ken Birman
- Amar Phanishayee
2Total Ordering
- a.k.a Atomic Broadcast
- delivering messages to a set of nodes in the same
order - messages arrive at nodes in different orders
- nodes agree on a single delivery order
- messages are delivered at nodes in the agreed
order
3Modern Datacenters
- Applications
- E-tailers, Finance, Aerospace
- Service-Oriented Architectures,
Publish-Subscribe, Distributed Objects, Event
Notification - Totally Ordered Multicast!
- Hardware
- Fast high-capacity networks
- Failure-prone commodity nodes
4Total Ordering in a Datacenter
Updates are Totally Ordered
Replicated Service
Totally Ordered Multicast is used to consistently
update Replicated Services Latency of Multicast
? System Consistency Requirement order
multicasts consistently, rapidly, robustly
5Multicast Wishlist
- Low Latency!
- High (stable) throughput
- Minimal, proactive overheads
- Leverage hardware properties
- HW Multicast/Broadcast is fast, unreliable
- Handle varying data rates
- Datacenter workloads have sharp spikes and
extended troughs!
6State-of-the-Art
- Traditional Protocols
- Conservative
- Latency-Overhead tradeoff
- Example Fixed Sequencer
- Simple, works well
- Optimistic Total Ordering
- deliver optimistically, rollback if incorrect
- Why this works No out-of-order arrival in LANs
- Optimistic total ordering for datacenters?
7PLATO Predictive Ordering
- In a datacenter, broadcast / multicast occurs
almost instantaneously - Most of the time, messages arrive in same order
at all nodes. - Some of the time, messages arrive in different
orders at different nodes. - Can we predict out-of-order arrival?
8Reasons for Disorder Swaps
Typical Datacenter Diameter 50-500 microseconds
Out-of-order arrival can occur when the
inter-send interval between two messages is
smaller than the diameter of the network
9Reasons for Disorder Loss
- Datacenter networks are over-provisioned
- Loss never occurs in the network
- Datacenter nodes are cheap
- Loss occurs due to end-host buffer overflows
caused by CPU contention
10Emulab Testbed (Utah)
11Cornell Testbed
12Disorder Emulab3
Percentage of swaps and losses goes up with data
rate
At 2800 packets per sec, 2 of all packet pairs
are swapped and 0.5 of packets are lost.
13Disorder
14Predicting Disorder
- Predictor Inter-arrival time of consecutive
packets into user-space - Why?
- Swaps simultaneous multicasts
- ? low inter-arrival time
- Loss kernel buffer overflow
- ? sequence of low inter-arrival times
15Predicting Disorder
- 95 of swaps and 14 of all pairs are within 128
µsecs
Inter-arrival time of swaps
Inter-arrival time of all pairs
Cornell Datacenter, 400 multicasts/sec
16Predicting Disorder
17PLATO Design
- Heuristic If two packets arrive within ? µsecs,
possibility of disorder - PLATO
- Heuristic Lazy Fixed Sequencer
- Heuristic works ? zero (?) latency
- Heuristic fails ? fixed sequencer latency
18PLATO Design
API optdeliver, confirm, revoke Ordering
Layer Pending Queue Packets suspected to be
out-of-order, or queued behind suspected
packets Suspicious Queue Packets optdelivered
to the application, not yet confirmed
19PLATO Design
20Performance
? Fixed Sequencer ? PLATO
At small values of ?, very low latency of
delivery but more rollbacks
21Performance
Latency of both Fixed-Sequencer and PLATO
decreases as throughput increases
22Performance
Traffic Spike PLATO is insensitive to data rate,
while Fixed Sequencer depends on data rate
23Performance
Latency is as good as static ? parameterization
?? is varied adaptively in reaction to rollbacks
24Conclusion
- First optimistic total order protocol that
predicts out-of-order delivery - Slashes ordering latency in datacenter settings
- Stable at varying loads
- Ordering layer of a time-critical protocol stack
for Datacenters