Title: Understanding the Limitations of Totally Ordered Communications
1Understanding the Limitations of Totally Ordered
Communications
- David Cheriton, Stanford University
- Dale Skeen, Teknekron Software Systems
- SOSP 1993
2A Battle Fought with Technical Papers
- K.P.Birman, A. Schiper and P.Stephenson,
Lightweight Causal and Atomic Group Multicast,
ACM Transactions on Computer Systems, August 1991 - D.R.Cheriton, D. Skeen, Understanding the
Limitations of Causally and Totally Ordered
Communications, Symposium on Operating Systems
Principles, December 1993 - K.Birman, A Response to Cheriton and Skeen's
Criticism of Causal and Totally Ordered
Communication, ACM SIGOPS Operating Systems
Review , October 1993 - R.Renesse, Causal controversy at Le Mont
St.-Michel, ACM SIGOPS Operating Systems Review ,
October 1993
3The Motivation Behind the Battle
- Commercial Interests!!
- State Machine Approach
- Cheriton Skeen - Teknekron Information Bus
(TIB) TIBCO Software Inc., VITRIA - BASE and other Byzantine Quorum systems (More for
Security) - Virtual Synchrony Approach
- K. Birman ISIS Distibuted Toolkit - Stratus
Technologies Inc. - IBM Websphere and Microsoft Windows Clusters
products - Both solutions aim at building loosely coupled
distributed environments and addressing problems
of distributed consistency, cooperative
distributed algorithms and fault-tolerance
4Some technical trivia.
- Virtual synchrony
- Relax synchronization requirements wherever
possible - Much higher performance (orders of magnitude)
- Requires that majority of system remain
connected. Partitioning failures force protocols
to wait for repair - Quorum-based state machine protocols are
- Due to close synchrony, execution is limited by
slowest process in the group - Slower (by orders of magnitude)
- Sometimes can make progress in partitioning
situations where virtual synchrony cant
5Organization
- Why is this important?
- Attacks on Causally and Totally Ordered
Communication Support (CATOCS) - Open House
6Why are we studying this?
- Several Important Distributed Computing Concepts
- Chandy-Lamport Logical Clocks
- Consistent Cuts (Stable State Detection)
- Distributed Snapshots - Determining Global States
of Distributed Systems. - Multitude of Arguments and Counter Arguments
strengthens insight into both systems
7Recap CATOCS
- Messages are delivered in an order consistent
with potential causal dependencies between
messages, following logical clock model, of
imposing an overall partial ordering on events in
a system. - ABCAST Totally ordered communication support
- Atomicity of message delivery messages
delivered to all or none - Failure notification which is causally ordered
with respect to message traffic
8Questions and Claims
- Is there a class of applications for which these
facilities are - Sufficient?
- Efficient?
- Not satisfied by alternative general purpose
mechanisms - Solving State Problems at Communication Level
Violates end-to-end argument. - End-to-End Argument Functions placed at low
levels of a system may be redundant or of little
value when compared with the cost of providing
them at that low level - J.H. Saltzer, D.P. Reed
and D.D. Clark, End-to-end arguments in system
design - CATOCS is at communication level but
consistency requirements are expressed in terms
of application state
9Issue of Durability
- Atomic Message Delivery Every process in a
process group buffers the message until it is
sure the message is Stable (Sender identifies
that all processes have received the message) - This does not include processes that fail during
CATOCS Multicast. - In CATOCS, message delivery is Atomic NOT
Durable! - An action is durable only if its changes survive
failures and recoveries
10Contd.
- If the sender fails during CATOCS protocol
execution, before message is stable, no guarantee
that the remaining operational processes ever
receive and deliver the message - A process can send a message to its process
group, receive and act on it locally and then
fail - without any other members of the process
group receiving the message so its state will
be potentially inconsistent with other members of
the process group. - Durable Message Delivery is (was?) a significant
deficiency which appears expensive to provide
11Ordering
- In CATOCS - Incidental Ordering based on
incidents of communication in a process group - That is not consistent with Semantic Ordering
determined by information in the message e.g
Notification of changes to a database should be
in the order committed by database system - Prescriptive Ordering delivery order is
effectively based on ordering constraints
explicitly specified by a process at the time it
sends the message - It is provided by using
state level clocks, temporal or logical
alternative to CATOCS
12Limitation 1 Unrecognized Causality (cant say
for sure)
- Those causal relationships between messages at
semantic level which typically arise from - a) External or
- b) Hidden
- communication channels cannot be recognized by
CATOCS
13External Channel Unrecognized and hence
unenforced causal relationships. The shared
database orders all requests made to the SFC
system, but the ordering is unknown to the
communication substrate
14Hidden channel
- 2 concurrent threads in a process. The shared
state of the address space constitutes the hidden
channel. - Thread 1 updates the shared state, but delays
sending the multicast message due to scheduling.
The second update from thread 2 is multicast
first, and delivered first by CATOCS out of order
wrt. true causal dependency. - Why CATOCS cant be used
- Inter thread communication via messages is not
fast and adds unnecessary delays - Using a causal graph between threads would add
code complexity and performance overhead
15External channel in a RTS
- Interactions in many applications take place
through shared resources and external channels - Solution use prescriptive ordering use
version numbers. - That obviates need for CATOCS
162 Lack of Serialization (cant say together)
- Consider 2 processors updating shared memory.
Consistency is ensured by locking. CATOCS causal
ordering for deliver of groups of messages
corresponding to the updates is not sufficient.
And additional mechanisms needed obviate CATOCS
since relative ordering of updates of different
processors becomes irrelevant in the face of
locking. Same for transactional systems
17Inability to Handle Higher-Level Error Conditions
- Updating replicated state distributed across a
group of server processes in case one or more
servers rejects an operation. - It may reject based on lack of storage /
protection problems or state/ application level
constraints - Transaction models allow for abort and rollback
- Solutions with CATOCS process rejecting must
effectively fail (expensive), or employ separate
rollback mechanism that obviates CATOCS.
183 Unexpressed Semantic Ordering Constraints
(cant say the whole story)
- Semantic Ordering Constraints
- Weakest constraint is - Causal memory defined
in M. Ahmad et al, Implementing and Programming
Distributed Shared Memory,ICDCS,1991 - An abstraction that ensures that processes in a
system agree on the relative ordering of
operations that are causally related. Such a
memory consistency allows more concurrency that
either atomic or sequentially consistent
memories. - This cant be ensured by causal multicast. It can
be enforced by totally ordered multicast which is
more expensive than cheaper protocols using state
level logical clocks.
194 Lack of Efficiency Gain over State Level
Techniques (Cant say efficiently)
- CATOCS doesnt eliminate the need for
prescriptive ordering - CATOCS can delay messages based on false
causality. Happens before relationships indicates
potential causality not actual causality. - If m1 is sent before m2 and was not in a
semantic sense caused by m1 then CATOCS reduces
performance by unnecessarily delaying messages.
Buffering costs are also involved - What are the overheads of false causality?
20Summary of Limitations
- (1)It cant say for sure Some causality might
go unnoticed - (2)It cant say the whole story All semantic
ordering constraints cannot be expressed by the
happens before relationship which CATOCS
enforces - (3)It cant say together Serializable
ordering between operations that correspond to a
group of messages cannot be ensured - (4)It cant say efficiently - There are no
efficiency gains of CATOCS over state level
techniques, and it is far less Scalable.
21Classes of Distributed Applications
- Data Dissemination Applications
- Netnews (News groups)
- False causality(4) between unrelated inquiries
and corresponding responses. New causal group for
each enquiry? - State machine based approach have a globally
unique id for each inquiry and corresponding
response.
22Trading application Example
- (2) semantic relationship not captured by CATOCS
happened before relationship - (4) Each unique stock and instrument should be
assigned a unique group - In state machine based approach have version
numbers on security prices.
23Global Predicate Evaluation
- Detect Stable conditions deadlock detection,
distributed garbage collection and such. - Solution by R. van Renesse, Causal Controversy at
Le Mont St.-Michel, Operating Systems Review
Use a monitor process which maintains a wait for
graph. Deadlock detection at each message. - For a 2 phase protocol a cyclic wait is a
necessary and sufficient condition for a
deadlock. Each process can multicast its wait for
graph and no ordering properties are required. It
can be periodic (at a lesser frequency than a
message)
24Transactional Applications
Phase 1
Phase 2
Vote?
Commit!
2PC initiator
p
q
r
s
t
All vote commit
25Contd..
- Prepare to commit phase of the protocol needs
end-to-end acknowledgements. - Because commit protocol is executed by a single
site (commit coordinator), delivery of commit
phase messages is ordered by conventional
transport mechanisms
26Replicated Data
- Frequently cited in favor of CATOCS
- Asynchrony achieved by CATOCS is limited - as
seen in the Deceit File system implemented on
CATOCS. - Cbcast waits for k acknowledgements with a
so-called write safety level of k. A write safety
level of 0 is asynchronous but write data could
be lost across replicas - HARP file server based on highly optimized
transaction techniques is claimed to provide
better performance than Deceit
27Replication in the Large
- Large scale naming services Lampsons design.
- Duplicate name binding can be resolved by undo
operation is preferable to directory operations
being significantly delayed by message losses and
reordering
28Distributed Real Time Applications
- Semantic relationships moving temperature sensor
generates alarm from a different sensor not
understood by causality. - No support to execute groups of operations at
same real time to achieve desired affect
lighting of pilot should be grouped with opening
of a gas valve - Delaying message delivery because of false
causality detracts performance and correctness
of a real time system. In a monitored system,
correctness of system is maximized by minimizing
the difference between computer stored state and
actual state of monitored system - State based systems use real time timestamp and
clock synchronization methods.
29CATOCS Scalability
- Consider scaling a system of N processes using
CATOCS. - Active causal Graph messages are nodes, arcs
denote happened before relationship, nodes are
deleted when messages are stable and delivered. - Time T to propagate a message across the system
- Grows roughly proportional with the square root
of the number of processes - Number of Nodes (Message rate / process)
Number of processes T - Arcs in the active causal graph grow
quadratically a process that multicasts a new
message after receiving a message potentially
introduces N new arcs in the graph () - Quadratic causal graph
- Amount of Buffering is proportional to the number
of arcs
30Buffering
- Grows quadratically with number of processes.
- It further increases if atomic message delivery
is provided since each node must maintain a
copy of each message it references in any message
it sends until the referenced message is stable.
to protect against sender failure - The message buffering is a valid point
- Is it true that buffering requirements on
state-based approaches are absent or are they
moved?
31Open House
32Additional Reference
- Ken Birmans Lecture Notes on Virtual Synchrony
in Course CS614 at Cornell prepared by Justin