Title: TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM
1TOTEM A FAULT-TOLERANT MULTICASTGROUP
COMMUNICATION SYSTEM
- L. E. Moser, P. M. Melliar Smith,D. A. Agarwal,
B. K. BudhiaC. A. Lingley-Papadopoulos - University of California, Santa Barbara
2INTRODUCTION
- Totem provides reliable totally-ordered
multicasting of messages over LANs - Intended for complex applications with critical
requirements for - fault tolerance
- real-time performance
- Exploits hardware broadcast of most LANs
3TOTEM SERVICES
- Built as a hierarchy of protocols
- Application layer
- Process group interface
- Multiple-ring protocol
- Single-ring protocol
- Physical medium
4Single-ring protocol
- Built on top of a best-effort multicast service,
using UDP to exploit the hardware broadcasts of
the LAN - Converts these multicasts into the service of
reliable totally ordered delivery of messages on
a single LAN - Also provides fault-detection, recovery and
configuration change service
5Multiple-ring protocol
- Uses information from the process group interface
above it - Provides total ordering of messages as well as
network topology maintenance services
6Process group interface
- Delivers messages to the application processes in
the appropriate process groups - Provides process group membership services.
7Services provided by Totem
- Two reliable totally ordered message delivery
services - Agreed delivery
- Safe delivery
- Both services deliver messages in a single
system-wide total order that respects Lamports
causal order
8Agreed Delivery
- Guarantees that a processor will not deliver a
message before it has delivered all prior
messages that - Have been issued by processors in the current
configuration and - Have time-stamps within the duration of that
configuration - All processes receive all messages in theorder
they were sent
9Safe Delivery
- Further guarantee that a processor will not
deliver a message unless all processors in its
configuration have received it (everyone or
nobody). - All processes receive all messages in the same
order at the same time
10Why Lamports causal order?
- Otherwise processes that belong to two or more
groups could receive message from different
groups in different order - A and B both in groups G and H
- A receives m from group G then m from group H
and finally m from group G - B could receive m from group H then m from group
G and finally m from group G
11Example
Group G sends messages m and mto A and B
Group H sends message mto A and B
Both A and B will receive m and min the same
order
Without total ordering, A could receive m before
m and B could receive m before m
12Delivery guarantees
- Extended virtual synchrony ensures that these
guarantees are honored within every configuration - When a fault occurs, Totem forms a transitional
configuration with a reduced membership - Message order is guaranteed even in the presence
of network partitions
13Extended virtual synchrony (I)
- We want to ensure that
- Messages are received in the same order by all
processes - All processes share the same view of the process
group to which they belong
14Extended virtual synchrony (II)
- Virtual synchrony model (K. Birman, ISIS) orders
group membership changes along with the regular
messages - Ensures that failures do not result in
- Incomplete delivery of multicast messages
- Holes in the causal delivery order
- Problems remain if network can partition
15Extended virtual synchrony (III)
- Extended virtual synchrony model (Totem) extends
the virtual synchrony model to systems - Processes can fail and recover
- Network can partition and remerge
- Guarantees that same message sent to processes in
two or more components of a partitioned network
will be in a consistent order in all these
components
16Ordering of messages
- Messages are born-ordered
- Each message includes a time-stamp
- Relative order of messages is determined by the
message themselves as created by their senders
17The single-ring protocol (I)
- Uses a circulating token containing among others
- A seq field with the sequence number of the last
message that was sent - An aru field with the sequence number of the last
message that has been received by all processors - Only the processor that holds the token can send
a message
18The single-ring protocol (II)
- aru field used to implement safe delivery
- Tells processors which messages have been
received by every processor in the ring - Token also provides information about the
aggregate message backlog of the processors on
the ring - Results in a fairer bandwidth allocation among
processors than FDDI
19Local membership protocol (I)
- Part of the single-ring protocol
- Allows
- Inclusion of new or recovering processors
- Deletion of faulty processors
20Local membership protocol (II)
- Ensures
- Consensus among all members of a configuration
about the configuration membership - Termination as each configuration will be
installed on every processor within a bounded
time or not at all.
21The multiple-ring protocol (I)
- Operates over several LANs linked by gateways
- Each LAN is organized as a virtual token ring
and managed by the single-ring protocol - Offers same services and same guarantees as
single-ring protocol
Ring A
Ring B
Ring C
22The multiple-ring protocol (II)
- Uses Lamports timestamps and delivers messages
in timestamp order - When a gateway forwards a message from one ring
to another, it gives to the message a new
sequence number for the new ring - Processor faults and network partitions are
detected by the single-ring protocol
23Message delivery (I)
- Each processor maintains one recv_msgs list of
messages received but not yet delivered for each
ring from which it can receive messages
24Message delivery (II)
- A processor will deliver a message as an agreed
message as soon as - Message has the lowest time stamp of all the
messages in its recv_msgs lists - None of these lists is empty
- We could wait forever for rings that have no
messages to send but for guaranteed vector
messages
25Example (I)
- Consider the following recv_msgs list where the
numbers in parentheses indicate the message
timestamps.
26Example (II)
- We can deliver messages mA!, mB1 and mC1
27Example (III)
- We cannot deliver more messages because we might
miss a message mCnfrom ring C with an earlier
timestamp, say ts 11.
28Related issues
- Totem sends from time to time guaranteed vector
messages - They specify among other things, which rings have
sent messages - Processor faults and network partitions are
detected by the single-ring protocol - Configuration and topology change messages have
timestamps and are delivered in strict timestamp
order