Title: Revisiting failure detectors
1Revisiting failure detectors
- Some of you asked questions about implementing
- consensus using S - how does it differ from
- reaching consensus using P. Here it is.
- Recall the definition of S (strong) FD
- Strong completeness weak accuracy
2Consensus using S
- Program for process p
- Vp (?,?, .. ?) Vpp input of p Dp Vp
- (Phase 1) Same as phase 1 of consensus with P
- (Phase 2)
- send (Vp, p) to all
- receive (Dq, q) from all q, or q is a
suspect - k 1
- do k ? n ?
- if ?Vqk Vpp ? ? ? Vqk ? ? Vpk
Dpk ? fi - od
- (Phase 3)
- Decide on the first element Vp j Vp j ? ?
3Example
0 1 2 3 4
0 1 2 3 4
1, 4
Never suspected
? - ? ? -
? - - ? -
0
? ? - ? -
? - - ? -
2, 4
1
? - - ? -
? ? ? ? -
4
2
? - - ? -
2, 4
? ? - ? -
3
crashed
4
V after Phase 2
V after Phase 1
List of suspects
4Atomic Commit Protocols
-
- Network of servers
- The initiator of a transaction is called the
coordinator, - and the remianing servers are participants
-
S1
Servers may crash
S3
S2
5Requirements of Atomic Commit Protocols
S1
- Network of servers
- Termination. All non-faulty servers must
eventually reach an irrevocable decision. - Agreement. If any server decides to commit, then
every server must have voted to commit. - Validity. If all servers vote commit and there is
no failure, then all servers must commit.
Servers may crash
S3
S2
6One-phase Commit
server
participant
Commit / abort
server
server
client
participant
coordinator
server
participant
If a participant deadlocks or faces a problem
then the coordinator may never be able to find
it. Too simplistic.
7Two-phase commit (2PC)
- Phase 1 The coordinator sends VOTE to the
participants. and receive yes / no from them. - Phase 2
- if ?server j vote(j) yes ? multicast COMMIT to
all severs - ? ? server j vote (j) no ? multicast ABORT
to all servers - fi
- What if failures occur?
8Failure scenarios in 2PC
- (Phase 1)
- Fault Coordinator did not receive YES / NO
- OR
- Participant did not receive VOTE
- Solution Broadcast ABORT
- Abort local transactions
9Failure scenarios in 2PC
- (Phase 2)
- (Fault) A participant does not receive a COMMIT
or ABORT message from the coordinator - (it may be the case that the coordinator crashed
after sending ABORT or COMIT to a fraction of the
servers), then it remains undecided, until the
coordinator is repaired and reinstalled into the
system. - This blocking is a known weakness of 2PC.
10Coping with blocking in 2PC
- A non-faulty participant can ask other
participants about - what message (COMMIT or ABORT) did they receive
from - the coordinator, and take appropriate actions.
- But what if no non-faulty participant received
anything? - Who knows if the coordinator committed or aborted
the - local transaction before crashing? Continue to
wait
11Non-blocking Atomic Commit
- A blocking protocol has the potential to prevent
non-faulty participants from reaching a final
decision. - A solution to the atomic commitment problem is
called non-blocking, if in spite of server
crashes, every non-faulty participant eventually
decides. - One solution is to impose the requirement of
uniform agreement
12Uniform agreement
- If any participant (faulty or not) delivers a
message m - (commit or abort) then all correct processes
eventually - deliver m.
- To implement uniform agreement, no server should
deliver a COMMIT or ABORT message until it has
relayed it to all other servers. - If a process times out in phase 2, then it
decides abort.
13Recovery Stable storage
Creates the illusion of an incorruptible storage,
even if a writer or a disk crashes at any time.
The implementation Uses at least two independent
disks.
A0
A1
inspect
update
14Stable storage
- To write, do the following
- copy on disk A0
- record timestamp T0
- compute checksum S0
- copy on disk A1
- record timestamp T1
- compute checksum S1
- Readers check four cases
- Both checksums OK and T1gtT0
- Both checksums OK and T1ltT0
- Checksum on A1 wrong
- Checksum on A2 wrong
- (Which copy to accept in each case?)
A0
update
inspect
A1
15Checkpointing
- Mechanism for (backward) error recovery.
Transaction states are periodically stored on
stable storages. Following a failure, the
transaction rolls back to the nearest checkpoint. - Independent (unsynchronized) or coordinated
(synchronized) checkpointing
16Classification of checkpointing
Coordinated Checkpointing takes a consistent
snapshot. Has some overhead. Uncoordinated
checkpointing apparently has no overhead. But it
may have some efficiency problems.
17Checkpointing (continued)
- Some actions can be reversed, but some cannot be
reversed (like dispensing cash from an ATM
machine, printing a document etc). - Such actions are logged, and during replay, logs
substitute real actions.
18Group Communication
- Group oriented activities are steadily
increasing. - There are many types of groups
- ? Open and Closed groups
- ? Peer-to-peer and hierarchical groups
19Major issues
- Atomic multicast
- Ordered multicast
- Dynamic groups
- Failure handling
20Atomic multicast
- A multicast is called atomic, when the message is
delivered to every correct (i.e. functioning)
member, or to no member at all. - Sometimes, certain features available in the
infrastructure of a distributed system simplify
the implementation of multicast. Examples are (1)
multicast on an ethernet LAN (2) IP multicast
21Basic vs. reliable multicast
- Basic multicast does not consider crash failures.
- Reliable multicast does.
- Three criteria for basic multicast
- Liveness. Each process must receive every
message - Integrity. No spurious message received
- No duplicate. Accepts exactly one copy of a
message
22Reliable atomic multicast
- Senders program Receivers program
- i0 if m is new ?
- do i ? n ? accept it
- send message to i multicast m
- i i1 ? m is duplicate ? discard m
- od fi
Tolerates process crashes.