Title: Impossibility of Distributed Consensus with One Faulty Process
1Impossibility of Distributed Consensus with One
Faulty Process
- Michael J. Fischer
- Nancy A. Lynch
- Michael S. Paterson
- Presentation by Scott McManus
- February 14, 2007
2Overview
- Consider general problem of transaction
management. - A system model is described to make impossibility
theorem as applicable as possible. - Consensus protocols are generalized and
simplified, and (a lot) of definitions are given. - Lemmas and theorem of impossibility are given.
- A problem is defined which is solvable as long as
a majority of the processes are not faulty.
3Problem Definition and Result
- Remote processes want to reach agreement.
- In the database transaction commit problem, want
to know whether to commit or discard results. - Straightforward if processes and network are
completely reliable, but process crashes, network
partitioning, and other faults. - Surprising result that no completely asynchronous
commit protocol can handle an announced process
death.
4Problem Solution?
- Problem considers a crash in the Schneider's
failure models, where a processor's halting may
not be detectable. - Does not even go so far as to assume that faults
are Byzantine - so this is an even simpler model.
Assumes channels are reliable. - Complete lack of information to processes
- No assumptions about delays or relative speeds
are made. and there is no information about
process availability.
5System Model
- The goal is to make the result as widely
applicable as possible. - Processes are just state machines.
- No timeouts or time information is available.
- Lamport clocks are also excluded, as they could
allow messages to be logically ordered. - Just one process must make a decision.
- The process must decide on whether a bit is 0 or
1. - Processes communicate with messages.
- No assumptions are made about message length,
latency, or ordering.
6Why does only one process have to reach a
decision?
- We are running a consensus protocol.
- Eventually, one process will reach a decision.
- The protocol will then gather consensus with the
other processes. - So we only consider a single process to make the
model as widely applicable as possible.
7Computation Model
- A simple computation model is used
- Atomically receive a message, perform some
computation for the message, or send a set of
messages. - Atomic broadcasts must be available one one
non-faulting process receives the message, then
all receive it.
8Consensus Protocols - Definitions
- There are many components and definitions for the
sake of rigor. Many are intuitive, but a few to
watch for are - partially correct consensus protocol
- admissible run
- deciding run
- totally correct in spite of one fault
9Definitions - Input/Output registers
- Input and output registers for a single bit are
used. - The proof uses only a single bit so that it can
be generalized to any system with two decisions
to make (without complicating the terminology). - Once an output register is written, it cannot be
changed (so a decision has been made). - An output register has the special value b to
indicate that no decision has been made.
10Definitions (Cont.)
- Internal State input register output register
program counter internal storage - Initial State
- Initial Internal State
- Input register is unspecified (so no input yet)
- Output register uses special symbol b to indicate
that no decision has been made. (Otherwise, an
arbitrary decision will be forced at
initialization.) - Transition Function Describes how states are
transitioned
11Definitions (Cont.)
- Configuration Internal state message buffer
- Step Receipt of a message and a transition of
the internal state. - Event Receipt of a message by a process
- Schedule Start configuration event sequence
- A resulting configuration is reachable.
- A configuration reachable from some initial
configuration is accessible
12Schedule Commutativity Lemma
- Lemma 1 If two processes (say, p1 and p2) start
with the same configurations and take disjoint
schedules (say, s1 and s2), then applying the
swapped schedules in order (s2 on p1 and s1 on
p2) will yield the same configuration. - In other words, two nonconflicting schedules can
be applied in any order.
13More Definitions
- A decision value v is reached if the process's
output register records v. - Partially correct consensus protocol
- No accessible configuration has more than one
decision value. (I.e., the configuration has
deterministic output.) - Given a decision value, an output register with
that decision value can be reached by way of some
accessible configuration. - In other words, the first condition says that a
decision is forced, and the second says that both
decisions will be possible somehow.
14What Else - More Definitions!
- Nonfaulty
- A process is nonfaulty if it takes infinitely
many time steps. - Faulty
- A process is faulty if it is not nonfaulty.
- So processes that halt are faulty. Note that
making a decision is not modelled as requiring a
halt. - Admissible run
- At most one process is faulty, and all nonfaulty
processes eventually receive all messages. - But no decision may be made!
- Deciding run
- Some process reaches a decision state in the run.
15One More Definition!
- A consensus protocol is totally correct in spite
of one fault if - the protocol is partially correct
- every run is admissable
- In other words
- each configuration yields a nonambiguous decision
(if any) - each decision can be reached by some admissible
configuration - at most one process is at fault
- all nonfaulty processes eventually receive their
messages. - So the protocol forces a decision in any
situation.
16Main Theorem
- No consensus protocol is totally correct in spite
of one fault. In other words, there is the
possibility the protocol will remain indecisive. - Proof idea
- Lemma 2 Some initial configuration allows both
decisions to be reached. - Lemma 3 From each configuration for which both
decisions can be reached, there is a reachable
configuration which still allows both decisions. - Finally We can use lemma 3 to indefinitely find
configurations which always allows both decisions.
17Questions/Discussion Before Next Section
- So there is some sequence of operations that
prevents processes from ever deciding on a value
when one process fails, but it does not mean that
such a situation is probable. - Is the complete lack of timeouts (and even
Lamport clocks) specious, especially with
heartbeats used in many systems? - Although this paper came out at least 4 years
before Schneider's work in Distributed Computing
was published, it uses a lot of the same modeling
granularity and assumes a crash failure mode.
18Solving the Consensus Problem
- There is a solution to the consensus problem if
at least a majority of the processes are
nonfaulty. In terms of modelling, the constraints
have been relaxed to make the consensus problem
feasible. - No process knows in advance whether the processes
are initially dead or not (i.e., a crash failure
mode in Schneider's work is assumed). - The problem is referred to as Initially Dead
Processes.
19Algorithm for Solving Consensus Problem
- First Stage
- Every process is a vertex (node) in a directed
graph. - Every process broadcasts its process number. (So
atomic broadcasts are required.) - Every process listens for L - 1 responses, where
L ceiling( (N1)/2 ). - N Even L - 1 simple majority of processes
- N Odd L - 1 one less than simple majority
- If process p heard from process q, then q is an
ancestor of p. We also draw a directed edge in
the directed graph from q to p.
20Algorithm for Consensus Problem - First Stage
(Cont.)
- At this point, each process p knows its own
ancestors, but p does not know which processes
chose to listen to the message that p broadcast. - The next goal for each process is to have at
least a simple majority of the processes know
about one another.
21Algorithm for Consensus Problem - Second Stage
- Second Stage
- Every process broadcasts out its own id and the
L-1 ancestors it found while listening. - Note that all nonfaulty processes will receive
the message. - Each process now knows its own ancestors (from
the first stage) and every process's ancestors
(from the second stage). - How can a process p know every other process's
ancestors if p only heard from L-1 processes? As
processes send out their own list of ancestors, p
will add those processes to its own list of known
processes.
22Algorithm for Consensus Problem - Second Stage
(Cont.)
- If a process q is an ancestor of p, and if a
process r is an ancestor of q, then r is an
ancestor of p (i.e., transitivity holds). - This property gives us a transitive closure on
the graph. We want to draw edges on the graph
from r to p if r is an ancestor of p, either
directly or indirectly. That way, we can reach
the goal of having L processes know each other. - If p and q are mutual ancestors (either directly
or indirectly), then they will have bidirectional
edges between them. - We want to find an initial clique so that all
processes in the initial clique are mutual
ancestors of one another.
23Algorithm for Consensus Problem
- How big is the initial clique?
- Each process has at least L-1 ancestors, so if a
process is part of an initial clique then it will
have to be at least (L-1)1 L processes large.
That gives a simple majority. - Can there be multiple initial cliques?
- No. If there are multiple initial cliques, then
they must have at least L L processes, which is
2L processes. But that gives 2 ceiling( (N1)/2
) processes, which is at least N1. - So there is exactly one such clique. It has at
least a simple majority of the processes, so a
consensus can be formed.
24Questions/Discussion
- Again, the Consensus Problem is solvable when
- a majority of processes are not dead
- no process fails while the protocol is running
- the number of processes is known.
- The point is that, by relaxing the constraints in
modeling a problem, infeasible problems may
become feasible. - Questions?