Impossibility of Distributed Consensus with One Faulty Process - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Impossibility of Distributed Consensus with One Faulty Process

Description:

Consider general problem of transaction management. ... timeouts (and even Lamport clocks) specious, especially with heartbeats used in many systems? ... – PowerPoint PPT presentation

Number of Views:232

Avg rating:3.0/5.0

Slides: 25

Provided by: ccGa

Category:

more less

Transcript and Presenter's Notes

Title: Impossibility of Distributed Consensus with One Faulty Process

1
Impossibility of Distributed Consensus with One
Faulty Process

Michael J. Fischer
Nancy A. Lynch
Michael S. Paterson
Presentation by Scott McManus
February 14, 2007

2
Overview

Consider general problem of transaction
management.
A system model is described to make impossibility
theorem as applicable as possible.
Consensus protocols are generalized and
simplified, and (a lot) of definitions are given.
Lemmas and theorem of impossibility are given.
A problem is defined which is solvable as long as
a majority of the processes are not faulty.

3
Problem Definition and Result

Remote processes want to reach agreement.
In the database transaction commit problem, want
to know whether to commit or discard results.
Straightforward if processes and network are
completely reliable, but process crashes, network
partitioning, and other faults.
Surprising result that no completely asynchronous
commit protocol can handle an announced process
death.

4
Problem Solution?

Problem considers a crash in the Schneider's
failure models, where a processor's halting may
not be detectable.
Does not even go so far as to assume that faults
are Byzantine - so this is an even simpler model.
Assumes channels are reliable.
Complete lack of information to processes
No assumptions about delays or relative speeds
are made. and there is no information about
process availability.

5
System Model

The goal is to make the result as widely
applicable as possible.
Processes are just state machines.
No timeouts or time information is available.
Lamport clocks are also excluded, as they could
allow messages to be logically ordered.
Just one process must make a decision.
The process must decide on whether a bit is 0 or
1.
Processes communicate with messages.
No assumptions are made about message length,
latency, or ordering.

6
Why does only one process have to reach a
decision?

We are running a consensus protocol.
Eventually, one process will reach a decision.
The protocol will then gather consensus with the
other processes.
So we only consider a single process to make the
model as widely applicable as possible.

7
Computation Model

A simple computation model is used
Atomically receive a message, perform some
computation for the message, or send a set of
messages.
Atomic broadcasts must be available one one
non-faulting process receives the message, then
all receive it.

8
Consensus Protocols - Definitions

There are many components and definitions for the
sake of rigor. Many are intuitive, but a few to
watch for are
partially correct consensus protocol
admissible run
deciding run
totally correct in spite of one fault

9
Definitions - Input/Output registers

Input and output registers for a single bit are
used.
The proof uses only a single bit so that it can
be generalized to any system with two decisions
to make (without complicating the terminology).
Once an output register is written, it cannot be
changed (so a decision has been made).
An output register has the special value b to
indicate that no decision has been made.

10
Definitions (Cont.)

Internal State input register output register
program counter internal storage
Initial State
Initial Internal State
Input register is unspecified (so no input yet)
Output register uses special symbol b to indicate
that no decision has been made. (Otherwise, an
arbitrary decision will be forced at
initialization.)
Transition Function Describes how states are
transitioned

11
Definitions (Cont.)

Configuration Internal state message buffer
Step Receipt of a message and a transition of
the internal state.
Event Receipt of a message by a process
Schedule Start configuration event sequence
A resulting configuration is reachable.
A configuration reachable from some initial
configuration is accessible

12
Schedule Commutativity Lemma

Lemma 1 If two processes (say, p1 and p2) start
with the same configurations and take disjoint
schedules (say, s1 and s2), then applying the
swapped schedules in order (s2 on p1 and s1 on
p2) will yield the same configuration.
In other words, two nonconflicting schedules can
be applied in any order.

13
More Definitions

A decision value v is reached if the process's
output register records v.
Partially correct consensus protocol
No accessible configuration has more than one
decision value. (I.e., the configuration has
deterministic output.)
Given a decision value, an output register with
that decision value can be reached by way of some
accessible configuration.
In other words, the first condition says that a
decision is forced, and the second says that both
decisions will be possible somehow.

14
What Else - More Definitions!

Nonfaulty
A process is nonfaulty if it takes infinitely
many time steps.
Faulty
A process is faulty if it is not nonfaulty.
So processes that halt are faulty. Note that
making a decision is not modelled as requiring a
halt.
Admissible run
At most one process is faulty, and all nonfaulty
processes eventually receive all messages.
But no decision may be made!
Deciding run
Some process reaches a decision state in the run.

15
One More Definition!

A consensus protocol is totally correct in spite
of one fault if
the protocol is partially correct
every run is admissable
In other words
each configuration yields a nonambiguous decision
(if any)
each decision can be reached by some admissible
configuration
at most one process is at fault
all nonfaulty processes eventually receive their
messages.
So the protocol forces a decision in any
situation.

16
Main Theorem

No consensus protocol is totally correct in spite
of one fault. In other words, there is the
possibility the protocol will remain indecisive.
Proof idea
Lemma 2 Some initial configuration allows both
decisions to be reached.
Lemma 3 From each configuration for which both
decisions can be reached, there is a reachable
configuration which still allows both decisions.
Finally We can use lemma 3 to indefinitely find
configurations which always allows both decisions.

17
Questions/Discussion Before Next Section

So there is some sequence of operations that
prevents processes from ever deciding on a value
when one process fails, but it does not mean that
such a situation is probable.
Is the complete lack of timeouts (and even
Lamport clocks) specious, especially with
heartbeats used in many systems?
Although this paper came out at least 4 years
before Schneider's work in Distributed Computing
was published, it uses a lot of the same modeling
granularity and assumes a crash failure mode.

18
Solving the Consensus Problem

There is a solution to the consensus problem if
at least a majority of the processes are
nonfaulty. In terms of modelling, the constraints
have been relaxed to make the consensus problem
feasible.
No process knows in advance whether the processes
are initially dead or not (i.e., a crash failure
mode in Schneider's work is assumed).
The problem is referred to as Initially Dead
Processes.

19
Algorithm for Solving Consensus Problem

First Stage
Every process is a vertex (node) in a directed
graph.
Every process broadcasts its process number. (So
atomic broadcasts are required.)
Every process listens for L - 1 responses, where
L ceiling( (N1)/2 ).
N Even L - 1 simple majority of processes
N Odd L - 1 one less than simple majority
If process p heard from process q, then q is an
ancestor of p. We also draw a directed edge in
the directed graph from q to p.

20
Algorithm for Consensus Problem - First Stage
(Cont.)

At this point, each process p knows its own
ancestors, but p does not know which processes
chose to listen to the message that p broadcast.
The next goal for each process is to have at
least a simple majority of the processes know
about one another.

21
Algorithm for Consensus Problem - Second Stage

Second Stage
Every process broadcasts out its own id and the
L-1 ancestors it found while listening.
Note that all nonfaulty processes will receive
the message.
Each process now knows its own ancestors (from
the first stage) and every process's ancestors
(from the second stage).
How can a process p know every other process's
ancestors if p only heard from L-1 processes? As
processes send out their own list of ancestors, p
will add those processes to its own list of known
processes.

22
Algorithm for Consensus Problem - Second Stage
(Cont.)

If a process q is an ancestor of p, and if a
process r is an ancestor of q, then r is an
ancestor of p (i.e., transitivity holds).
This property gives us a transitive closure on
the graph. We want to draw edges on the graph
from r to p if r is an ancestor of p, either
directly or indirectly. That way, we can reach
the goal of having L processes know each other.
If p and q are mutual ancestors (either directly
or indirectly), then they will have bidirectional
edges between them.
We want to find an initial clique so that all
processes in the initial clique are mutual
ancestors of one another.

23
Algorithm for Consensus Problem

How big is the initial clique?
Each process has at least L-1 ancestors, so if a
process is part of an initial clique then it will
have to be at least (L-1)1 L processes large.
That gives a simple majority.
Can there be multiple initial cliques?
No. If there are multiple initial cliques, then
they must have at least L L processes, which is
2L processes. But that gives 2 ceiling( (N1)/2
) processes, which is at least N1.
So there is exactly one such clique. It has at
least a simple majority of the processes, so a
consensus can be formed.

24
Questions/Discussion

Again, the Consensus Problem is solvable when
a majority of processes are not dead
no process fails while the protocol is running
the number of processes is known.
The point is that, by relaxing the constraints in
modeling a problem, infeasible problems may
become feasible.
Questions?

Write a Comment

User Comments (0)