Fault Tolerance and Consensus

About This Presentation

Title:

Fault Tolerance and Consensus

Description:

Impossibility in asynchronous systems. Fundamentals and Design of Distributed Systems ... This is an example of an impossibility result. ASCIa9/november 2006. 17 ... – PowerPoint PPT presentation

Number of Views:137

Avg rating:3.0/5.0

Slides: 60

Provided by: EPE28

Category:

more less

Transcript and Presenter's Notes

Title: Fault Tolerance and Consensus

1
Fault Tolerance and Consensus

Problem definitions
Stopping failures
Byzantine failures
Randomized solutions
Impossibility in asynchronous systems

Fundamentals and Design of Distributed Systems
D.H.J. Epema
Parallel and Distributed Group
2
Fault tolerance and consensus (1)

Two persons (Alice and Bob) try to make an
appointment
Two propositions
PA Alice wants to have the appointment
PB Bob wants to have the appointment
Alice sends message A1 that she wants the
appointment
Bob receives A1, and Knows PA KB(PA)
Bob sends back a message B1 that he wants to go
too
Alice receives B1, and so KA(PB) and KA(KB(PA))
hold
Alice sends confirmation A2 back
This continues for ever
Problem messages may have arbitrary delays and
may get lost

KA(PB), KA(KB(PA))
A
A1
B1
A2
B
KB(PA)
3
Fault tolerance and consensus (2)

Processors may need to reach consensus
Applications
commit a transaction in a database
all participating sites have to agree on
committing the results
in a distributed database with replication
when a record is to be modified, the database
servers holding the replicas have to agree on the
modification
in a replicated computation
processes have to start with the same input value
(e.g., from a sensor), so they have to agree on
this value
Agreement modeled as agreeing on the value of a
single bit
Reaching consensus is a problem in the face of
failures

4
Fault classification

Possible processor failures
fail-stop (crash) failures
a process just stops
when in a round in a synchronous system a process
should send a set of messages, it may only send a
subset
omission failures
fail to send or receive a message
performance failures
not meeting timing specifications
Byzantine failures
random (malicious) behavior

5
Model aspects

Synchronous versus asynchronous
reaching agreement is much more difficult in
asynchronous systems difference between long
delay and processor/link failure cannot be
detected
Authentication
without messages cannot be forged or altered by
a process before passing them along to others
with messages cannot be forged or modified
agreement much more difficult to reach when
messages are non-authenticated
Network connectivity
we assume a complete network

6
Agreement with stopping failures

All processes start with an initial value from
some set V
Every process has to decide on a value in V such
that
Agreement no two processes decide on different
values
Validity if all processes start with the same
value v, then no process decides on a value
different from v
Termination all non-faulty processes decide
within finite time

7
Byzantine generals
attack
no attack

City surrounded by armies
Armies have to attack simultaneously in order to
conquer the city
Communication between generals by means of
messengers
Some generals of the armies are traitors

8
The Byzantine agreement problem

One process (the source or commander) starts with
a binary value
Each of the remaining processes (the lieutenants)
has to decide on a binary value such that
Agreement all non-faulty processes agree on the
same value
Validity if the source is non-faulty, then all
non-faulty processes agree on the initial
value of the source
Termination all processes decide within finite
time
So if the source is faulty, the non-faulty
processes can agree on any value
It is irrelevant on what value a faulty process
decides

C (0/1)
9
Two variations

All generals start with a value
Variation 1
all non-faulty generals have to agree on a vector
with a value for every general
solution run a copy of an algorithm for the
previous problem for every general
Variation 2
all non-faulty generals have to agree on a single
value
solution apply the same decision rule on the
vector in every general (e.g., majority function)

(v1,v2,,vn)
majority(v1,v2,vn)
10
A solution for stopping failures (1)

Solution by flooding decision values
No more than f failing processes
Every process starts with a value v
Every process maintains a set W (with decision
values seen sofar)
Initially Wv
Then, do f1 rounds
broadcast current value of W to all other
processes
receive all these sets and set W to the union of
them all and W
Finally,
if W contains only a single element v, decide(v)
else decide(default)

11
A solution for stopping failures (2)

Validity and termination are trivially satisfied
For agreement
enough to show that all processes that are still
active at the end of round f1 then have the same
set W
because there are f1 rounds and at most f
failing processes, there is at least one round r
in which no process fails
in round r all active processes exchange their
sets W, and so have identical sets W at the end
of the round
from then on, all sets W in all active processes
are identical

round r
nobody fails
all same W
12
A solution for stopping failures (3)

Optimization
processes only need to know whether at the end
W1 or Wgt1
so let processes only broadcast at most two
values
their initial value
the first different value they receive

13
Conditions for a solution for Byzantine

Number of processes n
Maximum number of possibly failing processes f
Necessary and sufficient condition for a solution
to Byzantine agreement
fltn/3
Minimal number of rounds in a deterministic
solution
f1
There exist randomized solutions with a lower
expected number of rounds

14
Example three generals (1)

Scenario 1 Lieutenant L2 is a traitor

C
note all messages sent and received by L1
0
0
0
L1
L2
1
15
Example three generals (2)

Scenario 2 Commander C is a traitor

C
same messages sent and received by L1
0
1
0
L1
L2
1
16
Example three generals (3)

L1 has to decide 0 in scenario 1, because both L1
and C are loyal and C starts with a 0
Lieutenant L1 cannot distinguish the two
scenarios
So L1 also has to decide 0 in scenario 2
So a loyal lieutenant (L1) always has to follow
the commander
The same holds for L2, so L2 has to decide 1 in
scenario 2
Contradiction L1 and L2 are both loyal in
scenario 2, but decide on different values!
This is an example of an impossibility result

17
A solution for Byzantine agreement (1)

Algorithm is recursive with f1 levels
Without authentication, modeled with Oral
Messages (OM)
When a message is supposed to be sent according
to the algorithm, but a process does not send it,
this is detected, and a default value (e.g., 0)
is assumed
Bottom case of the recursion OM(0) (no failures)
the commander broadcasts its initial value
every other process decides on the value it
receives

18
A solution for Byzantine agreement (2)

OM(f), fgt0 (resilient to f failures)
the commander broadcasts its initial value
process numbering commander0, lieutenants
1,2,,n-1
let vi be the value received from the commander
by lieutenant Li, or the default if no value is
received
recursive step
Li executes OM(f-1), acting as the commander for
the other lieutenants (L1, , Li-1, Li1, ,
Ln-1)
let vj be the value on which Li decides in the
recursive step with Lj as the commander (for
i,j1,2,...,n-1, i ? j)
Li decides on majority(v1,,vi,,vn-1)

19
A solution for Byzantine agreement (3)

OM(f)
v1
vi
v2
vn-1
L1
L2

Li

Ln-1
OM(f-1)

Li
L2
Ln-1
here Li decides on its own v1 as a lieutenant of
L1
20
A solution for Byzantine agreement (4)

So a lieutenant does not decide on the majority
of all values it receives!!!
But Li decides on majority(majority(),majority
(),,vi,,majority(),,majority())

computed as the decision when acting as a
lieutenant in OM(f-1)
obtained directly from the commander
21
A solution for Byzantine agreement (5)

Number of executions
OM(f) 1 time
OM(f-1) (n-1) times
OM(k) (n-1)(n-2) (n-fk) times for
k0,1,...,f-1
Total number of messages is of order nf1
OM(f) n-1
OM(f-1) (n-1)(n-2)
OM(k) (n-1)(n-2) (n-(f-k))(n-(f-k1))
OM(0) (n-1)(n-2) (n-(f1)) (f1 factors, this
dominates)

22
A solution for Byzantine Agreement (6)
0
level 0
In lieutenant L6
n7 f2 i6
1
2
3
4
5
level 1

level 2
3
1
2
4
2
3
4
5

In order to decide, every lieutenant Li creates a
labelled tree with f1 levels
level 0 the root with label 0 (the commander)
level 1 n-2 children with all labels except 0
and i
at every subsequent level all ids that have not
yet occurred on the path from the root and are
different from i
the degree decreases by 1 at every level

23
A solution for Byzantine Agreement (7)
0
level 0
In lieutenant L6
v6
n7 f2 i6
1
2
3
4
5
value received from L1
level 1

level 2
3
1
2
4
2
3
4
5
value received through L1 and L5

Label the nodes of the tree with additional
labels
level 0 vi (value received from the commander)
level 1 the value that Lj told Li that the
commander told him
label of any node the value that was passed to
Li from the commander through the chain of
lieutenants on the path from the root to the node

24
A solution for Byzantine Agreement (8)
v6
level 0
In lieutenant L6
n7 f2 i6
majority(v,w,x,y,z)
v
level 1

level 2
w
x
y
z
decide z

Decide by propagating the result up with the
majority function
at the leaf level decide on the value received
(OM(0))
at every next higher level take the majority of
the local value and the decisions at child nodes
the final value at the root is the final decision

25
Example four generals (1)
Lieutenant is a traitor
C
v
v
OM(1)
v
v
v
v
?
3xOM(0)
v
?
Every loyal lieutenant receives v,v,?
26
Example four generals (2)
Lieutenant is a traitor
C
x
z
y
x
y
y
z
x
z
Every loyal lieutenant receives x,y,z
27
Byzantine agreem. with authentication (1)

Every message carries a signature
The signature of a loyal general cannot be forged
Alteration of the contents of a signed message
can be detected
Every (loyal) general can verify the signature of
any other (loyal) general
Any number f of traitors can be allowed
Commander is process 0
Structure of message from (and signed by) the
commander, and subsequently signed and sent by
lieutenants Li1, Li2,
(v s0 si1 sik)
Every lieutenant maintains a set of orders V
Some choice function on V for deciding (e.g.,
majority, minimum)

28
Byzantine agreem. with authentication (2)

Algorithm in commander
send(v s0) to every lieutenant
Algorithm in every lieutenant Li
upon receipt of (v s0 si1 . sik) do
if (v not in V) then
V V union v
if (k lt f) then
for (j in 1,2,,n-1 \ i,i1,,ik) do
send(v s0 si1 sik i) to Lj
if (Li will not receive any more messages) then
decide(choice(V))

sign and propagate messages long enough
29
Example three generals

Commander C is traitor

Format valuesignature(s)
C
10
00
V0,1
V0,1
101
L1
L2
002
30
Randomized Byzantine agreement (1)

Solution for synchronous and asynchronous
systems!!
n processes, of which at most f fail, ngt5f
Every process has an initial value v
The algorithm proceeds in rounds consisting of
three phases
a notification phase (messages contain an N)
a proposal phase (messages contain a P)
a decision phase
When a process expects messages from all other
processes, it is no use waiting for more than n-f
messages
When not enough processors support a possible
decision, a process starts the next round with a
new random input value v

31
Randomized Byzantine agreement (2)

r1
r1 decidedfalse
do forever
broadcast(N,r,v)
await (n-f) messages of the form (N,r,)
if (gt(nf)/2 messages (N,r,w), w0,1) then
broadcast(P,r,w)
else broadcast(P,r,?)
if decided then STOP
else await (n-f) messages of the form (P,r,)
if (gtf messages (P,r,w), w0,1) then
vw
if (gt3f messages (P,r,w)) then
decide(w)
decidedtrue
else vrandom(0,1)
rr1

conditions explained later
notification phase
proposal phase
decision phase
32
Randomized Byzantine agreement (3)

No simultaneous contradicting proposals by
correct processes
Lemma 1
If a correct process proposes v in round r, then
no other correct process proposes 1-v in round r
Proof
the process has received more than (nf)/2
messages (N,r,v)
of these, more than (n-f)/2 are from correct
processes, which is a majority of the correct
processes

33
Randomized Byzantine agreement (4)

When all correct processes have the same value,
immediate decision
Lemma 2
If at the start of round r all correct processes
have the same value v, then they all decide v in
round r
Proof
each correct process receives at least n-f
notification messages, at least n-2f of which are
from correct processes, and so of the form
(N,r,v)
because ngt5f, we have n-2f n/2n/2-2f gt
n/25f/2-2f (nf)/2
so each correct process proposes v
so, each correct process receives at least n-2f
messages of the form (P,r,v)
because ngt5f, we have n-2fgt3f, and so each
correct process decides v

34
Randomized Byzantine agreement (5)

Decision of any correct process immediately
followed by others
Lemma 3
If a correct process decides v in round r, all
correct processes decide v in round r1
Proof
enough all correct processes propose v in round
r1
if a process decides v in round r, it must have
received more than 3f proposals for v, m of which
are from correct processes for some mgt2f
so every other correct processor receives at
least m-fgtf proposals for v, so it starts the
next round with this value
now use Lemma 2

35
Randomized Byzantine agreement (6)

Theorem
If ngt5f, the algorithm guarantees agreement,
validity, and terminates with probability 1
Proof
with probability 1, enough processors will pick a
common value v to have at least one correct
process decide
Expected number of rounds is of order 2n (in
fact, slightly better)
Remark randomization is used only if there is
not enough initial support for any decision
anyway

36
Randomized coordinated attack (1)

Synchronous system
Coordinated-attack problem
Complete graph
System runs for a fixed number r of rounds
Messages may get lost (all links may exhibit
failures)
Processes do not exhibit failures

37
Randomized coordinated attack (2)

Validity
if all processes start with 0, they all decide 0
if all processes start with 1 and all messages
are received, they all decide 1
Agreement with some probability
Psome process decides 0 and some process decides
1e, for some 0 e 1 (probability of
disagreement)
Termination trivial

38
Adversaries

Faults modeled with an adversary who can on
purpose try to deceive the system/processors
Here, the adversary can choose
the input values of the processors
the communication pattern (can omit arbitrary
messages)
In the algorithm, we get e1/r

39
Communication patterns (1)

Communication pattern a subset of the set
(i,j,k) (i,j) an edge in the processor graph, k
a round number
We will define an ordering ? for pairs (i,k) for
a communication pattern ?
Interpretation (i,k) ? (j,l) means that j has
at least the same knowledge in round l as i had
in round k

message in round k
i
j
40
Communication patterns (2)

Ordering ? for pairs (i,k) for a communication
pattern ?
Knowledge is monotonic
(i,k) ? (i,l) if k l
All knowledge is transferred in messages
if (i,j,k) in ?, then (i,k-1) ? (j,k)
Transitive closure
if (i,k) ? (i,k) and (i,k) ? (i,k),
then (i,k) ? (i,k)

41
Information level (1)

The information level on pairs (i,k) is defined
as
k0 level?(i,0)0
kgt0 if there is a j?i such that (j,0) ? (i,k),
then level?(i,k)0
kgt0 let ljmaxlevel?(j,k) (j,k) ? (i,k)
then, level?(i,k)1minlj j?i
The information level
starts at 0
indicates what a process knows about other
processes
is incremented when a process has heard about the
previous level of all other processes

42
Information level (2)

It can be shown that
the information levels of different processes in
the same round never differ by more than 1
if the communication pattern is complete (all
triples (i,j,k) appear), then level?(i,k)k for
all i and k

0
0
1
0
information levels
1
2
2
3
43
The algorithm (1)

Ideas
Process 1 picks a random number k between 1 and r
Full information distribution in every round (on
correct links)
Processes maintain information on the initial
values v and the levels of all processes
Messages are of the form (L,V,k), with
L a vector with the levels as far as known by the
sending process
V a vector with the initial values of all
processes
k the round number picked by process 1
Levels and initial values of other processes, and
k initially undefined

44
The algorithm (2)

Picking a round number in process 1
if ((i1) and (round0)) then
keyrandom(1,r)
Sending a message in every round in every
process
send(L,V,key) to all j

all locally known information
45
The algorithm (3)

Receiving all message in a round in process i
rounds rounds1
for (j1 to i-1, i1 to N) do
receive(Lj,Vj,kj) from j / on correct
links /
if (kj ? undefined) then keykj /
round number picked by 1 /
if (for all l, Vj(l) ? undefined) then
Vi(l) Vj(l)/ copy init. vals /
if (for all l, Lj(l) gt Li(l)) then Li(l)
Lj(l) / copy levels /
Li(i)1minLi(j) j ? i / compute own
level /
if (roundsr) then
if (key ? undefined) and (Li(i)key) and
(Vi(j)1 for all j) then
decision1
else decision0

all processes started with 1
46
Use of levels and key

In a sense, processes agree on their levels,
i.e., on the actual round they have reached at
the end of the algorithm
The key chosen by process 1 is a guess of this
level

47
Why do we get e1/r ?

Sketch
Let li be the value of Li(i) in round r
The levels li differ by at most 1
If keygtmaxli or at least one process has
initial value 0, all processes decide 0
(agreement)
If keyminli and all processes have initial
value 1, all processes decide 1 (again agreement)
So the only case where disagreement is possible,
is when keymaxli, which has probability 1/r,
since maxli is determined by the adversary and
key is uniform on 1,r

48
We cant do much better

It can be shown that
Any r-round algorithm for the randomized
coordinated attack problem has probability of
disagreement at least equal to 1/(r1)

49
Impossibility of consensus in asynchronous systems

The consensus problem (weak form)
Consistency all correct processors that take a
decision, take the same decision
Validity both 0 and 1 are possible decisions
from possibly different initial configurations
(to avoid trivial solutions)
Termination at least one correct processor
eventually takes a decision
Theorem there is no such solution
(FLPFischer-Lynch-Petterson)

50
The model (1)

Every processor has
an input variable x
an output variable y with possible values 0 and 1
Messages are of the form (P,m), with
P the destination processor
m the message contents
Message passing is not necessarily FIFO
This is modeled with a single global message
buffer

P
Q
x y
x y
(P,m)
51
The model (2)

Every processor has a deterministic transition
function
In a single step, a processor
receives a message (any from the message buffer
meant for it)
does a local computation
sends a finite number of messages
A configuration is defined by
the internal state of every processor
the contents of the message buffer

52
The model (3)

In an initial configuration
every processor is in an initial state (with some
value for x)
the message buffer is empty
A step
takes the system from one configuration to the
next
is defined by a single step of one processor
is associated with an event (P,m)
receiving the message m by P
performing the ensuing computation
and sending a finite number of messages
A schedule s from C is a sequence of events
applied to C
The configuration s(C) with s a finite schedule
is said to be reachable from C

e(P,m)
C Ce(C)
s
C Cs(C)
53
The model (4)

A run is a sequence of configurations associated
with a schedule
A processor is in a decision state if it has
given y a value
A run is a deciding run if at least one
processors reaches a decision state
A processor is non-faulty in a run if it performs
infinitely many steps
A run is admissable if at most one processor is
faulty, and all messages to non-faulty processors
are eventually received
A configuration is bivalent if both decision
values are possible (for a correct processor)
A configuration is univalent if only one of the
decision values is possible (0-valent or 1-valent)

54
Outline impossibility proof

Suppose there exists a 1-resilient consensus
protocol
Step 1. There exists a bivalent initial
configuration
Step 2. For every bivalent configuration C and
processor P, there is a finite schedule s such
that s(C) is bivalent and P takes a step in s
Step 3. Keep the system in bivalent
configurations forever
apply step 1 let B1 be a bivalent initial
configuration
for every i, apply step 2 there exists an s such
that Bi1s(Bi) is bivalent with process Pj
taking the final step in s, ji mod n

s
s
s
s
all processes in turn

B1
B2
Bi
Bi1

55
Proof step 1
C0
0
C1
1

Consider a 0-valent configuration C0 and a
1-valent configuration C1
We can assume that C0 and C1 differ in the
initial value of only one process, say process P
(by flipping one value at a time)
An admissable deciding run starting in C0 in the
schedule s of which P does not take any step can
also be applied to C1
Decision is 0 in the run starting in C0 and 1 in
the run starting in C1 contradiction

0-val.
1-val.
P
C0
C1
P
others s
56
Proof step 2 (1)
C
C

Let C be a bivalent configuration in which
e(P,m) is applicable
Let C be the set of configurations reachable from
C without ever applying e
Let D be the set of configurations obtained by
applying e to every configuration in C
To prove D contains a bivalent configuration

apply e
e(C)
D
57
Proof step 2 (2)
C
C

Assume D does not contain bivalent configurations
D does contain both 0-valent and 1-valent
configurations
Let Ei be an i-valent configuration reachable
from C, i0,1
If Ei in C, let Fie(Ei)
If Ei not in C, event e was used in getting from
C to Ei, and so there exists an Fi in D such that
Ei can be reached from Fi
Fi is i-valent, i0,1

Ei
apply e
e(C)
D
Fi
Fi
Ei
58
Proof step 2 (3)
C
C

So D must have neighboring 0- and 1-valent
configurations D0 and D1
So there exist C0 and C1 in C such that
C1e(C0) for some step e(P,m)
Die(Ci) is i-valent, i0,1
Steps in different processes commute
Two cases
P?P then e and e commute and D1e(D0)
contradiction!!

C0
e
C1
apply e
(P,m)
e(C)
D
D0
D1
59
Proof step 2 (4)

PP
consider a finite deciding run from C0 in which P
does not do any steps let s be the schedule and
let As(C0)
s is applicable to D0 and D1
let Eis(Di), i0,1 Ei is i-valent
then E0e(A) and E1e(e(A))
so A is bivalent contradiction with run to A
deciding!!