Title: Fault Tolerance in Distributed Systems
1Fault Tolerance in Distributed Systems
- Why
- increasing number of components raises
probability that some may fail - spread of applications with high reliability
requirements - Main Approaches
- robust algorithms guarantee correct behaviour
in the presence of failures - self-stabilization guarantee recovery from
effects of failures - Failure Models
- initially dead processes
- crash failures
- Byzantine(malign) failures
- omission failures
- link failures
- timing failures
2Decision Problems
- Each correct process irrevocably writes a
decision value to the output variable. - Requirements
- termination all correct processes decide
- consistency the decision values satisfy some
consistency conditions - non-triviality exclude algorithms not depending
on the input - Examples
- commit-abort
- consensus on input
- election
- approximate agreement
3Asynchronous Setting - Preliminaries
- t-crash fair execution at least n-t processes
execute infinitely many events, and each message
sent to a correct process is eventually received. - An algorithm A is 1-crash-robust consensus
algorithm if it satisfies the following three
properties - termination all correct processes decide
- consistency all correct processes decide on the
same value - non-triviality both 0 and 1 are possible
decision values - Lemma events occurring at disjoint sets of
processes commute
4Asynchronous Setting Preliminaries II
- v-decided configuration
- decided configuration
- v-valent configuration
- bivalent configuration
- fork configuration (actions of at most t
processes can force both outcomes) - Lemma A crash of at most t processes must be
survived by the remaining processes For each
reachable configuration g of t-robust algorithm
and for each subset S of at least n-t processes,
there exists a decided configuration d such that
g d - Lemma There exist no reachable fork.
5Impossibility of Asynchronous Consensus
Theorem There exists no asynchronous,
deterministic, 1-crash-robust consensus
algorithm. Let A be 1-crash-robust consensus
algorithm. Init-Lemma There exists a bivalent
initial configuration for A. - from
non-triviality Cont-Lemma Let g be a reachable
bivalent configuration and s an applicable step
for process p in g. There exists a sequence r of
events such that s is applicable in r(g) and
s(r(g)) is bivalent.
6Impossibility of Asynchronous Consensus II
- Proof of the theorem
- start with the initial bivalent configuration
- choose s to be the longest pending event and
apply Cont-Lemma. - the resulting computation never decides
- Let A be 1-crash-robust consensus algorithm.
- Discussion possible remedies
- weaker fault model (e.g. initially dead
processes) - weaker coordination requirements (e.g. renaming)
- randomization
- weaker termination requirements (Byzantine
broadcast, termination required only when the
source is correct) - synchrony
7Initially Dead Processes
- Observation
- solvable only if tltn/2
- if a message is received from a process, that
process will remain correct - Algorithm for process p
- shout p
- wait for at least L(n1)/2 shout messages
and add the received ids to the list Succp - shout ltp, Succpgt
- Alivep Succp , Recvp ?
- while Alivep ? Recvp
- receive ltq, Succqgt
- add q to Recvp and add q and Succq to Succp
- copute a knot in G
8Initially Dead Processes II
- Correctness
- receiving a name defines an oriented edge
- strongly connected components of this graph,
(without outgoing edges) are called knots - each knot has at least L nodes
- since 2Lgtn, there could be at most one knot
- after termination, each process has received the
successors for each of its descendants and can
compute the knot - Analysis/Discussion
- every correct process knows the same knot, the
leader can easily be selected and decide on its
input value - O(n2) messages of length L
- more efficient solutions exist (O(n(tlog n))
9Relaxing Coordination Requirements
- a problem T to be solved can be described by a
mapping from possible inputs Xn to sets of
outputs P(Dn) - examples
- consensus (0,0,0), (1,1,)
- election (1,0,,0), (0,1,0,),
- approximate agreemet (d0, d1, , dn-1),
di-djlt? - renaming (d0, d1, , dn-1), i?j ? di ? dj
- t-crash robust solution for task T
- termination in every t-crash fair execution,
all correct processes decide messages of length L - consistency if all processes are correct, the
decision vector is among the allowed ones for the
given input
10Probabilistic Consensus
- Relaxed termination condition
- lim Prcorrect process has not decided after k
steps 0 - Fair schedulingThe algorithms works in rounds.
In a round k a process shouts a message and waits
for n-t messages. Let R(q,p,k) is the event that
in round k process p receives (round k) message
from q among the first n-t messages. Fair
scheduling means - ??gt0 ? p,q,k PrR(p,q,k) ??
- ? k, p,q,r R(q,p,k) and R(q,r,k) are
independent - Upper bound on t There is no t-crash-robust
consensus protocol for t?n/2. - similar structure as the impossibility proof,
assume P is such protocol - P has bivalent initial configuration.
- let S and T be two disjoint sets of processes of
size t - for a reachable configuration g, g is either
S-0-valent and T-0-valent or S-1-valent and
T-1-valent (from commutativity of disjoint
events) - there is no reachable bivalent configuration
k??
11Probabilistic Consensus Crash Faults
- The algorithm
- shout (vote, p, round, weight)
- count how many times you have received each vote
and note whether you have received any votes with
weight gt n/2 (witness votes) - if a witness vote has been received, that is
what you will vote in the next round, otherwise
choose majority (the weight is how many times
this vote has been received) - if more then t witnesses has been received,
decide on their value, shout your decided value
for two more rounds and terminate - Correctness
- in any round, no two processes witness for two
different values - if a process decides, then all correct processes
decide for the same value and at most two rounds
later - lim PrNo decision is taken in round ? k0
- from fair scheduling there is a positive
probability that the n-t correct processes
communicate with each other in three
consecutive rounds
k??
12Probabilistic Consensus Byzantine Faults
- Note a process must be able to distinguish the
source of the received message, otherwise a
single malicious node can fool everybody - Upper bound on t There is no t-Byzantine robust
protocol for t ? n/3 - there exists initial bivalent configuration
(non-triviality) - there exist sets of processes such that S and T
such that S ? n-t, T ? n-1, S ?T ? t (from
t ? n/3 ) - for a reachable configuration g, g is either
S-0-valent and T-0-valent or S-1-valent and
T-1-valent - by contradiction, if the correct processes of S
and T alone can reach different decisions, then
the malicious nodes in the intersection can
simulate that to both correct parts of S and T
and force them to decide differently - there is no reachable bivalent configuration
- by contradiction, assume g is bivalent and
(wlog) S-1-valent and T-1-valent. There is
0-valent d reachable from g. On the path from g
to d there are g1 and g0 such that gv is
v-valent. The node in which event transforming g1
into g0 occurred can be neither in S, nor in T.
13Probabilistic Consensus Byzantine Faults
- Idea of the algorithm
- rounds with voting
- a process decides when it receives at least
(nt)/2 votes for the same value - receiving a vote does not mean directly
receiving a message with that vote (the adversary
nodes could fool the correct ones by sending
different votes to different nodes), but is
replaced by two step protocol - shout your vote
- forward received votes to all neighbours
- accept a vote if its echo has been received from
at least (nt)/2 nodes
14Probabilistic Consensus Byzantine Faults
The algorithm for processor p while (true) do
initialize roundp and arrays msgsp and echosp
to 0 shout(init-vote, p, valp, roundp) /
accept the first n-t votes in this round / while
(msgsp0 msgsp1 lt n-t) do
receive(msgtype, r, v, rn) from q if
((msgtype, r, , rn) has already been received
from q or (msgtype init-vote) and
(qltgtr)) then ignore it, q repeats/lies,
it is byzantine else if (rn gt roundp)
process the message when roundp reaches rn
else if (msgtype init-vote) then
shout(echo-vote, r,v,rn) else if (msg-type
echo-vote) and (roundp rn) then
echospr,v if (echospr,v (nt)/2
1) then msgspv endwhile valuep
majority of msgsp if (msgspvaluep gt
(nt)/2 then decide valuep roundp endwhile
15Probabilistic Consensus Byzantine Faults
- Correctness
- if a correct process p accepts a vote v for
correct process r in round k then r indeed voted
for v - at least one of the received messages must have
been forwarded by a correct process - if correct processes p and q accept a vote for
process r in round k, they both accept the same
vote - p and q have received the echo of rs vote
through at least one common correct process - if all processes start round k then all correct
processes complete round k - since n-tgt(nt)/2, a vote from a correct process
will be accepted, unless n-t votes have already
been accepted
16Probabilistic Consensus Byzantine Faults
- Correctness II
-
- if a correct process decides on v in round k,
then all correct processes choose v in round k
and the remaining rounds - lim PrCorrect p has not decided before round
k 0 - with positive probability the correct processes
accept in round k from the same collection of n-t
processors and in round k1 only votes for
processes in S - if all correct processes start the algorithm
with input v, a decision for v is eventually taken
k??
17FT in Synchronous Systems
- crashes can be identified and easily dealt with
- we focus of Byzantine failures
- bounded-delay discussion
- Byzantine broadcast problem
- The n/3 upper bound on t
- Byzantine broadcast algorithm
- exponential
- polynomial
- Byzantine broadcast with signatures
- exponential
- polynomial
18Byzantine Broadcast
- one process is the general, containing the
initial value xg from some set V - at most t processes may fail in arbitrary
(Byzantine) way, including the general - the goals of Byzantine broadcast are
- Termination. Every correct process p will decide
on a value yp from V - Agreement. All correct processes decide on the
same value. - Dependence. If the general is correct, all
correct processes decide on xg - Theorem There is not-Byzantine robust broadcast
protocol for t ? n/3
19Byzantine Broadcast with Authentication
- each process can sign messages ltmsggtp,
ltmsggtpq - a byzantine process cannot forge others
signatures - 2 algorithms
- KeepSigning Lamport, Shostak, Pease
- (exponential communication, t1 steps)
- FirstTwo Dolev, Strong (quadratic
communication, t1 steps) - KeepSigning
- Step 1 general shouts ltvalue xggtg
- Step i receive messages ltvaluexgtgpl2p
l3pli - If a message is correct, add your
signature and shout it. - Remember values seen in correct
messages (in set Wp) - Step t1
- If Wp contains single value v, decide v,
otherwise decide default (0)
20Byzantine Broadcast with Authentication
- KeepSigning Correctness
-
- Termination in step t1
- Dependence If the general is correct with value
v, only value v will appear in the sets Wp and
all correct processes will decide on v. - Agreement If the general is not correct, all
correct processes will have the same sets Wp and
decide the same value. - Let p inserted v into Wp after seeing message
ltvaluevgtgpl2p l3pli in round i. - if qs signature is listed there, q already
inserted v into Wq - if iltt1, p will send the message to q and q
will insert v in round i1 - if it1, at least one of the signatures is of a
correct process, which has sent v to q
21Byzantine Broadcast with Authentication
- Algorithm FirstTwo
- it is not necessary to forward all messages,
just to detect whether there are two different
values signed by the general - shout only the first two different values you
learn about - if, after t1 steps, two values have been seen,
decide general lies, otherwise decide on the
single value - Message complexity O(n2)
- Note
- up to n-1 faults can be tolerated
22Synchronous Byzantine Broadcast
- Preliminaries
- the algorithm works in rounds, shouting either
value 1 or the processors identities - Rpq,v means p has received value v from q,
initialized to false - idea gather enough evidence that the general
said 1 - low threshold Lt1, high threshold H2t1
- Three activities at process p
- supporting set Sp p decides to supports q
iff - p has received value 1 directly from q (direct
support) - r Rpr,q ? L (indirect support)
- p supports q by shouting q
- confirming set Cp q r Rpr,q ? H
- initiating
- general with value 1 initiates
- a process receiving 1 from the general in step 1
initiates - a process in step i initiates if it has
confirmed at least Th(i)Lmax(0,i/2-1)
lieutenants - initiation means shouting value 1
23Synchronous Byzantine Broadcast II
The algorithm for process p Initialization
boolean Rp, false boolean inip (p
g) and (xp 1) Step i / sending / if
inip then shout lt1gt shout ltqgt for all q in your
supporting set Sp receive all messages of this
step / processing, update Sp and Cp / if i1
and Rpg,v then inip true if CpL ? Th(i)
then inip true if i2t3 then / deciding
/ if Cp ? H then decide 1 else decide 0
24Synchronous Byzantine Broadcast III
- Correctness
- if correct p supports q then q has sent lt1gt
- if p supports q indirectly, then some correct
process supported q - therefore there must be first direct support,
hence q sent lt1gt - if a correct process p confirms q then all
correct processes confirm q at most one step
later - at least L correct processes supported q,
therefore each correct process received at least
L supports for q so the next step all correct
processes support q, after that each correct
process confirms q - simultaneous termination follows directly from
the algorithm (step 2t3) - dependence
- if the general shouts lt1gt, every correct
processor will initiate, then all of them will
support each other, then confirm and at step 2t3
decide on 1 - if the general does not have 1, only faulty
processes can start shouting lt1gt but there are
not enough of them to break the L threshold
needed for support
25Synchronous Byzantine Broadcast IV
- Correctness (agreement)
- sufficient to consider the case of Byzantine
general - Lemma If at least L processes (of set A)
initiate at the end of pulse i for ilt2t then all
correct processes decide on the value 1 - at step i2 all correct processes have confirmed
processes in A - we show that all of them initiate as well
- if i1, A ? LTh(3)
- if igt1
- there is a process r that initiated for the
first time in round i, therefore it has confirmed
at least Th(i) lieutenants - these lieutenants will be confirmed by all
correct processes by the end of round i1 - r is not among them, as it shouts lt1gt for the
first time in round i1 - but all correct processes confirm r by the end
of pulse i2 - therefore at the end of pulse i2 each correct
process confirms Th(i)1Th(i2) lieutenants and
initiates - by step i4 all correct processes are confirmed
- since ilt2t, i4lt2t3 and at time 2t3 every
correct process decides
26Synchronous Byzantine Broadcast V
- Lemma If at least L correct processes initiate
and let i be the first step in which L correct
processes initiate. Then ilt2t. - In order a correct process to initiate, it must
confirm at least Th(2t)L(t-1) lieutenants, from
which at most t-1 are faulty (the general is
faulty). Hence, at least L correct lieutenants
must have been confirmed, i.e. they have been
initiated in an earlier step. - Note
- each process sends each of the n1 possible
messages at most over each link, resulting in
O(n3) message complexity (O(log n) bit messages) - if ngt3t1, a subset of 3t fixed lieutenants is
chosen and they announce the result to the
remaining passive processes - a passive process accepts value with
multiplicity at least t1 - complexity O(t3tn)