Fault Tolerance and Consensus - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Fault Tolerance and Consensus

Description:

Impossibility in asynchronous systems. Fundamentals and Design of Distributed Systems ... This is an example of an impossibility result. ASCIa9/november 2006. 17 ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 60
Provided by: EPE28
Category:

less

Transcript and Presenter's Notes

Title: Fault Tolerance and Consensus


1
Fault Tolerance and Consensus
  • Problem definitions
  • Stopping failures
  • Byzantine failures
  • Randomized solutions
  • Impossibility in asynchronous systems

Fundamentals and Design of Distributed Systems
D.H.J. Epema
Parallel and Distributed Group
2
Fault tolerance and consensus (1)
  • Two persons (Alice and Bob) try to make an
    appointment
  • Two propositions
  • PA Alice wants to have the appointment
  • PB Bob wants to have the appointment
  • Alice sends message A1 that she wants the
    appointment
  • Bob receives A1, and Knows PA KB(PA)
  • Bob sends back a message B1 that he wants to go
    too
  • Alice receives B1, and so KA(PB) and KA(KB(PA))
    hold
  • Alice sends confirmation A2 back
  • This continues for ever
  • Problem messages may have arbitrary delays and
    may get lost

KA(PB), KA(KB(PA))
A
A1
B1
A2
B
KB(PA)
3
Fault tolerance and consensus (2)
  • Processors may need to reach consensus
  • Applications
  • commit a transaction in a database
  • all participating sites have to agree on
    committing the results
  • in a distributed database with replication
  • when a record is to be modified, the database
    servers holding the replicas have to agree on the
    modification
  • in a replicated computation
  • processes have to start with the same input value
    (e.g., from a sensor), so they have to agree on
    this value
  • Agreement modeled as agreeing on the value of a
    single bit
  • Reaching consensus is a problem in the face of
    failures

4
Fault classification
  • Possible processor failures
  • fail-stop (crash) failures
  • a process just stops
  • when in a round in a synchronous system a process
    should send a set of messages, it may only send a
    subset
  • omission failures
  • fail to send or receive a message
  • performance failures
  • not meeting timing specifications
  • Byzantine failures
  • random (malicious) behavior

5
Model aspects
  • Synchronous versus asynchronous
  • reaching agreement is much more difficult in
    asynchronous systems difference between long
    delay and processor/link failure cannot be
    detected
  • Authentication
  • without messages cannot be forged or altered by
    a process before passing them along to others
  • with messages cannot be forged or modified
  • agreement much more difficult to reach when
    messages are non-authenticated
  • Network connectivity
  • we assume a complete network

6
Agreement with stopping failures
  • All processes start with an initial value from
    some set V
  • Every process has to decide on a value in V such
    that
  • Agreement no two processes decide on different
    values
  • Validity if all processes start with the same
    value v, then no process decides on a value
    different from v
  • Termination all non-faulty processes decide
    within finite time

7
Byzantine generals
attack
no attack
  • City surrounded by armies
  • Armies have to attack simultaneously in order to
    conquer the city
  • Communication between generals by means of
    messengers
  • Some generals of the armies are traitors

8
The Byzantine agreement problem
  • One process (the source or commander) starts with
    a binary value
  • Each of the remaining processes (the lieutenants)
    has to decide on a binary value such that
  • Agreement all non-faulty processes agree on the
    same value
  • Validity if the source is non-faulty, then all
    non-faulty processes agree on the initial
    value of the source
  • Termination all processes decide within finite
    time
  • So if the source is faulty, the non-faulty
    processes can agree on any value
  • It is irrelevant on what value a faulty process
    decides

C (0/1)
9
Two variations
  • All generals start with a value
  • Variation 1
  • all non-faulty generals have to agree on a vector
    with a value for every general
  • solution run a copy of an algorithm for the
    previous problem for every general
  • Variation 2
  • all non-faulty generals have to agree on a single
    value
  • solution apply the same decision rule on the
    vector in every general (e.g., majority function)

(v1,v2,,vn)
majority(v1,v2,vn)
10
A solution for stopping failures (1)
  • Solution by flooding decision values
  • No more than f failing processes
  • Every process starts with a value v
  • Every process maintains a set W (with decision
    values seen sofar)
  • Initially Wv
  • Then, do f1 rounds
  • broadcast current value of W to all other
    processes
  • receive all these sets and set W to the union of
    them all and W
  • Finally,
  • if W contains only a single element v, decide(v)
  • else decide(default)

11
A solution for stopping failures (2)
  • Validity and termination are trivially satisfied
  • For agreement
  • enough to show that all processes that are still
    active at the end of round f1 then have the same
    set W
  • because there are f1 rounds and at most f
    failing processes, there is at least one round r
    in which no process fails
  • in round r all active processes exchange their
    sets W, and so have identical sets W at the end
    of the round
  • from then on, all sets W in all active processes
    are identical

round r
nobody fails
all same W
12
A solution for stopping failures (3)
  • Optimization
  • processes only need to know whether at the end
    W1 or Wgt1
  • so let processes only broadcast at most two
    values
  • their initial value
  • the first different value they receive

13
Conditions for a solution for Byzantine
  • Number of processes n
  • Maximum number of possibly failing processes f
  • Necessary and sufficient condition for a solution
    to Byzantine agreement
  • fltn/3
  • Minimal number of rounds in a deterministic
    solution
  • f1
  • There exist randomized solutions with a lower
    expected number of rounds

14
Example three generals (1)
  • Scenario 1 Lieutenant L2 is a traitor

C
note all messages sent and received by L1
0
0
0
L1
L2
1
15
Example three generals (2)
  • Scenario 2 Commander C is a traitor

C
same messages sent and received by L1
0
1
0
L1
L2
1
16
Example three generals (3)
  • L1 has to decide 0 in scenario 1, because both L1
    and C are loyal and C starts with a 0
  • Lieutenant L1 cannot distinguish the two
    scenarios
  • So L1 also has to decide 0 in scenario 2
  • So a loyal lieutenant (L1) always has to follow
    the commander
  • The same holds for L2, so L2 has to decide 1 in
    scenario 2
  • Contradiction L1 and L2 are both loyal in
    scenario 2, but decide on different values!
  • This is an example of an impossibility result

17
A solution for Byzantine agreement (1)
  • Algorithm is recursive with f1 levels
  • Without authentication, modeled with Oral
    Messages (OM)
  • When a message is supposed to be sent according
    to the algorithm, but a process does not send it,
    this is detected, and a default value (e.g., 0)
    is assumed
  • Bottom case of the recursion OM(0) (no failures)
  • the commander broadcasts its initial value
  • every other process decides on the value it
    receives

18
A solution for Byzantine agreement (2)
  • OM(f), fgt0 (resilient to f failures)
  • the commander broadcasts its initial value
  • process numbering commander0, lieutenants
    1,2,,n-1
  • let vi be the value received from the commander
    by lieutenant Li, or the default if no value is
    received
  • recursive step
  • Li executes OM(f-1), acting as the commander for
    the other lieutenants (L1, , Li-1, Li1, ,
    Ln-1)
  • let vj be the value on which Li decides in the
    recursive step with Lj as the commander (for
    i,j1,2,...,n-1, i ? j)
  • Li decides on majority(v1,,vi,,vn-1)

19
A solution for Byzantine agreement (3)
  • C

OM(f)
v1
vi
v2
vn-1
L1
L2

Li

Ln-1
OM(f-1)


Li
L2
Ln-1
here Li decides on its own v1 as a lieutenant of
L1
20
A solution for Byzantine agreement (4)
  • So a lieutenant does not decide on the majority
    of all values it receives!!!
  • But Li decides on majority(majority(),majority
    (),,vi,,majority(),,majority())

computed as the decision when acting as a
lieutenant in OM(f-1)
obtained directly from the commander
21
A solution for Byzantine agreement (5)
  • Number of executions
  • OM(f) 1 time
  • OM(f-1) (n-1) times
  • OM(k) (n-1)(n-2) (n-fk) times for
    k0,1,...,f-1
  • Total number of messages is of order nf1
  • OM(f) n-1
  • OM(f-1) (n-1)(n-2)
  • OM(k) (n-1)(n-2) (n-(f-k))(n-(f-k1))
  • OM(0) (n-1)(n-2) (n-(f1)) (f1 factors, this
    dominates)

22
A solution for Byzantine Agreement (6)
0
level 0
In lieutenant L6
n7 f2 i6
1
2
3
4
5
level 1



level 2
3
1
2
4
2
3
4
5
  • In order to decide, every lieutenant Li creates a
    labelled tree with f1 levels
  • level 0 the root with label 0 (the commander)
  • level 1 n-2 children with all labels except 0
    and i
  • at every subsequent level all ids that have not
    yet occurred on the path from the root and are
    different from i
  • the degree decreases by 1 at every level

23
A solution for Byzantine Agreement (7)
0
level 0
In lieutenant L6
v6
n7 f2 i6
1
2
3
4
5
value received from L1
level 1



level 2
3
1
2
4
2
3
4
5
value received through L1 and L5
  • Label the nodes of the tree with additional
    labels
  • level 0 vi (value received from the commander)
  • level 1 the value that Lj told Li that the
    commander told him
  • label of any node the value that was passed to
    Li from the commander through the chain of
    lieutenants on the path from the root to the node

24
A solution for Byzantine Agreement (8)
v6
level 0
In lieutenant L6
n7 f2 i6
majority(v,w,x,y,z)
v
level 1



level 2
w
x
y
z
decide z
  • Decide by propagating the result up with the
    majority function
  • at the leaf level decide on the value received
    (OM(0))
  • at every next higher level take the majority of
    the local value and the decisions at child nodes
  • the final value at the root is the final decision

25
Example four generals (1)
Lieutenant is a traitor
C
v
v
OM(1)
v
v
v
v
?
3xOM(0)
v
?
Every loyal lieutenant receives v,v,?
26
Example four generals (2)
Lieutenant is a traitor
C
x
z
y
x
y
y
z
x
z
Every loyal lieutenant receives x,y,z
27
Byzantine agreem. with authentication (1)
  • Every message carries a signature
  • The signature of a loyal general cannot be forged
  • Alteration of the contents of a signed message
    can be detected
  • Every (loyal) general can verify the signature of
    any other (loyal) general
  • Any number f of traitors can be allowed
  • Commander is process 0
  • Structure of message from (and signed by) the
    commander, and subsequently signed and sent by
    lieutenants Li1, Li2,
  • (v s0 si1 sik)
  • Every lieutenant maintains a set of orders V
  • Some choice function on V for deciding (e.g.,
    majority, minimum)

28
Byzantine agreem. with authentication (2)
  • Algorithm in commander
  • send(v s0) to every lieutenant
  • Algorithm in every lieutenant Li
  • upon receipt of (v s0 si1 . sik) do
  • if (v not in V) then
  • V V union v
  • if (k lt f) then
  • for (j in 1,2,,n-1 \ i,i1,,ik) do
  • send(v s0 si1 sik i) to Lj
  • if (Li will not receive any more messages) then
  • decide(choice(V))

sign and propagate messages long enough
29
Example three generals
  • Commander C is traitor

Format valuesignature(s)
C
10
00
V0,1
V0,1
101
L1
L2
002
30
Randomized Byzantine agreement (1)
  • Solution for synchronous and asynchronous
    systems!!
  • n processes, of which at most f fail, ngt5f
  • Every process has an initial value v
  • The algorithm proceeds in rounds consisting of
    three phases
  • a notification phase (messages contain an N)
  • a proposal phase (messages contain a P)
  • a decision phase
  • When a process expects messages from all other
    processes, it is no use waiting for more than n-f
    messages
  • When not enough processors support a possible
    decision, a process starts the next round with a
    new random input value v

31
Randomized Byzantine agreement (2)
  • r1
  • r1 decidedfalse
  • do forever
  • broadcast(N,r,v)
  • await (n-f) messages of the form (N,r,)
  • if (gt(nf)/2 messages (N,r,w), w0,1) then
  • broadcast(P,r,w)
  • else broadcast(P,r,?)
  • if decided then STOP
  • else await (n-f) messages of the form (P,r,)
  • if (gtf messages (P,r,w), w0,1) then
  • vw
  • if (gt3f messages (P,r,w)) then
  • decide(w)
  • decidedtrue
  • else vrandom(0,1)
  • rr1

conditions explained later
notification phase
proposal phase
decision phase
32
Randomized Byzantine agreement (3)
  • No simultaneous contradicting proposals by
    correct processes
  • Lemma 1
  • If a correct process proposes v in round r, then
    no other correct process proposes 1-v in round r
  • Proof
  • the process has received more than (nf)/2
    messages (N,r,v)
  • of these, more than (n-f)/2 are from correct
    processes, which is a majority of the correct
    processes

33
Randomized Byzantine agreement (4)
  • When all correct processes have the same value,
    immediate decision
  • Lemma 2
  • If at the start of round r all correct processes
    have the same value v, then they all decide v in
    round r
  • Proof
  • each correct process receives at least n-f
    notification messages, at least n-2f of which are
    from correct processes, and so of the form
    (N,r,v)
  • because ngt5f, we have n-2f n/2n/2-2f gt
    n/25f/2-2f (nf)/2
  • so each correct process proposes v
  • so, each correct process receives at least n-2f
    messages of the form (P,r,v)
  • because ngt5f, we have n-2fgt3f, and so each
    correct process decides v

34
Randomized Byzantine agreement (5)
  • Decision of any correct process immediately
    followed by others
  • Lemma 3
  • If a correct process decides v in round r, all
    correct processes decide v in round r1
  • Proof
  • enough all correct processes propose v in round
    r1
  • if a process decides v in round r, it must have
    received more than 3f proposals for v, m of which
    are from correct processes for some mgt2f
  • so every other correct processor receives at
    least m-fgtf proposals for v, so it starts the
    next round with this value
  • now use Lemma 2

35
Randomized Byzantine agreement (6)
  • Theorem
  • If ngt5f, the algorithm guarantees agreement,
    validity, and terminates with probability 1
  • Proof
  • with probability 1, enough processors will pick a
    common value v to have at least one correct
    process decide
  • Expected number of rounds is of order 2n (in
    fact, slightly better)
  • Remark randomization is used only if there is
    not enough initial support for any decision
    anyway

36
Randomized coordinated attack (1)
  • Synchronous system
  • Coordinated-attack problem
  • Complete graph
  • System runs for a fixed number r of rounds
  • Messages may get lost (all links may exhibit
    failures)
  • Processes do not exhibit failures

37
Randomized coordinated attack (2)
  • Validity
  • if all processes start with 0, they all decide 0
  • if all processes start with 1 and all messages
    are received, they all decide 1
  • Agreement with some probability
  • Psome process decides 0 and some process decides
    1e, for some 0 e 1 (probability of
    disagreement)
  • Termination trivial

38
Adversaries
  • Faults modeled with an adversary who can on
    purpose try to deceive the system/processors
  • Here, the adversary can choose
  • the input values of the processors
  • the communication pattern (can omit arbitrary
    messages)
  • In the algorithm, we get e1/r

39
Communication patterns (1)
  • Communication pattern a subset of the set
  • (i,j,k) (i,j) an edge in the processor graph, k
    a round number
  • We will define an ordering ? for pairs (i,k) for
    a communication pattern ?
  • Interpretation (i,k) ? (j,l) means that j has
    at least the same knowledge in round l as i had
    in round k

message in round k
i
j
40
Communication patterns (2)
  • Ordering ? for pairs (i,k) for a communication
    pattern ?
  • Knowledge is monotonic
  • (i,k) ? (i,l) if k l
  • All knowledge is transferred in messages
  • if (i,j,k) in ?, then (i,k-1) ? (j,k)
  • Transitive closure
  • if (i,k) ? (i,k) and (i,k) ? (i,k),
    then (i,k) ? (i,k)

41
Information level (1)
  • The information level on pairs (i,k) is defined
    as
  • k0 level?(i,0)0
  • kgt0 if there is a j?i such that (j,0) ? (i,k),
    then level?(i,k)0
  • kgt0 let ljmaxlevel?(j,k) (j,k) ? (i,k)
  • then, level?(i,k)1minlj j?i
  • The information level
  • starts at 0
  • indicates what a process knows about other
    processes
  • is incremented when a process has heard about the
    previous level of all other processes

42
Information level (2)
  • It can be shown that
  • the information levels of different processes in
    the same round never differ by more than 1
  • if the communication pattern is complete (all
    triples (i,j,k) appear), then level?(i,k)k for
    all i and k

0
0
1
0
information levels
1
2
2
3
43
The algorithm (1)
  • Ideas
  • Process 1 picks a random number k between 1 and r
  • Full information distribution in every round (on
    correct links)
  • Processes maintain information on the initial
    values v and the levels of all processes
  • Messages are of the form (L,V,k), with
  • L a vector with the levels as far as known by the
    sending process
  • V a vector with the initial values of all
    processes
  • k the round number picked by process 1
  • Levels and initial values of other processes, and
    k initially undefined

44
The algorithm (2)
  • Picking a round number in process 1
  • if ((i1) and (round0)) then
    keyrandom(1,r)
  • Sending a message in every round in every
    process
  • send(L,V,key) to all j


all locally known information
45
The algorithm (3)
  • Receiving all message in a round in process i
  • rounds rounds1
  • for (j1 to i-1, i1 to N) do
  • receive(Lj,Vj,kj) from j / on correct
    links /
  • if (kj ? undefined) then keykj /
    round number picked by 1 /
  • if (for all l, Vj(l) ? undefined) then
    Vi(l) Vj(l)/ copy init. vals /
  • if (for all l, Lj(l) gt Li(l)) then Li(l)
    Lj(l) / copy levels /
  • Li(i)1minLi(j) j ? i / compute own
    level /
  • if (roundsr) then
  • if (key ? undefined) and (Li(i)key) and
    (Vi(j)1 for all j) then
  • decision1
  • else decision0

all processes started with 1
46
Use of levels and key
  • In a sense, processes agree on their levels,
    i.e., on the actual round they have reached at
    the end of the algorithm
  • The key chosen by process 1 is a guess of this
    level

47
Why do we get e1/r ?
  • Sketch
  • Let li be the value of Li(i) in round r
  • The levels li differ by at most 1
  • If keygtmaxli or at least one process has
    initial value 0, all processes decide 0
    (agreement)
  • If keyminli and all processes have initial
    value 1, all processes decide 1 (again agreement)
  • So the only case where disagreement is possible,
    is when keymaxli, which has probability 1/r,
    since maxli is determined by the adversary and
    key is uniform on 1,r

48
We cant do much better
  • It can be shown that
  • Any r-round algorithm for the randomized
    coordinated attack problem has probability of
    disagreement at least equal to 1/(r1)

49
Impossibility of consensus in asynchronous systems
  • The consensus problem (weak form)
  • Consistency all correct processors that take a
    decision, take the same decision
  • Validity both 0 and 1 are possible decisions
    from possibly different initial configurations
    (to avoid trivial solutions)
  • Termination at least one correct processor
    eventually takes a decision
  • Theorem there is no such solution
    (FLPFischer-Lynch-Petterson)

50
The model (1)
  • Every processor has
  • an input variable x
  • an output variable y with possible values 0 and 1
  • Messages are of the form (P,m), with
  • P the destination processor
  • m the message contents
  • Message passing is not necessarily FIFO
  • This is modeled with a single global message
    buffer

P
Q
x y
x y
(P,m)
51
The model (2)
  • Every processor has a deterministic transition
    function
  • In a single step, a processor
  • receives a message (any from the message buffer
    meant for it)
  • does a local computation
  • sends a finite number of messages
  • A configuration is defined by
  • the internal state of every processor
  • the contents of the message buffer

52
The model (3)
  • In an initial configuration
  • every processor is in an initial state (with some
    value for x)
  • the message buffer is empty
  • A step
  • takes the system from one configuration to the
    next
  • is defined by a single step of one processor
  • is associated with an event (P,m)
  • receiving the message m by P
  • performing the ensuing computation
  • and sending a finite number of messages
  • A schedule s from C is a sequence of events
    applied to C
  • The configuration s(C) with s a finite schedule
    is said to be reachable from C

e(P,m)
C Ce(C)
s
C Cs(C)
53
The model (4)
  • A run is a sequence of configurations associated
    with a schedule
  • A processor is in a decision state if it has
    given y a value
  • A run is a deciding run if at least one
    processors reaches a decision state
  • A processor is non-faulty in a run if it performs
    infinitely many steps
  • A run is admissable if at most one processor is
    faulty, and all messages to non-faulty processors
    are eventually received
  • A configuration is bivalent if both decision
    values are possible (for a correct processor)
  • A configuration is univalent if only one of the
    decision values is possible (0-valent or 1-valent)

54
Outline impossibility proof
  • Suppose there exists a 1-resilient consensus
    protocol
  • Step 1. There exists a bivalent initial
    configuration
  • Step 2. For every bivalent configuration C and
    processor P, there is a finite schedule s such
    that s(C) is bivalent and P takes a step in s
  • Step 3. Keep the system in bivalent
    configurations forever
  • apply step 1 let B1 be a bivalent initial
    configuration
  • for every i, apply step 2 there exists an s such
    that Bi1s(Bi) is bivalent with process Pj
    taking the final step in s, ji mod n

s
s
s
s
all processes in turn

B1
B2
Bi
Bi1

55
Proof step 1
C0
0
C1
1
  • Consider a 0-valent configuration C0 and a
    1-valent configuration C1
  • We can assume that C0 and C1 differ in the
    initial value of only one process, say process P
    (by flipping one value at a time)
  • An admissable deciding run starting in C0 in the
    schedule s of which P does not take any step can
    also be applied to C1
  • Decision is 0 in the run starting in C0 and 1 in
    the run starting in C1 contradiction

0-val.
1-val.
P
C0
C1
P
others s
56
Proof step 2 (1)
C
C
  • Let C be a bivalent configuration in which
    e(P,m) is applicable
  • Let C be the set of configurations reachable from
    C without ever applying e
  • Let D be the set of configurations obtained by
    applying e to every configuration in C
  • To prove D contains a bivalent configuration

apply e
e(C)
D
57
Proof step 2 (2)
C
C
  • Assume D does not contain bivalent configurations
  • D does contain both 0-valent and 1-valent
    configurations
  • Let Ei be an i-valent configuration reachable
    from C, i0,1
  • If Ei in C, let Fie(Ei)
  • If Ei not in C, event e was used in getting from
    C to Ei, and so there exists an Fi in D such that
    Ei can be reached from Fi
  • Fi is i-valent, i0,1

Ei
apply e
e(C)
D
Fi
Fi
Ei
58
Proof step 2 (3)
C
C
  • So D must have neighboring 0- and 1-valent
    configurations D0 and D1
  • So there exist C0 and C1 in C such that
  • C1e(C0) for some step e(P,m)
  • Die(Ci) is i-valent, i0,1
  • Steps in different processes commute
  • Two cases
  • P?P then e and e commute and D1e(D0)
    contradiction!!

C0
e
C1
apply e
(P,m)
e(C)
D
D0
D1
59
Proof step 2 (4)
  • PP
  • consider a finite deciding run from C0 in which P
    does not do any steps let s be the schedule and
    let As(C0)
  • s is applicable to D0 and D1
  • let Eis(Di), i0,1 Ei is i-valent
  • then E0e(A) and E1e(e(A))
  • so A is bivalent contradiction with run to A
    deciding!!

C0
e
e
e
C1
D0
s
D1
s
A
e
e
s
e
E0
E1
Write a Comment
User Comments (0)
About PowerShow.com