EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing

Description:

The growing reliance of industry and government on online information services ... If the timer T expires before receiving new-view message it starts the view ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 38
Provided by: wenbin
Category:

less

Transcript and Presenter's Notes

Title: EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing


1
EEC 693/793Special Topics in Electrical
EngineeringSecure and Dependable Computing
  • Lecture 16
  • Wenbing Zhao
  • Department of Electrical and Computer Engineering
  • Cleveland State University
  • wenbing_at_ieee.org

2
Outline
  • Reminder
  • Midterm 2, May 1, 4-6pm
  • May 3, no class
  • Project presentation May 8 4-8pm, attendance
    mandatory
  • Project report due May 8 midnight
  • Review
  • Byzantine general problem
  • Practical Byzantine fault tolerance
  • By Miguel Castro and Barbara Liskov, OSDI99
  • http//www.pmg.csail.mit.edu/papers/osdi99.pdf

3
Byzantine Generals Problem
  • A commanding general must send an order to
  • his n-1 lieutenants such that
  • IC1. All loyal lieutenants obey the same order
  • IC2. If the commanding general is loyal, then
    every loyal lieutenant obeys the order he sends

4
Byzantine Agreement Protocol
  • Round 1 the commander sends a value to each of
    the lieutenants
  • Round 2 each of the lieutenants sends the value
    it received to its peers
  • At the end of round 2, each lieutenant check to
    see if there is a majority opinion (attack or
    retreat). We have a solution if there is
  • Question is Can you find a counter example to
    show that the above protocol does not work if
    fgt2?

5
Introduction to BFT Paper
  • The growing reliance of industry and government
    on online information services
  • Malicious attacks become more serious and
    successful
  • More software errors due to increased size and
    complexity of software
  • This paper presents practical algorithm for
    state machine replication that works in
    asynchronous systems like the Internet

6
Assumptions
  • Asynchronous distributed system
  • The network may fail to deliver, delay, duplicate
    or deliver them out of order
  • Faulty nodes may behave arbitrarily
  • Independent node failures
  • The adversary cannot delay correct nodes
    indefinitely
  • All messages are cryptographically signed by
    their sender and these signatures cannot be
    subverted by the adversary

7
Service Properties
  • A (deterministic) service is replicated among
    3f1 processors. Resilient to f failures
  • Safety All replicas guaranteed to process the
    same requests in the same order
  • Liveness Clients eventually receive replies to
    their requests

8
Optimal Resiliency
  • Imagine non-faulty processors trying to agree
    upon a piece of data by telling each other what
    they believe the data to be
  • A non-faulty processor must be sure about a piece
    of data before it can proceed
  • f replicas may refuse to send messages, so each
    processor must be ready to proceed after having
    received (n-1)-f messages
  • Total of n-1 other replicas

9
Optimal Resiliency
  • But what if f of the (n-1)-f messages come from
    faulty replicas?
  • To avoid confusion, the majority of messages must
    come from non-faulty nodes, i.e, (n-f-1)/2 f
  • gt Need a total of 3f1 replicas

10
BFT Algorithm in a Nutshell
Backup
f 1 Match (OK)
Client
Primary
Backup
Backup
11
Replicas and Views
Set of replicas (R) R 3f 1
R1
R1
R0
R0
R0
RR-1
R2

View 0
View 1
For view v primary p is assigned such that p v
mod R
12
Safeguards
  • If the client does not receive replies soon
    enough, it broadcasts the request to all replicas
  • If the request has already been processed, the
    replicas simply re-send the reply (replicas
    remember the last reply message they sent to each
    client)
  • If the primary does not multicast the request to
    the group, it will eventually be suspected to be
    faulty by enough replicas to cause a view change

13
Normal Case Operation
Client
Primary
REQUEST, o, t, c
o Operation t Timestamp c - Client
  • Timestamps are totally ordered such that later
  • requests have higher timestamps than earlier ones

14
Normal Case Operation
  • Primary p receives a client request m , it starts
    a three-phase protocol
  • Three phases are
  • pre-prepare
  • prepare
  • commit

15
Pre-Prepare Phase
Backup
Primary
ltltPRE-PREPARE, v, n, dgt , mgt
Backup
  • v view number
  • n sequence number
  • d digest of the message D(m)
  • m message

Backup
16
Prepare Phase
  • A backup accepts the PRE-PREPARE message only if
  • The signatures are valid and the digest matches m
  • It is in view v
  • It has not accepted a PRE-PREPARE for the same v
    and n
  • Sequence number is within accepted bounds

17
Prepare Phase
  • If backup i accepts the pre-prepare message it
    enters prepare phase by multicasting
  • ltPREPARE, v, n, d, igt
  • to all other replicas and adds both messages to
    its log
  • Otherwise does nothing
  • Replica (including primary) accepts prepare
    message and adds them to its log, provided that
  • Signatures are correct
  • View numbers match the current view
  • Sequence number is within accepted bounds

18
Prepare Phase
  • At replica i, prepared (m, v, n, i) true,
  • iff 2f PREPARE from different backups (not
    including replica i) that match the pre-prepare
  • When prepared true, replica i multicasts
  • ltCOMMIT, v, n, d , igt to other replicas

19
Agreement Achieved
  • If primary is non-faulty then all 2f1 non-faulty
    replicas agree on the sequence number
  • If primary is faulty
  • Either f1 non-faulty replicas (majority) agree
    on some other sequence and the rest realize that
    the primary is faulty
  • Or, all non-faulty replicas will suspect the
    primary is faulty
  • When a faulty primary is replaced, the minority
    of confused non-faulty replicas are brought up to
    date up by the majority

20
Commit Phase
  • Replicas accept commit messages and insert them
    in their log provided signatures are same
  • Define committed and committed-local predicates
    as
  • Committed (m, v, n) true, iff prepared (m, v,
    n, i) is true for all i in some set of f1
    non-faulty replicas
  • Committed-local (m, v, n, i) true iff the
    replica has accepted 2f1 commit message from
    different replicas that match the pre-prepare for
    m
  • If Committed-local (m,v,n,i) is true for some
    non-faulty replica i, then committed (m,v,n) is
    true

21
Commit Phase
  • Replica i executes the operation requested by m
    after committed-local (m, v, n, i) true and is
    state reflects the sequential execution of all
    requests with lower sequence numbers
  • The PRE-PREPARE and PREPARE phases of the
    protocol ensure agreement on the total order of
    requests within a view
  • The PREPARE and COMMIT phases ensure total
    ordering across views

22
Normal Operation Reply
  • All replicas sends the reply ltREPLY, v, t, c, i,
    rgt, directly to the client
  • v current view number
  • t timestamp of the corresponding request
  • i replica number
  • r result of executing the requested
    operationc client id
  • Client waits for f1 replies with valid
    signatures from different replicas, and with same
    t and r, before accepting the result r

23
Normal Case Operation Summery
Request Pre-prepare Prepare
Commit Reply
C
Primary 0
1
2
Faulty 3
X
24
Garbage Collection
  • Used to discard messages from the log
  • For the safety condition to hold, messages must
    be kept in a replicas log until it knows that
    the requests have been executed by at least f1
    non-faulty replicas
  • Achieved using a checkpoint, which occur when a
    request with sequence number (n) is divisible by
    some constant is executed

25
Garbage Collection
  • When a replica i produces a checkpoint it
    multicasts a message ltCHECKPOINT, n, d, igt to
    other replicas
  • Each replica collects checkpoint messages in its
    log until it has 2f1 of them for sequence number
    n with same digest d
  • This creates a stable checkpoint and the replica
    discards all the pre-prepare, prepare and commit
    messages

26
View Changes
  • Triggered by timeouts that prevent backups from
    waiting indefinitely for request to execute
  • If the timer of backup expires in view v, the
    backup starts a view change to move to view v1
    by,
  • Not accepting messages (other than check-point,
    view-change, and new-view messages)
  • Multicasting a VIEW-CHANGE message

27
View Changes
  • VIEW-CHANGE message is defined as
  • ltVIEW-CHANGE, v1, n, C, P, igt
  • where,
  • C 2f 1 checkpoint messages
  • P set of sets Pm
  • Pm a PRE-PREPARE msg all PREPARE messages
  • for all messages with committed false

28
View Change - Primary
  • Primary p of view v1 receives 2f valid
    VIEW-CHANGE messages
  • Multicasts a ltNEW-VIEW, v 1, V, Ogt message to
    all other replicas where
  • V set of 2f valid VIEW-CHANGE messages
  • O set of reissued PRE-PREPARE messages
  • Moves to view v1

29
View Changes - Backups
  • Accepts NEW-VIEW by checking V and O
  • Sends PREPARE messages for everything in O
  • Moves to view v1

30
Events Before the View Change
  • Before the view change we have two groups of
    non-faulty replicas the Confused minority and
    the Agreed majority
  • A non-faulty replica becomes Confused when it is
    kept by the faulty's from agreeing on a sequence
    number for a request
  • It can't process this request and so it will time
    out, causing the replica to vote for a new view

31
Events Before the View Change
  • The minority Confused replicas send a VIEW-CHANGE
    message and drop off the network
  • The majority Agreed replicas continue working as
    long as the faulty's help with agreement
  • The two groups can go out of synch but the
    majority keeps working until the faulty's cease
    helping with agreement

32
System State Faulty Primary
System State
Is Erroneous View Change Possible?
Confused Minority f non-faulty replicas
Agreed Majority f1 non-faulty replicas
Agreed Majority f1 non-faulty replicas
Confused Minority f non-faulty replicas
Adversary f non-faulty replicas
P
Adversary f non-faulty replicas
f faulty replicas
P
f faulty replicas
2f replicas NOT enough to change views
33
Events Before the View Change
  • Given f1 non-faulty replicas that are trying to
    agree, the faulty replicas can either help that
    or hinder that
  • If they help, then agreement on request ordering
    is achieved and the clients get f1 matching
    replies for all requests with the faulty's help
  • If they hinder, then the f1 non-faulty's will
    time out and demand for a new view
  • When the new majority is in favor of a view
    change, we can proceed to the new view

34
System State Faulty Primary
Is it possible to continue processing requests?
System State
Confused Minority f non-faulty replicas
Confused Minority f non-faulty replicas
Agreed Majority f1 non-faulty replicas
Agreed Majority f1 non-faulty replicas
Adversary f non-faulty replicas
P
Adversary f non-faulty replicas
P
f faulty replicas
f faulty replicas
YES 2f1 replicas enough for agreement
35
System State Faulty Primary
Majority now large enough to independently move
to a new view
  • Confused Majority
  • 2f1 non-faulty replicas
  • Enough to agree to change views

Adversary f non-faulty replicas
P
P
f faulty replicas
f faulty replicas
Faulty replicas cease helping with agreement
36
Liveness
  • Replicas must move to a new view if they are
    unable to execute a request
  • To avoid starting a view change too soon, a
    replica that multicasts a view-change message for
    view v1, waits for 2f1 view-change messages and
    then starts the timer T
  • If the timer T expires before receiving new-view
    message it starts the view change for view v2
  • The timer will wait 2T before starting a
    view-change from v2 to v3

37
Liveness
  • If a replica receives f1 valid view-change
    messages from other replicas for views greater
    than its current view, it sends a view-change
    message for the smallest view in the set, even if
    T has not expired
  • Faulty replicas cannot cause a view-change by
    sending a view-change message since a view-change
    will happen only if at least f1 replicas send
    view-change message
  • The above three techniques guarantee liveness,
    unless message delays grow faster than the
    timeout period indefinitely
Write a Comment
User Comments (0)
About PowerShow.com