EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing

Description:

The growing reliance of industry and government on online information services ... If the timer T expires before receiving new-view message it starts the view ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 38

Provided by: wenbin

Learn more at: https://academic.csuohio.edu

Category:

more less

Transcript and Presenter's Notes

Title: EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing

1
EEC 693/793Special Topics in Electrical
EngineeringSecure and Dependable Computing

Lecture 16
Wenbing Zhao
Department of Electrical and Computer Engineering
Cleveland State University
wenbing_at_ieee.org

2
Outline

Reminder
Midterm 2, May 1, 4-6pm
May 3, no class
Project presentation May 8 4-8pm, attendance
mandatory
Project report due May 8 midnight
Review
Byzantine general problem
Practical Byzantine fault tolerance
By Miguel Castro and Barbara Liskov, OSDI99
http//www.pmg.csail.mit.edu/papers/osdi99.pdf

3
Byzantine Generals Problem

A commanding general must send an order to
his n-1 lieutenants such that
IC1. All loyal lieutenants obey the same order
IC2. If the commanding general is loyal, then
every loyal lieutenant obeys the order he sends

4
Byzantine Agreement Protocol

Round 1 the commander sends a value to each of
the lieutenants
Round 2 each of the lieutenants sends the value
it received to its peers
At the end of round 2, each lieutenant check to
see if there is a majority opinion (attack or
retreat). We have a solution if there is
Question is Can you find a counter example to
show that the above protocol does not work if
fgt2?

5
Introduction to BFT Paper

The growing reliance of industry and government
on online information services
Malicious attacks become more serious and
successful
More software errors due to increased size and
complexity of software
This paper presents practical algorithm for
state machine replication that works in
asynchronous systems like the Internet

6
Assumptions

Asynchronous distributed system
The network may fail to deliver, delay, duplicate
or deliver them out of order
Faulty nodes may behave arbitrarily
Independent node failures
The adversary cannot delay correct nodes
indefinitely
All messages are cryptographically signed by
their sender and these signatures cannot be
subverted by the adversary

7
Service Properties

A (deterministic) service is replicated among
3f1 processors. Resilient to f failures
Safety All replicas guaranteed to process the
same requests in the same order
Liveness Clients eventually receive replies to
their requests

8
Optimal Resiliency

Imagine non-faulty processors trying to agree
upon a piece of data by telling each other what
they believe the data to be
A non-faulty processor must be sure about a piece
of data before it can proceed
f replicas may refuse to send messages, so each
processor must be ready to proceed after having
received (n-1)-f messages
Total of n-1 other replicas

9
Optimal Resiliency

But what if f of the (n-1)-f messages come from
faulty replicas?
To avoid confusion, the majority of messages must
come from non-faulty nodes, i.e, (n-f-1)/2 f
gt Need a total of 3f1 replicas

10
BFT Algorithm in a Nutshell
Backup
f 1 Match (OK)
Client
Primary
Backup
Backup
11
Replicas and Views
Set of replicas (R) R 3f 1
R1
R1
R0
R0
R0
RR-1
R2

View 0
View 1
For view v primary p is assigned such that p v
mod R
12
Safeguards

If the client does not receive replies soon
enough, it broadcasts the request to all replicas
If the request has already been processed, the
replicas simply re-send the reply (replicas
remember the last reply message they sent to each
client)
If the primary does not multicast the request to
the group, it will eventually be suspected to be
faulty by enough replicas to cause a view change

13
Normal Case Operation
Client
Primary
REQUEST, o, t, c
o Operation t Timestamp c - Client

Timestamps are totally ordered such that later
requests have higher timestamps than earlier ones

14
Normal Case Operation

Primary p receives a client request m , it starts
a three-phase protocol
Three phases are
pre-prepare
prepare
commit

15
Pre-Prepare Phase
Backup
Primary
ltltPRE-PREPARE, v, n, dgt , mgt
Backup

v view number
n sequence number
d digest of the message D(m)
m message

Backup
16
Prepare Phase

A backup accepts the PRE-PREPARE message only if
The signatures are valid and the digest matches m
It is in view v
It has not accepted a PRE-PREPARE for the same v
and n
Sequence number is within accepted bounds

17
Prepare Phase

If backup i accepts the pre-prepare message it
enters prepare phase by multicasting
ltPREPARE, v, n, d, igt
to all other replicas and adds both messages to
its log
Otherwise does nothing
Replica (including primary) accepts prepare
message and adds them to its log, provided that
Signatures are correct
View numbers match the current view
Sequence number is within accepted bounds

18
Prepare Phase

At replica i, prepared (m, v, n, i) true,
iff 2f PREPARE from different backups (not
including replica i) that match the pre-prepare
When prepared true, replica i multicasts
ltCOMMIT, v, n, d , igt to other replicas

19
Agreement Achieved

If primary is non-faulty then all 2f1 non-faulty
replicas agree on the sequence number
If primary is faulty
Either f1 non-faulty replicas (majority) agree
on some other sequence and the rest realize that
the primary is faulty
Or, all non-faulty replicas will suspect the
primary is faulty
When a faulty primary is replaced, the minority
of confused non-faulty replicas are brought up to
date up by the majority

20
Commit Phase

Replicas accept commit messages and insert them
in their log provided signatures are same
Define committed and committed-local predicates
as
Committed (m, v, n) true, iff prepared (m, v,
n, i) is true for all i in some set of f1
non-faulty replicas
Committed-local (m, v, n, i) true iff the
replica has accepted 2f1 commit message from
different replicas that match the pre-prepare for
m
If Committed-local (m,v,n,i) is true for some
non-faulty replica i, then committed (m,v,n) is
true

21
Commit Phase

Replica i executes the operation requested by m
after committed-local (m, v, n, i) true and is
state reflects the sequential execution of all
requests with lower sequence numbers
The PRE-PREPARE and PREPARE phases of the
protocol ensure agreement on the total order of
requests within a view
The PREPARE and COMMIT phases ensure total
ordering across views

22
Normal Operation Reply

All replicas sends the reply ltREPLY, v, t, c, i,
rgt, directly to the client
v current view number
t timestamp of the corresponding request
i replica number
r result of executing the requested
operationc client id
Client waits for f1 replies with valid
signatures from different replicas, and with same
t and r, before accepting the result r

23
Normal Case Operation Summery
Request Pre-prepare Prepare
Commit Reply
C
Primary 0
1
2
Faulty 3
X
24
Garbage Collection

Used to discard messages from the log
For the safety condition to hold, messages must
be kept in a replicas log until it knows that
the requests have been executed by at least f1
non-faulty replicas
Achieved using a checkpoint, which occur when a
request with sequence number (n) is divisible by
some constant is executed

25
Garbage Collection

When a replica i produces a checkpoint it
multicasts a message ltCHECKPOINT, n, d, igt to
other replicas
Each replica collects checkpoint messages in its
log until it has 2f1 of them for sequence number
n with same digest d
This creates a stable checkpoint and the replica
discards all the pre-prepare, prepare and commit
messages

26
View Changes

Triggered by timeouts that prevent backups from
waiting indefinitely for request to execute
If the timer of backup expires in view v, the
backup starts a view change to move to view v1
by,
Not accepting messages (other than check-point,
view-change, and new-view messages)
Multicasting a VIEW-CHANGE message

27
View Changes

VIEW-CHANGE message is defined as
ltVIEW-CHANGE, v1, n, C, P, igt
where,
C 2f 1 checkpoint messages
P set of sets Pm
Pm a PRE-PREPARE msg all PREPARE messages
for all messages with committed false

28
View Change - Primary

Primary p of view v1 receives 2f valid
VIEW-CHANGE messages
Multicasts a ltNEW-VIEW, v 1, V, Ogt message to
all other replicas where
V set of 2f valid VIEW-CHANGE messages
O set of reissued PRE-PREPARE messages
Moves to view v1

29
View Changes - Backups

Accepts NEW-VIEW by checking V and O
Sends PREPARE messages for everything in O
Moves to view v1

30
Events Before the View Change

Before the view change we have two groups of
non-faulty replicas the Confused minority and
the Agreed majority
A non-faulty replica becomes Confused when it is
kept by the faulty's from agreeing on a sequence
number for a request
It can't process this request and so it will time
out, causing the replica to vote for a new view

31
Events Before the View Change

The minority Confused replicas send a VIEW-CHANGE
message and drop off the network
The majority Agreed replicas continue working as
long as the faulty's help with agreement
The two groups can go out of synch but the
majority keeps working until the faulty's cease
helping with agreement

32
System State Faulty Primary
System State
Is Erroneous View Change Possible?
Confused Minority f non-faulty replicas
Agreed Majority f1 non-faulty replicas
Agreed Majority f1 non-faulty replicas
Confused Minority f non-faulty replicas
Adversary f non-faulty replicas
P
Adversary f non-faulty replicas
f faulty replicas
P
f faulty replicas
2f replicas NOT enough to change views
33
Events Before the View Change

Given f1 non-faulty replicas that are trying to
agree, the faulty replicas can either help that
or hinder that
If they help, then agreement on request ordering
is achieved and the clients get f1 matching
replies for all requests with the faulty's help
If they hinder, then the f1 non-faulty's will
time out and demand for a new view
When the new majority is in favor of a view
change, we can proceed to the new view

34
System State Faulty Primary
Is it possible to continue processing requests?
System State
Confused Minority f non-faulty replicas
Confused Minority f non-faulty replicas
Agreed Majority f1 non-faulty replicas
Agreed Majority f1 non-faulty replicas
Adversary f non-faulty replicas
P
Adversary f non-faulty replicas
P
f faulty replicas
f faulty replicas
YES 2f1 replicas enough for agreement
35
System State Faulty Primary
Majority now large enough to independently move
to a new view

Confused Majority
2f1 non-faulty replicas
Enough to agree to change views

Adversary f non-faulty replicas
P
P
f faulty replicas
f faulty replicas
Faulty replicas cease helping with agreement
36
Liveness

Replicas must move to a new view if they are
unable to execute a request
To avoid starting a view change too soon, a
replica that multicasts a view-change message for
view v1, waits for 2f1 view-change messages and
then starts the timer T
If the timer T expires before receiving new-view
message it starts the view change for view v2
The timer will wait 2T before starting a
view-change from v2 to v3

37
Liveness

If a replica receives f1 valid view-change
messages from other replicas for views greater
than its current view, it sends a view-change
message for the smallest view in the set, even if
T has not expired
Faulty replicas cannot cause a view-change by
sending a view-change message since a view-change
will happen only if at least f1 replicas send
view-change message
The above three techniques guarantee liveness,
unless message delays grow faster than the
timeout period indefinitely