Title: Byzantine Techniques II
1Byzantine Techniques II
- Presenter Georgios Piliouras
- Partly based on slides by
- Justin W. Hart Rodrigo Rodrigues
2Papers
- Practical Byzantine Fault Tolerance. Miguel
Castro et. al. (OSDI 1999) - BAR Fault Tolerance for Cooperative Services.
Amitanand S. Aiyer, et. al. (SOSP 2005)
3Motivation
- Computer systems provide crucial services
client
server
4Problem
- Computer systems provide crucial services
- Computer systems fail
- natural disasters
- hardware failures
- software errors
- malicious attacks
client
server
Need highly-available services
5Replication
unreplicated service
6Replication
unreplicated service
replicated service
client
server replicas
- Replication algorithm
- masks a fraction of faulty replicas
- high availability if replicas fail
independently -
7Assumptions are a Problem
- Replication algorithms make assumptions
- behavior of faulty processes
- synchrony
- bound on number of faults
- Service fails if assumptions are invalid
8Assumptions are a Problem
- Replication algorithms make assumptions
- behavior of faulty processes
- synchrony
- bound on number of faults
- Service fails if assumptions are invalid
- attacker will work to invalidate assumptions
Most replication algorithms assume too much
9Contributions
- Practical replication algorithm
- weak assumptions ? tolerates attacks
- good performance
- Implementation
- BFT a generic replication toolkit
- BFS a replicated file system
- Performance evaluation
BFS is only 3 slower than a standard file system
10Talk Overview
- Problem
- Assumptions
- Algorithm
- Implementation
- Performance
- Conclusions
11Bad Assumption Benign Faults
- Traditional replication assumes
- replicas fail by stopping or omitting steps
12Bad Assumption Benign Faults
- Traditional replication assumes
- replicas fail by stopping or omitting steps
- Invalid with malicious attacks
- compromised replica may behave arbitrarily
- single fault may compromise service
- decreased resiliency to malicious attacks
13BFT Tolerates Byzantine Faults
- Byzantine fault tolerance
- no assumptions about faulty behavior
- Tolerates successful attacks
- service available when hacker controls replicas
14Byzantine-Faulty Clients
- Bad assumption client faults are benign
- clients easier to compromise than replicas
15Byzantine-Faulty Clients
- Bad assumption client faults are benign
- clients easier to compromise than replicas
- BFT tolerates Byzantine-faulty clients
- access control
- narrow interfaces
- enforce invariants
attacker replaces clients code
server replicas
Support for complex service operations is
important
16Bad Assumption Synchrony
- Synchrony ? known bounds on
- delays between steps
- message delays
- Invalid with denial-of-service attacks
- bad replies due to increased delays
- Assumed by most Byzantine fault tolerance
17Asynchrony
- No bounds on delays
- Problem replication is impossible
18Asynchrony
- No bounds on delays
- Problem replication is impossible
- Solution in BFT
- provide safety without synchrony
- guarantees no bad replies
19Asynchrony
- No bounds on delays
- Problem replication is impossible
- Solution in BFT
- provide safety without synchrony
- guarantees no bad replies
- assume eventual time bounds for liveness
- may not reply with active denial-of-service
attack - will reply when denial-of-service attack ends
20Talk Overview
- Problem
- Assumptions
- Algorithm
- Implementation
- Performance
- Conclusions
21Algorithm Properties
- Arbitrary replicated service
- complex operations
- mutable shared state
- Properties (safety and liveness)
- system behaves as correct centralized service
- clients eventually receive replies to requests
- Assumptions
- 3f1 replicas to tolerate f Byzantine faults
(optimal) - strong cryptography
- only for liveness eventual time bounds
22Algorithm
- State machine replication
- deterministic replicas start in same state
- replicas execute same requests in same order
- correct replicas produce identical replies
replicas
client
23Algorithm
- State machine replication
- deterministic replicas start in same state
- replicas execute same requests in same order
- correct replicas produce identical replies
replicas
client
Hard ensure requests execute in same order
24Ordering Requests
- Primary-Backup
- View designates the primary replica
- Primary picks ordering
- Backups ensure primary behaves correctly
- certify correct ordering
- trigger view changes to replace faulty primary
replicas
client
primary
backups
view
25Rough Overview of Algorithm
- A client sends a request for a service to the
primary
replicas
client
primary
backups
26Rough Overview of Algorithm
- A client sends a request for a service to the
primary - The primary mulicasts the request to the backups
replicas
client
primary
backups
27Rough Overview of Algorithm
- A client sends a request for a service to the
primary - The primary mulicasts the request to the backups
- Replicas execute request and sent a reply to the
client
replicas
client
primary
backups
28Rough Overview of Algorithm
- A client sends a request for a service to the
primary - The primary mulicasts the request to the backups
- Replicas execute request and sent a reply to the
client - The client waits for f1 replies from different
replicas with the same result this is the result
of the operation
f1 matching replies
replicas
client
primary
backups
view
29Quorums and Certificates
quorums have at least 2f1 replicas
quorum A
quorum B
3f1 replicas
quorums intersect in at least one correct replica
- Certificate ? set with messages from a quorum
- Algorithm steps are justified by certificates
30Algorithm Components
- Normal case operation
- View changes
- Garbage collection
- Recovery
All have to be designed to work together
31Normal Case Operation
- Three phase algorithm
- pre-prepare picks order of requests
- prepare ensures order within views
- commit ensures order across views
- Replicas remember messages in log
- Messages are authenticated
- ?? denotes a message sent by k
?k
32Pre-prepare Phase
assign sequence number n to request m in view v
request m
multicast ?PRE-PREPARE,v,n,m?
?0
primary replica 0
replica 1
replica 2
fail
replica 3
- backups accept pre-prepare if
- in view v
- never accepted pre-prepare for v,n with
different request
33Prepare Phase
digest of m
multicast ?PREPARE,v,n,D(m),1?
?1
m
prepare
pre-prepare
replica 0
replica 1
replica 2
replica 3
accepted ?PRE-PREPARE,v,n,m?
?0
all collect pre-prepare and 2f matching
prepares
P-certificate(m,v,n)
34Order Within View
No P-certificates with the same view and sequence
number and different requests
replicas
quorum for P-certificate(m,v,n)
quorum for P-certificate(m,v,n)
one correct replica in common ? m m
35Commit Phase
multicast ?COMMIT,v,n,D(m),2?
?2
replies
m
commit
pre-prepare
prepare
replica 0
replica 1
replica 2
fail
replica 3
replica has P-certificate(m,v,n)
all collect 2f1 matching commits
C-certificate(m,v,n)
- Request m executed after
- having C-certificate(m,v,n)
- executing requests with sequence number less
than n
36View Changes
- Provide liveness when primary fails
- timeouts trigger view changes
- select new primary (? view number mod 3f1)
37View Changes
- Provide liveness when primary fails
- timeouts trigger view changes
- select new primary (? view number mod 3f1)
- But also need to
- preserve safety
- ensure replicas are in the same view long enough
- prevent denial-of-service attacks
38View Change Protocol
send P-certificates ?VIEW-CHANGE,v1,P,2?
?2
fail
replica 0 primary v
2f VC messages
replica 1 primary v1
replica 2
replica 3
primary collects VC-messages in X
?NEW-VIEW,v1,X,O?
?1
pre-prepares messages for v1 view in O with the
same sequence number
backups multicast prepare messages for
pre-prepares in O
39View Change Safety
Goal No C-certificates with the same sequence
number and different requests
- Intuition if replica has C-certificate(m,v,n)
then
quorum for C-certificate(m,v,n)
any quorum Q
correct replica in Q has P-certificate(m,v,n)
40Garbage Collection
- Truncate log with certificate
- periodically checkpoint state (K)
- multicast ?CHECKPOINT,n,D(checkpoint),i?
- all collect 2f1 checkpoint messages
- send checkpoint in view-changes
?i
S-certificate(h,checkpoint)
discard messages and checkpoints
Log
sequence numbers
Hh2K
h
reject messages
41Formal Correctness Proofs
- Complete safety proof with I/O automata
- invariants
- simulation relations
- Partial liveness proof with timed I/O automata
- invariants
42Communication Optimizations
- Digest replies send only one reply to client
with result
43Communication Optimizations
- Digest replies send only one reply to client
with result - Optimistic execution execute prepared requests
2f1 replies
client
Read-write operations execute in two round-trips
44Communication Optimizations
- Digest replies send only one reply to client
with result - Optimistic execution execute prepared requests
- Read-only operations executed in current state
2f1 replies
client
Read-write operations execute in two round-trips
client
2f1 replies
Read-only operations execute in one round-trip
45Talk Overview
- Problem
- Assumptions
- Algorithm
- Implementation
- Performance
- Conclusions
46BFS A Byzantine-Fault-Tolerant NFS
replica 0
snfsd
replication library
replication library
relay
kernel NFS client
replica n
- No synchronous writes stability through
replication
47Talk Overview
- Problem
- Assumptions
- Algorithm
- Implementation
- Performance
- Conclusions
48 Andrew Benchmark
- Configuration
- 1 client, 4 replicas
- Alpha 21064, 133 MHz
- Ethernet 10 Mbit/s
Elapsed time (seconds)
- BFS-nr is exactly like BFS but without
replication - 30 times worse with digital signatures
49 BFS is Practical
- Configuration
- 1 client, 4 replicas
- Alpha 21064, 133 MHz
- Ethernet 10 Mbit/s
- Andrew benchmark
Elapsed time (seconds)
- NFS is the Digital Unix NFS V2 implementation
50 BFS is Practical 7 Years Later
- Configuration
- 1 client, 4 replicas
- Pentium III, 600MHz
- Ethernet 100 Mbit/s
- 100x Andrew benchmark
Elapsed time (seconds)
- NFS is the Linux 2.2.12 NFS V2 implementation
51Conclusions
- Byzantine fault tolerance is practical
- Good performance
- Weak assumptions ? improved resiliency
52What happens if we go MAD?
53What happens if we go MAD?
- Several useful cooperative services span Multiple
Administrative Domains. - Internet routing
- File distribution
- Cooperative backup e.t.c.
- Dealing only with Byzantine behaviors is not
enough.
54Why?
- Nodes are under control of multiple administrators
55Why?
- Nodes are under control of multiple
administrators - Broken Byzantine behaviors.
- Misconfigured, or configured with malicious
intent.
56Why?
- Nodes are under control of multiple
administrators - Broken Byzantine behaviors.
- Misconfigured, or configured with malicious
intent. - Selfish Rational behaviors
- Alter the protocol to increase local utility
57Talk Overview
- Problem
- Model
- 3 Level Architecture
- Performance
- Conclusions
58It is time to raise the BAR
59It is time to raise the BAR
- Byzantine
- Behaving arbitrarily or maliciously
- Altruistic
- Execute the proposed program, whether it benefits
them or not - Rational
- Deviate from the proposed program for purposes of
local benefit
60Protocols
- Incentive-Compatible Byzantine Fault Tolerant
(IC-BFT) - It is in the best interest of rational nodes to
follow the protocol exactly
61Protocols
- Incentive-Compatible Byzantine Fault Tolerant
(IC-BFT) - It is in the best interest of rational nodes to
follow the protocol exactly - Byzantine Altruistic Rational Tolerant (BART)
- Guarantees a set of safety and liveliness
properties despite the presence of rational nodes - IC-BFT ? BART
62General idea
- Extend/Modify the Practical Byzantine Fault
Tolerance Model in a way that combats the
negative effects of rational (greedy) behavior. - We will achieve that by using game-theoretic
tools. -
63A taste of Nash Equilibrium
Go Straight
Swerve
-1,1
0, 0
Swerve
X_X,X_X -100,-100
1,-1
Go Straight
64Naughty nodes are punished
- Nodes require access to a state machine in order
to complete their objectives - Protocol contains methods for punishing rational
nodes, including denying them access to the state
machine
65Talk Overview
- Problem
- BAR Model
- 3 Level Architecture
- Performance
- Conclusions
66Three-Level Architecture
- Layered design
- simplifies analysis/construction of systems
- isolates classes of misbehavior at appropriate
levels of abstraction
67Level 1 Basic Primitives
- Goals
- Provide IC-BFT versions of key abstractions
- Ensure long-term benefit to participants
- Limit non-determinism
- Mitigate the efffects of residual non-determinism
- Enforce predictable communication patterns
68Level 1 Basic Primitives
- Goals
- Provide IC-BFT versions of key abstractions
- Ensure long-term benefit to participants
- Limit non-determinism
- Mitigate the efffects of residual non-determinism
- Enforce predictable communication patterns
69Level 1 Basic Primitives
- BART-RSM based on PBFT
- Differences use TRB instead of consensus
- 3f2 nodes required for f
faulty
70Level 1 Basic Primitives
- Goals
- Provide IC-BFT versions of key abstractions
- Ensure long-term benefit to participants
- Limit non-determinism
- Mitigate the efffects of residual non-determinism
- Enforce predictable communication patterns
71Level 1 Basic Primitives
- The RSM rotates the leadership role to
participants. - Participants want to stay in the system in order
to control the RSM and complete their protocols - Ultimately, incentives stem from the higher level
service
72Level 1 Basic Primitives
- Goals
- Provide IC-BFT versions of key abstractions
- Ensure long-term benefit to participants
- Limit non-determinism
- Mitigate the efffects of residual non-determinism
- Enforce predictable communication patterns
73Level 1 Basic Primitives
- Self interested nodes could hide behind
non-determinism to shirk work. - Tit-for-Tat policy
- Communicate proofs of misconducts, leads to
global punishment - Use Terminating Reliable Broadcast, rather than
consensus. - In TRB, only the sender can propose a value
- Other nodes can only adopt this value, or choose
a default value
74Level 1 Basic Primitives
- Goals
- Provide IC-BFT versions of key abstractions
- Ensure long-term benefit to participants
- Limit non-determinism
- Mitigate the efffects of residual non-determinism
- Enforce predictable communication patterns
75Level 1 Basic Primitives
- Balance costs
- No incentive to make the wrong choice
- Encourage timeliness
- By allowing nodes to judge unilaterally whether
other nodes messages are late and inflict
sanctions to them (Penance)
76Level 1 Basic Primitives
- Goals
- Provide IC-BFT versions of key abstractions
- Ensure long-term benefit to participants
- Limit non-determinism
- Mitigate the efffects of residual non-determinism
- Enforce predictable communication patterns
77Level 1 Basic Primitives
- Nodes have to have participated at every step in
order to have the opportunity to issue a command - Message queues
x
x
x
x
y
y
y
x
y
I am waiting message from x
78Level 1 Basic Primitives
- Nodes have to have participated at every step in
order to have the opportunity to issue a command - Message queues
x
x
x
x
y
y
y
message
x
y
79Level 1 Basic Primitives
- Nodes have to have participated at every step in
order to have the opportunity to issue a command - Message queues
x
x
x
y
y
y
y
x
y
I am waiting message from y
80Theorems
- Theorem 1 The TRB protocol satisfies
Termination, Agreement, Integrity and
Non-Triviality - Theorem 2 No node has a unilateral incentive to
deviate from the protocol
81Level 2
- State machine replication is sufficient to
support a backup service, but the overhead is
unacceptable - 100 participants 100 MB backed up 10 GB of
drive space - Assign work to individual nodes, using arithmetic
codes to provide low-overhead fault-tolerant
storage
82Guaranteed Response
- Direct communication is insufficient when nodes
can behave rationally - We introduce a witness that overhears the
conversation - This eliminates ambiguity
- Messages are routed through this intermediary
83Guaranteed Response
84Guaranteed Response
- Node A sends a request to Node B through the
witness - The witness stores the request, and enters
RequestReceived state - Node B sends a response to Node A through the
witness - The witness stores the response, and enters
ResponseReceived
85Guaranteed Response
- Deviation from this protocol will cause the
witness to either notice the timeout from Node B
or lying on the part of Node A
86Optimization through Credible Threats
87Optimization through Credible Threats
- Returns to game theory
- Protocol is optimized so nodes can communicate
directly. Add a fast path - If recipient does not respond, nodes proceed to
the unoptimized case - Analogous to a game of chicken
88Periodic Work Protocol
- Witness checks that periodic tasks, such as
system maintenance are performed - It is expected that, with a certain frequency,
each node in the system will perform such a task - Failure to perform one will generate a POM from
the witness
89Authoritative Time Service
- Maintains authoritative time
- Binds messages sent to that time
- Guaranteed response protocol relies on this for
generating NoResponses
90Authoritative Time Service
- Each submission to the state machine contains the
timestamp of the proposer - Timestamp is taken to be the maximum of the
median of timestamps of the previous f1
decisions - If no decision is decided, then the timestamp
is the previous authoritative time
91Level 3 BAR-B
- BAR-B is a cooperative backup system
- Three operations
- Store
- Retrieve
- Audit
92Storage
- Nodes break files up into chunks
- Chunks are encrypted
- Chunks are stored on remote nodes
- Remote nodes send signed receipts and store
StoreInfos
93Retrieval
- A node storing a chunk can respond to a request
for a chunk with - The chunk
- A demonstration that the chunks lease has
expired - A more recent StoreInfo
94Auditing
- Receipts constitute audit records
- Nodes will exchange receipts in order to verify
compliance with storage quotas
95Talk Overview
- Problem
- BAR Model
- 3 Level Architecture
- Performance
- Conclusions
96Evaluation
- Performance is inferior to protocols that do note
make these guarantees, but acceptable (?)
97Impact of additional nodes
98Impact of rotating leadership
99Impact of fast path optimization
100Talk Overview
- Problem
- BAR Model
- 3 Level Architecture
- Performance
- Conclusions
101Conclusions
- More useful as a proof of concept but certainly
explores a very interesting common ground between
systems and game theory as a way of exploring the
performance of real-life systems.
102Conclusions
- More useful as a proof of concept but certainly
explores a very interesting common ground between
systems and game theory as a way of exploring the
performance of real-life systems. - CS 614 is over