Title: From Byzantine Agreement to Practical survivability
1From Byzantine Agreement to Practical
survivability
- Dahlia Malkhi
- The Hebrew University of Jerusalem
2Why Replicate?
- Cache data
- NFS, DNS, WWW
- Fault tolerance
- Clusters
- Remote backup
3A replication system
- Servers store x and timestamp t, both initially -
x - t -
x - t -
x - t -
x - t -
x - t -
4A replication system
- Clients update new values other clients obtain
them
x 7
x 7, t 1
5Active State Machine Replication
- Method for consistently replicating arbitrarily
typed objects - Start from the same initial state
- Apply operations in the same order without gaps
at each replica
a
a
a
b
b
c
6A replication system
- The decentralized approach
- Each server contends for next operation
- When all proposals are collected, everyone decides
x 7, t 1
x 3, t 1
7A simple ordering protocol
- Each server proposes a value in each round
X7,t1 X8,t2
X7
X8
none
none
none
X9,t3 X1,t4 X3,t5
none
none
X1
X3
X9
8Analysis of simple protocol
- Each delivery may cost N messages
- cost is amortized during busy times
- Need to respond to each failure
- Reconfigure costly agreement on membership
9Searching for efficient replication
- The leader-based approach
- Client sends write to leader, leader broadcasts
to everybody - Leader may rotate, dynamically change, etc.
x 7, t 1
10Group communication
- Leader sets order
- variations revolving token, dynamic leader
X7,t1
11Analysis of Group Communication
- Without further interaction, delivery is
optimistic and may lead to inconsistency - Need to respond to leader failure
- Costly agreement on membership
- Virtual synchrony simplify recovery from
partitioned views
12Where the costs hurt
- Servers need to monitor for failures
- Reconfiguration
- Recovery from optimistic delivery
13Some design choices
- Scale
- Survivability
- Trusted clients
14Why scalability?
- Yesterday
- NFS
- Fault tolerant replicated file system (cluster)
- Four computers flying a shuttle
- Today
- Digital archiving Andersons Eternity
- Ubiquitous computing
- Peer-to-peer resource sharing
- eCommerce and eApplication on the Internet
A mobile user
15Electronic voting system
- Develop electronic voting system for national
elections - Build on experience with Costa Rica Project (with
ATT Secure Systems Research Dept) - Goals
- vote from any polling station
- usable by all voters
- security
- double voting, voter privacy, vote coercion, ...
-
16Preventing double voting
- Scope 3,000,000 voters,1000 polling stations
- Simple problem Prevent using voter id twice
- For privacy, binding between vote voter id is
not kept - detecting double voting afterwards doesnt help
- Globally permanently lock voter id when vote is
cast - Centralized server or global protocol is no good
- lose availability, performance
- Need scalable, survivable solution!
17Why survivability?
- Yesterday
- Closely coupled, locally administered system
- Today
- Wide spread computing
- Internet hackers
- More
18(No Transcript)
19(No Transcript)
20Survivable systems
- The last frontier of protection
- Component penetrations will occur, so we should
build systems to anticipate them - Survivable system makes meaningful progress when
components fail to behave as expected, even when
they conspire to undermine the operation of the
system as a whole
21(No Transcript)
22Could clients be faulty?
- Benign faults yes
- Byzantine faults no
- Employ access control
- If bypassed, who cares?
- A malicious client can mess up the data anyway
23Summary of design choices
- Scaling
- thousands of servers, millions of clients
- Survivability
- Servers may be penetrated, hence use voting
- Trusted clients
24From replicated process to replicated storage
model
- Fault-tolerant computing in Storage Area Networks
- Fault-tolerant client/server computing with
passive servers - Servers are playing the role of data stores
- Servers are not communicating with one another
- Protocols are carried out by clients
25Byzantine quorum systems example Malkhi and
Reiter 98
- At most one server can be penetrated
- Read/write safe register
26Byzantine quorum systems example
- At most one server can be penetrated
- Read/write safe register
27Masking quorums
- A b-masking quorum system over a universe U of
servers is a set such that
- Justification let B be set of actually faulty
servers
28Replication using masking quorum systems
- Write(v)
- Read timestamps from quorum
- Choose higher, unique timestamp
- Read()
- Read (value, timestamp) pairs from quorum
- Identify correct values that appear b1
identical times - Return highest-timestamp correct value
29Byzantine Quorums - surprisingly
efficientMalkhi, Reiter and Wool 98
30Quorum-based replicationFleet, Malkhi and
Reiter 00
Persistent object servers
Server 1
Server 3
Server 2
Server 4
Server 5
No centralized management No locking No
server-to-server interaction No client-to-client
interaction Quorum tuning - benign/Byzantine
faults - strict/probabilistic
guarantees Simple, secure, modular
Q-RPC
Q-RPC
Object-stub
Object-stub
application
application
Client 1
Client 2
31Universal object emulation
Servers are data containers
Determine total order of object-operations
x.a()
x.b()
32The Approach Lamport 98
PAXOS
- Assume a weak leader election primitive
- Eventually there is a unique leader
- ? failure detector, partially synchronous/timed
asynchronous systems, etc. - To order operations, the leader invokes an
instance of the agreement protocol - Never disagree on the operation order
- Might fail to make progress if there is no unique
leader
33Identifying an agreement building-block
34Adding Ranks (ballots)
35Ranked Register Boichat et al. 02, Chockler and
Malkhi 02
- The interface
- rr-read(R), returns ltr,vgt
- rr-write(R,v), commits or aborts
- The Paxos Agreement
- Collect proposals with a rank
- Make a new proposal with a rank
- If rr-write with rank R1 commits, then rr-read
with rank R2gtR1 must see it - return the value written by this rr-write (or by
a write with rank Rgt R1)
36Agreement using RR
Shared A single ranked register propose(inp) wh
ile (true) do choose a unique monotonically
increasing rank R ltr,vgtrr-read(R) if
(v ) vinp if (rr-write(R,v)
commit) return v od
37The complete system
RR
RR
38Implementability of RR
39Some historical quotes
- The Byzantine agreement problem has received
more attention from the computer science
community than any other problem - Chor and Dwork, 89
- Essay on the Application of Analysis to the
Probability of Majority Decisions - Condorcet, 85
- Only five computers would be needed for the
entire country - Thomas Watson Sr., 1943
40More quotes
- The challenge of reliability in distributed
computing is perhaps the unavoidable challenge of
the coming decade, just as performance was the
challenge of the past decade - Ken Birman, 1996
41Why is this working?
42Replication models
43Operation ordering
- Easy if there are no failures and/or the system
is synchronous - E.g., leader based, LTS based
- Real systems are both asynchronous and
failure-prone - Accurate failure detection is impossible
multiple/no leaders might exist at times - Solution
PAXOS
44Fault-tolerant client/server (1)
- Scalable dynamic fault-tolerant services (e.g.,
Fleet) - Replication groups are created on-the-fly by
clients out of dynamic server universe - Servers need neither monitor nor be aware of each
other - Accommodates Byzantine failures
- Database servers
- Client-server middleware
45Paxos in Replicated Storage Model
client
client
client
46Disk Paxos Pros and Cons
- Assumes very weak memory objects
- Regular registers
- Not suitable for dynamic environments
- The number of clients and their Ids are known a
priori - Employs data structure whose size grows with the
number of clients
47Our Contribution
- Identified an abstract building block for Paxos
agreement - Ranked register (RR)
- Follows deconstruction of BG.. (round-based
register ) - Implemented RR in a setting with infinitely many
dynamic client processes - Proven a lower bound on number of R/W registers
48Identifying an agreement building block
Propose(V) Begin RMW if (val
) valV return val End RMW
consensus
RMW
- Agreement is trivial with a single RMW
- RMW cannot be emulated out of faulty memory
objects of any type Jayanti, Chandra, Toueg 9?
49Conclusions and Future Work
- Paxos with infinitely many processes based on new
ranked register abstraction - Fault-tolerant replication in SANs
- Fault-tolerant client/server applications
- Future work
- Handling Byzantine memory failures (NR-arbitrary)
- Specifying/implementing the leader election
primitive