Title: FaultTolerant SemiFast Implementations of Atomic ReadWrite Registers
1Fault-Tolerant SemiFast Implementations of
Atomic Read/Write Registers
- Nicolas Nicolaou, University of Connecticut
- Joint work with
- C. Georgiou, University of Cyprus
- A. A. Shvartsman, University of Connecticut
2What is an Atomic R/W Register?
Register
Read
Write(7)
Write(0)
3Prior Results
- Attiya et al. 1995 - Single Writer Multiple
Reader (SWMR) model where lt1/2 of processes may
crash - Pairs ltvalue, taggt are used for ordering
operations - Writer increases tag and sends ltvalue, taggt to a
majority - Reader
- Phase 1 obtains maximum tag from a majority
- Phase 2 propagates the tag to a majority and then
returns the value associated with that tag - Lynch, Shvartsman 1997 and Englert, Shvartsman
2000 extend the above result for MWMR - Quorums instead of majorities
- 2 round protocols for read/write operations
4Fast Implementations
- Dutta, Guerraoui, Levy, Chakraborty 2004
- SWMR model
- Single communication round for all write and read
operations - Requires R lt (S/t) 2
- R readers, S servers, t max server
failures - Not applicable to MWMR
Question Can one introduce SemiFast
Implementations (with fast reads or fast writes)
to relax the bound on the number of readers?
5Our Contributions
- Formally define semifast implementations
- Develop a semifast implementation
- Based on Fast implementation of Dutta et al. 04
- Introduce the notion of virtual nodes
- Bounds On the Number of Virtual Nodes
- Show that no SemiFast implementations are
possible for MWMR - Simulation Results
- A small percentile of read operations require a
second communication round.
6Model
Writer
Reliable communication channels (For
performance, not for safety)
Servers
Up to t Failures tlt(S/2)
sS
s2
s1
Readers
Any subset of readers /writer may fail by crash.
r2
r1
Siblings
rR
Virtual Nodes Vlt(S/t)-2
vrV
vr2
vr1
7Semifast Implementations
- Def. An implementation I is semifast if it
satisfies the following properties (informally) - All writes are fast
- All complete read operations perform one or two
communication rounds - ?f a read operation ?1 performs two communication
rounds, then all read operations that precede or
succeed ?1 and return the same value as ?1 are
fast - ?here exists some execution of I which contains
only fast read and write operations - Assuming all written values are unique
8SF Implementation
- Replica consists of
- Timestamp associated with 2 values
- Current value and Previous value
- Writer
- Send the new timestamp to S-t servers
- Increase its own timestamp
ts1
ts2
S-t WACKs receivedgt ts, ret(O.K.)
WRITE, ts1, w
WACK, ts1
ts0 ps0
ts0 ps0
ts0 ps0
ts1 ps0 w
ts1 ps0 w
ts0 ps0
ts0 ps0
ts1 ps0 w
ts1 ps0 w
s3
s4
s5
s1
s2
9SF Implementation (Cont.)
- Reader
- Inquire timestamp from S-t servers
- Server on receipt of a read/write message
- Record Virtual Identifiers of nodes inquire the
servers timestamp ts, into a set (seen set). - If tsltts gt seen vid ts ts
- If tsgtts gt seen seen U vid
- If msgType Inform gt postit ts
10The servers(Example)
11The servers(Example)
12SF Implementation (Cont.)
- Reader
- Consider return of timestamps
- Return timestamp as follows
- If Predicate True return Maximum Timestamp
- If Postit MaxTS return Maximum Timestamp
- Otherwise return Maximum Timestamp -1
- The definition of the Predicate will be given
later
13Predicate(Key Idea)
Completed
ts0 ps0 vr1
ts1 ps0 w,vr1
ts1 ps0 w,vr1
ts1 ps0 w
ts1 ps0 w,vr1
s5
s1
s2
s3
s4
I have to return 1
S-2t servers with ts1
ts0 r1(vr1)
14Predicate(Key Idea)
ts1 ps0 w,vr1,vr2
ts1 ps0 w,vr1
ts1 ps0 w,vr1,vr2
ts0 ps0 vr2
ts0 ps0 vr1,vr2
s2
s3
s1
s4
s5
MS S-3t
Completed, returned 1
I have to return 1
ts0 r2(vr2)
ts1 r1(vr1)
15Predicate(Final Form)
- Predicate is true if a read operation
- Receives maxTS from MS S-at servers (i.e.
S-3t) - Observes that (i.e. )
- Formally
16Sibling Problem
ts1 ps0 w,vr1
ts1 ps0 w,vr1
ts0 ps0 vr1
ts0 ps0 vr1
ts1 ps0 w,vr1
s2
s3
s1
s4
s5
MS S-3t
Completed, returned 1
Predicate is false! return 0
ts0 r2(vr1)
ts1 r1(vr1)
17SF Implementation (Cont.)
- Reader must perform second comm. round if
- Predicate True nm.seena
- Postit MaxTS Postits lt t1
18Which readers must write?
- Observation
- Two read operations r1 and r2
- MS1 gt servers replied with maxTS to r1
- MS2 gt servers replied with maxTS to r2
- Then MS1-MS2 t gtIf MS1S-at
then MS2 S-(a1)t - If r1 and r2 are siblings
- Let si ? MS1nMS2
- si sent m1 and m2 to r1 and r2 resp.
- It may be the case m1.seen m2.seen
- Thus if then
19Correctness
- We need to show the following
- Writes are globally ordered
- If a read() returns a value x then a write(x)
operation immediately precedes or is concurrent
with that read - A read operation does not return an older value
than a preceding read operation - Reads done by sibling readers
- Reads done by non-siblings readers
20Impossibility
- Consider Algorithms
- With no virtual nodes
- With grouping mechanisms similar to our approach
- Theorem There is no semifast implementation if
the number of virtual nodes is V (S/t) - 2.
21MWMR model
- Theorem There is no semifast implementation for
the MWMR model - Proved in the case of 2 writers, 2 readers and 1
failure - We consider n communication rounds
22Simulation Results
- NS2 Simulator
- Only 10 of read operations need to perform 2nd
communication round - Stochastic Environment
- Fix Interval Environment
23Conclusions
- Semifast implementation is defined
- Only one complete read operation has to perform 2
comm. rounds for every write operation - SF implementation presented
- Virtual Nodes lt (S/t) - 2
- No semifast implementation possiblefor MWMR
model
24References
- Partha Dutta, Rachid Gerraoui, Ron R. Levy and
Arindam Chakraborty, How Fast can a Distributed
Atomic Read be, Proceedings of the 23rd annual
ACM Symposium on Principles of distributed
computing (PODC 2004), pp. 236- 245, ACM press
2004. - S. Dolev, S. Gilbert, N.A.Lynch,A.A.Shvartsman,J.
L.Welch GeoquorumsImplementing Atomic Memory in
Mobile Ad-Hoc Networks, Technical Report
LCS-TR-900, MIT (2003) - Nancy Lynch and Alex Shvartsman. Rambo A
reconfigurable atomic memory service for dynamic
networks. In Proceedings of the 16th
International Symposium on Distributed Computing,
pages 173-- 190, 2002 - H.Attiya, A.Bar-Noy, and D.Dolev Sharing memory
robustly in message-passing systems, Journal of
the ACM, January 1995. - B. Englert and A. A. Shvartsman. Graceful quorum
reconfiguration in a robust emulation of shared
memory.In International Conference on Distributed
Computing Systems, pages 454463, 2000 - N. A. Lynch and A. A. Shvartsman. Robust
emulation of shared memory using dynamic
quorumacknowledged broadcasts. In Symposium on
Fault-Tolerant Computing, pages 272281, 1997
25 26Atomicity
- Lynch96
- Valid Executions
- Invalid Executions
write(8)
write(8)
ack( )
Time
Time
read( )
ret(0)
read( )
read( )
ret(0)
ret(8)
write(8)
ack( )
write(8)
Time
Time
read( )
ret(0)
read( )
ret(0)
read( )
ret(8)
read( )
ret(0)
27Definitions
- Each process invokes 1 operation at a time.
- Each operation consists of
- Invocation Step
- Matching Response Step
- Incomplete Operation no matching response for
the invocation. Complete operation - op1 precedes op2 gt response for op1 precedes
invocation for op2. - If op is a read we write rd
- If op is a write we write wr
28Definitions (Cont.)
- Algorithm implements a register gt satisfies
termination and atomicity properties - Termination Every operation by correct process
completes. - Atomicity (SWMR, wrkkth write)
- If rd returns x then there is wrk s.t. valkx
- If wrk precedes rd and rd returns valj, then j
k - If rd returns valk then wrk precedes or is
concurrent to rd - If rd1 returns valk and a succeeding rd2 returns
valj then j k
29Atomic vs Shared Register
- Shared Register
- Accessible from Single Process
- Write(v) Stores the value v and returns OK
- Read() Read the last value stored
- Atomic Register
- A distributed data structure
- Accessed by multiple processes concurrently
- Behaves as a sequential register.
- (Recall Atomicity)
30Atomic vs Shared Register(Graphical)
- Sequential Register
- Atomic Register
Register0
Register8
Read(0)
WriteAck()
Read(8)
Register
Write(8)
WriteAck( )
ReadAck2(0)
Read1( )
ReadAck1(8)
Read2( )
31Non-Triviality
- A semifast implementation is not trivial if
- For any execution of , if contains the
operations and some , performs 2comm.
rounds, then any , , must be fast. - For any execution of , if two read
operations rd1 and rd2 return the same value and
rd
32Strict Communication Scheme
- Only messages from the invoking processes to the
servers are delivered. - No messages between any servers
- No messages between any invoking processes
33When a SemiFast Impl. is Impossible?
- When Vlt(S/t)-2
- If V(S/t)-1 then No fast implementation even in
the case of a skip-free write operation.
(violates non-triv. Property 3) - If V(S/t)-2 then there is an execution where we
need 2 complete read operations to perform 2 com.
rounds. (violates Property 1) - When V(S/t)-2
- There exists an execution where 2 read operations
return the same value and they both perform 2
com. rounds (violates Prop. 2).
34No Semifast for MWMR model.
- Proof Sketch
- Split multiple round operations into
- Read phases
- Write phases
- Show that as soon as an operation performs a
write phase cannot change its return value. - Show a construction where W2, R2 and t1 and
atomicity is violated.
35Challenge
- How fast can a general implementation of an
Atomic Register can be? - Dynamic Environment (Mobility)
- Hybrid implementations with some read and write
operations to perform multiple roundtrips. - Communication Overhead in such impl.?
- Quorum based algorithms. How fast can they be?