Title: Reconfigurable Distributed Storage for Dynamic Networks
1Reconfigurable Distributed Storage for Dynamic
Networks
- Gregory Chockler, Seth Gilbert,
- Vincent Gramoli, Peter M Musial, Alexander A
Shvartsman
2Goals
- Reconfigurable Distributed Storage (RDS)
- Atomic consistency (read/write)
- Fault Tolerance
- in Dynamic and Asynchronous Systems.
3Distributed Storage
4Distributed Storage
Data is replicated at several network locations
5Distributed Storage
Read
Write
Operation policy
6in Dynamic Networks
7Distributed Storage in Dynamic Networks
8Distributed Storage in Dynamic Networks
9Distributed Storage in Dynamic Networks
10Distributed Storage in Dynamic Networks
requires a reconfiguration process.
11Distributed Storage in Dynamic Networks
by achieving agreement.
12Model
- Distributed
- Connected set of processors
- Each processor has a unique id i ? I
- MWMR, any processor is a potential client
- Asynchronous
- Asynchronous processors
- Point-to-point asynchronous unreliable channels
- Dynamic
- Processors join and leave the system
- Processors may crash
13What is a configuration?
- Configuration ltmembers, read-quorums,
write-quorumsgt - members is a set of processors,
- read-quorums, write-quorums two sets of quorums
- ? RQ ? read-quorums,? WQ ? write-quorums
- RQ ? members
- WQ ? members
- RQ ? WQ ? ? (only for a given configuration)
- Every client maintains a set of configurations,
initially containing the default one.
14Single Object Operations Overview After ABD95
- tag ltc,igt ? N ? I, val a possible value
- val Read()i
- (ltc,jgt,val)query()prop(ltc,jgt,val)
- Write(val)i
- (ltc,jgt,val)query()prop(ltc,igt,val)
- (tag,val) query(NULL) gathers (tag,val) pairs of
all processors of a RQ and returns the one with
the largest tag. - NULL prop(tag,val) updates (tag,val) pairs at
all processors of a WQ.
Read tag
Write tag
15Reconfiguration Design Goals
- Sound
- Totally ordered configurations
- Flexible
- No dependences between configurations
- Non-intrusive
- Makes possible concurrent read/write operations
- Fast
- Strengthening fault tolerance
16Decoupling Reconfiguration
- Reconfiguration Replacing Configurations
- I Installing a new configuration
- R Removing old configuration(s)
- If R ? I ? Operations are delayed
- If I ? R ? Stronger configuration viability
assumption is required
17Solution
- ?(R ? I) ? ?(I ? R)
- ?
- I // R
- Tighter coupling between removal and
installation
18RDS Reconfiguration
- Reconfiguration is based on Paxos
- (3 phases leader-based consensus alorithm)
- l is the leader
- c is the current configuration
- configs is the set of active configurations
- A ballot has a unique identifier b and a value v,
which is a configuration - Paxos phases
- Prepare l creates a new ballot and chooses/gets
the value to propose. - Propose l proposes ltb,vgt and gathers votes from
a majority. - Propagate l propagates decision
19RDS Reconfiguration
Recon(c,c)
l
RQ
WQ
20RDS Reconfiguration
Recon(c,c)
Prepare phase
- Creates a new larger ballot b
l
RQ
WQ
21RDS Reconfiguration
Recon(c,c)
Prepare phase
l
lt1a, bgt
RQ
WQ
22RDS Reconfiguration
- Updates its ballots value v with the one
received - Updates its configs set
Recon(c,c)
Prepare phase
l
lt1b, b, configs, ltb, cgtgt
lt1a, bgt
RQ
WQ
23RDS Reconfiguration
Recon(c,c)
Propose phase
l
lt1b, b, configs, ltb, cgtgt
lt2a, b, c, vgt
lt1a, bgt
RQ
WQ
24RDS Reconfiguration
Recon(c,c)
Propose phase
l
lt1b, b, configs, ltb, cgtgt
lt2a, b, c, vgt
lt1a, bgt
lt2b, b, c, v, tag, valgt
RQ
WQ
lt2b, b, c, v, tag, valgt
- Updates their tag and val
- Adds v to their configs set
25RDS Reconfiguration
Recon(c,c)
Propagation phase
l
lt1b, b, configs, ltb, cgtgt
lt2a, b, c, vgt
lt1a, bgt
lt2b, b, c, v, tag, valgt
RQ
WQ
lt3a, c, v, tag, valgt
lt3a, c, v, tag, valgt
lt2b, b, c, v, tag, valgt
lt3a, c, v, tag, valgt
- Update their tag and val
- Remove configuration c from their configs set
26Proving Atomicity
- Ordering configurations
- Ordering operations
Theorem 1 The set of installed configurations in
the system is totally ordered.
Theorem 2 If operation ?1 precedes operation ?2
then ?1s tag is not larger than ?2s tag.
27Additional Assumptions
- Eventual stabilization with
- Unique leader l
- Message delay bound d (unkown to the algorithm)
- Gossip with frequency d
- Restricted reconfiguration rate
- Some quorums remain alive in active configurations
ts System stabilization time
tl Algorithm stabilization time
2d
ts
tl
Lets tr be the Request time
28Reconfiguration Latency
Worst case scenario Last reconfiguration was
done by a different leader.
29Reconfiguration Latency
Other cases The leader made the previous
reconfiguration.
3d
2d
d
Propose
Propagate
max(tl, tr)
te
te end time Reconfiguration is complete
30Operation Latency
- Phase latency
- 2d is sufficient for the phase round trip.
- In some cases (pending reconfiguration), the
phase might be delayed twice.
2d
2d
1st round trip
2nd round trip
New configuration discovered
- Operation latency
- Operations are bounded by 8d.
- In some cases, the propagation phase of the read
operation can be ignored, leading to a possible
bound of 2d. -
31Experimental Results
- IOA to Java code following set of rules.
- Implementation of Attiya, Bar-Noy, and Dolev
algorithm  ABD (w/o Reconfiguration) and RDS
which shares parts of the ABD code. - Using majority-based configurations.
- Measuring operation latency
- While varying configuration size
- While varying algorithm instances
32Experimental Results
- Operation latency of RDS is competitive with ABD,
confirming the theory. - Reconfiguration messages contain operation
information which might accelerate operations in
RDS.
33Conclusion
- RDS, Reconfigurable Distributed Storage.
- With sound, flexible, non-intrusive and fast
reconfiguration. - It solves two problems in one Configuration
replacement and Consensus. - Reconfiguration is inexpensive (time).
- Fault tolerance is strenghtened.
- RAMBO can become more agressive it is exactly
what we did here!