Consistent and Automatic Replica Regeneration - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Consistent and Automatic Replica Regeneration

Description:

Single-replica regeneration instead of majority ... Before adoping a decision, each replica needs to waits for all leases to expire ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 32
Provided by: ds42
Category:

less

Transcript and Presenter's Notes

Title: Consistent and Automatic Replica Regeneration


1
Consistent and Automatic Replica Regeneration
  • Networked systems design and implementation 2004
  • Haifeng Yu
  • Amin Vahdat

2
Outline
  • Introduction
  • System Architecture Overview
  • Normal Case Operations
  • Reconfiguration
  • Single Replica Regeneration
  • Experimental Evaluation
  • Conclusions

3
Introduction
  • This paper presents Om based on PAST
  • Challenge
  • Maintaining consistency when the composition of
    the replica group changes

4
PAST
PAST example Object key 100
80
120
90
104
98
103
99
101
100
5
PAST
PAST example Object key 100 Replication
80
120
90
104
98
103
99
101
100
6
PAST
PAST example Object key 100 Replication Replic
a crash
80
120
90
104
98
103
99
101
100
7
PAST
PAST example Object key 100 Replication Replic
a crash Regeneration
80
120
90
104
98
103
99
101
100
8
Introduction
  • This paper presents Om based on PAST
  • Challenge
  • Maintaining consistency when the composition of
    the replica group changes

9
Inconsistency
Node 101 overloaded
80
120
90
104
98
103
99
101
100
10
Inconsistency
Node 101 overloaded Node 100 99 detect node 101
failure New replica created on node 98
80
120
90
104
98
103
99
101
100
11
Inconsistency
Node 100,99,98 overloaded too
80
120
90
104
98
103
99
101
100
12
Inconsistency
Node 100,99,98 overloaded too Considered dead by
node 101 New replica created on node 103,
104 Inconsistency
80
120
90
104
98
103
99
101
100
13
Introduction
  • Three novel techniques in Om
  • Single-replica regeneration instead of majority
  • Distinguish between failure-free and
    failure-induced reconfiguration
  • Use a lease graph among all replicas and a two
    phase write protocol to avoid executing a
    consensus protocol for normal writes

14
System Architecture Overview
15
Normal Case Operation
Read-one / write-all approach Writes serialized
via primary
80
120
90
104
write
98
103
99
101
100
read
primary
16
Normal Case Operation
  • Two major anomalies
  • The first anomaly arises when replicas from old
    configurations are slow in detecting failures,
    and continue servicing stale data after
    reconfiguration
  • A second problem results from a read seeing a
    write that has not been applied to all replicas,
    and the write may be lost in reconfiguration. In
    other words, the read observes temporary,
    inconsistent state.

17
Normal Case Operation
  • Solution to first leveraging leases
  • In traditional client-server architectures, each
    client holds a lease from the server. However,
    since Om can regenerate from any replica, a
    replica needs to hold valid leases from all other
    replicas
  • Solution to second two-phase protocol
  • First prepare round the primary propagates the
    writes to replicas
  • Second commit round sending commits to all
    replicas

18
Failure Detection and Regeneration
  • Failure are detected in Om via timeouts on
    messages
  • Propose new configuration to exclude failed
    replicas
  • Uniqueness of new configuration

19
A Simple Design that Needs Majority
Acquire votes from a majority of replicas before
regeneration
80
120
90
104
98
103
99
101
100
20
A Simple Design that Needs Majority
Acquire votes from a majority of replicas before
regeneration Create new replica
80
120
90
104
98
103
99
101
100
21
A Simple Design that Needs Majority
Acquire votes from a majority of replicas before
regeneration deadlock
80
120
90
104
98
103
99
101
100
22
Voting with witness
Use other random nodes (witnesses) for the quorum
system But we still need a majority of
witnesses
80
120
90
104
98
103
99
101
100
23
Witness Model
  • The witness model utilizes the following limited
    view divergence property
  • Intuitively, the property says that two replicas
    are unlikely to have a completely different view
    regarding the reachability of a set of
    randomly-placed witnesses.

24
Witness Model
  • To utilize the limited view divergence property,
    all replicas logically organize the witnesses
    into an mt matrix
  • The number of rows, m, determines the probability
    of intersection
  • The number of columns, t, protects against the
    failure of individual witnesses, so that each row
    has at least one functioning witness with high
    probability

25
Witness Model
26
Witness Model
Limited view divergence Reach one common witness
with good probability
80
120
90
104
98
103
99
101
100
27
Reconfiguration
  • Public class configuration
  • Valid, sequenceNum, primary, secondary,
    consensusID
  • Failure-free reconfiguration
  • Only the primary does this, because the other
    replicas are passive
  • Failure-induced reconfiguration
  • All replicas transmit configuration notices to
    aid in completing reconfiguration earlier

28
Failure-free Reconfiguration
  • Only the primary may initiate failure-free
    reconfiguration
  • After transferring data to the new replicas in
    two stages (snapshot followed by logged writes),
    the primary constructs a configuration for the
    new desired membership
  • The primary then informs the other replicas of
    the new configuration and waits for acks
  • If timeout occurs, a failure-induced
    reconfiguration will follow

29
Failure-induced Reconfiguration
A replica initiates and first disables the
current conf
It will perform another round of failure
detection for all member of the configuration
A result (current replicas) will be used as a
proposal for the new configuration
The replica then invokes a consensus protocol
Before adoping a decision, each replica needs to
waits for all leases to expire with respect to
the old configuration
Finally, the primary of the new configuration
will collect and re-apply any pending writes
30
Performance Evaluation
31
Conclusions
  • Single replica regeneration that enables Om to
    achieve high availability with a small number of
    replicas
  • Failure-free reconfigurations allowing
    common-case reconfigurations to proceed within a
    single round of communication
  • A lease graph and two-phase write protocol to
    avoid expensive consensus for normal writes and
    also to allow reads to be processed by any replica
Write a Comment
User Comments (0)
About PowerShow.com