Consistent and Automatic Replica Regeneration - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Consistent and Automatic Replica Regeneration

Description:

Single-replica regeneration instead of majority ... Before adoping a decision, each replica needs to waits for all leases to expire ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 32

Provided by: ds42

Category:

more less

Transcript and Presenter's Notes

Title: Consistent and Automatic Replica Regeneration

1
Consistent and Automatic Replica Regeneration

Networked systems design and implementation 2004
Haifeng Yu
Amin Vahdat

2
Outline

Introduction
System Architecture Overview
Normal Case Operations
Reconfiguration
Single Replica Regeneration
Experimental Evaluation
Conclusions

3
Introduction

This paper presents Om based on PAST
Challenge
Maintaining consistency when the composition of
the replica group changes

4
PAST
PAST example Object key 100
80
120
90
104
98
103
99
101
100
5
PAST
PAST example Object key 100 Replication
80
120
90
104
98
103
99
101
100
6
PAST
PAST example Object key 100 Replication Replic
a crash
80
120
90
104
98
103
99
101
100
7
PAST
PAST example Object key 100 Replication Replic
a crash Regeneration
80
120
90
104
98
103
99
101
100
8
Introduction

This paper presents Om based on PAST
Challenge
Maintaining consistency when the composition of
the replica group changes

9
Inconsistency
Node 101 overloaded
80
120
90
104
98
103
99
101
100
10
Inconsistency
Node 101 overloaded Node 100 99 detect node 101
failure New replica created on node 98
80
120
90
104
98
103
99
101
100
11
Inconsistency
Node 100,99,98 overloaded too
80
120
90
104
98
103
99
101
100
12
Inconsistency
Node 100,99,98 overloaded too Considered dead by
node 101 New replica created on node 103,
104 Inconsistency
80
120
90
104
98
103
99
101
100
13
Introduction

Three novel techniques in Om
Single-replica regeneration instead of majority
Distinguish between failure-free and
failure-induced reconfiguration
Use a lease graph among all replicas and a two
phase write protocol to avoid executing a
consensus protocol for normal writes

14
System Architecture Overview
15
Normal Case Operation
Read-one / write-all approach Writes serialized
via primary
80
120
90
104
write
98
103
99
101
100
read
primary
16
Normal Case Operation

Two major anomalies
The first anomaly arises when replicas from old
configurations are slow in detecting failures,
and continue servicing stale data after
reconfiguration
A second problem results from a read seeing a
write that has not been applied to all replicas,
and the write may be lost in reconfiguration. In
other words, the read observes temporary,
inconsistent state.

17
Normal Case Operation

Solution to first leveraging leases
In traditional client-server architectures, each
client holds a lease from the server. However,
since Om can regenerate from any replica, a
replica needs to hold valid leases from all other
replicas
Solution to second two-phase protocol
First prepare round the primary propagates the
writes to replicas
Second commit round sending commits to all
replicas

18
Failure Detection and Regeneration

Failure are detected in Om via timeouts on
messages
Propose new configuration to exclude failed
replicas
Uniqueness of new configuration

19
A Simple Design that Needs Majority
Acquire votes from a majority of replicas before
regeneration
80
120
90
104
98
103
99
101
100
20
A Simple Design that Needs Majority
Acquire votes from a majority of replicas before
regeneration Create new replica
80
120
90
104
98
103
99
101
100
21
A Simple Design that Needs Majority
Acquire votes from a majority of replicas before
regeneration deadlock
80
120
90
104
98
103
99
101
100
22
Voting with witness
Use other random nodes (witnesses) for the quorum
system But we still need a majority of
witnesses
80
120
90
104
98
103
99
101
100
23
Witness Model

The witness model utilizes the following limited
view divergence property

Intuitively, the property says that two replicas
are unlikely to have a completely different view
regarding the reachability of a set of
randomly-placed witnesses.

24
Witness Model

To utilize the limited view divergence property,
all replicas logically organize the witnesses
into an mt matrix
The number of rows, m, determines the probability
of intersection
The number of columns, t, protects against the
failure of individual witnesses, so that each row
has at least one functioning witness with high
probability

25
Witness Model
26
Witness Model
Limited view divergence Reach one common witness
with good probability
80
120
90
104
98
103
99
101
100
27
Reconfiguration

Public class configuration
Valid, sequenceNum, primary, secondary,
consensusID
Failure-free reconfiguration
Only the primary does this, because the other
replicas are passive
Failure-induced reconfiguration
All replicas transmit configuration notices to
aid in completing reconfiguration earlier

28
Failure-free Reconfiguration

Only the primary may initiate failure-free
reconfiguration
After transferring data to the new replicas in
two stages (snapshot followed by logged writes),
the primary constructs a configuration for the
new desired membership
The primary then informs the other replicas of
the new configuration and waits for acks
If timeout occurs, a failure-induced
reconfiguration will follow

29
Failure-induced Reconfiguration
A replica initiates and first disables the
current conf
It will perform another round of failure
detection for all member of the configuration
A result (current replicas) will be used as a
proposal for the new configuration
The replica then invokes a consensus protocol
Before adoping a decision, each replica needs to
waits for all leases to expire with respect to
the old configuration
Finally, the primary of the new configuration
will collect and re-apply any pending writes
30
Performance Evaluation
31
Conclusions

Single replica regeneration that enables Om to
achieve high availability with a small number of
replicas
Failure-free reconfigurations allowing
common-case reconfigurations to proceed within a
single round of communication
A lease graph and two-phase write protocol to
avoid expensive consensus for normal writes and
also to allow reads to be processed by any replica

Write a Comment

User Comments (0)