CS556: Distributed Systems - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

CS556: Distributed Systems

Description:

What if a replica fails or becomes isolated ? Upon rejoining, it must 'catch up' ... c (disallowed). d (disallowed). p crashes. view (q, r) ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 34

Provided by: dimp9

Category:

more less

Transcript and Presenter's Notes

Title: CS556: Distributed Systems

1
CS-556 Distributed Systems
High Availability Replication (II)

Manolis Marazakis
maraz_at_csd.uoc.gr

2
Transactions with Replicated Data

Better performance
Concurrent service
Reduced latency
Higher availability
Fault tolerance
What if a replica fails or becomes isolated ?
Upon rejoining, it must catch up
Replicated transaction service
Data replicated at a set of replica managers
Replication transparency
One copy serializability
Read one, write all

Failures must be observed to have happened
before any active Txs at other servers
3
Fault Tolerance ?

Define correctness criteria
When 2 replicas are separated by network
partition
Both are deemed incorrect stop serving.
One (the master) continues the other ceases
service.
One (the master) continues to accept updates
both continue to supply reads (of possibly stale
data).
Both continue service subsequently synchronise.

4
Linearizability

Sequence of client i operations Oi0, Oi1, Oi2,
Single server would serialize client operations
in some order
e.g., O10, O11, O20, O21, O12,
This is a virtual interleaving of client
operations in a server with single-copy of data
A replicated shared object service is
linearizable if for any execution there is an
interleaving of the client operations that
satisfies
The interleaved sequence of operations meets
specification of single correct copy of objects
The order of operations in interleaving is
consistent with real times at which the
operations occurred at actual execution

5
Sequential consistency

Linearizability is hard to achieve in practice,
without precise clock synchronization
A replicated shared object service is
sequentially consistent if for any execution
there is an interleaving of the client operations
that satisfies
The interleaved sequence of operations meets
specification of single correct copy of objects
The order of operations in interleaving is
consistent with program order in which each
client executed them
ATTENTION no total ordering between clients
Every linearizable service is sequentially
consistent (the converse is not true)

6
Example

Client 1 Client 2
setBalance-B(x,1)
getBalance-A(y) 0
getBalance-A(x) 0
setBalance-A(y,2)
Real-time criterion of linearizability is not
satisfied
Find interleaving that satisfies both criteria
for sequential consistency

7
Passive Replication (I)

At any time, system has single primary RM
One or more secondary backup RMs
Front ends communicate with primary, primary
executes requests, response to all backups
If primary fails, one backup is promoted to
primary
New primary starts from Coordination phase for
each new request
What happens if primary crashes
before/during/after agreement phase?

8
Passive Replication (II)
9
Passive replication (III)

Satisfies linearizability
Front end looks up new primary, when current
primary does not respond
Primary RM is performance bottleneck
Can tolerate F failures for F1 RMs
Variation clients can access backup RMs
(linearizability is lost, but clients get
sequential consistency)
SUN NIS (yellow pages) uses passive replication
clients can contact primary or backup servers for
reads, but only primary server for updates

10
Active replication (I)

RMs are state machines with equivalent roles
Front ends communicates the client requests to RM
group, using totally ordered reliable multicast
RMs process independently requests reply to
front end (correct RMs process each request
identically)
Front end can synthesize final response to client
(tolerating Byzantine failures)
Active replication provides sequential
consistency if multicast is reliable ordered
Byzantine failures (F out of 2F1) front end
waits until it gets F1 identical responses

11
Active replication (II)
12
Replication Architectures
replica managers

How many replicas are required?
All or majority ?
Forward all updates as soon as received.
Two phase commit protocol.
Contacted replica acts as coordinator
What if one of the replicas isnt available?
Primary copy replication

getBalance(A)
deposit(B)
13
Available Copies Replication
replica managers
getBalance(A)

Not all copies will always be available.
Failures
Timeout at failed replica
Rejected by recovering, unsynchronised replica

deposit(B)
deposit(A)
getBalance(B)
14
Local Validation

Failure recovery events do not occur during a
Tx.
Example
T reads A before server Xs failure, therefore
Tgt failX
T observes server Ns failure when it writes B,
therefore failNgtT
failNgt T.getBalance(A)gt T.deposit(B) gt failX
failXgt U.getBalance(B)gt U.deposit(A) gt failN

Server x fails followed by Transaction U which
is followed by Server Ns failure which is
followed by Transaction T which is
followed by server Xs failure. This is
inconsistent so the transactions must not be
allowed to commit.
Failure and recovery must be serialised just like
a Tx They occur before or after a Tx, but
not during.
15
Network Partitions

Separate but viable groups of servers
Optimistic schemes validate on recovery
Available copies with validation
Pessimistic schemes limit availability until
recovery

16
Fault Tolerance

Design to recover after a failure w/o loss of
(committed) data.
Designs for fault tolerance
Single server, fail and recover
Primary server with trailing backups
Replicated service

17
Ordered Multicast

FIFO ordering If a correct process issues
multicast(g, m1) followed by multicast(g,m2) then
every correct process that delivers m2 will
deliver m1 before m2.
Causal ordering If multicast(g, m1) happened
before multicast(g, m2) then any correct process
that delivers m2 will deliver m1 before m2.
Total ordering If a correct process delivers m1
before it delivers m2, then any other correct
process that delivers m2 will deliver m1 before
m2.

18
Total Causal Ordering
C3 no happened before indication -
delivered in different orders at P2, P3, P4
19
Synch Ordering
All replicas of the same request are either
processed before the synch request or after it.
- Essentially a synch request serves to
flush the system.
20
Causal Ordering - Vector Timestamps

Lamports logical clock does not show causality
Vector timestamp - a sequence of events at each
process from each source
FE keeps timestamp of RM at last read

21
Group membership service
For each group,this service delivers to any
member process a series of views.
22
View delivery constraints

Order
If p delivers v(g) then v(g), then no other
process delivers v(g) before v(g)
Integrity
If p delivers v(g), then p v(g)
Non-triviality
If q joins a group is reachable from p, then
eventually q is always in the views that p
delivers
If a group is partitioned, then eventually the
views delivered in any one partition will exclude
processes in another partition

23
View-synchronous guarantees (I)

Agreement
If p delivers message m in view v(g) then
delivers v(g), then all processes that survive
to deliver v(g) also delivers m in the view v(g)
Integrity
If p delivers message m, then it will not deliver
m again.
Validity
Correct processes always deliver the messages
that they send.
If the system fails to deliver a message to any
process q, then it notifies the surviving
processes by deliverign a new view, with q
excluded, immediately after the view in which any
of them delivered the message.

24
View-synchronous guarantees (II)
P sends a msg while in view (p, q, r) crashes
25
View-synchronous guarantees (III)

State transfer to a new group member
Delivery of first view containing the new process
Group representative captures its state
Send state to new member (one-to-one)
Suspend execution
All (previous) group members suspend their
execution as well
New member delivers state
Integrate new state
Multicast Commence message to the group

26
The gossip architecture (I)

Replicate data close to points where groups of
clients need it
Periodic exchange of msgs among RMs
Front-ends send queries updates to any RM they
choose
Any RM that is available can provide acceptable
response times
Consistent service over time
Relaxed consistency bet. replicas

27
The gossip architecture (II)

Causal update ordering
Forced ordering
Causal total
A Forced-order a Causal-order update that are
related by the happened-before relation may be
applied in different orders at different RMs !
Immediate ordering
Updates are applied in a consistent order
relative to any other update at all RMs

28
The gossip architecture (III)

Bulletin board application example
Posting items -gt causal order
Adding a subscriber -gt forced order
Removing a subscriber -gt immediate order
Gossip messages updates among RMs
Front-ends maintain prev vector timestamp
One entry per RM
RMs respond with new vector timestamp

29
State components of a gossip RM
30
Query operations in gossip

RM must return a value that is at least as recent
as the requests timestamp
Q.prev lt valueTS
List of pending query operations
Hold back until above condition is satisfied
RM can wait for missing updates
or request updates from the RMs concerned
RMs response includes valueTS

31
Updates in causal order

RM-i checks to see if operation ID is in its
executed table or in its log
Discard update if it has already seen it
Increment i-th element of replica timestamp
Count of updates received from front-ends
Assign vector timestamp (ts) to the update
Replace i-th element of u.prev by i-th element of
replica timestamp
Insert log entry
lti, ts, u.op, u.prev, u.idgt
Stability condition u.prev lt valueTS
All updates on which u depends have been applied

32
Forced immediate order

Unique sequence number is appended to update
timestamps
Primary RM acts as sequencer
Another RM can be elected to take over
consistently as sequencer
Majority of RMs (including primary) must record
which update is the next in sequence
Immediate ordering by having the primary order
them in the sequence (along with forced updates
considering causal updates as well)
Agreement protocol on sequence

33
References

K.P. Birman, The process group approach to
reliable distributed computing, CACM, vol. 36,
no. 12, pp. 36-53, 1993.
R. Ladin, B. Liskov, L. Shrira and S. Ghemawat,
Providing Availability using Lazy Replication,
ACM Trans. Computer Systems, vol. 10, no.4, pp.
360-391, 1992.

Write a Comment

User Comments (0)