CS556: Distributed Systems - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

CS556: Distributed Systems

Description:

What if a replica fails or becomes isolated ? Upon rejoining, it must 'catch up' ... c (disallowed). d (disallowed). p crashes. view (q, r) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 34
Provided by: dimp9
Category:

less

Transcript and Presenter's Notes

Title: CS556: Distributed Systems


1
CS-556 Distributed Systems
High Availability Replication (II)
  • Manolis Marazakis
  • maraz_at_csd.uoc.gr

2
Transactions with Replicated Data
  • Better performance
  • Concurrent service
  • Reduced latency
  • Higher availability
  • Fault tolerance
  • What if a replica fails or becomes isolated ?
  • Upon rejoining, it must catch up
  • Replicated transaction service
  • Data replicated at a set of replica managers
  • Replication transparency
  • One copy serializability
  • Read one, write all

Failures must be observed to have happened
before any active Txs at other servers
3
Fault Tolerance ?
  • Define correctness criteria
  • When 2 replicas are separated by network
    partition
  • Both are deemed incorrect stop serving.
  • One (the master) continues the other ceases
    service.
  • One (the master) continues to accept updates
    both continue to supply reads (of possibly stale
    data).
  • Both continue service subsequently synchronise.

4
Linearizability
  • Sequence of client i operations Oi0, Oi1, Oi2,
  • Single server would serialize client operations
    in some order
  • e.g., O10, O11, O20, O21, O12,
  • This is a virtual interleaving of client
    operations in a server with single-copy of data
  • A replicated shared object service is
    linearizable if for any execution there is an
    interleaving of the client operations that
    satisfies
  • The interleaved sequence of operations meets
    specification of single correct copy of objects
  • The order of operations in interleaving is
    consistent with real times at which the
    operations occurred at actual execution

5
Sequential consistency
  • Linearizability is hard to achieve in practice,
    without precise clock synchronization
  • A replicated shared object service is
    sequentially consistent if for any execution
    there is an interleaving of the client operations
    that satisfies
  • The interleaved sequence of operations meets
    specification of single correct copy of objects
  • The order of operations in interleaving is
    consistent with program order in which each
    client executed them
  • ATTENTION no total ordering between clients
  • Every linearizable service is sequentially
    consistent (the converse is not true)

6
Example
  • Client 1 Client 2
  • setBalance-B(x,1)
  • getBalance-A(y) 0
  • getBalance-A(x) 0
  • setBalance-A(y,2)
  • Real-time criterion of linearizability is not
    satisfied
  • Find interleaving that satisfies both criteria
    for sequential consistency

7
Passive Replication (I)
  • At any time, system has single primary RM
  • One or more secondary backup RMs
  • Front ends communicate with primary, primary
    executes requests, response to all backups
  • If primary fails, one backup is promoted to
    primary
  • New primary starts from Coordination phase for
    each new request
  • What happens if primary crashes
    before/during/after agreement phase?

8
Passive Replication (II)
9
Passive replication (III)
  • Satisfies linearizability
  • Front end looks up new primary, when current
    primary does not respond
  • Primary RM is performance bottleneck
  • Can tolerate F failures for F1 RMs
  • Variation clients can access backup RMs
    (linearizability is lost, but clients get
    sequential consistency)
  • SUN NIS (yellow pages) uses passive replication
    clients can contact primary or backup servers for
    reads, but only primary server for updates

10
Active replication (I)
  • RMs are state machines with equivalent roles
  • Front ends communicates the client requests to RM
    group, using totally ordered reliable multicast
  • RMs process independently requests reply to
    front end (correct RMs process each request
    identically)
  • Front end can synthesize final response to client
    (tolerating Byzantine failures)
  • Active replication provides sequential
    consistency if multicast is reliable ordered
  • Byzantine failures (F out of 2F1) front end
    waits until it gets F1 identical responses

11
Active replication (II)
12
Replication Architectures
replica managers
  • How many replicas are required?
  • All or majority ?
  • Forward all updates as soon as received.
  • Two phase commit protocol.
  • Contacted replica acts as coordinator
  • What if one of the replicas isnt available?
  • Primary copy replication

getBalance(A)
deposit(B)
13
Available Copies Replication
replica managers
getBalance(A)
  • Not all copies will always be available.
  • Failures
  • Timeout at failed replica
  • Rejected by recovering, unsynchronised replica

deposit(B)
deposit(A)
getBalance(B)
14
Local Validation
  • Failure recovery events do not occur during a
    Tx.
  • Example
  • T reads A before server Xs failure, therefore
    Tgt failX
  • T observes server Ns failure when it writes B,
    therefore failNgtT
  • failNgt T.getBalance(A)gt T.deposit(B) gt failX
  • failXgt U.getBalance(B)gt U.deposit(A) gt failN

Server x fails followed by Transaction U which
is followed by Server Ns failure which is
followed by Transaction T which is
followed by server Xs failure. This is
inconsistent so the transactions must not be
allowed to commit.
Failure and recovery must be serialised just like
a Tx They occur before or after a Tx, but
not during.
15
Network Partitions
  • Separate but viable groups of servers
  • Optimistic schemes validate on recovery
  • Available copies with validation
  • Pessimistic schemes limit availability until
    recovery

16
Fault Tolerance
  • Design to recover after a failure w/o loss of
    (committed) data.
  • Designs for fault tolerance
  • Single server, fail and recover
  • Primary server with trailing backups
  • Replicated service

17
Ordered Multicast
  • FIFO ordering If a correct process issues
    multicast(g, m1) followed by multicast(g,m2) then
    every correct process that delivers m2 will
    deliver m1 before m2.
  • Causal ordering If multicast(g, m1) happened
    before multicast(g, m2) then any correct process
    that delivers m2 will deliver m1 before m2.
  • Total ordering If a correct process delivers m1
    before it delivers m2, then any other correct
    process that delivers m2 will deliver m1 before
    m2.

18
Total Causal Ordering
C3 no happened before indication -
delivered in different orders at P2, P3, P4
19
Synch Ordering
All replicas of the same request are either
processed before the synch request or after it.
- Essentially a synch request serves to
flush the system.
20
Causal Ordering - Vector Timestamps
  • Lamports logical clock does not show causality
  • Vector timestamp - a sequence of events at each
    process from each source
  • FE keeps timestamp of RM at last read

21
Group membership service
For each group,this service delivers to any
member process a series of views.
22
View delivery constraints
  • Order
  • If p delivers v(g) then v(g), then no other
    process delivers v(g) before v(g)
  • Integrity
  • If p delivers v(g), then p v(g)
  • Non-triviality
  • If q joins a group is reachable from p, then
    eventually q is always in the views that p
    delivers
  • If a group is partitioned, then eventually the
    views delivered in any one partition will exclude
    processes in another partition

23
View-synchronous guarantees (I)
  • Agreement
  • If p delivers message m in view v(g) then
    delivers v(g), then all processes that survive
    to deliver v(g) also delivers m in the view v(g)
  • Integrity
  • If p delivers message m, then it will not deliver
    m again.
  • Validity
  • Correct processes always deliver the messages
    that they send.
  • If the system fails to deliver a message to any
    process q, then it notifies the surviving
    processes by deliverign a new view, with q
    excluded, immediately after the view in which any
    of them delivered the message.

24
View-synchronous guarantees (II)
P sends a msg while in view (p, q, r) crashes
25
View-synchronous guarantees (III)
  • State transfer to a new group member
  • Delivery of first view containing the new process
  • Group representative captures its state
  • Send state to new member (one-to-one)
  • Suspend execution
  • All (previous) group members suspend their
    execution as well
  • New member delivers state
  • Integrate new state
  • Multicast Commence message to the group

26
The gossip architecture (I)
  • Replicate data close to points where groups of
    clients need it
  • Periodic exchange of msgs among RMs
  • Front-ends send queries updates to any RM they
    choose
  • Any RM that is available can provide acceptable
    response times
  • Consistent service over time
  • Relaxed consistency bet. replicas

27
The gossip architecture (II)
  • Causal update ordering
  • Forced ordering
  • Causal total
  • A Forced-order a Causal-order update that are
    related by the happened-before relation may be
    applied in different orders at different RMs !
  • Immediate ordering
  • Updates are applied in a consistent order
    relative to any other update at all RMs

28
The gossip architecture (III)
  • Bulletin board application example
  • Posting items -gt causal order
  • Adding a subscriber -gt forced order
  • Removing a subscriber -gt immediate order
  • Gossip messages updates among RMs
  • Front-ends maintain prev vector timestamp
  • One entry per RM
  • RMs respond with new vector timestamp

29
State components of a gossip RM
30
Query operations in gossip
  • RM must return a value that is at least as recent
    as the requests timestamp
  • Q.prev lt valueTS
  • List of pending query operations
  • Hold back until above condition is satisfied
  • RM can wait for missing updates
  • or request updates from the RMs concerned
  • RMs response includes valueTS

31
Updates in causal order
  • RM-i checks to see if operation ID is in its
    executed table or in its log
  • Discard update if it has already seen it
  • Increment i-th element of replica timestamp
  • Count of updates received from front-ends
  • Assign vector timestamp (ts) to the update
  • Replace i-th element of u.prev by i-th element of
    replica timestamp
  • Insert log entry
  • lti, ts, u.op, u.prev, u.idgt
  • Stability condition u.prev lt valueTS
  • All updates on which u depends have been applied

32
Forced immediate order
  • Unique sequence number is appended to update
    timestamps
  • Primary RM acts as sequencer
  • Another RM can be elected to take over
    consistently as sequencer
  • Majority of RMs (including primary) must record
    which update is the next in sequence
  • Immediate ordering by having the primary order
    them in the sequence (along with forced updates
    considering causal updates as well)
  • Agreement protocol on sequence

33
References
  • K.P. Birman, The process group approach to
    reliable distributed computing, CACM, vol. 36,
    no. 12, pp. 36-53, 1993.
  • R. Ladin, B. Liskov, L. Shrira and S. Ghemawat,
    Providing Availability using Lazy Replication,
    ACM Trans. Computer Systems, vol. 10, no.4, pp.
    360-391, 1992.
Write a Comment
User Comments (0)
About PowerShow.com