Formal Models for Distributed Negotiations Commit Protocols - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Formal Models for Distributed Negotiations Commit Protocols

Description:

XVII Escuela de Ciencias Informaticas (ECI 2003), Buenos Aires, July 21-26 2003 ... Commits have to be coordinated among participants to preserve data consistency ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 40
Provided by: RB2
Category:

less

Transcript and Presenter's Notes

Title: Formal Models for Distributed Negotiations Commit Protocols


1
Formal Models forDistributed NegotiationsCommit
Protocols
XVII Escuela de Ciencias Informaticas (ECI 2003),
Buenos Aires, July 21-26 2003
Roberto Bruni Dipartimento di Informatica
Università di Pisa
2
Distributed DataBases
  • Data can be inherently distributed
  • e.g. customers accounts in different branches of
    the same bank
  • Data are distributed to achieve failure
    independence
  • e.g. replicated file systems
  • Partial failures can lead to inconsistent results
  • Commits have to be coordinated among participants
    to preserve data consistency

3
Distributed DataBases
user
user
DB
user
user
user
Centralized
Distributed
4
Atomic Commitment Problem
  • Reach a globally consistent state despite
    failures
  • Each participant has two possible decision values
  • commit
  • All participants will make the transactions
    updates permanent
  • abort
  • All will roll-back
  • Individual decisions are irreversible
  • A commit decision requires unanimity of YES votes

5
Atomic Commitment Properties
  • Consensus
  • All participants that decide reach the same
    decision
  • If any participant decides commit, then all
    participants must have voted YES
  • If all participants have voted YES and no
    failures occur, then commit is decided
  • Irreversibility
  • Each participant decides at most once

6
Commitment Protocols
  • Atomic commitment protocol
  • satisfies all atomic commitment properties
  • ensures that transactions terminate consistently
    at all participating sites of a distributed
    database, even in presence of failures
  • Non-blocking
  • if it permits transaction termination to proceed
    at correct participants despite failures of
    others
  • is the activity of ensuring that Sw and Hw
    failures do not corrupt persistent data
  • can limit time intervals of resource locking

7
Some Assumptions
  • One of the participants acts as unique
    coordinator (centralized version)
  • At most one (if no failures, then there is one
    coordinator)
  • A participant assumes the role of coordinator
    within a fixed time interval from the beginning
    of the transaction
  • The transaction begins at a single participant
    called the invoker
  • sends start messages to other participants
  • Only undeliverable messages are dropped
  • All participants can communicate (useful later)

8
Generic ACP Coordinator
  • send VOTE-REQTid to all participants
  • set-timeout
  • wait-for voteTid from all participants
  • if (all votes are YES) then
  • broadcast(commitTid, participants)
  • else // at least one vote is NO
  • broadcast(abortTid, participants)
  • on-timeout // escape blocking wait-for
  • broadcast(abortTid, participants)

Phase 1
Phase 2
9
Generic ACP Participants
  • set-timeout
  • wait-for VOTE-REQTid from coordinator // 1
  • send voteTid to coordinator
  • if (voteNO) then // unilateral abort
  • decide abort
  • else
  • set-timeout
  • wait-for decision from coordinator // 2
  • if (decisionabort) then decide abort
  • else decide commit
  • on-timeout termination-protocol // escape 2
  • on-timeout decide abort //escape 1

10
Simple Broadcast
  • broadcast(m,S)
  • // Broadcaster
  • send m to all processes in S
  • deliver m
  • // other processes in S
  • upon-receipt m // non-blocking
  • deliver m
  • This corresponds to the 2PC Protocol

11
Timeout Actions
  • Participants must wait
  • VOTE_REQ from coordinator
  • If this takes too long can just decide abort
  • Coordinator collects votes
  • No global decision is yet made
  • Coordinator can decide abort
  • commit / abort from coordinator
  • The participants already took a decision (YES)
  • It is now uncertain
  • It must consult other participants according to
    the termination protocol

12
Termination Protocol (TP)
  • What if a participant that voted YES times out
    waiting for the response from coordinator?
  • It invokes a termination protocol to contact
  • the coordinator
  • other participants (cooperative TP)
  • can have already voted or not yet voted
  • There are failure scenarios for which no
    termination protocol can lead to a decision
  • Blocking scenario correct participants cannot
    decide
  • e.g. coordinator crashes during broadcast
  • all faulty participants deliver and crash
  • all correct participants do not deliver the
    decision
  • if faulty participants do not recover any
    decision could contradict the decision of a
    participant that crashed

13
Non-Blocking ACP I
  • set-timeout
  • wait-for VOTE-REQTid from coordinator // 1
  • send voteTid to coordinator
  • if (voteNO) then // unilateral abort
  • decide abort
  • else
  • set-timeout
  • wait-for decision from coordinator // 2
  • if (decisionabort) then decide abort
  • else decide commit
  • on-timeout decide abort // escape 2
  • on-timeout decide abort //escape 1

14
Non-Blocking ACP II
  • broadcast(m,S)
  • // Broadcaster as before
  • // other processes in S
  • upon-first-receipt m
  • send m to all processes in S // S can be sent
    along VOTE_REQ
  • deliver m
  • any process receiving m relays m to all others
    (if any correct process receives m, all correct
    process receive m, even if broadcaster crashes)
  • m is delivered only after relaying

15
Recovery
  • Participant p is recovering from a failure
  • Must reach a consistent decision
  • Suppose p remembers its state at the time it
    failed
  • Before voting
  • it can unilaterally abort
  • After deciding abort
  • it can unilaterally abort
  • After receiving commit / abort from coordinator
  • it had already decided and must behave
    accordingly
  • During the uncertainty period (voted YES)
  • Independent recovery is not possible!
  • Termination protocol is needed

16
Distributed Transaction Log
  • DTL is kept in stable storage at each site
  • Its content must survive failures
  • Coordinators and participants at that site can
    record information about transactions
  • Before/after sending VOTE_REQ, the coordinator C
    writes start2PC(S,Tid)
  • Before voting YES, a participant writes
    yes(C,S,Tid)
  • Before/after voting NO, a participant writes
    abort(Tid)
  • Before C sends commit, it writes commit(Tid)
  • Before/after C sends abort, it writes abort(Tid)
  • After receiving the decision, participant writes
    commit/abort

17
Recovery From DTL
  • If DTL contains start2PC (the site hosted the
    coordinator)
  • If it also contains commit/abort
  • The coordinator decided before failure
  • Otherwise
  • The coordinator can decide abort (and record it
    in DTL)
  • Otherwise
  • It contains commit/abort
  • The participant has reached decision before the
    failure
  • Does not contain yes
  • Either failed before voting or voted no
  • The participant can unilaterally abort
  • Otherwise (it contains yes but not commit/abort)
  • The participant failed in its uncertainty period
  • Must use the termination protocol

18
Cooperative TP Initiator
  • send DECISION_REQTid to all processes in S
  • wait-for decisionTid from any process
  • if (decisioncommit) then
  • write commit in DTL
  • else // decisionabort
  • write abort in DTL

19
Cooperative TP Responder
  • wait-for decisionTid from any process p
  • if (abort(Tid) in DTL) then
  • send abort to p
  • else if (commit(Tid) in DTL) then
  • send commit to p

20
Evaluation of 2PC
  • Criteria Reliability vs Efficiency
  • Resiliency
  • What failures can be tolerated?
  • Blocking
  • Can processes be blocked?
  • Under which conditions?
  • Time Complexity
  • How long does it take to reach a decision?
  • Message Complexity
  • How many messages are exchanged to reach a
    decision?
  • What are their dimensions?

21
Balancing
  • Reliability and Efficiency are conflicting goals
  • each can be achieved at the expenses of the other
  • The choice of protocol depends on which goal is
    more important for a specific application
  • Whatever protocol is chosen, we should optimize
    for the case with no failures
  • Hopefully the normal operating state of the system

22
Measuring Time Complexity
  • A round is the max time for a message to reach
    its destination
  • Timeouts are based on the assumption that such a
    delay is known
  • Note that many messages can be sent in a single
    round
  • Two messages must belong to different rounds iff
    one cannot be sent before the other is received
  • Rounds are taken as time units
  • We count the number of rounds needed for
    unblocked sites to reach a decision, in the worst
    case
  • This neglects the time needed to process messages
  • Reasonable messages delays usually exceed
    processing delays
  • Other two factors can be relevant
  • DTL management (on stable storage)
  • Broadcasting preparation (to a large number of
    processes)

23
Measuring Message Complexity
  • Number of messages sent during the whole protocol
  • Reasonable measure if individual messages are not
    very large
  • Otherwise we should measure the length of
    messages, not merely their number
  • Here messages are short, so we abstract away from
    their lengths

24
Reliability of 2PC
  • Resiliency
  • 2PC is resilient to
  • site failures
  • communication failures
  • In fact, the cause of timeouts is not important
  • Blocking
  • 2PC is subject to blocking
  • Probabilistic analysis can be performed depending
    on the probabilistic distribution of failures

25
Time Complexity of 2PC
  • In absence of failure, 2PC requires 3 rounds
  • Broadcast VOTE-REQ
  • Collect votes
  • Broadcast global decision
  • If failures happen, The TP may need 2 additional
    rounds
  • Broadcast DECISION_REQ
  • Reply from a process outside its uncertainty
    period
  • Note that several TPs can be initiated separately
    in the same round
  • Up to 5 rounds, independently from the number of
    failures!
  • But processes may remain blocked for an unbounded
    period of time

26
Message Complexity of 2PC
  • Let N1 be the number of participants, including
    the coordinator
  • In each round of 2PC, there are N messages sent
  • Hence, in absence of failures 2PC uses 3N
    messages
  • Cooperative TP is invoked by all participants
    that voted YES but did not receive commit / abort
  • Let there be M such participants
  • M initiators, each sending N DECISION_REQ (MN
    messages)
  • At most N-M1 processes will respond to the first
    request
  • In the worst case only one process abandons its
    uncertainty and will respond to another
    initiator (N-M1)(N-M2)N

27
Calculating the Message Complexity of 2PC
  • In the worst case the total number of TP messages
    will be
  • NM ?i1 (N-Mi) NM NM M2 M(M1)/2
  • 2NM M2/2 M/2 messages
  • This quantity is maximum when MN
  • N(3N1)/2 messages
  • The 2PC together with worst-case TP amount to
  • 3N N(3N1)/2 N(3N7)/2 messages

M
28
Communication Topology
  • The communication topology of a protocol is the
    specification of who sends messages to whom
  • e.g. in 2PC without TP, the coordinator sends
    messages to participants and vice versa
  • Participants do not send messages directly to
    each other
  • The topology is described as a tree of height 1

Coordinator

Participant
Participant
Participant
Participant
29
Alternative 2PCs
  • To reduce time and message complexity of
    centralized 2PC, two variations have been
    proposed, based on different communication
    topologies
  • Decentralized 2PC
  • Communication topology is a complete graph
  • Improve time complexity
  • Linear 2PC (aka Nested 2PC)
  • Linearly ordered processes
  • Reduce the number of messages

30
Decentralized 2PC
  • Depending on its own vote, the coordinator sends
    YES or NO to all participants
  • Informs that it is time to vote
  • Tells the coordinators vote
  • If the message is NO
  • Each participant decides abort and stops
  • Otherwise, each participant sends back its vote
    to ALL OTHER PARTICIPANTS
  • After receiving all votes each process can decide
    autonomously
  • If all are YES and its own vote is YES, decide
    commit
  • Otherwise it decides abort
  • Timeouts can be employed as in the centralized
    2PC

31
Evaluation of Decentralized 2PC
  • In the absence of failures, only 2 rounds are
    necessary
  • Coordinator voting YES / NO
  • Each participant voting YES / NO
  • More messages are needed N2N messages
  • N messages in the first round
  • N2 messages in the second round
  • (and this is just in absence of failures)

32
Linear 2PC
  • Each participant can communicate only with its
    left / right neighbors
  • The coordinator is the leftmost process
  • It sends its vote YES / NO to its right neighbor
  • This message has a dual meaning as in
    decentralized 2PC
  • Each participant p waits for the vote from its
    left neighbor
  • If it is YES, and p votes YES, then p tells YES
    to its right neighbor
  • Otherwise, p tells NO to its right neighbor
  • When the rightmost participant receives the vote,
    it makes the final decision commit / abort
  • The decision is propagated from right to left
  • When the coordinator receives it, the protocol
    ends
  • Timeout periods are influenced by positions

33
Evaluation of Linear 2PC
  • Only 2N messages needed
  • N votes from left to right
  • N decisions from right to left
  • (and this is just in absence of failures)
  • Unfortunately the same amount of rounds is
    needed 2N rounds
  • No two messages are sent concurrently

34
Comparison of 2PC Variants
  • Hybrid communication topologies are also possible
  • e.g. Linear for voting, complete for conveying
    decision
  • 2N messages, N1 rounds
  • The choice of the protocol might be influenced by
    the available communication topology

35
From 2PC to 3PC
  • In 2PC, if all operational participants are
    uncertain, they are blocked
  • They cannot decide abort even if aware that
    processes they cannot communicate with have
    failed, because some of them could have decided
    commit before failure
  • The 3CP is an ACP designed to rule out this
    situation
  • It guarantees that if any operational process is
    uncertain, then no (operational / failed) process
    can have decided commit
  • Thus, if p realizes that any operational site is
    uncertain, then p can decide abort
  • Why does 2PC violate this property?
  • A participant p can receive commit while q is
    still uncertain

36
Sketch of 3PC The Idea
  • After the coordinator has found that all votes
    were YES, it sends pre-commit messages to all
    participants
  • When a participant p receives pre-commit, it
    knows that all participants voted YES
  • p is no longer uncertain, but does not decide
    commit yet
  • p knows that it will decide commit unless it
    fails
  • p acknowledges the receipt of pre-commit
  • When the coordinator collects all acks it knows
    that no participant is uncertain
  • The coordinator sends commit to all participants
  • When a participant receives commit, it decides
    commit
  • If a participant voted NO, then 3PC behaves as 2PC

37
Sketch of 3PC Some Notes
  • In absence of failures, 3PC involves 5 rounds and
    up to 5N messages
  • Participants have four possible states
  • Aborted, Uncertain, Committable, Committed
  • For p and q any two participants, only certain
    combinations of their states are possible
  • Timeouts can occur in five situations
  • 3 are trivially handled
  • 2 require a complex termination protocol
  • Election protocol (for a new coordinator) based
    on a linear ordering of participants
  • The new coordinator checks the states of all
    operational participants
  • Timeouts are again necessary

38
Recap
  • We have seen
  • Atomic Commitment Problem
  • Several ACP protocols
  • Generic ACP
  • Centralized 2PC (Good middle ground)
  • Non-Blocking ACP
  • Decentralized 2PC (OK if end-to-end delays must
    be minimized)
  • Linear 2PC (OK if messages are expensive)
  • 3PC (sketched)
  • Learned some criteria to evaluate and compare
    protocols
  • Usually also dependent on the communication
    topology

39
References
  • Concurrency control and recovery in database
    systems (Chapter 7, Addison-Wesley 1987)
  • P. Bernstein, N. Goodman, V. Hadzilacos
  • Non-blocking atomic commitment (Chapter 6 of
    Distributed Systems, Addison-Wesley 1995)
  • O. Babaoglu, S. Toueg
Write a Comment
User Comments (0)
About PowerShow.com