Title: Formal Models for Distributed Negotiations: Transactions
1Models and Languages for Coordination and
Orchestration IMT- Institutions Markets
Technologies - Alti Studi Lucca
Nominal Calculi for Transactions ZSN in JOIN II
Roberto Bruni Dipartimento di Informatica
Università di Pisa
2Distributed 2PC
- The distributed 2PC is a variant of the
decentralized 2PC with a finite but unknown
number of participants - When a participant P is ready to commit it has
only a partial knowledge of the whole set of
participants - Only those who directly cooperated with P
- To commit P must contact all its neighbours and
possibly learn the identity of other participants
from them
3Commit in Distributed DataBases
- Data can be inherently distributed
- e.g. customers accounts in different branches of
the same bank - Data are distributed to achieve failure
independence - e.g. replicated file systems
- Partial failures can lead to inconsistent results
- Commits have to be coordinated among participants
to preserve data consistency
4Distributed DataBases
user
user
DB
user
user
user
Centralized
Distributed
5Atomic Commitment Problem
- Reach a globally consistent state despite
failures - Each participant has two possible decision values
- commit
- All participants will make the transactions
updates permanent - abort
- All will roll-back
- Individual decisions are irreversible
- A commit decision requires unanimity of YES votes
6Atomic Commitment Properties
- Consensus
- All participants that decide reach the same
decision - If any participant decides commit, then all
participants must have voted YES - If all participants have voted YES and no
failures occur, then commit is decided - Irreversibility
- Each participant decides at most once
7Commitment Protocols
- Atomic commitment protocol
- satisfies all atomic commitment properties
- ensures that transactions terminate consistently
at all participating sites of a distributed
database, even in presence of failures - Non-blocking
- if it permits transaction termination to proceed
at correct participants despite failures of
others - is the activity of ensuring that Sw and Hw
failures do not corrupt persistent data - can limit time intervals of resource locking
8Some Assumptions
- One of the participants acts as unique
coordinator (centralized version) - At most one (if no failures, then there is one
coordinator) - A participant assumes the role of coordinator
within a fixed time interval from the beginning
of the transaction - The transaction begins at a single participant
called the invoker (not necessarily the
coordinator) - sends start messages to other participants
- Only undeliverable messages are dropped
- All participants can communicate (useful later)
9Generic ACP Coordinator
- send VOTE-REQTid to all participants
- set-timeout
- wait-for voteTid from all participants
- if (all votes are YES) then
- broadcast (commitTid, participants)
- else // at least one vote is NO
- broadcast (abortTid, participants)
- on-timeout // escape blocking wait-for
- broadcast (abortTid, participants)
Phase 1
Phase 2
10Generic ACP Participants
- set-timeout
- wait-for VOTE-REQTid from coordinator // 1
- send voteTid to coordinator
- if (voteNO) then // unilateral abort
- decide abort
- else
- set-timeout
- wait-for decision from coordinator // 2
- if (decisionabort) then decide abort
- else decide commit
- on-timeout termination-protocol // escape 2
- on-timeout decide abort //escape 1
11Simple Broadcast
- broadcast(m,S)
- // Broadcaster
- send m to all processes in S
- deliver m
- // other processes in S
- upon-receipt m // non-blocking
- deliver m
- This corresponds to the 2PC Protocol
12Timeout Actions
- Participants must wait
- VOTE_REQ from coordinator
- If this takes too long can just decide abort
- Coordinator collects votes
- No global decision is yet made
- Coordinator can decide abort
- commit / abort from coordinator
- The participants already took a decision (YES)
- It is now uncertain
- It must consult other participants according to
the termination protocol
13Termination Protocol (TP)
- What if a participant that voted YES times out
waiting for the response from coordinator? - It invokes a termination protocol to contact
- the coordinator
- other participants (cooperative TP)
- can have already voted or not yet voted
- There are failure scenarios for which no
termination protocol can lead to a decision - Blocking scenario correct participants cannot
decide - e.g. coordinator crashes during broadcast
- all faulty participants deliver and crash
- all correct participants do not deliver the
decision - if faulty participants do not recover any
decision could contradict the decision of a
participant that crashed
14Non-Blocking ACP I
- set-timeout
- wait-for VOTE-REQTid from coordinator // 1
- send voteTid to coordinator
- if (voteNO) then // unilateral abort
- decide abort
- else
- set-timeout
- wait-for decision from coordinator // 2
- if (decisionabort) then decide abort
- else decide commit
- on-timeout decide abort // escape 2
- on-timeout decide abort //escape 1
15Non-Blocking ACP II
- broadcast(m,S)
- // Broadcaster as before
- // other processes in S
- upon-first-receipt m
- send m to all processes in S // S can be sent
along VOTE_REQ - deliver m
- any process receiving m relays m to all others
(if any correct process receives m, all correct
process receive m, even if broadcaster crashes) - m is delivered only after relaying
16Recovery
- Participant p is recovering from a failure
- Must reach a consistent decision
- Suppose p remembers its state at the time it
failed - Before voting
- it can unilaterally abort
- After deciding abort
- it can unilaterally abort
- After receiving commit / abort from coordinator
- it had already decided and must behave
accordingly - During the uncertainty period (voted YES)
- Independent recovery is not possible!
- Termination protocol is needed
17Distributed Transaction Log
- DTL is kept in stable storage at each site
- Its content must survive failures
- Coordinators and participants at that site can
record information about transactions - Before/after sending VOTE_REQ, the coordinator C
writes start2PC(S,Tid) - Before voting YES, a participant writes
yes(C,S,Tid) - Before/after voting NO, a participant writes
abort(Tid) - Before C sends commit, it writes commit(Tid)
- Before/after C sends abort, it writes abort(Tid)
- After receiving the decision, participant writes
commit/abort
18Recovery From DTL
- If DTL contains start2PC (the site hosted the
coordinator) - If it also contains commit/abort
- The coordinator decided before failure
- Otherwise
- The coordinator can decide abort (and record it
in DTL) - Otherwise
- It contains commit/abort
- The participant has reached decision before the
failure - Does not contain yes
- Either failed before voting or voted no
- The participant can unilaterally abort
- Otherwise (it contains yes but not commit/abort)
- The participant failed in its uncertainty period
- Must use the termination protocol
19Cooperative TP Initiator
- send DECISION_REQTid to all processes in S
- wait-for decisionTid from any process
- if (decisioncommit) then
- write commit in DTL
- else // decisionabort
- write abort in DTL
20Cooperative TP Responder
- wait-for decisionTid from any process p
- if (abort(Tid) in DTL) then
- send abort to p
- else if (commit(Tid) in DTL) then
- send commit to p
21Evaluation of 2PC
- Criteria Reliability vs Efficiency
- Resiliency
- What failures can be tolerated?
- Blocking
- Can processes be blocked?
- Under which conditions?
- Time Complexity
- How long does it take to reach a decision?
- Message Complexity
- How many messages are exchanged to reach a
decision? - What are their dimensions?
22Balancing
- Reliability and Efficiency are conflicting goals
- each can be achieved at the expenses of the other
- The choice of protocol depends on which goal is
more important for a specific application - Whatever protocol is chosen, we should optimize
for the case with no failures - Hopefully the normal operating state of the system
23Measuring Time Complexity
- A round is the max time for a message to reach
its destination - Timeouts are based on the assumption that such a
delay is known - Note that many messages can be sent in a single
round - Two messages must belong to different rounds iff
one cannot be sent before the other is received - Rounds are taken as time units
- We count the number of rounds needed for
unblocked sites to reach a decision, in the worst
case - This neglects the time needed to process messages
- Reasonable messages delays usually exceed
processing delays - Other two factors can be relevant
- DTL management (on stable storage)
- Broadcasting preparation (to a large number of
processes)
24Measuring Message Complexity
- Total number of messages sent during the whole
protocol - Reasonable measure if individual messages are not
very large - Otherwise we should measure the length of
messages, not merely their number - Here messages are short, so we abstract away from
their lengths
25Reliability of 2PC
- Resiliency
- 2PC is resilient to
- site failures
- communication failures
- In fact, the cause of timeouts is not important
- Blocking
- 2PC is subject to blocking
- Probabilistic analysis can be performed depending
on the probabilistic distribution of failures
26Time Complexity of 2PC
- In absence of failure, 2PC requires 3 rounds
- Broadcast VOTE-REQ
- Collect votes
- Broadcast global decision
- If failures happen, The TP may need 2 additional
rounds - Broadcast DECISION_REQ
- Reply from a process outside its uncertainty
period - Note that several TPs can be initiated separately
in the same round - Up to 5 rounds, independently from the number of
failures! - But processes may remain blocked for an unbounded
period of time
27Message Complexity of 2PC
- Let N1 be the number of participants, including
the coordinator - In each round of 2PC, there are N messages sent
- Hence, in absence of failures 2PC uses 3N
messages - Cooperative TP is invoked by all participants
that voted YES but did not receive commit / abort
- Let there be M such participants
- M initiators, each sending N DECISION_REQ (MN
messages) - At most N-M1 processes will respond to the first
request - In the worst case only one process abandons its
uncertainty and will respond to another
initiator (N-M1)(N-M2)N
28Calculating the Message Complexity of 2PC
- In the worst case the total number of TP messages
will be - NM ?i1 (N-Mi) NM NM M2 M(M1)/2
- 2NM M2/2 M/2 messages
- This quantity is maximum when MN
- N(3N1)/2 messages
- The 2PC together with worst-case TP amount to
- 3N N(3N1)/2 N(3N7)/2 messages
M
29Communication Topology
- The communication topology of a protocol is the
specification of who sends messages to whom - e.g. in 2PC without TP, the coordinator sends
messages to participants and vice versa - Participants do not send messages directly to
each other - The topology is described as a tree of height 1
Coordinator
Participant
Participant
Participant
Participant
30Alternative 2PCs
- To reduce time and message complexity of
centralized 2PC, two variations have been
proposed, based on different communication
topologies - Decentralized 2PC
- Communication topology is a complete graph
- Improve time complexity
- Linear 2PC (aka Nested 2PC)
- Linearly ordered processes
- Reduce the number of messages
31Decentralized 2PC
- Depending on its own vote, the coordinator sends
YES or NO to all participants - Informs that it is time to vote
- Tells the coordinators vote
- If the message is NO
- Each participant decides abort and stops
- Otherwise, each participant sends back its vote
to ALL OTHER PARTICIPANTS - After receiving all votes each process can decide
autonomously - If all are YES and its own vote is YES, decide
commit - Otherwise it decides abort
- Timeouts can be employed as in the centralized
2PC
32Evaluation of Decentralized 2PC
- In the absence of failures, only 2 rounds are
necessary - Coordinator voting YES / NO
- Each participant voting YES / NO
- More messages are needed N2N messages
- N messages in the first round
- N2 messages in the second round
- (and this is just in absence of failures)
33Linear 2PC
- Each participant can communicate only with its
left / right neighbors - The coordinator is the leftmost process
- It sends its vote YES / NO to its right neighbor
- This message has a dual meaning as in
decentralized 2PC - Each participant p waits for the vote from its
left neighbor - If it is YES, and p votes YES, then p tells YES
to its right neighbor - Otherwise, p tells NO to its right neighbor
- When the rightmost participant receives the vote,
it makes the final decision commit / abort - The decision is propagated from right to left
- When the coordinator receives it, the protocol
ends - Timeout periods are influenced by positions
34Evaluation of Linear 2PC
- Only 2N messages needed
- N votes from left to right
- N decisions from right to left
- (and this is just in absence of failures)
- Unfortunately the same amount of rounds is
needed 2N rounds - No two messages are sent concurrently
35Comparison of 2PC Variants
Rounds Messages
Centralized 2PC 3 3N
Decentralized 2PC 2 N2N
Linear 2PC 2N 2N
- Hybrid communication topologies are also possible
- e.g. Linear for voting, complete for conveying
decision - 2N messages, N1 rounds
- The choice of the protocol might be influenced by
the available communication topology
36From 2PC to 3PC
- In 2PC, if all operational participants are
uncertain, they are blocked - They cannot decide abort even if aware that
processes they cannot communicate with have
failed, because some of them could have decided
commit before failure - The 3CP is an ACP designed to rule out this
situation - It guarantees that if any operational process is
uncertain, then no (operational / failed) process
can have decided commit - Thus, if p realizes that any operational site is
uncertain, then p can decide abort - Why does 2PC violate this property?
- A participant p can receive commit while q is
still uncertain
37Sketch of 3PC The Idea
- After the coordinator has found that all votes
were YES, it sends pre-commit messages to all
participants - When a participant p receives pre-commit, it
knows that all participants voted YES - p is no longer uncertain, but does not decide
commit yet - p knows that it will decide commit unless it
fails - p acknowledges the receipt of pre-commit
- When the coordinator collects all acks it knows
that no participant is uncertain - The coordinator sends commit to all participants
- When a participant receives commit, it decides
commit - If a participant voted NO, then 3PC behaves as 2PC
38Sketch of 3PC Some Notes
- In absence of failures, 3PC involves 5 rounds and
up to 5N messages - Participants have four possible states
- Aborted, Uncertain, Committable, Committed
- For p and q any two participants, only certain
combinations of their states are possible - Timeouts can occur in five situations
- 3 are trivially handled
- 2 require a complex termination protocol
- Election protocol (for a new coordinator) based
on a linear ordering of participants - The new coordinator checks the states of all
operational participants - Timeouts are again necessary
39Some References
- Concurrency control and recovery in database
systems (Addison-Wesley 1987) - P. Bernstein, N. Goodman, V. Hadzilacos
- Transaction processing concepts and techniques
(Morgan Kaufmann 1993) - J. Gray, A. Reuter
- Sagas (Proc. SIGMod87, ACM, pp. 249-259)
- H. Garcia-Molina, K. Salem
- Non-blocking atomic commitment (Chapter 6 of
Distributed Systems, Addison-Wesley 1995) - O. Babaoglu, S. Toueg
40D2PC
- Every participant P acts as coordinator
- During the transaction P builds its own
synchronization set LP of cooperating agents - When P is ready to commit, P asks readiness to
processes in LP (if empty P was isolated and can
commit) - In doing so, P sends them the set LP
- Other participants will send to P
- either a successful reply with their own
synchronization sets - or a failure message
- (in this case, failure is then propagated)
- Successful replies are added to LP
- The protocol terminates when LP is transitively
closed
41Example D2PC
P1,P3
P2
P1
P2
P3
P2
42Example P3 Enters ACP Phase
P1,P3
P2
P1
P2 ()
P3
P2
43Example P3 Contacts Known Parties
P1,P3
P2
Hi, I am P3. I am ready to commit. I know P2.
ltP3,P2gt
P1
P2 P2 ()
P3
P2
44Example P2 Enters ACP Phase
P1,P3 ()
P2
ltP3,P2gt
P1
P2 P2 ()
P3
P2
45Example P2 Contacts Known Parties
P1,P3 P1,P3 ()
P2
ltP3,P2gt
Hi, I am P2. I am ready to commit. I know P1 and
P3.
ltP2,P1,P3gt
ltP2,P1,P3gt
P1
P2 P2 ()
P3
P2
46Example Some Pending Messages Around
P1,P3 P1,P3 ()
P2
ltP3,P2gt
ltP2,P1,P3gt
P1
P2 P2 ()
P3
ltP2,P1,P3gt
P2
47Example P2 Reads a Pending Vote
P1,P3 P1,P3 (P3)
P2
ltP2,P1,P3gt
P1
P2 P2 ()
P3
ltP2,P1,P3gt
P2
48Example P3 Reads a Pending Vote
P1,P3 P1,P3 (P3)
P2
ltP2,P1,P3gt
P1
P1,P2 P2 (P2)
P3
P2
49Example P3 Contacts the Newly Known Party
P1,P3 P1,P3 (P3)
P2
ltP2,P1,P3gt
P1
P1,P2 P1,P2 (P2)
P3
ltP3,P1,P2gt
P2
50Example P1 Enters ACP Phase
P1,P3 P1,P3 (P3)
P2
ltP3,P1,P2gt
ltP2,P1,P3gt
P1
P1,P2 P1,P2 (P2)
P3
P2 ()
51Example P1 Contacts Known Parties
P1,P3 P1,P3 (P3)
P2
ltP3,P1,P2gt
ltP2,P1,P3gt
P1
ltP1,P2gt
P1,P2 P1,P2 (P2)
P3
P2 P2 ()
52Example Some Pending Messages Around
P1,P3 P1,P3 (P3)
P2
ltP1,P2gt
ltP3,P1,P2gt
ltP2,P1,P3gt
P1
P1,P2 P1,P2 (P2)
P3
P2 P2 ()
53Example P2 Reads a Pending Vote
P1,P3 P1,P3 (P1,P3)
P2
ltP3,P1,P2gt
ltP2,P1,P3gt
P1
P1,P2 P1,P2 (P2)
P3
P2 P2 ()
54Example P2 can Commit!
P1,P3 P1,P3 (P1,P3)
Q2
ltP3,P1,P2gt
ltP2,P1,P3gt
P1
P1,P2 P1,P2 (P2)
P3
P2 P2 ()
55Example P1 Reads a Pending Vote
P1,P3 P1,P3 (P1,P3)
Q2
ltP3,P1,P2gt
P1
P1,P2 P1,P2 (P2)
P3
P2,P3 P2 (P2)
56Example P1 Contacts the Newly Known Party
P1,P3 P1,P3 (P1,P3)
Q2
ltP3,P1,P2gt
P1
P1,P2 P1,P2 (P2)
ltP1,P2,P3gt
P3
P2,P3 P2 (P2)
57Example Some Pending Messages Around
P1,P3 P1,P3 (P1,P3)
Q2
ltP3,P1,P2gt
P1
P1,P2 P1,P2 (P2)
P3
ltP1,P2,P3gt
P2,P3 P2 (P2)
58Example P1 Reads a Pending Vote
P1,P3 P1,P3 (P1,P3)
Q2
P1
P1,P2 P1,P2 (P2)
P3
ltP1,P2,P3gt
P2,P3 P2 ,P3 (P2 ,P3)
59Example P3 Reads a Pending Vote
P1,P3 P1,P3 (P1,P3)
Q2
P1
P1,P2 P1,P2 (P1,P2)
P3
P2,P3 P2 ,P3 (P2 ,P3)
60Example P1 and P3 Commit!
P1,P3 P1,P3 (P1,P3)
Q2
Q1
P1,P2 P1,P2 (P1,P2)
Q3
P2,P3 P2 ,P3 (P2 ,P3)
61ZS nets in Join
We encode basic nets, which are expressive enough
given a net (T,S) we define an agent def T
in S , where
E open e E ? def D in e(put,
lock ) state( E ) e calc e
e(p, L) ? e(p, L) e fork e, e
e(p, L) ? def D in e(p, L ? lock )
e(put, L ? lock ) state( ? ) e, e
join e e(p, L) e(p, L) ? e(p,
L ? L) p(L ? L, ? ) e close E
e(p, L) ? p(L, E )
default compensation
62DTC in JOIN
the definition D is the following
state(H) put(L, F) ? commit(L \ lock , L ,
lock , F, H ) state(H) ? failed()
release(H) commit( l ? L, L, L, F, H) ?
commit(L, L, L , F, H) l(L, lock,
fail ) commit(L, L, L, F, H) lock(L, l, f
) ? commit(L ? (L \ L), L ? L, L ?
l , F, H ) commit(?, L, L, F, H) ?
release(F) commit(?, L, L, F, H) fail() ?
failed() release(H) failed() put(L, F) ?
failed() failed() lock(L, l, f) ? failed()
f() failed() fail() ? failed()
63Opening
a new thread is created and two tokens released
open
E open e E ? def D in e(put, lock
) state( E )
the state of the new thread it contains the
consumed resources to be returned in case of
failure
64Progress
return address where to send the set of threads
and the generated stable resources, if any
calc
e calc e e(p, L) ? e(p, L)
set of threads, including self, the token has
been in contact with
65Joining
join
e, e join e e(p, L) e(p,
L) ? e(p, L ? L p(L ? L, ? )
the first thread continues
the second thread terminates
66Splitting
the original thread continues but its contact set
is augmented with the name of new thread
a new thread is created and three tokens
released
fork
e fork e, e e(p, L) ? def D in
e(p, L ? lock ) e(put, L ? lock )
state( ? )
the new thread starts with the augmented contact
set
the state of the new thread contains no consumed
resource
67Pre-commit
generated stable resources to be unfrozen in case
of success
List of contacted threads already synchronized
with
list of known threads including self
state(H) put(L, F) ? commit(L \ lock , L ,
lock , F, H )
consumed resources to be returned in case of
failure
list of known threads to be contacted yet
the thread receives the signal that it can commit
locally and evolves to a committed state
68Commitment
commit( l ? L, L, L, F, H) ? commit(L, L,
L , F, H) l(L, lock, fail ) commit(L,
L, L, F, H) lock(L, l, f ) ?
commit(L ? (L \ L), L ? L, L ? l
, F, H ) commit(?, L, L, F, H) ? release(F)
69Failures
state(H) ? failed() release(H) commit(?, L,
L, F, H) fail() ? failed()
release(H) failed() put(L, F) ?
failed() failed() lock(L, l, f) ? failed()
f() failed() fail() ? failed()
consumed resources are given back
local failure
handling global failure
70Correctness
- The join version of the D2PC protocol comes with
a correctness theorem - for success
- for failure
- The result is exploited to state the adequacy of
the ZSN encoding - complex statement because of garbage terms
71Playful Digression5 Methods for Complex Proofs
- Proof by calculus
- "This proof requires calculus, so we'll skip it."
- Proof by postponement
- "The proof for this is long and arduous, so it
will be written in a technical annex." - Proof by mumbo-jumbo
- "(B ? P ) , (C ? W ) QED"
- Proof by illegibility
- "scribble, scribble QED"
- Proof by poor analogy
- "Well, it's just like the other day in the bus
..."
72D2PC in Jocaml
- JoCaml is an extension of the Objective Caml
- Support Join calculus programming model
- Functional language
- Support of OO and imperative paradigms
- A running implementation of the controllers in
Jocaml has been developed by Hernán Melgratti - Given a description of a ZS net, it automatically
generates a JoCaml program that simulates the net
behavior
73Thread Coordinator in Jocaml
- let def new_coordinator ()
- let def
- state! h timeout! () failed()
release h deact timeout - or failed!() fail! () failed ()
- or failed!() lock! (ll, l , f) failed ()
f () - or failed!() put!(l,f) failed ()
- or commit!(l, l1, l2,f, h) fail!()
failed() release h deact timeout - or commit0!(l, l1, l2,f, h) fail!()
failed() release h deact timeout - or commit0!(l,l1,l2,f,h)
- match l with
- -gt if (equiv l1 l2) then
release f else commit(l,l1,l2,f,h) - tts -gt t (l1,lock,fail)
commit0(ts,l1,l2,f,h) - or commit!(l,l1,l2,f,h)lock!(l3,ll,f)
- let lnew union l (difference l3 l1)
in - commit0 (lnew, union l1 l3,union l2
ll,f,h) - or state! h put! (l,f) commit0 (del lock
l, l, lock, f, h) - deact timeout
74A ZSNet in Jocaml
- let def aZSNet ()
- let def
- placeA!() let newthread
new_coordinator() in - (state newthread) placeA
- placeB (put newthread,lock
newthread) - act (timeout newthread)
- or placeB!(p,l) placeC!(p,l)
- or placeB!(p,l) let newthread
create_thread() in (state
newthread) - placeC (p, union l lock
newthread) - placeD (put newthread,
union l lock newthread) - act (timeout newthread)
- or ...
- in reply placeA, place H .....
OPEN
CALC
FORK
Initial Marking
75Example (Problem description)
- Apartment Rentals
- There is an offer of apartments that can be
rented immediately - There are requests to rent offered apartments
from persons - Also, there are request to change apartment,
i.e., a person take a new apartment only when
somebody else rents her apartment
76Example (ZS Net)
77Implementation
Initial State Matrix
Preferences Matrix
ZS Net Generator
ZsNet
Reflexive encoding
Jocaml Source Code
Jogc
78References
- Orchestrating transactions in join calculus
(Proc. CONCUR02, LNCS 2421, pp. 321-336) - R. Bruni, C. Laneve, U. Montanari