Title: Flat and nested distributed transactions
1Chapter 13 Distributed Transactions
- Introduction
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency control in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Summary
2Introduction
- Distributed transaction
- A flat or nested transaction that accesses
objects managed by multiple servers - Atomicity of transaction
- All or nothing for all involved servers
- Two phase commit
- Concurrency control
- Serialize locally serialize globally
- Distributed deadlock
3Chapter 13 Distributed Transactions
- Introduction
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency control in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Summary
4Flat and nested distributed transactions
- Flat transaction
- Nested transaction
- Nested banking transaction
- The four subtransactions run in parallel
5The architecture of distributed transactions
- The coordinator
- Accept client request
- Coordinate behaviors on different servers
- Send result to client
- Record a list of references to the participants
- The participant
- One participant per server
- Keep track of all recoverable objects at each
server - Cooperate with the coordinator
- Record a reference to the coordinator
- Example
6Chapter 13 Distributed Transactions
- Introduction
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency control in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Summary
7One-phase atomic commit protocol
- The protocol
- Client request to end a transaction
- The coordinator communicates the commit or abort
request to all of the participants and to keep on
repeating the request until all of them have
acknowledged that they had carried it out - The problem
- some servers commit, some servers abort
- How to deal with the situation that some servers
decide to abort?
8Introduction to two-phase commit protocol
- Allow for any participant to abort
- First phase
- Each participant votes to commit or abort
- The second phase
- All participants reach the same decision
- If any one participant votes to abort, then all
abort - If all participants votes to commit, then all
commit - The challenge
- work correctly when error happens
- Failure model
- Server crash, message may be lost, no arbitrary
fails
9The two-phase commit protocol
- When the client request to abort
- The coordinator informs all participants to abort
- When the client request to commit
- First phase
- The coordinator ask all participants if they
prepare to commit - If a participant prepare to commit, it saves in
the permanent storage all of the objects that it
has altered in the transaction and reply yes.
Otherwise, reply no - Second phase
- The coordinator tell all participants to commit (
or abort)
10The two-phase commit protocol continued
- Operations for two-phase commit protocol
- The two-phase commit protocol
- Record updates that are prepared to commit in the
permanent storage - When the server crash, the information can be
retrieved by a new process - If the coordinator decide to commit, all
participants will commit eventually
11Timeout actions in the two-phase commit protocol
- Communication in two-phase commit protocol
- New processes to mask crash failure
- Crashed process of coordinator and participant
will be replaced by new processes - Time out for the participant
- Timeout of waiting for canCommit abort
- Timeout of waiting for doCommit
- Uncertain status Keep updates in the permanent
storage - getDecision request to the coordinator
- Time out for the coordinator
- Timeout of waiting for vote result abort
- Timeout of waiting for haveCommited do nothing
- The protocol can work correctly without the
confirmation
12Two-phase commit protocol for nested transactions
- Nested transaction semantics
- Subtransaction
- Commit provisionally
- abort
- Parent transaction
- Abort all subtransactions abort
- Commit exclude aborting subtransactions
- Distributed nested transaction
- When a subtransaction completes
- provisionally committed updates are not saved in
the permanent storage
13Distributed nested transactions commit protocol
- A coordinator for a subtransaction will provide
an operation to open a sunbtransaction - Open subtransaction(trans)?subTrans
- Open a subtransaction whose parents is trans and
returns a unique subtransaction identifier. - getStatus(trans)?commited, aborted, provisional
- Asks the coordinator to report on the status of
the transactions trans. Return values
representing one of the followingcommitted,
aborted, provisional
14Distributed nested transactions commit protocol
- Each subtransaction
- If commit provisionally
- Report the status of it and its descendants to
its parent - If abort
- Report abort to its parent
- Top level transaction
- Receive a list of status of all subtransactions
- Start two-phase commit protocol on all
subtransactions that have committed provisionally
15Example of a distributed nested transactions
- The execution process
- The information held by each coordinator
- Top level coordinator
- The participant list the coordinators of all the
subtransactions in the tree that have
provisionally committed but do not have aborted
parent - Two-phase commit protocol
- Conducted on the participant of T, T1 and T12
16Different two-phase commit protocol
- Hierarchic two-phase commit protocol
- Messages are transferred according to the
hierarchic relationship between successful
participants - The interface
17Different two-phase commit protocol
- Flat two-phase commit protocol(The interface)
- Messages are transferred from top-level
coordinator to all successful participants
directly - If the participant has any provisionally
committed transactions that are descendents of
the top-level transaction, trans - Check that they do not have aborted ancestors in
the abortList,then prepare to commit - Those with aborted ancestors are aborted
- Send a Yes vote to coordinator
- If the participant doesnt have a provisionally
committed descendent ,send No to coordinator
18Chapter 13 Distributed Transactions
- Introduction
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency control in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Summary
19Serial equivalence on all servers
- Objective
- Serial equivalence on all involved servers
- If transaction T is before transaction U in their
conflicting access to objects at one of the
server then they must be in that order at all of
the servers whose objects are accessed in a
conflicting manner by both T and U - Approach
- Each server apply concurrency control it its own
objects - All servers coordinate together to reach the
objective
20Lock
- Each participant locks on objects locally
- strict two phase locking scheme
- Atomic commit protocol
- a server can not release any locks until it knows
that the transaction has been committed or
aborted at all - Distributed deadlock
- either deadlock or serial equivalence
21Timestamp ordering concurrency control
- Globally unique transaction timestamp
- Be issued to the client by the first coordinator
accessed by a transaction - The transaction timestamp is passed to the
coordinator at each server - Each server accesses shared objects according to
the timestamp - Resolution of a conflict
- Abort a transaction from all servers
22Optimistic concurrency control
- The validation
- takes place during the first phase of two phase
commit protocol - Commitment deadlock
T
U
Read (B) At Y
Read (A) At X
Write (B)
Write (A)
Read(A) At X
Read(B) At Y
Write (A)
Write (B)
23Optimistic concurrency control
- Parallel validation
- Suitable for distributed transaction
- write-write conflict must be checked as well as
write-read for backward validation - Possibly different validation order on different
server - Measure1global validation check after individual
server is serializable. - measure2 each server validates according to a
globally unique transaction number of each
transaction
24Chapter 13 Distributed Transactions
- Introduction
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency control in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Summary
25Distributed deadlocks
- Distributed deadlocks
- A cycle in the global wait-for graph
- An example
- Simple resolution
- A centralized deadlock detector
- collect latest copy of each servers local
wait-for graph - construct global wait-for graph
- find cycles in the global wait-for graph
- Drawbacks
- poor availability, lack of fault tolerance, poor
scalability - cost of collecting information is high
26Phantom deadlocks
- Phantom deadlocks
- a deadlock that is detected but is not really a
deadlock - may occur when some deadlocked transactions abort
or release locks - An example
- at server Y U request lock V
- at server X U release lock for T
- at global deadlock detector message from server
Y arrives earlier than message from server X,
then phantom deadlock happens
27Edge chasing
- Idea
- Detect dead-lock in a distributed manner
- Each server involved in the dead-lock forwards
the partial knowledge of wait-for edge which is
called probes to other servers to construct the
wait-for graph - Question
- When to send a probe?
28Edge chasing
- Initiation
- When a server finds that a transaction T starts
waiting for another transaction U, where U is
waiting to access an object at another server, it
initiates detection by sending a probe containing
the edge ltT?Ugt to the server of the object at
which transaction U is blocked
29Edge chasing
- Detection
- Receive probes
- Detect whether deadlock has occurred
- Merge the local wait-for knowledge and that of
the probes, find cycle - Decide whether to forward the probes
- If there is a new transaction V is waiting for
another object elsewhere, the probe is forwarded - Resolution
- When a cycle is detected, a transaction in the
cycle is aborted
30Transaction priorities
- The problem of edge-chasing algorithm
- Concurrent initiation may cause more than one
transaction aborting - Example
- The same cycle is detected at two servers
- Approach
- All transactions are totally ordered
- probe initiation, probe forward and transaction
abort are conducted according to the order
31So, each transaction is given a priority
- Abort
- When a deadlock is detected, the transaction with
the lowest priority is selected to abort - Initiation
- Detect is initiate only when a higher-priority
transaction starts to wait for a lower-priority
one - forward
- downhill forward probes from transactions with
higher priorities to transactions with lower
priorities - Example
- Set the priorities T gt U gt V gt W
- Detect initiation starts only when T begin to
wait for U
32A pitfall of the transaction priority scheme
- The pitfall
- Since initiation starts according to the
priority, so some deadlock will not be detected - Example
- Resolution
- Probe queue
- Each coordinator save copies of all the probes
received on behalf of each transaction in a probe
queue - Forward the probe queue
- When a transaction starts waiting for an object,
it forwards the probes in its queue to the server
of the object, which propagates the probes on
downhill routes
33A pitfall of the transaction priority scheme
continued
- Example
- Priorities U gt V gt W
- Deadlock will be detected when W begins to detect
U
34Chapter 13 Distributed Transactions
- Introduction
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency control in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Summary
35What is transaction recovery?
- Durability and failure atomicity
- Recovery
- Restoring the server with the latest committed
versions of its objects from permanent storage - The task of the recovery manager
- Save objects in permanent storage (in a recovery
file) for committed transactions - Restore the servers objects after a crash
- Reorganize the recovery file to improve the
performance of recovery - Reclaim storage space ( in the recovery file)
36Important components of a recovery file
- Intentions list
- Keep track of the objects accessed by
transactions - An intention list per active transaction
- Contains a list of the references and the values
of all the objects that are altered by the
transaction - When the transaction is committed
- Replace the committed version of each object by
the tentative version object - When the transaction is aborted
- Delete the tentative version object
- Entries in recovery file
- Object values
- Transaction status committed, aborted, prepared
- Intention list
37Logging
- Recovery file
- A log containing the history of all the
transactions performed by a server - Recovery manager is called when
- Prepare to commit
- Append all the objects in its intentions list to
the recovery file, followed by the current status
of that transaction ( prepared) together with its
intentions list - Commit abort
- Append corresponding status of the transaction to
its recovery file - Example
- Each transaction status entry contains a pointer
to the previous transaction status entry
38Recovery of objects
- When a server is replaced after a crash
- Set default initial values for all objects, then
hand over to recovery manager - Recovery managers task
- Include all the effects of all the committed
transactions performed in the correct order and
none of the effects of incomplete or aborted
transactions - Two approaches
- Find most recent checkpoint, and then replay all
committed transactions after the checkpoint by
the help of intention lists and the committed
values of objects - Read the recovery file backwards until all
objects have been restored to the most recent
committed values
39Recovery of objects - example
- If the server fails at the point reached at P7
- Restore by the second approach
- P7 is ignored
- P4 is committed, so find P3
- Restore A and B by the intention list of P3
- Restore C by P0
- Reorganize the recovery file
- Add an aborted transaction status to the recovery
file for transaction U
40Reorganize the recovery file
- Checkpoint
- Checkpointing
- The process of writing the current committed
values of a servers object to a new recovery
file, together with transaction status entries
and intentions lists of transactions that have
not yet been fully resolved - Checkpoint
- The information stored by the checkpointing
process - The purpose of make checkpoints
- Reduce the number of transactions to be dealt
with during recovery, To reclaim file space - When to make checkpoint
- Immediately after recovery, or from time to time
- Recovery from the checkpoint
- Discard old recovery file
41Shadow versions
- Map and Version store
- Map locates versions of the objects in a file
called a version store - To restore objects, locate the objects in the
version store by the map - When a transaction is prepared to commit
- Updated objects are appended to the version store
- Shadow version these new as yet tentative
versions - When a transaction commits
- New map is made by copying the old map and
entering the positions of the shadow versions
42Shadow versions continued
- Example
- Shadow version vs. logging
- Faster recovery
- The positions of the current committed objects
are recorded in the map - Slower normal activity
- Switch from the old map to the new map must be
performed in a single atomic step, so as to lead
to an additional stable storage write
43Chapter 13 Distributed Transactions
- Introduction
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency control in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Summary
44Summary
- Flat and nested distributed transaction
- Two-phase commit protocol
- Take an unbounded amount of time to complete but
is guaranteed to complete eventually - Concurrency control
- Lock
- timestamp ordering
- Optimistic concurrency control
45Summary continued
- Distributed deadlock
- Edge-chasing algorithm
- Recovery
- Logging
- Shadow version
46Distributed transactions
47Nested banking transaction
48A distributed banking transaction
49Operations for two-phase commit protocol
canCommit?(trans)-gt Yes / No Call from
coordinator to participant to ask whether it can
commit a transaction. Participant replies with
its vote. doCommit(trans) Call from coordinator
to participant to tell participant to commit its
part of a transaction. doAbort(trans) Call from
coordinator to participant to tell participant to
abort its part of a transaction. haveCommitted(tra
ns, participant) Call from participant to
coordinator to confirm that it has committed the
transaction. getDecision(trans) -gt Yes / No Call
from participant to coordinator to ask for the
decision on a transaction after it has voted Yes
but has still had no reply after some delay. Used
to recover from server crash or delayed messages.
50Operations for two-phase commit protocol
Phase 1 (voting phase) 1. The coordinator
sends a canCommit? request to each of the
participants in the transaction. 2. When a
participant receives a canCommit? request it
replies with its vote (Yes or No) to the
coordinator. Before voting Yes, it prepares to
commit by saving objects in permanent storage. If
the vote is No the participant aborts immediately.
51Operations for two-phase commit protocol
Phase 2 (completion according to outcome of
vote) 3. The coordinator collects the votes
(including its own). (a) If there are no
failures and all the votes are Yes the
coordinator decides to commit the transaction and
sends a doCommit request to each of the
participants. (b) Otherwise the coordinator
decides to abort the transaction and sends
doAbort requests to all participants that voted
Yes. 4. Participants that voted Yes are waiting
for a doCommit or doAbort request from the
coordinator. When a participant receives one of
these messages it acts accordingly and in the
case of commit, makes a haveCommitted call as
confirmation to the coordinator.
52Communication in two-phase commit protocol
Coordinator
Participant
step
status
step
status
canCommit?
prepared to commit
1
Yes
(waiting for votes)
2
prepared to commit
(uncertain)
doCommit
3
committed
haveCommitted
4
committed
done
53Transaction T decides whether to commit
T
abort (at M)
11
T
provisional commit (at X)
1
T
T
provisional commit (at N)
12
provisional commit (at N)
T
21
T
aborted (at Y)
2
T
provisional commit (at P)
22
54Transaction T decides whether to commit
Coordinator of
Child
Participant
Abort list
Provisional
commit list
transaction
transactions
yes
T
T11, T2
T1,T12
T1,T2
yes
T11,
T11,T12
T1
T1,T12
no (aborted)
, T2
T21T22
T2
no (aborted)
T11
T11,
T12 but notT21
T21,T12
T12T21
T22
no (parent aborted)
T22
55canCommit? for hierarchic two-phase commit
protocol
canCommit?(trans, subTrans) -gt Yes / No Call a
coordinator to ask coordinator of child
subtransaction whether it can commit a
subtransaction subTrans. The first argument trans
is the transaction identifier of top-level
transaction. Participant replies with its vote
Yes / No.
56canCommit? for flat two-phase commit protocol
canCommit?(trans, abortList) -gt Yes / No Call
from coordinator to participant to ask whether it
can commit a transaction. Participant replies
with its vote Yes / No. The abortList is used by
the coordinator of the participants to filter
aborted subtransactions if multiple participants
share a same coordinator
57Example of a distributed deadlock
T
U
Locks A
Write(A)
At X
Locks
Write(B)
B
At Y
Wait for U
Read(B)
At Y
Wait for T
Read(A)
At X
58Interleavings of transactions U, V and W
U
V
W
d.deposit(10)
lock D
b.deposit(10)
lock B
at Y
lock A
a.deposit(20)
at X
lock C
c.deposit(30)
at Z
b.withdraw(30)
wait at Y
c.withdraw(20)
wait at Z
a.withdraw(20)
wait at X
59Distributed deadlock
(a)
(b)
W
W
Waits for
Held by
D
C
A
X
Z
V
Held
Held by
by
Waits
for
U
V
U
Waits for
B
Held
by
Y
60Local and global wait-for graphs
local wait-for graph
local wait-for graph
T
V
T
T
U
V
U
Y
X
global deadlock detector
Suppose U release an object at X and request the
one held by V at Y, Suppose detector receives Ys
graph before X Cycle T?U?V?T would be detected
61Edge chasing algorithm - example
W
U?V?W
V
U?V
U
62Two probes initiated
(c) detection initiated at object requested by W
(a) initial situation
(b) detection initiated at object requested by T
T
Waits for
T
Waits for
T
T
U
T
W
V
W
V
T
U
V
U
U
V
V
T
U
W
U
T
U
W
V
Waits
W
W
V
for
Waits
W
W
for
T gt U gt V gt W
63Probes travel downhill
C
A
B
64Type of entry in a recovery file
Description of contents of entry
Type of entry
object
A value of an object.
Transaction status
Transaction identifier, transaction status
(prepared, committed, aborted) and other status
values used for the two-phase commit protocol.
Intentions list
Transaction identifier and a sequence of
intentions, each of which consists of
ltidentifier of objectgt, ltposition in recovery
file of value of objectgt.
65Log for banking service
66Shadow versions
67Validation of transactions
68W
W
V?W
U?V?W
V?W?U
V
V
U?V
U
U
In priority rule, W?U will not be sent, deadlock
will not be detected