Distributed Transaction Management - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Transaction Management

Description:

Entrance to stadium (writing tickets for collection at a collection point or ... a request MR to S' and receiving a time message MT from S' containing time t. ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 76
Provided by: jyrkinu
Category:

less

Transcript and Presenter's Notes

Title: Distributed Transaction Management


1
Distributed Transaction Management
  • Jyrki Nummenmaa
  • jyrki.nummenmaa_at_cs.uta.fi

2
Motivation
  • We will pick up some motivating examples from the
    world of electronic commerce.
  • As we will see, electronic commerce is an
    application area, where transactionality is
    needed, and the application programmers in charge
    of this need good knowledge on distributed
    transactions.
  • The following slides will explain discuss those
    examples and some of their implications.

3
Electronic commerce -business-to-customer
services
  • Searching for product information
  • Ordering products
  • Paying for goods and services
  • Providing online customer service
  • Delivering services
  • Various other business-to-business services
    exist, but these are enough for our motivational
    purposes...

4
Internet Commerce
  • A person, running a web browser on a desktop
    computer, electronically purchases a set of goods
    or services from several vendors at different web
    sites.
  • This person wants either the complete set of
    purchases to go through, or none of them.

5
Internet Commerce ExampleExhibition Hall
Rental Companies Web Sites
6
Technical Problems with Internet Commerce
  • Security
  • Failure
  • Multiple sites
  • Protocol problems
  • Server product limitations
  • Response time
  • Heterogeneous systems

7
Failures single computer
  • Hardware failure
  • Software crash
  • User switched off the PC
  • Active attack

8
Failure Additional Problems for Multiple Sites
  • Network failure
  • Or is it just congestion?
  • Or has the remote computer crashed?
  • Or is it just running slowly?
  • Message loss?
  • Denial-of-service attack?
  • Typically, these failures are partial.

9
Distributed Transaction
  • A set of participating processes with local
    sub-transactions, distributed to a set of sites,
    perform a set of actions.
  • Server Autonomy - any server can unilaterally
    decide to abort the transaction.
  • All or none of the updates or related operations
    should be performed.

10
Subtle Difference Transaction
  • Traditional data processing (database)
    transaction
  • set of read and update operations collectively
    transform the database from one consistent state
    to another.
  • Electronic Commerce transaction
  • set of (any) operations collectively provide the
    user with his/her required package

11
Distributed business object transaction example
  • Arriving to a football stadium with a car, the
    customer uses a mobile terminal to buy the ticket
    and get a parking place.
  • Business objects to
  • Charge the money from a bank account
  • Give access to parking
  • Entrance to stadium (writing tickets for
    collection at a collection point or just giving a
    digital reservation document).

12
Distributed business object transaction example
(contd)
(Arriving to a football stadium)
  • Why is transactionality needed?
  • All-or-nothing situation? Maybe...
  • Compensational transactions are difficult - e.g.
    once access is given to car park, that is
    difficult to roll back.

13
Transaction properties -Atomicity
  • Atomicity
  • Ensures that if several different operations
    occur within a single transaction, it can never
    be the case that some operations complete if
    others cannot complete.
  • Classic example transfer money from one bank
    account to another consisting of a withdraw and a
    deposit operations we want both or neither.

14
Transaction properties -Isolation
  • Isolation
  • Ensures that concurrently-executing transactions
    do not interfere with each other, in the sense
    that each transaction sees a consistent state of
    the data often a database.
  • Prevents from using dirty data.
  • Classic example (based on the previous one) The
    overall sum of money on the two bank accounts
    should not be summed while the txn is running).

15
Transaction properties -Durability
  • Durability
  • Ensures that unless an update transaction is
    rolled back, then its changes will affect the
    state of the data as seen by subsequently-executin
    g transactions.
  • Data is recorded persistently.

16
Typical system architecture
  • Front-tier clients
  • e.g. web browsers.
  • Back-tier servers
  • such as database systems, message queue managers,
    device drivers, ...
  • Middle-tier business objects
  • each typically serving one client using (and
    locking) a number of shared resources from a
    number of back-tier servers.

17
Traditional distributed DBMS system architecture
  • Computers are hard-wired to each other.
  • In practice a synchronous system, where a message
    timeout means that a computer has crashed.
  • A transparent centralised database management
    system, which the user can see as a single
    database.
  • An application program can use the database as a
    single database, thus benefitting from
    transparency.

18
Main transactional services
  • Distributed locking is needed, if replicated data
    is needed for exclusive (write) access.
  • Distributed commit is needed to control the fate
    of the transaction in a controlled manner.
  • Barrier synchronisation can be used to guarantee
    a consistent view of the world.

19
Implementing transactional services
  • As we noticed, a traditional distributed database
    system gives a transparent view to the system. It
    also takes care of concurrency.
  • In a modern distributed system, the application
    programmer needs to implement a large part of
    transactional services.
  • These services are complicated, and their
    implementation is far from being easy.

20
Transaction Model
  • - We will quite often write txn instead of
    transaction.

21
Txn model - sites
  • We assume that there is a set of sites S1,,Sn.
  • All of these sites have a resource manager
    controlling the usage of the local resources.
  • We may know all of these sites before the txn
    starts (like a site for each bookstore
    sub-branch) or then we may not (like when
    previously unknown sites from the Internet may
    join in).

22
Txn model - subtxns
  • The txn needs to access resources on some of
    these sites (without loss of generality, all of
    them).
  • For this, there is a local transaction on each
    site (transaction Ti on site Si).
  • The local transaction executes the operations
    required on the local site.
  • To use the local resources, the local transaction
    Ti on site Si talks with the local resource
    manager Mi.

23
Distributed Transactions
  • In a distributed transaction there is a set of
    subtransactions T1,...,Tk, which are executed on
    sites S1,...,Sk.
  • Each subtransaction manages local resources. The
    particular problems of managing distributed
    transactions vs. centralised (local) transactions
    come from two sources
  • Data may be replicated to several sites. Lock
    management of the replicated data is a particular
    problem.
  • Regardless of whether the data is replicated or
    not, there is a need to control the fate of the
    distributed transaction using a distributed
    commit protocol.

24
Failure model - sites
  • Sites may fail by crashing, that is, they fail
    completely.
  • Sometimes it is assumed that crashed sites may
    recover. In this case usually the resource
    managers and the participants have recorded their
    actions in persistent memory.
  • Sometimes it is assumed that the crashed sites do
    not recover.
  • Usual assumption if a site functions, it
    functions correctly (instead of e.g. sending
    erroneus messages).

25
Failure model - messages
  • Messages may be delayed.
  • Message transfer delays are unpredictable
    (asynchronous message-passing)
  • Messages are transferred eventually.
  • Messages between sites are not spontaneously
    generated.
  • Messages do not change in transmission.

26
Failure model - messages
  • All messages arriving at a site Si from a site Sj
    are processed in the order they were sent.
  • It may be that the network is partitioned, that
    is, some sites can not exchange messages. This
    may continue for an unpredictable time.
  • This assumption is by default avoided, since
    it is a really hard one.
  • We will state it explicitly if we want it to
    hold.
  • However, in real world this happens.

27
Asynchronous communication
  • In a synchronous system, we assume that the
    relative speeds of processes and communication
    delays are bounded.
  • In an asynchronous system we do not make such an
    assumption. This means that not receiving an
    expected message does not mean a failure.
  • Generally, we assume here that we are dealing
    with an asynchronous system.

28
Failure detection
  • Failure is hard to detect.
  • Typically, failure is assumed, if an expected
    message does not arrive within the usual time
    period.
  • Timeouts are used.
  • Delay may be caused by network congestion.
  • Or is the remote computer running slowly?
  • Mobile hosts make failure detection even harder,
    because it is expected behaviour if they stay
    unconnected for an unexpected time.

29
Distributed Locking

30
Mutual Exclusion (Locking)
  • The problem of managing access to a single,
    indivisible resource (e.g. a data item) that can
    only support one user (or transaction, or
    process, or thread, or whatever) at a time.

31
Desired properties for solutions
  • Safety Mutual exclusion is never violated. (Only
    one transaction gets the lock).
  • This property can not be compromised.
  • Liveness Each request will be granted
    (eventually).
  • This property should not be compromised.
  • Fairness Access to the resource should happen in
    the order of requests.
  • This property is to be discussed later.

32
Coordinator-based solutions
  • There is a coordinator to control access.
  • Coordinator is a process on one of the sites. (It
    is none of the transactions.)
  • Let A be a data item.
  • When a transaction needs access, that transaction
    sends a request to coordinator. The request is
    X(A) exclusively lock A.
  • The coordinator queues requests.

33
Coordinator-based solutions
  • When the resource is available, the coordinator
    sends a grant message to the transaction T first
    in the queue. We write G(X(A)) Grant X(A)
  • When T sees the grant message, it may use the
    resource.
  • When T does need the resource anymore, it sends a
    release message to the coordinator. R(A)
    release A.

34
An example
T
T
T
C
Lock request list
X(A)
- - T T,T T - -
X(A)
G(X(A))
R(A)
X(A)
G(X(A))
R(A)
G(X(A))
R(A)
35
Coordinator-based solutions / properties
  • These coordinator-based solutions obviously have
    the safety and the liveness properties, if the
    coordinator is correctly implemented.
  • We can argue that they are also fair, since
    requests are queued. The ordering (and fairness)
    only takes place at the coordinators site
    (request arrival, not request departure). More on
    that later.
  • Since lock management is centralised, different
    lock types need no special attention.

36
Coordinator-based solutions / weaknesses
  • The system does not tolerate a crashing
    coordinator.
  • The coordinator may become a bottleneck for
    performance.
  • Suppose data is replicated, there is a local
    copy, and the coordinator is not on the local
    site. Then we always need to communicate over the
    network, which reduces the benefits of having a
    local copy.

37
Primary copy for replicated data
  • If data is not replicated, then to use a data
    item, you must contact the site containing the
    item.
  • If the resource manager at that site acts as the
    coordinator giving locks for its items,
    communication is simple.
  • If the data is replicated, then we can have a
    primary copy, which is accessed for locking.
    The resource manager at the site of the primary
    copy is the coordinator.

38
Token-based algorithms for resource management
  • In the token-based algorithms, there is a token
    to represent the permission.
  • Whoever has the token, has the permission, and
    can pass it on.
  • These algorithms are more suitable to share a
    resource like a printer, a car park gate, etc
    than for a huge database. Lets see why

39
Perpetuum mobile
  • The token travels around (say, a ring).
  • When a process receives the token, it may use the
    resource, if it so wishes.
  • Then the process passes the token on.

TOKEN
40
Token-asking algorithms
  • The token does not travel around if it is not
    needed.
  • When a process needs the token, it asks for it.
  • Requests are queued.

41
Analysis of token-based algorithms
  • Safety ok.
  • Liveness ok.
  • Fairness in a way ok.
  • Drawbacks
  • Token-based algorithms are vulnerable to
    single-site failures
  • Token management may be complicated and/or
    consume lots of resources, if there are lots of
    resources to be managed.

42
Voting-based algorithms
  • We assume here that we know a set or resource
    managers (say, M1,,Mn), which hold a replicated
    data item.
  • When transaction T needs access to the shared
    resource, it will send a message to M1,,Mn
    asking for the permission.
  • Each M1,,Mn will answer G(X(A)) meaning a Yes
    vote or N(X(A)) meaning a No vote.
  • T waits until the replies are in.
  • If there are enough Yes votes, T will get the
    lock.

43
A voting example
M1
T1
T2
M2
X(A)
X(A)
X(A)
G(X(A))
G(X(A))
X(A)
N(X(A))
N(X(A))
44
How many votes you need?
  • Suppose we have n resource managers, and we want
    k Yes votes for an exclusive lock (write-lock)
    and m No votes for a shared lock (read-lock).
  • To avoid two simultaneous exclusive locks, we
    must have k gt n/2.
  • To avoid simultaneously having an exclusive and a
    shared lock, must have k m gt n.
  • If read-operations dominate, then we may choose
    m1 and kn.
  • Notice that we may choose to consult more
    resource managers than the above minumum number.

45
Which resource managers to consult?
  • In principle, it could be enough to ask only a
    subset (like a majority) of processes for a
    permission.
  • This subset could be statistically defined, given
    a data item.
  • However, as it might be advantageous to contact
    near-by resource managers, the set may well
    depend on who is asking.

46
Example
  • Suppose we operate an airline with offices (and
    resource managers) in Tampere, Santiago de Chile
    and London.
  • It seems reasonable to replicate timetables and
    use m1, kn, since that information does not
    change that often.
  • For ticket booking, primary copy may be more
    appropriate. By statistical analysis we may get
    to know, where people (geographically) book which
    flights, to choose the placement of each primary
    copy.

47
Who needs to give permission?
  • If we need a permission from all resource
    managers, then we do not tolerate site failures
    (all the downsides of having a coordinator plus
    all the extra effort of contacting all the
    resource managers).
  • Generally, a majority (of all resource managers)
    is enough.
  • There are also ways other than simple majority or
    unanimous vote, but one has to be careful to
    preserve the mutual exclusion.

48
A problematic voting
T2
T1
M1
M2
X(A)
X(A)
X(A)
G(X(A))
R(A)
G(X(A))
N(X(A))
X(A)
N(X(A))
Now what?
49
Analysis for voting
  • Safety
  • Apparently ok, if the numbers are chosen
    appropriately.
  • Liveness
  • This far there is nothing to stop the previous
    slide situation repeating over and over.
  • Liveness is not guaranteed unless we make some
    improvements.
  • Fairness
  • Nothing appears to guarantee fairness at this
    point.
  • -gt Further improvements are necessary.

50
How to re-start after not getting a lock?
  • Apparently, something needs to be done to avoid
    repeating the situation where no-one gets the
    lock.
  • If we re-start requesting locks, we can tell
    younger transactions to wait longer before
    re-starting.
  • However, new transactions may always step in to
    stop the oldest transaction from getting the lock
    -gt this is not the solution.

51
Queueing the requests?
  • Instead of just answering the lock requests, the
    resource managers can also maintain a lock
    request list.
  • Put the oldest transaction T first in the list
    and answer no-one Yes before T has either got and
    released the lock or canceled the lock request.
  • Now, eventually T should get the lock and we are
    able to get liveness (and some sort of fairness
    as well, although maybe not exactly what we want).

52
Using timestamps basic idea
  • Give each transaction a timestamp
  • Execute the transactions reads and writes.
  • If there is a conflict (impossible event compared
    to serial execution based on timestamps), roll
    back the younger transaction, which is then free
    to restart.

53
Using timestamps examples
  • T1 starts
  • T2 starts
  • T2 writes X
  • T1 is to read X conflict, as T2 should have not
    have written this value!
  • Roll back T2, if it still exists. Otherwise roll
    back T1.
  • Multiversioning solves this.
  • T1 starts
  • T2 starts
  • T2 reads X
  • T1 is to write X conflict, as T2 should have
    read this new value!
  • Roll back T2, if it still exists. Otherwise roll
    back T1.
  • Multiversioning does not solve this!

54
Distributed timestamps?
  • Can be used similarly as centralised timestamps
    with the exception that we must be able to order
    timestamps globally.
  • Old trick clock time site id if local clock
    times are equal, use site id solve ties.

55
Ordering things
  • Fairness in both the coordinator-based and
    voting-based protocol as well as timestamping
    seems to depend on ordering the transactions by
    their age.
  • However, we would need synchronised clocks to do
    this. Perfect synchronisation or clocks is not
    possible. Good synchronisation can sometimes be
    assumed.
  • Next time we will study logical ordering events
    and possibly deadlock management.

56
Physical clock synchronisation
57
Coordinated universal time
  • Atomic clocks based on atomic oscillations are
    the most accurate physical clocks.
  • So-called Coordinated Universal Time based on
    atomic time is signaled from radio stations and
    satellites.
  • You can buy a receiver (maybe not more than 100,
    I had a look at the web) and get accuracy in the
    order of 0.1-10 milliseconds.

58
Reasons for and problems in clock synchronisation
  • Different clocks work at different speeds.
    Therefore, they need to be synchronised at times
    (continuously).
  • Message delay can not be known, but must be
    approximated -gt perfect synchronisation can not
    be achieved.
  • Clock skew difference in simultaneous readings.
  • Clock drift divergence of clocks because of
    different clock speeds.

59
External and Internal Synchronisation
  • External synchronisation of clock C is
    synchronisation with some external source E. If
    C-Eltd, then C is accurate (with respect to E)
    within the bound d.
  • Internal synchronisation is synchronisation of
    clocks C and C between themselves. If C-Cltd,
    then C and C agree within the bound d. C and C
    may drift from an external source, but not from
    each other.

60
Cristians synchronisation method
  • A clock at site S is synchronised with a clock at
    site S by sending a request MR to S and
    receiving a time message MT from S containing
    time t.
  • Round-trip time tR is the time between sending MR
    and receiving MT. This is a small time and can be
    measured fairly accurately.
  • A simple estimate S will set its clock to t
    tR / 2.

61
Accuracy of Cristians synchronisation
  • Assume min is shortest time for a message to
    travel from S to S (this must be approximated).
  • When MT arrives to S, the clock of S will read
    in the range tmin, ttR-min. This range has
    width tR- 2min.
  • We set the clock of S to t tR/2.
  • -gt Accuracy is plus/minus (tR/2 min)

62
Problems and improvements
  • Problem A single source for time.
  • Improvement Poll several servers and e.g. use
    the fastest reply.
  • Problem Faulty time servers.
  • Improvement Poll several servers and use
    statistics.

63
Further improvements
  • Berkely time protocol internal synchronisation
    with a server polling a number of slaves and
    using an average of estimates and sends the
    necessary correction to the slaves.
  • The Network Time Protocol A hierarchy of
    servers. Top level UTC, second level
    synchronises with top level and so on. More
    details at http//www.ntp.org.

64
Applications of clocks
  • Clocks are needed in timestamp concurrency
    control to generate the timestamps!
  • If we are satisfied with clock accuracy (and
    accept the clock skew) then we can use the
    physical clock time stamps.
  • If not, then logical ordering of events needs to
    be used.

65
Logical clocks
66
Logical order
  • Using physical clocks to order events is
    problematic, because we can not completely
    synchronise the clocks.
  • An alternative solution use a logical
    (causality) order.

67
What input to use to compute a logical order?
  • If e1 happens before e2 on site S, then we write
    e1 ltS e2.
  • If e1 is the sending of message m on some site
    and e2 is the receiving of message m on some
    site, then we write e1 ltm e2.

68
The happens-before relation
  • The happens-before relation is denoted by ltH.
  • If e1 ltS e2, then e1 ltH e2.
  • If e1 ltm e2, then e1 ltH e2.
  • If e1 ltH e2 and e2 ltH e3, then e1 ltH e3.
  • If happens-before relation does not order two
    events, we call them concurrent.

69
Happens-before example
  • e1 ltS1 e2
  • e2 ltS1 e3
  • e3 ltS1 e4
  • e5 ltS2 e6
  • e6 ltS2 e7
  • e7 ltS2 e8
  • e1 ltm1 e5
  • e3 ltm2 e8
  • e7 ltm3 e4
  • Plus the transitive closure

70
The happens-before graph
  • The vertices of the happens-before graph are the
    events in the system.
  • The edges are obtainted as followsIf e1 ltS e2
    or e1 ltm e2, then there is an edge in the
    happens-before graph from e1 to e2.
  • The closure of the happens-before graph
    represents the happens-before relation.

71
Happens-before graph example
e1
e5
e2
e6
e7
e3
e4
e8
The transitive closure represents full
information on the logical order
72
Lamport timestamps
  • Initially, assing 0 to myTS.
  • If event e is the receipt of a message m,
    then Assign max(m.TS,myTS) to myTS. Add 1 to
    myTS. Assign myTS to e.TS.
  • If event e is the sending of a message m,
    then Add 1 to myTS. Assign myTS to both e.TS
    and my.TS.

73
Find the logical order of events.
T
T
T
T
m1
m2
m3
m4
m5
m6
m7
m8
m9
74
Use Lamport timestamps
T
T
T
T
m1
1
2
m2
m3
1
3
4
m4
m5
1
5
4
5
m6
6
8
7
m7
9
m8
10
12
11
m9
13
14
75
Lamport timestamps - properties
  • Lamport timestamps guarantee that if eltH e', then
    e.TS lt e'.TS - This follows from the definition
    of happens-before relation by observing the path
    of events from e to e.
  • Lamport timestamps do not guarantee that if e.TS
    lt e'.TS, then e ltH e' (why?).
Write a Comment
User Comments (0)
About PowerShow.com