Distributed Concurrency Control - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Concurrency Control

Description:

Title: Distributed Concurrency Control Author: Tadeusz Morzy Last modified by: Tadeusz Morzy Created Date: 3/26/2002 5:08:05 PM Document presentation format – PowerPoint PPT presentation

Number of Views:250
Avg rating:3.0/5.0
Slides: 65
Provided by: Tade52
Category:

less

Transcript and Presenter's Notes

Title: Distributed Concurrency Control


1
Distributed Concurrency Control
2
Motivation
  • World-wide telephone system
  • World-wide computer network
  • World-wide database system
  • Collaborative projects the project has a
    database composed of smaller local databases of
    each researcher
  • A travel company organizing vacation it
    consults a local subcontractors (local
    companies), which list prices and quality ratings
    for hotels, restaurants, and fares
  • A library service people looking for articles
    query two or more libraries

3
Types of distributed systems
  • Homogeneous federation servers participating in
    the federation are logically part of a single
    system they all run the same suite of protocols,
    and they may even be under the control of a
    master site
  • Homogeneous federation is characterized by
    distribution transparency
  • Heterogeneous federation - servers participating
    in the federation are autonomous and
    heterogeneous they may run different protocols,
    and there is no master site

4
Types of transactions and schedules
  • Local transactions
  • Global transactions

5
Concurrency Control in Homogeneous Federations
6
Preliminaries
  • Let the federation consists of n sites, and let T
    T1, ..., Tm be a set of global transactions
  • Let s1, ..., sn be local schedules
  • Let D ? Di, where Di is a local database at
    site i
  • We assume no replication (each replica is treated
    as a separate data item)
  • A global schedule for T and s1, ..., sn is a
    schedule s for T such that its local projection
    equals the local schedule at each site, i.e.
    ?i(s) si for all i, 1 ? i ? n

7
Preliminaries
  • ?i(s) denotes the projection of the schedule s
    onto site i
  • We call the projection of a transaction T onto
    site i a subtransaction of T (Ti), which
    comprises all steps of T at the site i
  • Global transactions formally have to have Commit
    operations at all sites at which they are active
  • Conflict serializability a global local
    schedule is globally locally conflict
    serializable if there exists a serial schedule
    over the global local (sub-) transactions that
    is conflict equivalent to s

8
Example 1
  • Consider the federation of two sites, where D1
    (x) and D2 (y). Then, s1 r1(x) w2(x) and s2
    w1(y) r2(y) are local schedules, and
  • s r1(x) w1(y) w2(x) c1 r2(y) c2
  • is a global schedule
  • ?1(s) s1 and ?2(s) s2
  • Another form of the schedule
  • server 1 r1(x) w2(x)
  • server 2 w1(y)
    r2(y)

9
Example 2
  • Consider the federation of two sites, where D1
    (x) and D2 (y). Assume the following schedule
  • server 1 r1(x) w2(x)
  • server 2 r2(y)
    w1(y)
  • The schedule is not conflict serializable since
    the conflict serialization graph will have a
    cycle

10
Global conflict serializability
  • Let s be a global schedule with local schedules
    s1, s2, ..., sn involving a set T of transactions
    such that each si, 1 ? i ? n, is conflict
    serializable. Then, the following holds
  • s is globally conflict serializable iff there
    exists a total order lt on T that is
    consistent with the local serialization orders of
    the transactions (proof)

11
Concurrency Control Algorithms
  • Distributed 2PL locking algorithms
  • Distributed T/O algorithms
  • Distributed optimistic algorithms

12
Distributed 2PL locking algorithms
  • The main problem is how to determine that a
    transaction has reached its lock point?
  • Primary site 2PL lock management is done
    exclusively at a a distinguished site primary
    site
  • Distributed 2PL when a server wants to start
    unlocking phase for a transaction, it
    communicates with all other servers regarding the
    locking point of that transaction
  • Strong 2PL all locks acquired on behalf of a
    transaction are held until the transaction wants
    to commit (2PC)

13
Distributed T/O algorithms
  • Assume that each local site (scheduler) executes
    its private T/O protocol for synchronizing
    accesses in its portion of the database
  • server 1 r1(x) w2(x)
  • server 2 r2(y)
    w1(y)
  • If timestamps were assigned as in the
    centralized case, each of the two servers would
    assign a value 1 to the first transaction that it
    sees locally T1 on the server 1 and T2 on the
    server 2, which would lead to globally incorrect
    result

14
Distributed T/O algorithms
  • We have to find a way to assign globally unique
    timestamps to transactions at all sites
  • Centralized approach a particular server is
    responsible for generating and distributing
    timestamps
  • Distributed approach each server generates a
    unique local timestamp using a clock or counter
  • server 1 r1(x) w2(x)
  • server 2 r2(y)
    w1(y)
  • TS(T1) (1,1)
  • TS(T2) (1,2)

15
Distributed T/O algorithms
  • Lamport clock used to solve more general
    problem of fixing the notion of logical time in
    an asynchronous network
  • Sites communicate through messages
  • Logical time is a pair (c, i), where c is
    nonnegative integer and i is a transaction number
  • The clock variable gets increased by 1 at every
    transaction operation the logical time of the
    operation is defined as the value of the clock
    immediately after the operation

16
Distributed optimistic algorithms
  • Under optimistic approach, every transaction is
    processed in three phases
  • Problem how to ensure that validation comes to
    the same resultat every site where a global
    transaction has been active
  • Not implemented

17
Distributed Deadlock Detection
  • Problem global deadlock, which cannot be
    detected by local means only (each server keeps a
    WFG locally)

Site 3
Site 1
wait for message
T1
T1
T2
T3
wait for lock
T2
T3
Site 2
18
Distributed Deadlock Detection
  • Centralized detection centralized monitor
    collecting local WFGs
  • performance
  • false deadlocks
  • Timeout approach
  • Distributed approaches
  • Edge chasing
  • Path pushing

19
Distributed Deadlock Detection
  • Edge chasing each transaction that becomes
    blocked in a wait relationship sends its
    identifier in a special message called probe to
    the blocking transaction. If a transaction
    receives a probe, it forwards it to all
    transactions by which it is itself blocked. If
    the probe comes back to the transaction by which
    it was initiated this transaction knows that it
    is participating in a cycle and hence it is part
    of a deadlock

20
Distributed Deadlock Detection
  • Path pushing entire paths are circulated
    between transactions instead of single
    transaction identifiers.
  • The basic algorithm is as follows
  • Each server that has a wait-for path from
    transaction Ti to transaction Tj such that Ti has
    an incoming waits-for message edge and Tj has an
    outgoing waits-for message edge sends that path
    to the server along the outgoing edge, provided
    the identifier of Ti is smaller than that of Tj
  • Upon receiving a path, the server concatenates
    this with the local paths that already exists,
    and forwards the result along its outgoing edges
    again. If there exists a cycle among n servers,
    at least one of them will detect that cycle in at
    most n such rounds

21
Distributed Deadlock Detection
  • Consider the deadlock example

Site 1
Site 2
Site 3
T1
T2
T2
T3
T1
T2
T3
Site 3 knows that T3 T1 locally and
detects global deadlock
22
Concurrency Control in Heterogeneous Federations
23
Preliminaries
  • A heterogeneous distributed database system which
    integrates pre-existing external data sources to
    support global applications accessing more than
    one external data source
  • HDDBS vs LDBS
  • Local autonomy and heterogeneity of local data
    sources
  • Design autonomy
  • Communication autonomy
  • Execution autonomy
  • Local autonomy reflects the fact that local data
    sources were designed and implemented
    independently and were totally unaware of the
    integration process

24
Preliminaries
  • Design autonomy it refers to the capability of a
    database system to choose its own data model and
    implementation procedures
  • Communication autonomy it refers to the
    capability of a database system to decide what
    other systems it will communicate with and what
    information it will exchange with them
  • Execution autonomy it refers to the capability
    of a database system to decide how and when to
    execute requests received from other systems

25
Difficulties
  • Actions of a transaction may be executed in
    different EDSs, one of which has system that use
    locks to guarantee the serializability, while
    another one may use timestamps
  • Guaranteeing the properties of transactions may
    restrict local autonomy, e.g. to guarantee the
    atomicity, the participating EDSs must execute
    some type of a commit protocol
  • EDSs may not provide the necessary functionality
    to implement the required global coordination
    protocols. Ref. To commit protocol, it is
    necessary for EDS to become prepared,
    guaranteeing that the local actions of a
    transaction can be completed. Existing EDSs may
    not allow a transaction to enter this state

26
HDDBS model
Global transactions
Global Transaction Manager (GTM)
Local Transaction Manager (LTM)
Local Transaction Manager (LTM)
Local transactions
Local transactions
External Data Source EDS2
External Data Source EDS1
27
Basic notation
  • HDDBS consists of a set D of external data
    sources and a set of transactions T
  • D D1, D2, ..., Dn Di i-th external
    data source
  • ? T ? T1 ? T2 ? ... ? Tn
  • T a set of global transactions
  • Ti a set of local transactions that access Di
    only

28
Example
  • Given a federation of two servers
  • D1 a, b D2 c, d, e Da, b, c, d,
    e
  • Local transactions
  • T1 r(a) w(b) T2 w(d) r(e)
  • Global transactions
  • T3 w(a) r(d) T4 w(b) r(c) w(e)
  • Local schedules
  • s1 r1(a) w3(a) c3 w1(b) c1 w4(b)
    c4
  • s2 r4(c) w2(d) r3(d) c3 r2(e) c2
    w4(e) c4

29
Global schedule
  • Let the heterogeneous federation consists of n
    sites, and let T1, ..., Tn be sets of local
    transactions at sites 1, ..., n, T be a set of
    global transactions. Finally, let s1, s2, ...,
    sn.
  • A (heterogeneous) global schedule (for s1, ...,
    sn) is a schedule s for
    such that its local projection equals the local
    schedule at each site, i.e. ?i(s) si for all i,
    1 ? i ? n

30
Correctness of schedules
  • Given a federation of two servers
  • D1 a D2 b, c
  • Given two global transactions T1 and T2 and a
    local transaction T3
  • T1 r(a) w(b) T2 w(a) r(c) T3 r(b)
    w(c)
  • Assume the following local schedules
  • server 1 r1(a) w2(a)
  • server 2 r3(b) w1(b)
    r2(c) w3(c)
  • Transactions T1 and T2 are executed strictly
    serially at both sites the global schedule is
    not globally serializable

indirect conflict
31
Global serializability
  • In a heterogeneous federation GTM has no direct
    control over local schedules the best it can do
    is to control the serialization order of global
    transactions by carefully controlling the order
    in which operations are sent to local systems for
    execution and in which these get acknowledged.
  • Indirect conflict Ti and Tk are in indirect
    conflict in si if there exists a sequence T1,
    ..., Tr of transactions in si such that Ti is in
    si in a direct conflict with T1 Tj is in si in a
    direct conflict with Tj1, 1?j?r-1, and Tr is in
    si in a direct conflict with Tk
  • Conflict equivalence two schedules contain the
    same operations and the same direct and indirect
    conflicts

32
Global serializability
  • Global Conflict Serialization Graph
  • Let s be a global schedule for the local
    schedules s1, s2, ..., sn let G(si) denote the
    conflict serialization graph of si,
  • 1 ? i ? n, derived from direct and indirect
    conflicts. The global conflict serialization
    graph of s is defined as the union of all G(si),
    1 ? i ? n, i.e.
  • Global serializability theorem
  • Let the local schedules s1, s2, ..., sn be
    given, where each G(si), 1 ? i ? n, is acyclic.
    Let s be a global schedule for the si, 1 ? i ? n.
    The global schedule s is globally conflict
    serializable iff G(s) is acyclic

33
Global serializability - problems
  • To ensure the global serializability the
    serialization order of global transactions must
    be the same in all sites they execute
  • Serialization orders of local schedules must be
    validated by the HDDBS
  • These orders are neither reported by EDSs, nor
  • They can be determined by controlling the
    submission of the global subtransactions or
    observing their execution order

to check
34
Example
  • Globall non-serializable schedule
  • s1 w1(a) r2(a) T1 T2
  • s2 w2(c) r3(c) w3(b) r1(b) T2 T3
    T1
  • Globally serializable schedule
  • s1 w1(a) r2(a) T1 T2
  • s2 w2(c) r1(b)
  • Globall non-serializable schedule
  • s1 w1(a) r2(a) T1 T2
  • s2 w3(b) r1(b) w2(c) r3(c) T2
    T3 T1

35
Quasi serializability
  • Rejecting global serializability as the
    correctness criterion
  • The basic idea we assume that no value
    dependencies exist among EDSs so indirect
    conflicts can be ignored
  • In order to preserve global database consistency,
    only global transactions needs to be executed in
    a serializable way with proper consideration of
    the effects of local transactions

36
Quasi serializability
  • Quasi-serial schedule
  • A set of local schedules s1, ..., sn is quasi
    serial if each si is conflict serializable and
    there exists a total order lt on the set T of
    global transactions such that Ti lt Tj for Ti, Tj
    ? T, i ? j, implies that in each local schedule
    si, 1 ? i ? n, the Ti subtransaction occurs
    completely before Tj subtransaction
  • Quasi serializability
  • A set of local schedules s1, ..., sn is quasi
    serializable if there exists a set s1, ...,
    sn of quasi serial local schedules such that si
    is conflict equivalent to si for 1 ? i ? n.

37
Example (1)
  • Given a federation of two servers
  • D1 a, b D2 c, d, e
  • Given two global transactions T1 and T2 and two
    local transactions T3 and T4
  • T1 w(a) r(d) T2 r(b) r(c) w(e)
  • T3 r(a) w(b) T4 w(d) r(e)
  • Assume the following local schedules
  • s1 w1(a) r3(a) w3(b) r2(b)
  • s2 r2(c) w4(d) r1(d) w2(e) r4(e)

38
Example (2)
  • The set s1, s2 is quasi serializable, since it
    is conflict equivalent to the quasi serial set
    s1, s2, where
  • s2 w4(d) r1(d) r2(c) w2(e) r4(e)
  • The global schedule
  • s w1(a) r3(a) r2(c) w4(d) r1(d) c1 w3(b) c3
    r2(b) w2(e) c2 r4(e) c4
  • is quasi serializable however, s is not
    globally serializable
  • Since the quasi-serialization order is always
    compatible with the orderings of subtransactions
    in the various local schedules, quasi
    serializability is relatively easy to achieve for
    a GTM

39
Achieving Global Serializability through Local
Guarantees - Rigorousness
  • GTM assume that local schedules are conflict
    serializable
  • There are various scenarios for guaranteeing
    global serializability
  • Rigorousness local schedulers produce
    conflict-serializable rigorous schedules. The
    schedule is rigorous if it satisfies the
    following condition
  • oi(x) lts oj(x), i ? j, oi, oj in conflict
  • aj lts oj(x) or cj lts oj(x)
  • Schedules in RG avoid any type of rw, wr, or ww
    conflict between uncommitted transactions

40
Achieving Global Serializability through Local
Guarantees - Rigorousness
  • Given a federation of two servers
  • D1 a, b D2 c, d
  • Given two global transactions T1 and T2 and two
    local transactions T3 and T4
  • T1 w(a) w(d) T2 w(c) w(b)
  • T3 r(a) r(b) T4 r(c) r(d)
  • Assume the following local schedules
  • s1 w1(a) c1 r3(a) r3(b) c3 w2(b) c2
  • s2 w2(c) c2 r4(c) r4(d) c4 w1(d) c1
  • Both schedules are rigorous, but they yield
    different serialization orders

41
Achieving Global Serializability through Local
Guarantees - Rigorousness
  • Commit-deferred transactions A global
    transaction T is commit-deferred if its commit
    operation is sent by GTM to local sites only
    after the local executions of all data operations
    from T have been acknowledged at all sites
  • Theorem If si ? RG, 1 ? i ? n, and all global
    transactions are commit-deferred, then s is
    globally serializable

42
Possible solutions
  • Bottom-up approach observing the execution of
    global transactions at each EDS.
  • Idea the execution order of global transactions
    is determined by their serialization orders at
    each EDS
  • Problem how to determine serialization order of
    gl. trans.
  • Top-down approach controlling the submission and
    execution order of global transactions
  • Idea GTM determines a global serialization
    order for global transactions before submitting
    them to EDSs. It is EDSs responsibility to
    enforce the order at local sites
  • Problem how the order is enforced at local sites

43
Ticket-Based Method
  • How GTM can obtain information about relative
    order of subtransactions of global transactions
    at each EDSs?
  • How GTM can guarantee that subtransactions of
    each global transaction have the same relative
    order in all participating EDSs?
  • Idea to force local direct conflicts between
    global transactions or to convert indirect
    conflicts (not observable by the GTM) into direct
    (observable) conflicts

44
Ticket-Based Method
  • Ticket a ticket is a logical timestamp whose
    value is stored as a special data item in each
    EDS
  • Each subtransaction is required to issue the
    Take_A_Ticket operation
  • r(ticket) w(ticket1) (critical
    section)
  • Only subtransactions of global transactions have
    to take tickets
  • Theorem If global transaction T1 takes its
    ticket before global transaction T2 in a server,
    then T1 will be serialized before T2 by that
    server
  • or tickets obtained by subtransactions determine
    their relative serialization order

45
Example (1)
  • Given a federation of two servers
  • D1 a D2 b, c
  • Given two global transactions T1 and T2 and a
    local transaction T3
  • T1 r(a) w(b) T2 w(a) r(c)
  • T3 r(b) w(c)
  • Assume the following local schedules
  • s1 r1(a) c1 w2(a) c2 T1 T2
  • s2 r3(b) w1(b) c1 r2(c) c2 w3(c)
    c3
  • the schedule is not globally serializable T2
    T3 T1

46
Example (2)
  • Using tickets, the local schedules look as
    follows
  • s1 r1(I1) w1(I11) r1(a) c1 r2(I1)
    w2(I11) w2(a) c2
  • s2 r3(b) r1(I2) w1(I21) w1(b) c1
    r2(I2) w2(I21) r2(c) c2 w3(c) c3
  • Indirect conflict between global transactions in
    the schedule s2 has been turned into an explicit
    one the schedule s2 is not conflict serializable

T3
T2
T1
47
Example (3)
  • Consider another set of schedules
  • s1 r1(I1) w1(I11) r1(a) c1 r2(I1)
    w2(I11) w2(a) c2
  • s2 r3(b) r2(I2) w2(I21) r1(I2)
    w1(I21) w1(b) c1 r2(c) c2 w3(c)
    c3
  • Now, both schedules are conflict serializable
    tickets obtained by transactions determine their
    serialization order

48
Optimistic ticket method
  • Optimistic ticket method (OTM) GTM must ensure
    that the subtransactions have the same relative
    serialization order in their corresponding EDSs
  • Idea is to allow the subtransactions to proceed
    but to commit them only if their ticket values
    have the same relative order in all participating
    EDSs
  • Requirement EDSs must support a visible
    prepare_to_commit state for all subtransactions
  • Prepare_to_commit state is visible if the
    application program can decide whether the
    transaction should commit or abort

49
Optimistic ticket method
  • A global transaction T proceed as follows
  • GTM sets a timeout for T
  • Submits all subtransactions of T to their
    corresponding EDSs
  • If they enter their p_t_c state, they wait for
    the GTM to validate T
  • Commit or abort is broadcasted
  • GTM validates T using Ticket graph the graph is
    tested for cycles involving T
  • Problems with OTM
  • Global aborts caused by ticket operations
  • Probability of global deadlocks increases

50
Cache Coherence and Concurrency Control for
Data-Sharing Systems
51
Architectures for Parallel Distributed Database
Systems
  • Three main architectures
  • Shared memory systems
  • Shared disk systems
  • Shared nothing
  • Shared memory system multiple CPUs are attached
    to an interconnection network, and can access a
    common region of main memory
  • Shared disk system each CPU has a private memory
    and direct access to all disks through an
    interconnection network
  • Shared nothing system each CPU has local memory
    and disk space, but no two CPUs can access the
    same storage area, all communication is through a
    network connection

52
Shared memory system
P
P
P
P
Interconnection Network
Global Shared Memory
D
D
D
53
Shared disk system
M
M
M
M
P
P
P
P
Interconnection Network
D
D
D
54
Shared nothing system
Interconnection Network
P
P
P
P
M
M
M
M
D
D
D
D
55
Characteristic of architectures
  • Shared memory
  • is closer to conventional machine, many
    commercial DBMS have been ported to this platform
  • Communication overhead is low
  • Memory contention becomes a bottleneck as the
    number of CPUs increases
  • Shared disk similar characteristic
  • Interference problem as more CPUs are added,
    existing CPUs are slowed down because of the
    increased contention for memory access and
    network bandwith
  • A system with 1000 CPU is only 4 as effective as
    a single CPU system

56
Shared nothing
  • It provides almost linear speed-up in that the
    time taken for operations decreases in proportion
    to the increase in the number of CPUs and disks
  • It provides almost linear scale-up in that
    performance is sustained if the number of CPUs
    and disks are increased in proportion to the
    amount of data
  • Powerful parallel database systems can be built
    by taking advantage of rapidly improving
    performance for single CPU

57
Shared nothing
transactions/second
transactions/second
of CPUs
of CPUs and DB size
SCALE-UP with DB SIZE
SPEED-UP
58
Concurrency and cache coherency problem
  • Data pages can be dynamically replicated in more
    than one server cache to exploit access locality
  • Synchronization of reads and writes requires some
    form of distributed lock management and
    invalidation of stale copies of data items or
    propagation of updated data items must be
    communicated among the servers
  • Basic assumption for data sharing systems each
    individual transaction is executed solely on one
    server (i.e. transaction does not migrate among
    servers during its execution)

59
Callback Locking
  • We assume that both concurrency control and cache
    coherency control are page oriented
  • Each server has a global lock manager and a local
    lock manager
  • Data items are assigned to global managers in a
    static manner (e.g. via hashing), so each global
    lock manager is responsible for a fixed subset of
    the data items we say that global lock manager
    has the global lock authority for a data item
  • The global lock manager knows for a data item at
    each point in time whether the item is locked or
    not

60
Callback Locking - concurrency control
  • When a transaction requests a lock or wants to
    release a lock, it first addresses its local lock
    manager, which can then contact the global lock
    manager
  • The simplest way is to forward all lock and
    unlock requests to the global lock manager that
    has the global lock authority for the given data
    item
  • If a lock lock manager is authorized to manage
    read lock (or write lock) locally, then it can
    save message exchanges with the global lock
    manager

61
Callback Locking concurrency control
  • Local read authority enables local lock manager
    to grant local read locks for a data item
  • Local write authority enables local lock manager
    to grant local read/write locks for a data item
  • A write authority has to be returned to the
    corresponding global lock manager if another
    server wants to access the data item
  • A read authority can be held by several servers
    simultaneously and has to be returned to the
    corresponding global lock manager if another
    server wants to access the data item to perform a
    write access

62
Callback Locking concurrency control
  • Cache coherency protocol needs to ensure
  • Multiple caches can hold up-to-date versions of a
    page simultaneously as long as the page is only
    read, and
  • Once a page has been modified in one of the
    caches, this cache is the one that is allowed to
    hold a copy of the page
  • Callback message revokes the local lock authority

63
Callback Locking
Server B
Server C
Home(x)
Server A
r1(x)
Rlock(x)
Rlock authority(x)
r2(x)
Rlock(x)
Rlock authority(x)
c1 r3(x) c3
w4(x)
64
Callback Locking
Server B
Server C
Home(x)
Server A
c1 r3(x) c3
w4(x)
Wlock(x)
Callback(x)
Callback(x)
OK
c2
OK
Wlock authority(x)
Write a Comment
User Comments (0)
About PowerShow.com