Distributed Concurrency Control

About This Presentation

Title:

Distributed Concurrency Control

Description:

Title: Distributed Concurrency Control Author: Tadeusz Morzy Last modified by: Tadeusz Morzy Created Date: 3/26/2002 5:08:05 PM Document presentation format – PowerPoint PPT presentation

Number of Views:235

Avg rating:3.0/5.0

Slides: 65

Provided by: Tade52

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Concurrency Control

1
Distributed Concurrency Control
2
Motivation

World-wide telephone system
World-wide computer network
World-wide database system
Collaborative projects the project has a
database composed of smaller local databases of
each researcher
A travel company organizing vacation it
consults a local subcontractors (local
companies), which list prices and quality ratings
for hotels, restaurants, and fares
A library service people looking for articles
query two or more libraries

3
Types of distributed systems

Homogeneous federation servers participating in
the federation are logically part of a single
system they all run the same suite of protocols,
and they may even be under the control of a
master site
Homogeneous federation is characterized by
distribution transparency
Heterogeneous federation - servers participating
in the federation are autonomous and
heterogeneous they may run different protocols,
and there is no master site

4
Types of transactions and schedules

Local transactions
Global transactions

5
Concurrency Control in Homogeneous Federations
6
Preliminaries

Let the federation consists of n sites, and let T
T1, ..., Tm be a set of global transactions
Let s1, ..., sn be local schedules
Let D ? Di, where Di is a local database at
site i
We assume no replication (each replica is treated
as a separate data item)
A global schedule for T and s1, ..., sn is a
schedule s for T such that its local projection
equals the local schedule at each site, i.e.
?i(s) si for all i, 1 ? i ? n

7
Preliminaries

?i(s) denotes the projection of the schedule s
onto site i
We call the projection of a transaction T onto
site i a subtransaction of T (Ti), which
comprises all steps of T at the site i
Global transactions formally have to have Commit
operations at all sites at which they are active
Conflict serializability a global local
schedule is globally locally conflict
serializable if there exists a serial schedule
over the global local (sub-) transactions that
is conflict equivalent to s

8
Example 1

Consider the federation of two sites, where D1
(x) and D2 (y). Then, s1 r1(x) w2(x) and s2
w1(y) r2(y) are local schedules, and
s r1(x) w1(y) w2(x) c1 r2(y) c2
is a global schedule
?1(s) s1 and ?2(s) s2
Another form of the schedule
server 1 r1(x) w2(x)
server 2 w1(y)
r2(y)

9
Example 2

Consider the federation of two sites, where D1
(x) and D2 (y). Assume the following schedule
server 1 r1(x) w2(x)
server 2 r2(y)
w1(y)
The schedule is not conflict serializable since
the conflict serialization graph will have a
cycle

10
Global conflict serializability

Let s be a global schedule with local schedules
s1, s2, ..., sn involving a set T of transactions
such that each si, 1 ? i ? n, is conflict
serializable. Then, the following holds
s is globally conflict serializable iff there
exists a total order lt on T that is
consistent with the local serialization orders of
the transactions (proof)

11
Concurrency Control Algorithms

Distributed 2PL locking algorithms
Distributed T/O algorithms
Distributed optimistic algorithms

12
Distributed 2PL locking algorithms

The main problem is how to determine that a
transaction has reached its lock point?
Primary site 2PL lock management is done
exclusively at a a distinguished site primary
site
Distributed 2PL when a server wants to start
unlocking phase for a transaction, it
communicates with all other servers regarding the
locking point of that transaction
Strong 2PL all locks acquired on behalf of a
transaction are held until the transaction wants
to commit (2PC)

13
Distributed T/O algorithms

Assume that each local site (scheduler) executes
its private T/O protocol for synchronizing
accesses in its portion of the database
server 1 r1(x) w2(x)
server 2 r2(y)
w1(y)
If timestamps were assigned as in the
centralized case, each of the two servers would
assign a value 1 to the first transaction that it
sees locally T1 on the server 1 and T2 on the
server 2, which would lead to globally incorrect
result

14
Distributed T/O algorithms

We have to find a way to assign globally unique
timestamps to transactions at all sites
Centralized approach a particular server is
responsible for generating and distributing
timestamps
Distributed approach each server generates a
unique local timestamp using a clock or counter
server 1 r1(x) w2(x)
server 2 r2(y)
w1(y)
TS(T1) (1,1)
TS(T2) (1,2)

15
Distributed T/O algorithms

Lamport clock used to solve more general
problem of fixing the notion of logical time in
an asynchronous network
Sites communicate through messages
Logical time is a pair (c, i), where c is
nonnegative integer and i is a transaction number
The clock variable gets increased by 1 at every
transaction operation the logical time of the
operation is defined as the value of the clock
immediately after the operation

16
Distributed optimistic algorithms

Under optimistic approach, every transaction is
processed in three phases
Problem how to ensure that validation comes to
the same resultat every site where a global
transaction has been active
Not implemented

17
Distributed Deadlock Detection

Problem global deadlock, which cannot be
detected by local means only (each server keeps a
WFG locally)

Site 3
Site 1
wait for message
T1
T1
T2
T3
wait for lock
T2
T3
Site 2
18
Distributed Deadlock Detection

Centralized detection centralized monitor
collecting local WFGs
performance
false deadlocks
Timeout approach
Distributed approaches
Edge chasing
Path pushing

19
Distributed Deadlock Detection

Edge chasing each transaction that becomes
blocked in a wait relationship sends its
identifier in a special message called probe to
the blocking transaction. If a transaction
receives a probe, it forwards it to all
transactions by which it is itself blocked. If
the probe comes back to the transaction by which
it was initiated this transaction knows that it
is participating in a cycle and hence it is part
of a deadlock

20
Distributed Deadlock Detection

Path pushing entire paths are circulated
between transactions instead of single
transaction identifiers.
The basic algorithm is as follows
Each server that has a wait-for path from
transaction Ti to transaction Tj such that Ti has
an incoming waits-for message edge and Tj has an
outgoing waits-for message edge sends that path
to the server along the outgoing edge, provided
the identifier of Ti is smaller than that of Tj
Upon receiving a path, the server concatenates
this with the local paths that already exists,
and forwards the result along its outgoing edges
again. If there exists a cycle among n servers,
at least one of them will detect that cycle in at
most n such rounds

21
Distributed Deadlock Detection

Consider the deadlock example

Site 1
Site 2
Site 3
T1
T2
T2
T3
T1
T2
T3
Site 3 knows that T3 T1 locally and
detects global deadlock
22
Concurrency Control in Heterogeneous Federations
23
Preliminaries

A heterogeneous distributed database system which
integrates pre-existing external data sources to
support global applications accessing more than
one external data source
HDDBS vs LDBS
Local autonomy and heterogeneity of local data
sources
Design autonomy
Communication autonomy
Execution autonomy
Local autonomy reflects the fact that local data
sources were designed and implemented
independently and were totally unaware of the
integration process

24
Preliminaries

Design autonomy it refers to the capability of a
database system to choose its own data model and
implementation procedures
Communication autonomy it refers to the
capability of a database system to decide what
other systems it will communicate with and what
information it will exchange with them
Execution autonomy it refers to the capability
of a database system to decide how and when to
execute requests received from other systems

25
Difficulties

Actions of a transaction may be executed in
different EDSs, one of which has system that use
locks to guarantee the serializability, while
another one may use timestamps
Guaranteeing the properties of transactions may
restrict local autonomy, e.g. to guarantee the
atomicity, the participating EDSs must execute
some type of a commit protocol
EDSs may not provide the necessary functionality
to implement the required global coordination
protocols. Ref. To commit protocol, it is
necessary for EDS to become prepared,
guaranteeing that the local actions of a
transaction can be completed. Existing EDSs may
not allow a transaction to enter this state

26
HDDBS model
Global transactions
Global Transaction Manager (GTM)
Local Transaction Manager (LTM)
Local Transaction Manager (LTM)
Local transactions
Local transactions
External Data Source EDS2
External Data Source EDS1
27
Basic notation

HDDBS consists of a set D of external data
sources and a set of transactions T
D D1, D2, ..., Dn Di i-th external
data source
? T ? T1 ? T2 ? ... ? Tn
T a set of global transactions
Ti a set of local transactions that access Di
only

28
Example

Given a federation of two servers
D1 a, b D2 c, d, e Da, b, c, d,
e
Local transactions
T1 r(a) w(b) T2 w(d) r(e)
Global transactions
T3 w(a) r(d) T4 w(b) r(c) w(e)
Local schedules
s1 r1(a) w3(a) c3 w1(b) c1 w4(b)
c4
s2 r4(c) w2(d) r3(d) c3 r2(e) c2
w4(e) c4

29
Global schedule

Let the heterogeneous federation consists of n
sites, and let T1, ..., Tn be sets of local
transactions at sites 1, ..., n, T be a set of
global transactions. Finally, let s1, s2, ...,
sn.
A (heterogeneous) global schedule (for s1, ...,
sn) is a schedule s for
such that its local projection equals the local
schedule at each site, i.e. ?i(s) si for all i,
1 ? i ? n

30
Correctness of schedules

Given a federation of two servers
D1 a D2 b, c
Given two global transactions T1 and T2 and a
local transaction T3
T1 r(a) w(b) T2 w(a) r(c) T3 r(b)
w(c)
Assume the following local schedules
server 1 r1(a) w2(a)
server 2 r3(b) w1(b)
r2(c) w3(c)
Transactions T1 and T2 are executed strictly
serially at both sites the global schedule is
not globally serializable

indirect conflict
31
Global serializability

In a heterogeneous federation GTM has no direct
control over local schedules the best it can do
is to control the serialization order of global
transactions by carefully controlling the order
in which operations are sent to local systems for
execution and in which these get acknowledged.
Indirect conflict Ti and Tk are in indirect
conflict in si if there exists a sequence T1,
..., Tr of transactions in si such that Ti is in
si in a direct conflict with T1 Tj is in si in a
direct conflict with Tj1, 1?j?r-1, and Tr is in
si in a direct conflict with Tk
Conflict equivalence two schedules contain the
same operations and the same direct and indirect
conflicts

32
Global serializability

Global Conflict Serialization Graph
Let s be a global schedule for the local
schedules s1, s2, ..., sn let G(si) denote the
conflict serialization graph of si,
1 ? i ? n, derived from direct and indirect
conflicts. The global conflict serialization
graph of s is defined as the union of all G(si),
1 ? i ? n, i.e.
Global serializability theorem
Let the local schedules s1, s2, ..., sn be
given, where each G(si), 1 ? i ? n, is acyclic.
Let s be a global schedule for the si, 1 ? i ? n.
The global schedule s is globally conflict
serializable iff G(s) is acyclic

33
Global serializability - problems

To ensure the global serializability the
serialization order of global transactions must
be the same in all sites they execute
Serialization orders of local schedules must be
validated by the HDDBS
These orders are neither reported by EDSs, nor
They can be determined by controlling the
submission of the global subtransactions or
observing their execution order

to check
34
Example

Globall non-serializable schedule
s1 w1(a) r2(a) T1 T2
s2 w2(c) r3(c) w3(b) r1(b) T2 T3
T1
Globally serializable schedule
s1 w1(a) r2(a) T1 T2
s2 w2(c) r1(b)
Globall non-serializable schedule
s1 w1(a) r2(a) T1 T2
s2 w3(b) r1(b) w2(c) r3(c) T2
T3 T1

35
Quasi serializability

Rejecting global serializability as the
correctness criterion
The basic idea we assume that no value
dependencies exist among EDSs so indirect
conflicts can be ignored
In order to preserve global database consistency,
only global transactions needs to be executed in
a serializable way with proper consideration of
the effects of local transactions

36
Quasi serializability

Quasi-serial schedule
A set of local schedules s1, ..., sn is quasi
serial if each si is conflict serializable and
there exists a total order lt on the set T of
global transactions such that Ti lt Tj for Ti, Tj
? T, i ? j, implies that in each local schedule
si, 1 ? i ? n, the Ti subtransaction occurs
completely before Tj subtransaction
Quasi serializability
A set of local schedules s1, ..., sn is quasi
serializable if there exists a set s1, ...,
sn of quasi serial local schedules such that si
is conflict equivalent to si for 1 ? i ? n.

37
Example (1)

Given a federation of two servers
D1 a, b D2 c, d, e
Given two global transactions T1 and T2 and two
local transactions T3 and T4
T1 w(a) r(d) T2 r(b) r(c) w(e)
T3 r(a) w(b) T4 w(d) r(e)
Assume the following local schedules
s1 w1(a) r3(a) w3(b) r2(b)
s2 r2(c) w4(d) r1(d) w2(e) r4(e)

38
Example (2)

The set s1, s2 is quasi serializable, since it
is conflict equivalent to the quasi serial set
s1, s2, where
s2 w4(d) r1(d) r2(c) w2(e) r4(e)
The global schedule
s w1(a) r3(a) r2(c) w4(d) r1(d) c1 w3(b) c3
r2(b) w2(e) c2 r4(e) c4
is quasi serializable however, s is not
globally serializable
Since the quasi-serialization order is always
compatible with the orderings of subtransactions
in the various local schedules, quasi
serializability is relatively easy to achieve for
a GTM

39
Achieving Global Serializability through Local
Guarantees - Rigorousness

GTM assume that local schedules are conflict
serializable
There are various scenarios for guaranteeing
global serializability
Rigorousness local schedulers produce
conflict-serializable rigorous schedules. The
schedule is rigorous if it satisfies the
following condition
oi(x) lts oj(x), i ? j, oi, oj in conflict
aj lts oj(x) or cj lts oj(x)
Schedules in RG avoid any type of rw, wr, or ww
conflict between uncommitted transactions

40
Achieving Global Serializability through Local
Guarantees - Rigorousness

Given a federation of two servers
D1 a, b D2 c, d
Given two global transactions T1 and T2 and two
local transactions T3 and T4
T1 w(a) w(d) T2 w(c) w(b)
T3 r(a) r(b) T4 r(c) r(d)
Assume the following local schedules
s1 w1(a) c1 r3(a) r3(b) c3 w2(b) c2
s2 w2(c) c2 r4(c) r4(d) c4 w1(d) c1
Both schedules are rigorous, but they yield
different serialization orders

41
Achieving Global Serializability through Local
Guarantees - Rigorousness

Commit-deferred transactions A global
transaction T is commit-deferred if its commit
operation is sent by GTM to local sites only
after the local executions of all data operations
from T have been acknowledged at all sites
Theorem If si ? RG, 1 ? i ? n, and all global
transactions are commit-deferred, then s is
globally serializable

42
Possible solutions

Bottom-up approach observing the execution of
global transactions at each EDS.
Idea the execution order of global transactions
is determined by their serialization orders at
each EDS
Problem how to determine serialization order of
gl. trans.
Top-down approach controlling the submission and
execution order of global transactions
Idea GTM determines a global serialization
order for global transactions before submitting
them to EDSs. It is EDSs responsibility to
enforce the order at local sites
Problem how the order is enforced at local sites

43
Ticket-Based Method

How GTM can obtain information about relative
order of subtransactions of global transactions
at each EDSs?
How GTM can guarantee that subtransactions of
each global transaction have the same relative
order in all participating EDSs?
Idea to force local direct conflicts between
global transactions or to convert indirect
conflicts (not observable by the GTM) into direct
(observable) conflicts

44
Ticket-Based Method

Ticket a ticket is a logical timestamp whose
value is stored as a special data item in each
EDS
Each subtransaction is required to issue the
Take_A_Ticket operation
r(ticket) w(ticket1) (critical
section)
Only subtransactions of global transactions have
to take tickets
Theorem If global transaction T1 takes its
ticket before global transaction T2 in a server,
then T1 will be serialized before T2 by that
server
or tickets obtained by subtransactions determine
their relative serialization order

45
Example (1)

Given a federation of two servers
D1 a D2 b, c
Given two global transactions T1 and T2 and a
local transaction T3
T1 r(a) w(b) T2 w(a) r(c)
T3 r(b) w(c)
Assume the following local schedules
s1 r1(a) c1 w2(a) c2 T1 T2
s2 r3(b) w1(b) c1 r2(c) c2 w3(c)
c3
the schedule is not globally serializable T2
T3 T1

46
Example (2)

Using tickets, the local schedules look as
follows
s1 r1(I1) w1(I11) r1(a) c1 r2(I1)
w2(I11) w2(a) c2
s2 r3(b) r1(I2) w1(I21) w1(b) c1
r2(I2) w2(I21) r2(c) c2 w3(c) c3
Indirect conflict between global transactions in
the schedule s2 has been turned into an explicit
one the schedule s2 is not conflict serializable

T3
T2
T1
47
Example (3)

Consider another set of schedules
s1 r1(I1) w1(I11) r1(a) c1 r2(I1)
w2(I11) w2(a) c2
s2 r3(b) r2(I2) w2(I21) r1(I2)
w1(I21) w1(b) c1 r2(c) c2 w3(c)
c3
Now, both schedules are conflict serializable
tickets obtained by transactions determine their
serialization order

48
Optimistic ticket method

Optimistic ticket method (OTM) GTM must ensure
that the subtransactions have the same relative
serialization order in their corresponding EDSs
Idea is to allow the subtransactions to proceed
but to commit them only if their ticket values
have the same relative order in all participating
EDSs
Requirement EDSs must support a visible
prepare_to_commit state for all subtransactions
Prepare_to_commit state is visible if the
application program can decide whether the
transaction should commit or abort

49
Optimistic ticket method

A global transaction T proceed as follows
GTM sets a timeout for T
Submits all subtransactions of T to their
corresponding EDSs
If they enter their p_t_c state, they wait for
the GTM to validate T
Commit or abort is broadcasted
GTM validates T using Ticket graph the graph is
tested for cycles involving T
Problems with OTM
Global aborts caused by ticket operations
Probability of global deadlocks increases

50
Cache Coherence and Concurrency Control for
Data-Sharing Systems
51
Architectures for Parallel Distributed Database
Systems

Three main architectures
Shared memory systems
Shared disk systems
Shared nothing
Shared memory system multiple CPUs are attached
to an interconnection network, and can access a
common region of main memory
Shared disk system each CPU has a private memory
and direct access to all disks through an
interconnection network
Shared nothing system each CPU has local memory
and disk space, but no two CPUs can access the
same storage area, all communication is through a
network connection

52
Shared memory system
P
P
P
P
Interconnection Network
Global Shared Memory
D
D
D
53
Shared disk system
M
M
M
M
P
P
P
P
Interconnection Network
D
D
D
54
Shared nothing system
Interconnection Network
P
P
P
P
M
M
M
M
D
D
D
D
55
Characteristic of architectures

Shared memory
is closer to conventional machine, many
commercial DBMS have been ported to this platform
Communication overhead is low
Memory contention becomes a bottleneck as the
number of CPUs increases
Shared disk similar characteristic
Interference problem as more CPUs are added,
existing CPUs are slowed down because of the
increased contention for memory access and
network bandwith
A system with 1000 CPU is only 4 as effective as
a single CPU system

56
Shared nothing

It provides almost linear speed-up in that the
time taken for operations decreases in proportion
to the increase in the number of CPUs and disks
It provides almost linear scale-up in that
performance is sustained if the number of CPUs
and disks are increased in proportion to the
amount of data
Powerful parallel database systems can be built
by taking advantage of rapidly improving
performance for single CPU

57
Shared nothing
transactions/second
transactions/second
of CPUs
of CPUs and DB size
SCALE-UP with DB SIZE
SPEED-UP
58
Concurrency and cache coherency problem

Data pages can be dynamically replicated in more
than one server cache to exploit access locality
Synchronization of reads and writes requires some
form of distributed lock management and
invalidation of stale copies of data items or
propagation of updated data items must be
communicated among the servers
Basic assumption for data sharing systems each
individual transaction is executed solely on one
server (i.e. transaction does not migrate among
servers during its execution)

59
Callback Locking

We assume that both concurrency control and cache
coherency control are page oriented
Each server has a global lock manager and a local
lock manager
Data items are assigned to global managers in a
static manner (e.g. via hashing), so each global
lock manager is responsible for a fixed subset of
the data items we say that global lock manager
has the global lock authority for a data item
The global lock manager knows for a data item at
each point in time whether the item is locked or
not

60
Callback Locking - concurrency control

When a transaction requests a lock or wants to
release a lock, it first addresses its local lock
manager, which can then contact the global lock
manager
The simplest way is to forward all lock and
unlock requests to the global lock manager that
has the global lock authority for the given data
item
If a lock lock manager is authorized to manage
read lock (or write lock) locally, then it can
save message exchanges with the global lock
manager

61
Callback Locking concurrency control

Local read authority enables local lock manager
to grant local read locks for a data item
Local write authority enables local lock manager
to grant local read/write locks for a data item
A write authority has to be returned to the
corresponding global lock manager if another
server wants to access the data item
A read authority can be held by several servers
simultaneously and has to be returned to the
corresponding global lock manager if another
server wants to access the data item to perform a
write access

62
Callback Locking concurrency control

Cache coherency protocol needs to ensure
Multiple caches can hold up-to-date versions of a
page simultaneously as long as the page is only
read, and
Once a page has been modified in one of the
caches, this cache is the one that is allowed to
hold a copy of the page
Callback message revokes the local lock authority

63
Callback Locking
Server B
Server C
Home(x)
Server A
r1(x)
Rlock(x)
Rlock authority(x)
r2(x)
Rlock(x)
Rlock authority(x)
c1 r3(x) c3
w4(x)
64
Callback Locking
Server B
Server C
Home(x)
Server A
c1 r3(x) c3
w4(x)
Wlock(x)
Callback(x)
Callback(x)
OK
c2
OK
Wlock authority(x)

Write a Comment

User Comments (0)

About PowerShow.com

Distributed Concurrency Control - PowerPoint PPT Presentation

Distributed Concurrency Control

Title: Distributed Concurrency Control Author: Tadeusz Morzy Last modified by: Tadeusz Morzy Created Date: 3/26/2002 5:08:05 PM Document presentation format – PowerPoint PPT presentation