Synchronization in Distributed Systems

About This Presentation

Title:

Synchronization in Distributed Systems

Description:

Fourth property points out that time is different in centralized and distributed ... Temporal values have meaning in a DS only within a given granularity determined ... – PowerPoint PPT presentation

Number of Views:1323

Avg rating:3.0/5.0

Slides: 124

Provided by: douglas122

Category:

more less

Transcript and Presenter's Notes

Title: Synchronization in Distributed Systems

1
Synchronization in Distributed Systems

EECS 750
Spring 1999
Course Notes Set 3
Chapter 3
Distributed Operating Systems
Andrew Tanenbaum

2
Synchronization in Distributed Systems

A computation must be composed of components
separated logically, if not physically, or it
cannot be considered to be distributed
Just as this implied communication as a necessary
component, so too is some form of synchronization
Distributed components of the computation
cooperate and exchange information
This implies implicit and explicit constraints on
how the components execute relative to one
another
Such constraints are ensured and enforced by
various forms of synchronization

3
Synchronization

Interacting components whose execution is
constrained are obviously communicating
Communication support is a necessary but not
sufficient set of support for distributed systems
Some forms of communication are synchronous
implying both properties
As with communication
Different situations require different semantics
Weakest adequate semantics are usually the best
choice
As with communication
Synchronization in distributed systems is like
that in uni-processor systems, only more so

4
Synchronization

Uni-processor systems present all of the basic
synchronization scenarios and problems
Critical Sections
Mutual Exclusion
Counting Semaphores - resource allocation
Atomic Transactions
BUT a uni-processor is pseudo-parallel
Canonical problems are often simplified or are
special cases of the general problem
Multi-processors, multi-computers, and networks
of workstations all have different implications

5
Synchronization

Implicit assumptions change in moving from one
architecture to another
Uni-processor Multi-processor
True parallelism changes the probability of
various scenarios by removing pseudo-parallel
constraints
Changes the methods by which critical sections
must be or are best protected
Still preserves the most basic assumption atomic
operations on shared memory
Multiple caches and NUMA hierarchy can make this
complicated

6
Synchronization

Single box Multiple distributed boxes
Violates the assumption that all components can
have atomic access to shared memory
Requires new methods of supporting
synchronization
All synchronization methods ultimately decide
Which set of computation operations must be
controlled and which need not
In what order to execute computation operations
Sets of events that can be done at the same time
or that can be done in any order are concurrent
Sets of events that must be done one at a time
are sequential

7
Synchronization

Many synchronization methods in distributed
systems thus depend on
How the system can tell the time at which events
occur
How the system can tell the order in which events
occur
Under the principle of weakening semantics for
better performance, there are many forms of
event ordering
We will consider
Mutual Exclusion
Election
Atomic Transactions
Deadlock

8
Clock Synchronization

Coordination in DS often requires a knowledge of
when things happen which implies a clock of some
kind
We will see that not all situations requires
clocks with semantics of the same strength
Distributed systems are often more complicated
than non-distributed equivalents because they
require distributed rather than centralized
algorithms
The properties of distributed algorithms, as
always, determine the set of system services
required
Distribution is often more complex and difficult
than one first expects
Centralized architectures are not necessarily bad

9
Clock SynchronizationDistributed Algorithm
Properties

Distributed algorithms have properties with
important implications
Information scattered among many components
Computation components (processes) make decisions
based on locally available information
Single points of failure should be avoided
No common clock or other precise global time
source
Yet, some form of distributed sense of time is
required
How precise depends on what has to be
synchronized
Coarse grain is easy
Most DS require fine enough grain to be hard

10
Clock Synchronization Distributed Algorithm
Properties

First three properties argue against
centralization for resource allocation and other
types of management
Limits scalability
Single point of failure
Requires a new approach to algorithm design
Fourth property points out that time is different
in centralized and distributed systems
Temporal values have meaning in a DS only within
a given granularity determined by clock
synchronization
Unattended clocks can drift by hours or days
ITTC uses GPS and Network Time Protocol (NTP) to
synchronize within fractions of a second

11
Clock Synchronization Distributed Algorithm
Properties

Consider the problem of a distributed file system
and compilation environment with files and
compilers on multiple distributed machines
Make tracks relations among source and output
files to determine what needs to be recompiled at
any given time
Make uses the creation time stamp of the files to
determine if a source file is younger than an
output file
Does not depend on the validity of the times of
each file
Does depend on the times imposing a correct
order on the set of creation events for each file
Single incorrect clock in uni-processor works
just fine
Multiple clocks must be synchronized closely
enough

12
Logical Clocks

No computer has an absolute clock and no computer
keeps absolute time
Computers keep logical time for a number of
reasons and in a number of different senses
Logical time is a representation of absolute time
in the computer subject to a number of different
constraints
How does a computer obtain a sense of time?
Often a periodic interrupt updating software
clock
Imposes constraints on temporal resolution and
overhead
Raising interrupt frequency raises the resolution
but also the overhead

13
Logical Clocks

Where does the periodic interrupt come from?
Timer hardware with an oscillating crystal
Interrupt programmed for every N crystal
oscillations
Crystals differ from one another quite a bit
Clock drift is the difference in rate between two
clocks
Clock drift over time results in a difference in
value between clocks called skew
Lamport observed
Clock synchronization within reasonable limits is
possible
Useable synchronization need not be absolute

14
Logical Clocks

Degree of synchronization required depends on the
time scale of the operations being synchronized
and the semantics of the synchronization
Lamport based his approach on several
observations
Components which do not interact place no
constraint on the synchronization of their clocks
Interacting components often care only about the
order in which events occur, not their times
Even when a global time is required, it can often
be a logical time differing arbitrarily from
real-time
When real-time does matter, the system must be
designed to tolerate the real clock
synchronization tolerance

15
Logical Clocks

Algorithms which depend on temporal ordering but
which do not depend on absolute time use logical
clocks
Absolute time is given by physical clocks
Lamports algorithm synchronizes logical clocks

16
Lamports Logical ClockSynchronization

Lamports approach to logical clocks is used in
many situations in distributed systems where
ordering is important but global time is not
required
Example of weakening semantics to simplify and/or
increase efficiency
Begin with an important relation happens-before
A happens-before B ( ) when all the
processes involved in a distributed decision
agree that event A occurred first, and that then
B occurred
Note that this does not mean that A actually
happened before B according to a hypothetical
absolute global clock

17
Lamports Logical ClockSynchronization

A system can know the happens-before applies
when
1) Events A and B are observed by the same
process, or by different processes with the same
global clock, and A happens before B, then
2) Event A denotes sending a message, and event
B denotes receiving the same message, then
since a message cannot be received
before it is sent
3) Happens-before is transitive so
If two elements X and Y do not interact through
messages then they are concurrent since neither
can be determined
nor does it matter

18
Lamports Logical ClockSynchronization

We have, thus, distinguished between concurrent
events whose global ordering can be ignored, and
events to which a global logical time must be
assigned
This global logical time is denoted as the
logical clock value of an event
Consider the previous two situations
Events on the same system
send and receive events on different systems

19
Lamports Logical ClockSynchronization

On the same system, then
trivially since the two events on the same system
can easily use the same clock
Note that the temporal granularity of the system
clock being used must be sufficient to
distinguish A and B
Otherwise
When the events occur on different systems, we
must assign C(A) and C(B) in such a way that the
necessary relation holds without ever decreasing
a time value
Thus logical clock values of an event may be
changed but always by moving them forward
Logical clocks in a distributed system always run
as fast or faster than the physical clocks with
which they interact

20
Lamports Logical ClockSynchronization

Consider how logical times are assigned in a
specific scenario
Figure 3-2 page 123 of Tanenbaum
In Part A of the figure the three machines have
clocks running at different speeds, and the event
times are not consistent with the happens-before
relation
Note that message C arrives at local time 56 even
though it was sent at local time 60
This is a contradiction because
Clearly, a message must be received after it is
sent, so by setting the receiving clock to 61

21
Lamports Logical ClockSynchronization

Adjusting the receiving clock to 61 or greater
ensures that happens-before applies and events
can be assigned a rational logical order
Figure 3-2(b) shows this adjustment to the clock
at the receiver of C
Every message transfer takes at least 1 time tick
Any clock, logical or physical, has finite
resolution
Two events occurring close enough together happen
at the same time
All clock values are thus limited to creating
partial rather than total orders on a set
Some distributed algorithms require a total order

22
Lamports Logical ClockSynchronization

Additional refinement Tie Breaker
If a total order is required and for two events A
and B
then we use some unique property of the processes
associated with the events to choose the winner
Process ID (PID) is often used for this purpose
Establishes a total order on a set of events
Recall that ties can happen only between events
happening on the same system since we already
asserted that every message transfer takes at
least one tick of the logical clock

23
Lamports Logical ClockSynchronization

Following these rules means that the logical
clock at each node in a distributed system is now
sufficient to reason about synchronization
problems
Logical clock provides a way for each system to
decide about the order in which events occur from
each systems point of view
Consider the connection to in-order message
delivery in ensuring logically consistent
decision making among distributed components of a
computation
HOWEVER the logical clock values at each
distributed component may have little or no
relation to real time or to each other

24
Physical Clocks

All clocks are logical clocks in the sense that
each
Has finite resolution
Approximates real time
Two important questions must always be considered
for a particular system
How do we synchronize the computers logical
clock with real time
How do we synchronize computer clocks with one
another
Computers and distributed software running on
them may have several clocks, but one is the
local notion of real time

25
Physical ClocksReal Time

Sun Time
Humans, including astronomers, want to have time
keeping stay synchronized with the sun
Harder than it seems
Consider transition to Gregorian calendar
Significantly shorter year was decreed to adjust
drift in previous scheme
Is the year 2000 a leap year? (Hint4,-100,400)
Atomic Time
50 Cesium 133 clocks around the world
Average number of ticks since 1/1/58

26
Physical Clocks Real Time

Atomic time is the official universal time
Requires leap seconds every few years to stay
synchronized with earths rotation
Astronomers care because it makes a difference
where they point their instruments
They work at a much finer time scale than you
might think
So do computers and distributed computations
GPS (Global Positioning System) satellites now
make this easy and cheap to get
You can also call NIST on the telephone in Ft.
Collins

27
Physical Clocks Clock Synchronization

Consider the difference between accuracy and
synchronization of two clocks
Their accuracy is how closely they agree with
real time
Their synchronization is how closely they agree
with each other
Synchronization of clocks in a network supporting
distributed decision making is often more
important than their accuracy
Synchronization of clocks affects how easily
distributed components can decide on an ordering
of events
Synchronization of clocks within a few
milliseconds of each other is desirable, but
seconds or minutes of drift from real time could
be OK

28
Physical Clocks Clock Synchronization

Degree of agreement among interacting machines is
thus a crucial factor
Consider make horror scenarios to see this point
Same principle applies to banks
Network performance measurement experiments often
depend on time stamps taken on different
machines
Experiments are often restructured to minimize or
avoid this
Physical clocks run at different speeds
Manufacturers specify maximum drift rate (rho -
?)
Manufacturers lie (sorry - provide factually
unreliable information in a completely sincere
manner)

29
Physical Clocks Clock Synchronization

Maximum resolution desired for global time
keeping determines the maximum difference
which can be tolerated between synchronized
clocks
The time keeping of a clock, its tick rate should
satisfy
The worst possible divergence d between two
clocks is thus
So the maximum time ?t between clock
synchronization operations that can ensure d is

30
Physical Clocks Clock Synchronization

Christians Algorithm
Periodically poll the machine with access to the
reference time source
Estimate round-trip delay with a time stamp
Estimate interrupt processing time
figure 3-6, page 129 Tanenbaum
Take a series of measurements to estimate the
time it takes for a timestamp to make it from the
reference machine to the synchronization target
This allows the synchronization to converge
within d with a certain degree of confidence
Probabilistic algorithm and guarantee

31
Physical Clocks Clock Synchronization

Wide availability of hardware and software to
keep clocks synchronized within a few
milliseconds across the Internet is a recent
development
Network Time Protocol (NTP) discussed in papers
by David Mill(s)
GPS receiver in the local network synchronizes
other machines
What if all have GPS receivers
Increasing deployment of distributed system
algorithms depending on synchronized clocks
Supply and demand constantly in flux

32
Physical Clocks At-Most-Once Semantics

Traditional approach
Each message has unique message ID
Server maintains list of IDs
Can lose message numbers on server crash
How long does server keep IDs?
With globally synchronized clocks
Sender assigns a timestamp to message
Server keeps most recent timestamp for each
connection
reject any message with lower timestamp (is a
duplicate)
removing old timestamps
G CurrentTime - MaxLifeTime - MaxClockSkew
timestamps older than G are removed

33
Physical Clocks At-Most-Once Semantics

After a server crash
CurrentTime is recomputed
using global synchronization of time
All messages older than G are rejected
All messages before crash are rejected as
duplicate
some new messages may be wrongfully rejected
but at-most-once semantics is guaranteed

34
Physical Clocks Cache Coherence

File caching in a distributed file system
Many readers, single writer
Writer must ask readers to invalidate their
copies
TS on the readers copies helps by making copies
expire
Readers lease their copies of a file block
Constrains the period during which a
non-responding reader may delay a potential
writer
Does NFS server not responding sound familiar
Note tradeoff of overhead and latency
Lower lease time increases message load and
decreases delay of ignoring a non-responding
reader

35
Mutual Exclusion

Distributed components still need to coordinate
their actions, including but not limited to
access to shared data
Mutual exclusion to some limited set of
operations and data is thus required
Consider several approaches and compare and
contrast their advantages and disadvantages
Centralized Algorithm
The single central process is essentially a
monitor
Central server becomes a semaphore server
Three messages per use request, grant, release
Centralized performance constraint and point of
failure

36
Mutual ExclusionDistributed Algorithm Factors

Functional Requirements
1) Freedom from deadlock
2) Freedom from starvation
3) Fairness
4) Fault tolerance
Performance Evaluation
Number of messages
Latency
Semaphore system Throughput
Synchronization is always overhead and must be
accounted for as a cost

37
Mutual Exclusion Distributed Algorithm Factors

Performance should be evaluated under a variety
of loads
Cover a reasonable range of operating conditions
We care about several types of performance
Best case
Worst case
Average case
Different aspects of performance are important
for different reason and in different contexts

38
Mutual ExclusionLamports Algorithm

Every site keeps a request queue sorted by
logical time stamp
Uses Lamports logical clocks to impose a total
global order on events associated with
synchronization
Algorithm assumes ordered message delivery
between every pair of communicating sites
Messages sent from site Sj in a particular order
arrive at Sj in the same order
Note Since messages arriving at a given site
come from many sources the delivery order of all
messages can easily differ from site to site

39
Lamports Algorithm Request Resource r

Thus, each site has a request queue containing
resource use requests and replies
Note that the requests and replies for any given
pair of sites must be in the same order in queues
at both sites
Because of message order delivery assumption

40
Lamports Algorithm Entering CS for Resource r

Site Si enters the CS protecting the resource
when
This ensures that no message from any site with a
smaller timestamp could ever arrive
This ensures that no other site will enter the CS
Recall that requests to all potential users of
the resource and replies from then go into
request queues of all processes including the
sender of the message

41
Lamports Algorithm Releasing the CS

The site holding the resource is releasing it,
call that site
Note that the request for resource r had to be at
the head of the request_queue at the site holding
the resource or it would never have entered the
CS
Note that the request may or may not have been at
the head of the request_queue at the receiving
site

42
Lamport ME Example
Pj
Pi
Pj enters critical section
queue(j10)
15
release(i5)
queue(j10)
14
Pi in critical section
reply(12)
reply(12)
13
13
12
12
queue(j10, i5)
11
11
request (i5)
queue(i5)
queue(j10)
request (j10)
43
Lamports Algorithm Correctness

We show that Lamports algorithm ensures mutual
exclusion through a proof by contradiction
Assume two sites and are executing in
the critical section concurrently
For this to happen, L1 and L2 must hold at both
sites concurrently which implies that at some
time t both sites and had their
own requests at the top of their respective
request_queues
Without Loss of generality (WLOG) assume that
Due to L1 and FIFO property of communication it
is clear that at time t must have had the
request from in

44
Lamports Algorithm Correctness

This implies that at site the local request
is at the head of the local
even though the request from had a lower
timestamp
This is a contradiction
Lamports algorithms thus ensures mutual
exclusion since assuming otherwise produces a
contradiction
Key idea is that L1 ensures that must place
the request from ahead of its own because it
definitely arrived and has a lower logical
timestamp

45
Lamports AlgorithmComments

Performance 3(N-1) messages per CS invocation
since each requires (N-1) REQUEST, REPLY, and
RELEASE messages
Observation Some REPLY messages are not required
If sends a request to and then receives a
REQUEST from with a timestamp smaller than its
own REQUEST
need not send a reply to because it
already has enough information to make a decision
This reduces the messages to between 2(N-1) and
3(N-1)
As a distributed algorithm there is no single
point of failure but there is increased overhead

46
Ricart and Agrawala

Refine Lamports mutual exclusion by merging the
REPLY and RELEASE messages
Assumption total ordering of all events in the
system implying the use of Lamports logical
clocks with tie breaking
Request CS (P) operation
1) Site requesting the CS creates a
message and sends it to all
processes using the CS including itself
Messages are assumed to be reliably delivered in
order
Group communication support can play an obvious
role

47
Ricart and AgrawalaReceive a CS Request

If the receiver is not currently in the CS and
does not have pending request for it in its
request_queue
Send REPLY
If the receiver is already in the CS
Queue the request, sending no reply
If the receiver desires the CS but has not
entered
Compare the TS of its request to that just
received
REPLY if received is newer
Queue the request if pending request is newer

48
Ricart and Agrawala

Enter a CS
A process enters the CS when it receives a REPLY
from every member of the group that can use the
CS
Leave a CS
When the process leaves the CS it sends a REPLY
to the senders of all pending messages on its
queue

49
Ricart and Agrawala Example 1
I
J
K
k in CS
OK(i)
i in CS
OK(k)
OK(j)
OK(j)
request(k12)
request(i8)
50
Ricart and Agrawala Example 2
I
J
K
OK(j)
k in CS
j in CS
OK(i)
OK(i)
i in CS
OK(k)
OK(k)
OK(j)
q(k9)
q(j8, k9)
q(j8)
request(i7)
request(j8)
request(k9)
51
Ricart and AgrawalaProof by Contradiction

Assume sites and are executing in the CS
concurrently
Assume that
Site clearly received the request from
after its own
Other wise
However, can be executing concurrently with
only if returns a REPLY message in response
to the request from before exits its CS
This is impossible because
The assumption leads to a contradiction and thus
the R-A algorithm ensures mutual exclusion
Performance 2(N-1) messages, (N-1) REQUEST and
(N-1) REPLY

52
Ricart and AgrawalaObservations

The algorithm works because the global logical
clock ensures a global total ordering on events
This ensures, in turn, that the decision about
who enters the CS is unambiguous
Single point of failure is now N points of
failure
A crashed group member cannot be distinguished
from a busy CS
Distributed and optimized version is N times
more vulnerable than the centralized version!
Explicit message denying entry helps reliability
and converts this into busy wait

53
Ricart and AgrawalaObservations

Either group communication support is used, or
each user of the CS must keep track of all other
potential users correctly
Powerful motivation for standard group
communication primitives
Argument against a centralized server said that a
single process involved in each CS decision was
bad
Now we have N processes involved in each decision
Improvements get a majority - Makaewas
algorithm
Bottom Line a distributed algorithm is possible
Shows theoretical and practical challenges of
designing distributed algorithms that are useful

54
Token Passing Mutex

General structure
One token per CS ? token denotes permission to
enter
Only process with token allowed in CS
Token passed from process to process ? logical
ring
Mutex
Pass token to process i 1 mod N
Received token gives permission to enter CS
hold token while in CS
Must pass token after exiting CS
Fairness ensured each process waits at most N-1
entries to get CS

55
Token Passing Mutex

Correctness is obvious
No starvation since passing is in strict order
Difficulties with token passing mutex
Idle case of no process entering CS pays overhead
of constantly passing the token
Lost tokens diagnosis and creating a new token
Duplicate tokens ensure generation of only one
token
Crashes require a receipt to detect dead
destinations
Receipts double the message overhead
Design challenge holding time for unneeded token
Too short ? high overhead, too long ? high CS
latency

56
Mutex Comparison

Centralized
Simplest and most efficient
Centralized coordinator crashes create the need
to detect crash and choose a new coordinator
M/use 3 Entry Latency 2
Distributed
3(N-1) messages per CS use (Lamport)
2(N-1) messages per CS use (Ricart Agrawala)
If any process crashes with a non-empty queue,
algorithm wont work
M/use 2(N-1) Entry Latency 2(N-1)

57
Mutex Comparison

Token Ring
Ensures fairness
Overhead is subtle ? no longer linked to CS use
M/use 1 ? ? Entry Latency 0 ? N-1
This algorithm pays overhead when idle
Need methods for re-generating a lost token
Design Principle building fault handling into
algorithms for distributed systems is hard
Crash recovery is subtle and introduces overhead
in normal operation
Performance Metrics M/use and Entry Latency

58
Election Algorithms

Centralized approaches often necessary
Best choice in mutex, for example
Need method of electing a new coordinator when it
fails
General assumptions
Give processes unique system/global numbers (e.g.
PID)
Elect process using a total ordering on the set
All processes know process number of members
All processes agree on new coordinator
All do not know if it is up or down ? election
algorithm is responsible for determining this
Design challenge network delay vs. crashed peer

59
Bully Algorithm

Suppose the coordinator doesnt respond to P1
request
P1 holds an election by sending an election
message to all processes with higher numbers
If P1 receives no responses, P1 is the new
coordinator
If any higher numbered process responds, P1 ends
its election
Process receives an election request
Reply to the sender tells it that it has lost the
election
Holds an election of its own
Eventually all but highest surviving process give
up
Process recovering from a crash takes over if
highest

60
Bully Algorithm

Example Processes 0-7, 4 detects that 7 has
crashed
4 holds election and loses
5 holds election and loses
6 holds election and wins
Message overhead variable
Who starts an election matters
Solid lines say Am I leader?
Dotted lines say you lose
Hollow lines say I won
6 becomes the coordinator
When 7 recovers it is a bully and sends I win
to all

61
Ring Algorithm

Processes have a total order known by all
Each process knows its successor ? forming a ring
Ring mod N
So the successor of Pi is P(i1) mod N
No token involved
Any process Pi noticing that the coordinator is
not responding
Sends an election message to its successor P(i1)
mod N
If successor is down, send to next member ?
timeout
Receiving process adds its number to the message
and passes it along

62
Ring Algorithm

When election message gets back to election
initiator
Change message to coordinator
Circulate to all members
Coordinator is highest process in the total order
All processes know the order and thus all will
agree no matter how the election started
Strength
Only one coordinator chosen
Weakness
Scalability latency increases with N because the
algorithm is sequential

63
Ring Algorithm

What if more than one process detects a crashed
coordinator?
More than one election will be produced message
storm
All messages will contain the same information
member process numbers and order of members
Same coordinator is chosen (highest number)
Refinement might include filtering duplicate
messages
Some duplicates will happen
Consider two elections chasing each other
Eliminate one initiated by lower numbered process
Duplicated until lower reaches source of the
higher

64
Atomic Transactions

All synchronization methods so far have been low
level
Essentially equivalent to semaphores
Good for building more powerful higher level
tools
Assume stable storage
Contents survive all non-physical disasters
Specifically used by system to store data across
crashes
Transaction
Performs a single logical function
All-or-none computationeither all operations are
executed or none
Must do so in the face of system failures ?
stable storage

65
Atomic Transactions

Transaction Model
Start transaction
Series of read and write operations
Either a commit or abort operation
Commit all transaction operations executed
successfully no transaction operations are
allowed to hold
Roll Back restore system to the original state
before transaction started
Transaction is in limbo before a commit
Has neither occurred nor not occurred
Depends on who is asking

66
Transactions Properties ACID

Atomic
Actions occur indivisibly completely or not at
all
Appear to happen instantly, from the POV of any
interacting process because they are all blocked
No intermediate states are visible
Consistent
System invariants hold, but are specific to
application
Conservation of money semantics in banking
applications
Inside transaction this is violated, but from
outside, the transaction is indivisible and
invariants are, well, invariant

67
Transactions Properties ACID

Isolated
Concurrent transactions do not interfere with
each other
Serializable results from every set of
transactions looks as if they are done in some
sequential transaction execution
Transaction system must ensure that only legal or
semantically consistent interleavings of
transaction components occur
Durable
Once a transaction commits, results are permanent
Relevant to ask permanent with respect to what
Generally data structures or stable storage
contents

68
Transaction Primitives

Begin-transaction
End-transaction
Abort-transaction
Returns to state before the begin-transaction
Often referred to as roll-back
Commit-transaction
Changes made in transaction become visible to the
outside world
Transaction operations
Read (receive)
Write (send)

69
Transaction Example

Suppose we have three transactions T1, T2, and T3
Two data elements, A and B
Scheduled by a round-robin scheduler artificial
but instructive for this example
One operation per time slice
Consider what interleavings of component
operations are consistent with a serial execution
order of transaction set
Obvious choice is to not interleave components of
different transactions ? constrains concurrency

70
Transaction Example

T1 ? T2 ? T3
But T1 reads A after T3 writes
This implies that T3 ? T1 creating a
contradiction
Atomicity is violated
Abort T1

T3
22
Aw
Br
71
Transaction Example

T2 ? T3 ? T1
T2 writes A after T3s write
Requiring T3 ? T2
Abort T2
Note since we interleaved operations all members
of the set must be ready to commit before any can
commit

72
Transaction Example

T3 ? T1 ? T2
This works because each reaches the commit stage
without encountering a contradiction

T
Ts
event1
event2
event3
event4
event5
event6
event7
Br
T3
20
Aw
T1
21
Ar
Aw
Ar
Bw
Aw
T2
22
73
Nested Transactions

Transaction divided into sub-transactions
Structured as a hierarchy
Internal nodes are masters for its children
Advantages
Better performance aborted sub-transactions do
not abort masters
Increased concurrency only need to lock
sub-transactions

A
C
H
G
F
B
I
J
D
E
74
Nested Transactions

Suppose a parent transaction starts several child
transactions
One or more children transactions commit
Only after committing are the childs results
visible to parent
Atomicity is preserved at child level
But the results are horrible so the parent aborts
But child already committed
Parent abort must roll back all child
transactions
Even if they have committed
Commit of subordinate transactions thus not
final, and thus not real with respect to the
containing system

75
Implementing Transactions

Conceptually, a transaction is given a private
workspace
Containing all resources it is allowed to access
Before commit all operations done to private
workspace
Commit changes in the private workspace are
reflected into the actual workspace (file system,
etc.)
If the shadowed workspaces of more than one
transaction intersect ? contain common member
data items
And one of them has a write operation on a common
member
Then there is a conflict
And one of the transactions must be aborted

76
Implementing Transactions

First level optimization copy on write
Private workspace points to the common workspace
Copy items into the private space only when
written
Virtual memory systems do this when processes
fork
Copied items are shadowed
Commit copies shadowed items into global
workspace
Second level optimization shadow blocks
Make units of shadowing as small as possible
Disk blocks within a file that are written
instead of the whole file
Specific variables or groups of variables in a
data space

77
Implementing Transactions

Private workspaces are a form of caching
Design issues
Size of shadowed objects
Probability of an intersection of private
workspaces
Constraint on concurrency of transactions
Overhead of managing information and detecting
intersections
Analogy to data cache line size and snooping
cache consistency problems

78
Implementing Transactions Writeahead Log

Global copies are changed in the course of a
transaction
Log of changes maintained in stable storage
Log entries include of write operation records
Transaction name
Data item name
Old value
New value
Save log entry before performing write operations
Transaction Ti is represented by a series of
write operation records terminated by the commit
operation and record

79
Implementing Transactions Writeahead Log

Transaction log consists of
lt Ti startgt
series of write records (Ti, x, old value, new
value)
lt Ti commitgt or lt Ti abortgt
Recovery procedures
undo(Ti) restores a values written by Ti to old
values
redo(Ti) sets all values written by Ti to new
values
If Ti aborts
Execute undo(Ti)

80
Implementing Transactions Writeahead Log

If there is a system failure the system can use
redo(Ti) to make sure all updates are in place
Compare writeahead log values to actual value
Also use the log to proceed with the transaction
If an abort is necessary, use undo(Ti)
Note that the commit operation must be done
atomically
Difficult when different machines and processes
are involved
Multiple logs are still a problem to consider

81
Implementing Transactions Two-Phase Commit

The commit to the transaction must be atomic
Specific roles permit this
Figure 3-20, page 153 Tanenbaum
Coordinator is selected (transaction initiator)
Phase 1
Coordinator writes prepare in log
Sends prepare message to all processes involved
in the commit (subordinates)
Subordinates write ready (or abort) into log
Subordinates reply to coordinator
Coordinator collects replies from all
subordinates

82
Implementing Transactions Two-phase Commit

If any subordinate aborts or does not respond ?
abort
If all respond, commit message will make
transaction results permanent in all subordinates
Stable storage is the key to the very end
Crashes can be handled by tracing the log to
recover
Phase 2
Coordinator logs commit and sends commit message
Subordinates write commit into their log
Subordinates execute the commit
Subordinates send finished message to coordinator
System can remove all transaction log entries, if
desired

83
Concurrency Control

Transactions need to run simultaneously
All modern data base systems need to serve
concurrent users -especially in parallelized
distributed system
Transactions can conflict
One may write to items others want to read or
write
Most transactions do not conflict
Maximizing performance requires us to constrain
only conflicting transactions
Concurrency control methods
Locking
Optimistic concurrency control
Timestamps

84
Locking

Locks
Semaphore of sorts creating mutual exclusion
regions within the total data of a DB
Simplistic scheme is too restrictive
Distinguish read and write locks
Many readers, single writer canonical problem
Read locks
Allow N read locks on a resource
Write locks
No other lock is permitted

85
Locking

Locking granularity
File level is too coarse
Finer granularity ? less concurrency constraint
Finer granularity ? greater overhead managing
locks and increased probability of deadlock
Two-Phase locking
Fine-grained locking can lead to inconsistency
and deadlock
Dividing lock requests into two phases helps
simplify
If transaction avoids updating until all locks
are acquired, this simplifies failure
Release all locks and try again

86
Locking

Growing phase
Transaction obtains locks, may not release any
Shrinking phase
Once a lock is released, no locks can be obtained
for rest of the transaction
Disadvantage of two-phase locking
Concurrency is reduced
Resource ordering (prevention) or detection and
resolution are necessary to handle deadlocks
Strict TPL releases no locks until abort/commit
Increases concurrency constraint but avoids
cascade aborts

87
Two-Phase Locking

Scenario 1
Also safe from deadlock
P1 P2
lock R1 lock R1
... lock R2
lock R2 ...
... unlock R2
unlock R2 unlock R1
unlock R1

88
Two-Phase Locking

Scenario 2
Susceptible to deadlock
P1 P2
lock R1 lock R2
... lock R1
lock R2 ...
... unlock R1
unlock R1 unlock R2
unlock R2

89
Optimistic Concurrency Control

Based on the observation that transactions rarely
conflict
Expected value argument
Cumulative overhead of avoiding conflicts is more
expensive than detecting and resolving conflicts
Let a transaction make all changes
Without checking for conflicts
Deadlock free
At commit time
Check for conflicts with files that have changed
since the transaction began
if found ? abort all but one conflicting
transaction and redo

90
Optimistic Concurrency Control

Optimistic changes made to private workspace
Distributed transactions need some form of global
clock
Basis for comparing time for file changes
Make canonical problem
Parallelism is maximized
No waiting on locks
Inefficient when an abort is needed
Not a good strategy in systems with many
potential conflicts ? bets on conflict
probability
? Load ? ? Conflicts? ? Failures ? ? Load
Positive feedback scenario

91
Timestamp Ordering

Each transaction Ti assigned a unique timestamp
TS(Ti)
If Ti enters system before Tj,
TS(Ti) lt TS(Tj)
Imposes a total ordering on transactions
Each data item, Q, gets two timestamps
W-timestamp(Q) largest write timestamp
R-timestamp(Q) largest read timestamp
General concept
Process transactions in a serial order
Can use the same file, but must do it in order
Therefore atomicity is preserved

92
Timestamp Ordering

For a read
if (TS(Ti ) lt W-timestamp(Q))
reject read
roll back and re-start Ti
else / TS(Ti ) ? W-timestamp(Q) /
execute read
R-timestamp max(R-timestamp, TS(Ti ))
Timestamp ordering is deadlock-free
Total ordering of file accesses ? no cycles can
result

93
Timestamp Ordering Example

Three transactions T1, T2, and T3
two data elements, A and B
scheduled in a round-robing scheduler
one operation per time slice
use read and write timestamps

94
Timestamp Ordering Example

Three transactions T1, T2, and T3

95
Deadlocks

Definition Each process in a set is waiting for
a resource to be released by another process in
set
The set is some subset of all processes
Deadlock only involves the processes in the set
Remember the necessary conditions for DL
Remember that methods for handling DL are based
on preventing or detecting and fixing one or more
necessary conditions

96
Deadlocks Necessary Conditions

Mutual exclusion
Process has exclusive use of resource allocated
to it
Hold and Wait
Process can hold one resource while waiting for
another
No Preemption
Resources are released only by explicit action by
controlling process
Requests cannot be withdrawn (i.e. request
results in eventual allocation or deadlock)
Circular Wait
Every process in the DL set is waiting for
another process in the set, forming a cycle in
the SR graph

97
Deadlock Handling Strategies

No strategy
Prevention
Make it structurally impossible to have a
deadlock
Avoidance
Allocate resources so deadlock cant occur
Detection
Let deadlock occur, detect it, recover from it

98
No Strategy The Ostrich Algorithm

Assumes deadlock rarely occurs
Becomes more probable with more processes
Catastrophic consequences when it does occur
May need to re-boot all or some machines in
system
Fairly common and works well when
DL is rare and
Other sources of instability are more common
How reboots of Window or MacOS are prompted by
DL?

99
Deadlock Prevention

Ordered resource allocation most common example
Consider link with two-phase-locking grow and
shrink
Works but requires global view of all resources
A total order on resources must exist for the
system
Process code must allocate resources in order
Under utilizes resources when period of use of a
resource conflict with the total resource order
Consider process Pi and Pk using resources R1
and R2
Pi uses R1 90 of its execution time and R2 10
Pk uses R2 90 of its execution time and R1 10
One holds one resource far too long

100
Deadlock Avoidance

General method Refuse allocations that may lead
to deadlock
Method for keeping track of states
Need to know resources required by a process
Bankers algorithm
Must know maximum number allocated to Pi
Keep track of resources available
For each request, make sure maximum need will not
exceed total available
Under utilizes resources
Never used
Advance knowledge not available and CPU-intensive

101
Deadlock Detection and Resolution

Attractive for two main reasons
Prevention and avoidance are hard, have
significant overhead, and require information
difficult or impossible to obtain
Deadlock is comparatively rare in most systems so
a form of the argument for optimistic concurrency
control applies detect and fix comparatively
rare situations
Availability of transactions helps
DL resolution requires us to kill some
participant(s)
Transactions are designed to be rolled back and
restarted

102
Centralized Deadlock Detection

General method Construct a resource graph and
analyze it
Analyze through resource reductions
If cycle exists after analysis, deadlock has
occurred
Processes in cycle are deadlocked
Local graphs on each machine
Pi requests R1
R1s machine places request in local graph
If cycle exists in local graph, perform
reductions to detect deadlock
Need to calculate union of all local graphs
Deadlock cycle may transcend machine boundaries

103
Graph Reduction

Cycles dont always mean deadlock!

P1
P2
P3
P1
P2
Deadlock
P3
No Deadlock
P2
P3
104
Waits-For Graphs (WFGs)

Based on Resource Allocation Graph (SR)
An edge from Pi to Pj
means Pi is waiting for Pj to release a resource
Replaces two edges in SR graph
Pi ?R
R ? Pj
Deadlocked when a cycle is found

105
Centralized Deadlock Detection

All hosts communicate resource state to
coordinator
Construct global resource graph on coordinator
Coordinator must be reliable and fast
When to construct the graph is an important
choice
Report every

Write a Comment

User Comments (0)

About PowerShow.com

Synchronization in Distributed Systems - PowerPoint PPT Presentation

Synchronization in Distributed Systems

Fourth property points out that time is different in centralized and distributed ... Temporal values have meaning in a DS only within a given granularity determined ... – PowerPoint PPT presentation