Synchronization

About This Presentation

Title:

Synchronization

Description:

No machine has complete information about the system state. ... Use a token to arbitrate access to critical section. Must wait for token before entering CS ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 72

Provided by: steve1816

Category:

more less

Transcript and Presenter's Notes

Title: Synchronization

1
Synchronization

Chapter 5

2
Synchronization

Synchronization in distributed systems is harder
than in centralized systems because of the need
for distributed algorithms.
Distributed algorithms have the following
properties
No machine has complete information about the
system state.
Machines make decisions based only on local
information.
Failure of one machine does not ruin the
algorithm.
There is no implicit assumption that a global
clock exists.
Clocks are needed to synchronize in a distributed
system.

3
Clock Synchronization

Time is unambiguous in centralized systems.
System clock keeps time, all entities use this
for time.
In distributed systems each node has own system
clock.
Each crystal-based clock runs at slightly
different rates. This difference is called clock
skew.
Problem An event that occurred after another may
be assigned an earlier time.

4
Physical Clocks A Primer

Accurate clocks are atomic oscillators
Most clocks are less accurate (e.g., mechanical
watches)
Computers use crystal-based blocks
Results in clock drift
How do you tell time?
Use astronomical metrics (solar day)
Coordinated universal time (UTC) international
standard based on atomic time same as Greenwich
Mean Time
Add leap seconds to be consistent with
astronomical time
UTC broadcast on radio (satellite and earth)
Receivers accurate to 0.1 10 ms
The goal is to synchronize machines with a master
(UTC receiver machine) or with one another.

5
Physical Clocks

Computation of the mean solar day (transit of the
sun noon)

6
Physical Clocks

TAI (Temps Atomique International) seconds are of
constant length, unlike solar seconds. Leap
seconds are introduced when necessary to keep in
phase with the sun.

7
Clock Synchronization Algorithms

The relation between clock time and UTC when
clocks tick at different rates.

8
Clock Synchronization

Each clock has a maximum drift rate r
1-r lt dC/dt lt 1r
Two clocks may drift by 2r Dt in time Dt
To limit drift to d gt resynchronize every d/2r
seconds (2r Dt lt d, Dt d/2r)

9
Cristian's Algorithm

Synchronize machines to a time server that has a
UTC receiver.
Machine P requests time from server every d/2r
seconds
Receives time t (Cutc) from server, P sets clock
to ttreply where treply is the time to send
reply to P
Use (treqtreply)/2 as an estimate of treply
Improve accuracy by making a series of
measurements

10
Cristian's Algorithm

Getting the current time from a time server.

11
Berkeley Algorithm

Used in systems without UTC receiver
Keep clocks synchronized with one another
One computer is master, other are slaves
Master periodically polls slaves for their times
Average times and return differences to slaves
Communication delays compensated as in Cristians
algorithm
Failure of master ? election of a new master

12
The Berkeley Algorithm

The time daemon asks all the other machines for
their clock values
The machines answer
The time daemon tells everyone how to adjust
their clock

13
Distributed Approaches

Both approaches studied thus far are centralized.
Decentralized algorithms use resynchronization
intervals
Broadcast time at the start of the interval
Collect all other broadcast that arrive in a
period S
Use average value of all reported times
Can throw away few highest and lowest values
Approaches in use today
rdate synchronizes a machine with a specified
machine
Network Time Protocol (NTP) Uses advanced clock
synchronization to achieve accuracy in 1-50 ms

14
Logical Clocks

For many problems, only internal consistency of
clocks matters.
Absolute (real) time is less important
Use logical clocks
Key idea
Clock synchronization needs not be absolute.
If two machines do not interact, no need to
synchronize them.
More importantly, processes need to agree on the
order in which events occur rather than the time
at which they occurred.

15
Event Ordering

Problem define a total ordering of all events
that occur in a system.
Events in a single processor machine are totally
ordered.
In a distributed system
No global clock, local clocks may be
unsynchronized.
Can not order events on different machines using
local times.
Key idea Lamport
Processes exchange messages
Message must be sent before received
Send/receive used to order events (and
synchronize clocks).

16
Happenes-Before Relation

The expression A ? B is read A happens before
B.
If A and B are events in the same process and A
executed before B, then A ? B
If A represents sending of a message and B is the
receipt of this message, then A ? B
Relation is transitive
A ? B and B ? C ? A ? C
Relation is undefined across processes that do
not exchange messages
Partial ordering on events

17
Event Ordering Using HB

Goal define the notion of time of an event such
that
If A? B then C(A) lt C(B)
If A and B are concurrent, then C(A) lt, , or gt
C(B)
Solution
Each processor maintains a logical clock LCi
Whenever an event occurs locally at I, LCi
LCi1
When i sends message to j, piggyback LCi
When j receives message from i
If LCj lt LCi then LCj LCi 1 else do nothing
This algorithm meets the above goals

18
Lamport Timestamps

Three processes, each with its own clock. The
clocks run at different rates.
Lamport's algorithm corrects the clocks.

19
Example Totally-Ordered Multicasting

Updating a replicated database and leaving it in
an inconsistent state without a totally-ordered
logic clock.

20
Causality

Lamports logical clocks
If A ? B then C(A) lt C(B)
Reverse is not true!!
Nothing can be said about events by comparing
time-stamps!
If C(A) lt C(B), then ??
Need to maintain causality
Causal deliveryIf send(m) ? send(n) ? deliver(m)
? deliver(n)
Capture causal relationships between groups of
processes
Need a time-stamping mechanism such that
If T(A) lt T(B) then A should have causally
preceded B

21
Vector Clocks

Causality can be captured by means of vector
timestamps.
Each process i maintains a vector Vi
Vii number of events that have occurred at i
Vij number of events I knows have occurred at
process j
Update vector clocks as follows
Local event increment ViI
Send a message piggyback entire vector V
Receipt of a message Vjk max( Vjk,Vik )
Receiver is told about how many events the sender
knows occurred at another process k
Also Vji Vji1

22
Global State

The global state of a distributed system consists
of
Local state of each process
Messages sent but not received (state of the
queues)
Many applications need to know the state of the
system
Failure recovery, distributed deadlock detection
Problem how can you figure out the state of a
distributed system?
Each process is independent
No global clock or synchronization
A distributed snapshot reflects a consistent
global state.

23
Global State

A consistent cut receipts corresponds a send
event
An inconsistent cut sender cannot be identified

24
Distributed Snapshot Algorithm

Assume each process communicates with another
process using unidirectional point-to-point
channels (e.g, TCP connections)
Any process can initiate the algorithm
Checkpoint local state
Send marker on every outgoing channel
On receiving a marker
Checkpoint state if first marker and send marker
on outgoing channels, save messages on all other
channels until
Subsequent marker on a channel stop saving state
for that channel

25
Distributed Snapshot

A process finishes when
It receives a marker on each incoming channel and
processes them all
State local state plus state of all channels
Send state to initiator
Any process can initiate snapshot
Multiple snapshots may be in progress
Each is separate, and each is distinguished by
tagging the marker with the initiator ID (and
sequence number)

B
M
A
M
C
26
Global State (Snapshot Algorithm)

Organization of a process and channels for a
distributed snapshot

27
Global State (Snapshot Algorithm)

Process Q receives a marker for the first time
and records its local state
Q records all incoming message
Q receives a marker for its incoming channel and
finishes recording the state of the incoming
channel

28
Termination Detection

Detecting the end of a distributed computation
Notation let sender be predecessor, receiver be
successor
Two types of markers Done and Continue
After finishing its part of the snapshot, process
Q sends a Done or a Continue to its predecessor
Send a Done only when
All of Qs successors send a Done
Q has not received any message since it
check-pointed its local state and received a
marker on all incoming channels
Else send a Continue
Computation has terminated if the initiator
receives Done messages from everyone

29
Election Algorithms

Many distributed algorithms need one process to
act as coordinator
Doesnt matter which process does the job, just
need to pick one
Election algorithms technique to pick a unique
coordinator (aka leader election)
Examples take over the role of a failed process,
pick a master in Berkeley clock synchronization
algorithm
Types of election algorithms Bully and Ring
algorithms

30
Bully Algorithm

Each process has a unique numerical ID
Processes know the Ids and address of every other
process
Communication is assumed reliable
Key Idea select process with highest ID
Process initiates election if it just recovered
from failure or if coordinator failed
3 message types election, OK, I won
Several processes can initiate an election
simultaneously
Need consistent result
O(n2) messages required with n processes

31
Bully Algorithm Details

Any process P can initiate an election
P sends Election messages to all process with
higher Ids and awaits OK messages
If no OK messages, P becomes coordinator and
sends I won messages to all process with lower
Ids
If it receives an OK, it drops out and waits for
an I won
If a process receives an Election msg, it returns
an OK and starts an election
If a process receives a I won, it treats sender
an coordinator

32
The Bully Algorithm

The bully election algorithm
Process 4 holds an election
Process 5 and 6 respond, telling 4 to stop
Now 5 and 6 each hold an election

33
Bully Algorithm

Process 6 tells 5 to stop
Process 6 wins and tells everyone

34
Ring-based Election

Processes have unique Ids and arranged in a
logical ring
Each process knows its neighbors
Select process with highest ID
Begin election if just recovered or coordinator
has failed
Send Election to closest downstream node that is
alive
Sequentially poll each successor until a live
node is found
Each process tags its ID on the message
Initiator picks node with highest ID and sends a
coordinator message
Multiple elections can be in progress
Wastes network bandwidth but does no harm

35
A Ring Algorithm

Election algorithm using a ring.

36
Comparison

Assume n processes and one election in progress
Bully algorithm
Worst case initiator is node with lowest ID
Triggers n-2 elections at higher ranked nodes
O(n2) msgs
Best case immediate election n-2 messages
Ring
2 (n-1) messages always

37
Distributed Synchronization

Distributed system with multiple processes may
need to share data or access shared data
structures
Use critical sections with mutual exclusion
Single process with multiple threads
Semaphores, locks, monitors
How do you do this for multiple processes in a
distributed system?
Processes may be running on different machines
Solution lock mechanism for a distributed
environment
Can be centralized or distributed

38
Centralized Mutual Exclusion

Assume processes are numbered
One process is elected coordinator (highest ID
process)
Every process needs to check with coordinator
before entering the critical section
To obtain exclusive access send request, await
reply
To release send release message
Coordinator
Receive request if available and queue empty,
send grant if not, queue request
Receive release remove next request from queue
and send grant

39
Mutual Exclusion A Centralized Algorithm

Process 1 asks the coordinator for permission to
enter a critical region. Permission is granted
Process 2 then asks permission to enter the same
critical region. The coordinator does not reply.
When process 1 exits the critical region, it
tells the coordinator, when then replies to 2

40
Properties

Simulates centralized lock using blocking calls
Fair requests are granted the lock in the order
they were received
Simple three messages per use of a critical
section (request, grant, release)
Shortcomings
Single point of failure
How do you detect a dead coordinator?
A process can not distinguish between lock in
use from a dead coordinator
No response from coordinator in either case
Performance bottleneck in large distributed
systems

41
Distributed Algorithm

Ricart and Agrawala needs 2(n-1) messages
Based on event ordering and time stamps
Process k enters critical section as follows
Generate new time stamp TSk TSk1
Send request(k,TSk) all other n-1 processes
Wait until reply(j) received from all other
processes
Enter critical section
Upon receiving a request message, process j
Sends reply if no contention
If already in critical section, does not reply,
queue request
If wants to enter, compare TSj with TSk and send
reply if TSkltTSj, else queue

42
A Distributed Algorithm

Two processes want to enter the same critical
region at the same moment.
Process 0 has the lowest timestamp, so it wins.
When process 0 is done, it sends an OK also, so 2
can now enter the critical region.

43
Properties

Fully decentralized
N points of failure!
All processes are involved in all decisions
Any overloaded process can become a bottleneck
A Token Ring Algorithm
Use a token to arbitrate access to critical
section
Must wait for token before entering CS
Pass the token to neighbor once done or if not
interested
Detecting token loss in not-trivial

44
A Toke Ring Algorithm

An unordered group of processes on a network.
A logical ring constructed in software.

45
Comparison

A comparison of three mutual exclusion algorithms.

46
Transactions

Transactions provide higher level mechanism for
atomicity of processing in distributed systems
Have their origins in databases
Banking example Three accounts A100, B200,
C300
Client 1 transfer 4 from A to B
Client 2 transfer 3 from C to B
Result can be inconsistent unless certain
properties are imposed on the accesses

47
ACID Properties

Atomic all or nothing (indivisible)
Consistent transaction takes system from one
consistent state to another (hold certain
invariants)
Isolated Immediate effects are not visible to
other (serializable)
Durable Changes are permanent once transaction
completes (commits)

48
The Transaction Model

Updating a master tape is fault tolerant.

49
The Transaction Model

Examples of primitives for transactions.

50
The Transaction Model

Transaction to reserve three flights commits
(White Plains ? New York ? Nairobi ? Malindi)
Transaction aborts when third flight is
unavailable

51
Classification of Transactions.

A flat transaction is a series of operations that
satisfy the ACID properties.
It does not allow partial results to be committed
or aborted.
Example flight reservation, Web link update.
A nest transaction is constructed from a number
of subtransactions.
A distributed transaction is logically a flat,
indivisible transaction that operates on
distributed data.

52
Distributed Transactions

A nested transaction (transaction is decomposed
into subtransactions)
A distributed transaction (subtransaction on
different data)

53
Implementation of transactions

Two methods can be used to implement
transactions
Private workspace Until the transaction either
commits or aborts, all of its reads and writes go
to the private workspace.
Writeahead log Use a log to record the change.
Only after the log has been written successfully
is the change made to the file.
Private workspace
Each transaction get copies of all files, objects
It can optimize for reads by not making copies
It can optimize for writes by copying only what
is required (An appended block and a copy of
modified block are created. These new blocks are
called shadow blocks.)
Commit requires making local workspace global

54
Private Workspace

The file index and disk blocks for a three-block
file
The situation after a transaction has modified
block 0 and appended block 3
After committing

55
Implementation Write-ahead Logs

In-place updates transaction makes changes
directly to all files/objects and keeps these
changes in a log.
Write-ahead log prior to making change,
transaction writes to log on stable storage
Transaction ID, block number, original value, new
value
Force logs on commit
If abort, read log records and undo changes
rollback
Log can be used to rerun transaction after
failure
Both workspaces and logs work for distributed
transactions
Commit needs to be atomic will return to this
issue in Ch. 7

56
Writeahead Log

a) A transaction
b) d) The log before each statement is executed

57
Concurrency Control

Goal Allow several transactions to be executing
simultaneously such that
Collection of manipulated data item is left in a
consistent state
Achieve consistency by ensuring data items are
accessed in an specific order
Final result should be same as if each
transaction ran sequentially

58
Concurrency Control

Concurrency control can implemented in a layered
fashion
Bottom layer - A data manager performs the actual
read and write operations on data.
Middle layer - A scheduler carries the main
responsibility for properly controlling
concurrency. Scheduling can be based on the use
of locks or timestamps.
Highest layer The transaction manager is
responsible for guaranteeing atomicity of
transactions.

59
Concurrency Control

General organization of managers for handling
transactions.

60
Concurrency Control

General organization of managers for handling
distributed transactions.

61
Serializability

Key idea properly schedule conflicting
operations
Conflict is possible if at least one operation is
write
Read-write conflict
Write-write conflict

(d)

a) c) Three transactions T1, T2, and T3
d) Possible schedules (Schedule 2 is legal
because it results in a valid x value.)

62
Serializability

Two approaches are used in concurrency control
Pessimistic approaches operations are
synchronized before they are carried out.
Optimistic approaches operations are carried out
and synchronization takes place at the end of
transaction. At the conflict point, one or more
transactions are aborted.

63
Two-phase Locking (2PL)

Widely used concurrency control technique
Scheduler acquires all necessary locks in growing
phase, releases locks in shrinking phase
Check if operation on data item x conflicts with
existing locks
If so, delay transaction. If not, grant a lock on
x
Never release a lock until data manager finishes
operation on x
Once a lock is released, no further locks can be
granted.

64
Two-Phase Locking

Two-phase locking.

65
Two-phase Locking (2PL)

In strict two-phase locking, the shrinking phase
does not take place until the transaction has
finished running.
Advantages
A transaction always reads a value written by a
committed transaction.
All lock acquisitions and releases can be handled
by the system without the transaction being aware
of them.
Problem deadlock possible
Example acquiring two locks in different order

66
Two-Phase Locking

Strict two-phase locking.

67
Two-phase Locking (2PL)

In centralized 2PL, a single site is responsible
for granting and releasing locks.
In primary 2PL, each data item is assigned a
primary copy. The lock manager on that copys
machine is responsible for granting and releasing
locks.
In distributed 2PL, the schedulers on each
machine not only take care that locks are granted
and released, but also that the operation is
forwarded to the (local) data manager.

68
Timestamp-based Concurrency Control

Each transaction Ti is given timestamp ts(Ti)
If Ti wants to do an operation that conflicts
with Tj
Abort Ti if ts(Ti) lt ts(Tj)
When a transaction aborts, it must restart with a
new (larger) time stamp
Two values for each data item x
Max-rts(x) max time stamp of a transaction that
read x
Max-wts(x) max time stamp of a transaction that
wrote x

69
Reads and Writes using Timestamps

Readi(x)
If ts(Ti) lt max-wts(x) then Abort Ti
Else
Perform Ri(x)
Max-rts(x) max(max-rts(x), ts(Ti))
Writei(x)
If ts(Ti)ltmax-rts(x) or ts(Ti)ltmax-wts(x) then
Abort Ti
Else
Perform Wi(x)
Max-wts(x) ts(Ti)

70
Pessimistic Timestamp Ordering

Concurrency control using timestamps.

71
Optimistic Concurrency Control

Transaction does what it wants and validates
changes prior to commit
Check if files/objects have been changed by
committed transactions since they were opened
Insight conflicts are rare, so works well most
of the time
Works well with private workspaces
Advantage
Deadlock free
Maximum parallelism
Disadvantage
Rerun transaction if aborts
Probability of conflict rises substantially at
high loads
Not used widely