Synchronization

About This Presentation

Title:

Synchronization

Description:

When each machine has its own clock, an event that occurred after another event ... Relative Clock Synchronization ... Handles problems of clock drift in a DS ... – PowerPoint PPT presentation

Number of Views:101

Avg rating:3.0/5.0

Slides: 65

Provided by: baobao3

Category:

more less

Transcript and Presenter's Notes

Title: Synchronization

1
Chapter 7

Synchronization

2
Topics

Physical clock synchronization
Logical clock synchronization
Causality relation
Lamports logical clock
Vector logical clock
Multicast
ISIS vector clock
Snapshot

3
New Issues in DS

Global time
Event order
e1 at 100pm on machine m1, e2 at 101pm at
machine m2. Which event happens earlier?
Global state
snapshot
Mutual exclusion Synchronization

4
Time and Clock

Two roles of time
- Defines temporal order among events
- Duration (measured by timer)
UTC (Universal Coordinated Time) is based
on Cesium-133 atom oscillation located at
over 200 labs in the world
With satellites, 0.5ms accuracy is possible.
(100 MIPS ? 50,000 instructions in 0.5ms).

5
Clock Skew

Skew Clock reading (from single clock) is
location-dependent, e.g., distance from satellite
or clock source on a circuit board
Drift Multiple clocks.
t the real time
Cp(t) the reading of a clock p at time t
(Cp(t) t for ideal clock)
dCp(t) /dt ticking rate (dCp(t) /dt 1 for
ideal clock)

6
Consequence An Example

When each machine has its own clock, an event
that occurred after another event may
nevertheless be assigned an earlier time. But it
is a different story in DS.

Time?
7
Physical Clock Synchronization

Cristians Algorithm
The Berkeley Algorithm
Network Time Protocol (NTP)
OSF DCE

8
Cristians Algorithm Architecture

WWV-node receiving UTC-signals, serving as the
central UTC-time server (CUTCS) for the DS
WWV is a short wave radio station in Colorado.
Periodically every node in the DS sends a time
request to the central UTC time server CUTCS.
The CUTCS responds with its current time tUTC

9
Adjust Time

When the client gets the reply, it simply set its
clock to tUTC
Time may run backward
Introduce the change gradually
Consider the time for message propagation.

10
Adjust Time (continued)

Estimate Tp (propagation delay) from
T1 T0 2 x Tp I. where I
processing time.
Current time t (servers time in message) Tp.
Assumption?

11
The Berkeley Algorithm

In Cristians algorithm the central time server
was passive
Now its active, i.e. it periodically polls all
other nodes to hand out their current local times
ci(t).
Based on the answers it calculates a mean and
tells all other machines to advance or slow down
their clocks accordingly.

12
Relative Clock Synchronization

Time server periodically sends its time to
clients and asks for theirs.
Clients respond with how far ahead or behind the
server they are
Time server uses the estimated local times for
building the arithmetic mean
Deviations from this arithmetic mean are sent to
nodes enabling them to slow down respectively to
speed up.

13
The Berkeley Algorithm

The time daemon asks all the other machines for
their clock values
The machines answer
The time daemon tells everyone how to adjust
their clock

14
Summary

Cristians method and the Berkeley algorithm are
intended for intranets
Both may be improved with fault tolerance methods
Instead of one UTC server in Cristians algorithm
use n time servers and always take the first
answer from whatever time serve
Instead of taking the arithmetic mean from all
clients in the Berkeley algorithm take the fault
tolerance mean, i.e. skip deviations with a
certain threshold

15
Network Time Protocol (NTP)

Goal
absolute (UTC)-time service in large nets (e.g.
Internet)
high availability (via fault tolerance)
protection against fabrication (via
authentication)
Architecture
time-servers build up a hierarchical
synchronization subnet
all primary servers have an UTC-receiver
secondary servers are synchronized by their
parent primary server
all other stations are leaves on level 3 being
synchronized by level 2 time servers
accuracy of clocks decreases with increasing
level number
the net is able to reconfigure

16
NTP
Reliability from redundant paths, scalability,
authenticated time sources
17
Synchronization of Servers (NTP)

Synchronization subnet can reconfigure if
failures occur,e.g.
Primary having lost its UTC source can become a
secondary
Secondary having lost its primary can use another
one
Modes of synchronization
Multicast mode (for quick LANs, low accuracy)
A server within a LAN periodically multicasts
time to other leaves in the LAN which set their
clocks assuming some delay
Procedure-call mode (Cristians algorithm with
medium accuracy)
A server responds to requests with its actual
timestamp
Symmetric mode (high accuracy)
Pairs of servers exchange message containing times

18
OSF DCE

Time is an interval t-e, te.

Two intervals overlap ? cannot say which time is
earlier (In case of overlap, Unix make should
recompile).
19
Logical Clock Synchronization

A powerful building block in DS
Duplicate detection
Cache consistency
Leases
Commitment

20
Leslie Lamport
Time, Clocks, and the Ordering of Events in a
Distributed System
Best known for his work on 1)Temporal
logic 2)LaTeX
Microsoft
21
The Paper

Handles problems of clock drift in a DS
Identifies main function of computer clocks, i.e.
ordering of events
Indicates which conditions clocks must satisfy to
fulfill their role
Introduces logical clocks
Benefits of logical clocks?
Needed for determining causality

22
Logical Time

In many cases its sufficient just to order the
related relevant events, i.e. we want to be able
to position these events relatively, but not
absolutely.
Interesting Relative position of an event on the
time axis ? no need for any scaling on this time
axis

23
Ordering Events

Event ordering linked with concept of causality
Saying that event a happened before event b is
same as saying that event a could have affected
the outcome of event b
If events a and b happen on processes that do not
exchange any data, their exact ordering is not
important.

24
Causality Relationship

Event changes state of process.
State remains same till next event occurs.

25
Formal Definition

a? b defined by
If a and b are in the same process, and a occurs
earlier than b, then a?b.
If a is a sending event and b is receiving event
of same message, then a? b.
If a?b and b?c, then a?c. Transitive
If a? b, then a causally precedes (or happened
before) b a and b are causally related
a and b are concurrent if neither a?b nor b?a.

26
Message-Related Events

Sending event
Receiving event
Message arrival (at kernel) and delivery
(to user process) Kernel can control timing
of delivery after arrival.

27
Example
Time
e11?e12?e21?e22?e32 , furthermore
e31?e32, whereas e31 is neither related (has
happened before) to e11, nor to e12, nor to e21,
nor to e22. e31 is concurrent to e11, e12, e21,
and e22.
28
Lamport Clock

Suppose E events, each e in E gets a Lamport
time stamp L(e), as follows
1. e is a pure local event or a sending-event
if e has no local predecessor, then L(e) 1,
otherwise there is a local predecessor e, thus
L(e) L(e) 1
2. e is a receiving event, with a corresponding
sending-event s if e has no local predecessor,
then L(e) L(s) 1, otherwise there is a local
predecessor e, thus L(e) maxL(s),L(e) 1

29
Example
Note Each local counter is incremented with each
local event. In a communication we adjust the
involved counters of the two communicating nodes
to be consistent with the happened-before-relation
. Remark Same mechanism can be used to adjust
clocks on different nodes. The Lamport time is
consistent with the happened-before-relation,
i.e. if x? y, then L( x)ltL( y), but not vice
versa.
30
Adjusting Clocks
Without adjusting local clocks
With adjusting local clocks
31
Limitation on Lamport Clocks
From Lamport time values you cannot conclude
whether two events are in the happen-before
relationship,e.g. e11 and e32.
32
Total Ordering of Events

Lamport-time only gives us a partial-ordering of
distributed events.
To implement the total ordering
Each processor is assigned by a unique id
(integer)
Given two events e1 and e2, e1 is ordered before
e2 if L(e1) lt L(e2) or L(e1)L(e2) and Id(e1) lt
Id(e2)

33
Holding Back Deliveries

Delay the delivery of messages that arrived too
soon
Useful when delivering messages from kernel to
processes
Hold back the delivery of M to process P until
there is a guarantee that no message M with
L(M) lt L(M) will arrive at P in the future.

34
Implementation

Assumption messages from a particular source
arrive in the FIFO order
Each site maintains a set of message queues, one
for each other site
When a message arrives, placed in the
corresponding queue
When all queues are non-empty, compare the
timestamps of the messages at the heads of the
queues, and deliver the messages with the oldest
timestamp.

35
Limitation

All message queues need be non-empty.
Normally not true.
Require multicast to solve the problem.
With Lamport clock, L(a)ltL(b) does not mean a?b.
Unnecessarily delay some messages.
Vector Clock.

36
Event Counter
Event counter at Pi Initialized at 0
and incremented for each event
37
Vector Time

Assumption n tasks (processes) Pi in DS
Each Pi has its own local clock being a
n-dimensional vector (initially zeroed)
Vi(a) is timestamp of event a at process Pi
Vii number of local events at Pi
Vij is Pi best guess of how many events have
been on Pj

38
Rules

There is a DS with n distributed processes.
n-dimensional vector Vi is vector-time of process
Pi if it is built according to the following
rules
(1) Initially, Vi (0, , 0) for all 1ltiltn
(2) For a local event on process Pi Vii
(3) Pi includes the value t Vi in every msg m
(4) When Pj receives a message m with timestamp
t, it sets Vjk maxtk, Vj k, for 1ltkltn
and k ! j
Communication cost?
Little overhead compared to Lamport clock

39
Example
M1
M2
M3
Time
40
Notation

We define global V(e) Vi(e) if event e happens
in Pi
We write V(a) ? V(b) if
V(a)k ? V(b)k for all k.
Here V(a)k denotes the kth component of V(a).
We write V(a) lt V(b) if
V(a)k ? V(b)k for all k, and
V(a)j lt V(b)j for at least one j

41
Vector Time Characteristics

The following inter relationships between
causality or the happened-before relation and
vector-time hold
A.) e?e iff V(e) lt V(e)
B.) ee iff V(e) V(e)
The vector-time is the best known estimation for
global sequencing that is based only on local
information.

42
Proof

a? b iff V(a) lt V(b)
Proof
For A fixed b, a ? b iff a is in shaded area iff
each component of V(a) ? corresponding component
of V(b).

43
Multicasting

A message is sent to all the members of a group
Sending video stream to a set of customers
Implementing a chat program
Sending updates to a group of replica managers

44
IPv4 Multicast Addresses

Class D (starts with bit sequence1110)
224.0.0.1 to 239.255.255.255 (about 228?268
million)
224.0.0.1 is for all systems on this subnet
224.2.0.0 224.2.127.253 are for multimedia
conference calls

45
Causal Ordering of Messages

Suppose m1 and m2 are two messages being
received at the same node i. A set of messages is
causally ordered if for all pairs ltm1, m2gt the
following holds
send(m1) ? send(m2) ? receive(m1)?receive(m2)

46
Causality Violation

Suppose M1s sending event happened before M3s
sending event.
Causality violation occurs if M1 is delivered
after M3
(In particular, non-FIFO delivery is causality
violation).

Delay the delivery of M3 to P2 until M1 arrives.
ISIS system using multicast

47
Formal Description of ISIS Clock ICi

Pi initializes its clock ICi 0,,0.
For each msg sending event by Pi
ICii
Pi attaches ICi to message it sends.
Upon receiving msg M from Pj with M.ts, Pi checks
if
1) M.tsj ICij 1 (M is next msg expected
from Pj)
2) ICik ? M.tsk for all other k (all msgs
from Pk that sender Pj has received have been
received by Pi)
If both are satisfied, Pi delivers M after
ICij
Otherwise, Pi puts M in hold-back Q until they
are satisfied.

48
Example
P1
P2
P3
Migrate foo to P2
Where is foo?
100
001
000
101
M1
M2.ts1 gt IC311 Put M2 in Hold-back Q
201
M2
foo is at P2
M1.ts1 IC311 IC3 ? 101 deliver M1 IC3 ?
201 deliver M2
Time

Note jth component of M.ts is
sequence number of latest msg sent
by Pj that is known to sender of M

49
Safety

Show that msgs are delivered in timestamp order.
Suppose not
Let m (m) be event of sending message M (M)
Assume Pi delivered msg M (from Pk) before M
(from Pj), even though
M.ts ( ICj(m)) lt M.ts (ICk(m))
.(A) (1)
(a) Just before Pi delivered M
ICij1 ICj(m)j hence ICij
lt ICj(m)j (2)
(b) Delivery of M would have resulted in
ICij ICk(m)j
at time of delivery
(a) and (b) contradict (A) since (b) took place
before (a), hence ICij ? ICij

50
Liveness

Show the system starvation-free no message will
wait forever in the hold-back Q
Assume Q is the hold-back queue in Pi and is
non-empty. Let M be a msg in Q which is not
preceded by any other msg in Q. Suppose M was
sent by Pj.

51
Proof

Assume ICik lt M.tsk for some k (!j), i.e.,
condition (2) is violated want to derive
contradiction from this.
Let M be latest msg from Pk that Pj delivered
prior to sending M so that M.ts lt M.ts and
M.tsk M.tsk.
If Pi hasnt delivered copy of msg M from Pk ,
then M with M.ts lt M.ts is in holdback Q of Pi,
contradicting assumption that M is not causally
preceded by any other msg in holdback Q of Pi.
So Pi must have delivered copy of msg M from
Pk.Thus ICik ? M.tsk M.tsk,
contradicting ICik lt M.tsk
Must give up assumption that Pi cannot deliver M.

52
Proof Illustration
53
Global state of a DS

Consists of
Local state of each node (task, process)
Messages in transit
Why interested in a global state?
Suppose local computation has stopped on each
node and there are no pending messages, then
1. Distributed application has terminated
successfully? or
2. Deadlock?
Problem lack of global time
Consequences?
How take a snapshot

54
Snapshots (taken at 200pm by local clocks)
A
B
B
B
A
A
100
0
0
100
159pm
200pm
100 In channel
0
100
100
100
201
200pm
sum 100
sum 0
sum 200
(a)
(b)
(c)
Snapshots taken at
55
Census Taking in Ancient Kingdom

Want to take census counting all people, some of
whom may be traveling on highways.

56
Census Taking Algorithm

Close all gates into/out of each village
(process) and count people (record process state)
in village
Open each outgoing gate and send official with a
red cap (special marker message).
Open each incoming gate and count all travelers
(record channel state messages sent but not
received yet) who arrive ahead of official (with
a red cap).
Tally the counts from all villages.

57
Chandy/Lamport Snapshot Algorithm

All processes are initially white Messages sent
by white(red) processes are also white (red)
MSend Marker sending rule for process P
Suspend all other activities until done
Record Ps state
Turn red
Send one marker over each output channel of P.
MReceive Marker receiving rule for P
On receiving marker over channel C,
if P is white Record state of channel C as
empty
Invoke MSend
else record the state of C as sequence of white
messages received since P turned red.
Stop when marker is received on each incoming
channel
MSend and MReceive are atomic.

58
Assumptions

No process failures, no message loss
Point to point message delivery is ordered
How to guarantee it? ISIS clock
Network is strongly connected.
Why?

59
Snapshot (1)
A
B
B
A
msgs arriving before maker constitute channel
state
100
100 in channel
0
0
0
marker
100 in channel
sum 100
sum 100
(a)
(b)
OK
OK
Need not use time.
60
Snapshots (2)
B
A
B
A
100
100
0
marker
marker
100
100 in channel
marker
100
sum 200
sum 100
(c)
(d)
Cannot happen
Will be like this
61
Cuts

Cut C divides all events to PC (those which
happened in the past relative to C) and FC
(future events).
Cut C is consistent if there is no message whose
sending event is in FC and whose receiving event
is in PC.

62
Progress shown by cuts
Time
p4
p1
p2
p3
P
Q
q1
q2
q3
1
2
3
4
5
7
8
There are 54 20 possible cuts.
63
Example
Time
p3
p4
p1
p2
P
q2
q3
q4
q5
Q
M
q1
R
r1
r4
r2
r3
inconsistent cut
consistent cut
State recorded by SNAPSHOT ?consistent cut
64
Checkpoint

Cut C is consistent ? C doesnt contradict
sequence of events experienced by any site ? can
assume it did exist at the same time
Can use snapshot as checkpoint, from which
activity in distributed system can be resumed
after crash

Write a Comment

User Comments (0)