CS556: Distributed Systems presentation

About This Presentation

Transcript and Presenter's Notes

Title: CS556: Distributed Systems

1
CS-556 Distributed Systems
Synchronization (I)

Manolis Marazakis
maraz_at_csd.uoc.gr

2
The issue of Time in distributed systems

A quantity that we often have to measure
accurately
necessary to synchronize a nodes clock with an
authoritative external source of time
Eg timestamps for electronic transactions
both at merchants banks computers
auditing
An important theoretical construct in
understanding how distributed executions unfold
Algorithms for several problems depend upon clock
synchronization
timestamp-based serialization of transactions for
consistent updates of distributed data
Kerberos authentication protocol
elimination of duplicate updates

3
Clock Synchronization

When each machine has its own clock, an event
that occurred after another event may
nevertheless be assigned an earlier time.

4
Fundamental limits
The notion of physical time is problematic in
distributed systems - limitations in our
ability to timestamp events at different nodes
sufficiently accurately to know the order in
which any pair of events occurred, or whether
they occurred simultaneously.
5
History of Process pi

e i e
total ordering of events at process
Assuming that process executes on a single
processor
history(pi) hi ltei0, ei1, ei2, ... gt
series of events that take place within pi
Hi(t) hardware clock value (by oscillator)
Ci(t) software clock value (generated by OS)
Ci(t) a Hi(t) b
Eg nsecs elapsed at time t since a reference
time
clock resolution period bet. updates of Ci(t)
limit on determining order of events

6
Clock skew drift

Skew instantaneous difference bet. readings
Drift different rates of counting time
physical variations of underlying oscillators
variance with temperature
Even extremely small differences accumulate over
a large number of oscillations
leading to observable difference in the counters
drift rate difference in reading bet. a clock
and a nominal perfect clock per unit of time
measured by the reference clock
10-6 seconds/sec for quartz crystals
10-7 - 10-8 seconds/sec for high precision quartz
crystals

7
UTC Coordinated Universal Time

Atomic oscillators
drift rate 10-13 seconds/second
International Atomic Time (since 1967)
1 standard sec 9,192,631,770 periods of
transition for Cs133
Astronomical Time years, seconds, ...
UTC 1 leap sec is occasionally inserted, or more
rarely deleted, to keep in step with Astronomical
Time
time signals broadcasted from land-based radio
stations (WWV) and satelites (GPS)
accuracy 0.1-10 millisec (land-based), 1
microsec (GPS)

8
Synchronization of physical clocks

D synchronization bound
S source of UTC time, t I
External synchronization
S(t) - Ci(t) lt D
Clocks are accurate within the bound D
Internal synchronization
Ci(t) - Cj(t) lt D
Clocks agree within the bound D
external sync internal sync

9
Correctness of clocks

Hardware correctness
(1 - p)(t - t) H(t) - H(t) (1 p)(t -
t)
There can be no jumps in the value of H/W clocks
Monotonicity
t gt t C(t) gt C(t)
A clock only ever advances
Even if a clock is running fast, we only need to
change at which updates are made to the time
given to apps
can be achieved in software Ci(t) a Hi(t) b
Hybrid
monotonicity drift rate bounded bet. sync.
points (where clock value can jump ahead)

10
Synchronous systems

P1 sends its local clock value t to P2
P2 can set its clock value to (t Ttransmit)
Ttransmit can be variable or unknown
resource competition bet. processes
network congestion
u (max - min)
uncertainty in Ttransmit
obtained if P2 sets its clock to (t min) or (t
max)
If P2 sets its clock value to t (maxmin)/2,
then skew lt u/2
Optimal bound for N processes u (1 - )

In asynchronous systems Ttransmit min x,
where x 0 Only the distribution of x may be
measurable, for a given installation
11
Clock Synchronization Algorithms

The relation between clock time and UTC when
clocks tick at different rates.

12
Time servers Christians algorithm
Receiver of UTC signals
Tround total round-trip time t time value
in message mt estimate (t Tround /2)
13
Cristian's Algorithm

Getting the current time from a time server.

14
Limitations of Cristians algorithm

Variability in estimate of Tround
can be reduced by repeated requests to S taking
the minimum value of Tround
Single point of failure
group of synchronized time servers
multicast request use only 1st reply obtained
Faulty clocks
f faulty clocks, N servers
N gt 3f, for the correct clocks to achieve
agreement
Malicious interference
Protection by authentication techniques

15
The Berkeley algorithm (I)

Gusella Zatti (1989)
Co-ordinator (master) periodically polls slaves
estimates each slaves local clock (based on RTT)
averages the values obtained (incl. its own clock
value)
ignores any occasional readings with RTT higher
than max
Slaves are notified of the adjustment required
This amount can be positive or negative
Sending the updated current time would introduce
further uncertainty, due to message transmit
delay
Elimination of faulty clocks
averaging over clocks that do not differ from one
another more than a specified amount
Election of new master, in case of failure
no guarantee for election to complete in bounded
time

16
The Berkeley Algorithm (II)

The time daemon asks all the other machines for
their clock values
The machines answer
The time daemon tells everyone how to adjust
their clock

17
Averaging algorithms

Divide time into fixed-length re-synchronization
intervals T0 iR, T0 (i1)R
At the beginning of an interval, each machine
broadcasts the current time according to its
clock
and starts a local timer to collect all
incoming broadcasts during a time interval S
When the broadcasts have been received, a new
time value is computed
Average
Average after discarding the m lowest and the m
highest values
tolerate up to m faulty machines
May also correct each value based on estimate of
propagation time from the source machine

18
NTP An Internet-scale time protocol

Statistical filtering of timing data
discrimination based on quality of data from
different servers
Re-configurable inter-server connections
logical hierarchy
Scalable for both clients servers
Clients can re-sync. frequently to offset drift
Authentication of trusted servers
and also validation of return addresses

Sync. Accuracy 10s of milliseconds over
Internet paths 1 millisecond on LANs
19
NTP Synchronization Subnets
Primary servers
stratum
High stratum ? server more liable to be less
accurate
Node ? root RTT as a quality criterion

3 modes of synchronization
multicast acceptable for high-speed LAN
procedure-call similar to Cristians algorithm
symmetric between a pair of servers
All modes rely on UDP messages.

20
Message pairs bet. NTP peers (I)

Each message contains the local times when the
previous
message was sent received, and the local time
when the
current message was sent.
There can be a non-negligible delay bet. the
arrival of one
message the dispatch of the next.
Messages may be lost

Offset oi estimate of the actual offset bet.
two clocks, as computed from a pair of
messages Delay di total transmission time for
the message pair
21
Message pairs bet. NTP peers (II)
T i-2 T i - 3 t o, where o is the true
offset
T i T i - 1 t - o
di t t T i-2 - T i - 3 Ti - T i - 1
o oi (t - t)/2
oi (T i-2 - T i - 3 - Ti T i - 1 ) / 2
oi - di / 2 o oi di /2
Delay di is a measure of the accuracy of the
estimate of offset
22
NTP data filtering peer selection

Retain 8 most recent ltoi, di gt pairs
compute filter dispersion metric
higher values ? less reliable data
The estimate of offset with min. delay is chosen
Examine values from several peers
look for relatively unreliable values
May switch the peer used primarily for sync.
Peers with low stratum are more favored
closer to primary time sources
Also favored are peers with lowest sync.
dispersion
sum of filter dispersions bet. peer root of
sync. subnet
May modify local clock update frequency wrt
observed drift rate

23
Lamports notion of logical time

For many purposes, it is sufficient that all
machines agree on the same time
Emphasis on internal consistency
If two processes do not interact, lack of
synchronization will not be observable
and thus will not cause problems
Ordering of events is needed to avoid ambiguities

24
Lamport Timestamps

3 processes, each with its own clock. The clocks
run at different rates.
Lamport's algorithm corrects the clocks.

25
Space-Time diagram representation of a
distributed computation
26
The happened-before relation

We cannot synchronize clocks perfectly across a
distributed system
cannot use physical time to find out event order
Lamport, 1978 happened-before partial order
(potential) causal ordering
e i e, for process Pi e e
send(m) receive(m), for any message m
e e and e e e e
concurrent events a // b
occur at different processes chain of
messages intervening between them

27
Totally-Ordered Multicasting

Updating a replicated database leaving it in an
inconsistent state.

Solution via multicast
Each msg is multicast, with timestamp current
(logical) time
Recipient ACKs each message (via multicast)
Each process puts received messages in its local
queue, sorted
according to the timestamp
A process only delivers a msg when it is at the
head and
it has been ACKed by all processes

28
Lamports Logical Clocks (I)

Per-process monotonically increasing counters
Li Li 1, before each event is recorded at Pi
Clock value, t, is piggy-backed with messages
Upon receiving ltm ,tgt, Pj updates its clock
Lj max Lj, t, Lj Lj 1
Total order by taking into account process ID
(Ti, i) lt (Tj, j) iff (Ti lt Tj or (Ti Tj and i
lt j) )

29
Lamports Logical Clocks (II)
p
1
a
b
m
1
Physical
p
2
time
c
d
m
2
p
3
e
f
L(b) gt L(e), but b // e
30
FIFO delivery causal delivery
31
Hidden channels
The relation captures the flow of data
intervening bet. events Data can flow in ways
other than message passing !
a pipe rapture, detected by sensor 1 b
pressure drop, detected by sensor 2
The pipe acts as comm. channel
Controller (P3) increases heat (to increase
pressure), then receives notification of rapture.
32
Vector Clocks

Mattern, 1989 Fidge, 1991
clock vector of N numbers (one per process)
Vi i Vi i 1, before Pi timestamps an
event
Clock vector is piggybacked with messages
When Pi receives ltm ,tgt
Vi j max tj, Vi j , for j1, , N
Vi j, j i events that have occurred at Pj
and has a (potential) effect on Pi
Vi i events that Pi has timestamped

e e V(e) lt V(e)

Write a Comment

User Comments (0)

About PowerShow.com

CS556: Distributed Systems PowerPoint PPT Presentation