Title: CS6223: Distributed Systems
1CS6223 Distributed Systems
- Distributed Time and Clock Synchronization
2Why Timestamps in Systems?
- Do some precise performance measurements
- Guarantee up-to-date or recentness of data
- Temporal ordering of events produced by
concurrent processes - Synchronization between senders and receivers of
messages - Coordination of joint activities
- Serialization of concurrent accesses to shared
objects -
3Physical time
- Solar time
- 1 sec 1 day / 86400
- Problem days are of different lengths (due to
tidal friction, etc.) - mean solar second averaged over many days
- International atomic time (TAI)
- 1 sec ? time for Cesium-133 atom to make
9,192,631,770 state transitions. - TAI time is simply the number of Cesium-133
transitions since midnight on Jan 1, 1958. - Accuracy better than 1 second in six million
years - Problem Atomic clocks do not keep in step with
solar time
4Coordinated Universal Time (UTC)
- Based on the atomic time (TAI)
- A leap second is occasionally inserted or deleted
to keep in step with solar time
5Computer Clocks
- CMOS clock circuit driven by a quartz oscillator
- battery backup to continue measuring time when
power is off - The circuit has a counter and a register. The
counter decrements by 1 for each oscillation an
interrupt is generated when it reaches 0 and the
number in the register is loaded to the counter.
Then, it repeats - OS catches interrupt signals to maintain a
computer clock - e.g., 60 or 100 interrupts per second
- Programmable Interrupt Controller (PIC)
- Interrupt service procedure increments a counter
by 1 for each interrupt
CPU
counter
register
6Clock drift and clock skew
- Clock Drift
- Clocks tick at different rates
- Ordinary quartz clocks drift by 1sec in 11-12
days. (10-6 secs/sec). - High precision quartz clocks drift rate is 10-7
or 10-8 secs/sec - Create ever-widening gap in perceived time
- Clock Skew (offset)
- Difference between two clocks at one point in time
7Perfect clock
8Drift with a slow computer clock
9Drift with a fast computer clock
10Dealing with drift
- No good to set a clock backward
- Illusion of time moving backwards can confuse
message ordering and software development
environments - Go for gradual clock correction
- If fast Make clock run slower until it
synchronizes - If slow Make clock run faster until it
synchronizes
11Linear compensating function
- OS can do this Change rate at which it requests
interrupts - e.g. if the system generates an interrupt every
17 ms but clock is too slow generates an
interrupt at (e.g.) 15 ms - Adjustment changes slope of system time Linear
compensating function
12Resynchronization
- After synchronization period is reached
- Resynchronize periodically
- Successive application of a second linear
compensating function can bring clock closer to
the true slope - Keep track of adjustments and apply continuously
- UNIX adjtime system call
- int adjtime(struct timeval delta, struct
timeval old-delta) - adjusts the system's notion of the current
time, advancing or retarding it, by the
amount of time specified in the struct timeval
pointed to by delta.
13Getting UTC
- Attach GPS receiver to each computer
- 1 ms of UTC
- Attach WWV (http//tf.nist.gov) radio receiver
- Obtain time broadcasts from Boulder or DC
- 3 ms of UTC (depending on distance)
- Attach GOES receiver (Geostationary Operational
Environmental Satellites, http//www.goes.noaa.gov
/) - 0.1 ms of UTC
- Not practical solution for every machine
- Cost, size, convenience, environment
14Getting UTC
- Synchronize from another machine
- One with a more accurate clock
- Machine that provides time information
- Time server
15Synchronizing Clocks by using RPC
- Simplest synchronization technique
- Make an RPC to obtain time
- Set time
- Does not count network or processing latency
16Cristians algorithm
- Compensate for network delays (assuming
symmetric) - client sends a request at T0
- server replies with the current clock value
Tserver - client receives response at T1
- client sets its clock to
17Cristians algorithm example
- Send request at 50815.100 (T0)
- Receive response at 50815.900 (T1)
- Response contains 50925.300 (Tserver)
- Round-trip time is T1 - T0
- 50815.900 - 50815.100 800 ms
- Best guess timestamp was generated 400 ms ago
- Set time to Tserver round-trip-time/2
- 50925.300 400 509.25.700
- Accuracy round-trip-time/2
18Cristians algorithm error bound
Tmin Minimum message travel time
( )
19Problems with Cristians algorithm
- Server might fail
- Subject to malicious interference
20Berkeley Algorithm
- Gusella Zatti, 1989
- Aim clocks of a group of machines as close as
possible - Assumes no machine has an accurate time source
(i.e., no differentiation of client and server) - Obtains average from participating computers
- Synchronizes all clocks to average
21Berkeley Algorithm
- One machine is elected (or designated) as the
master others are slaves - Master polls all slaves periodically, asking for
their time - Cristians algorithm can be used to obtain more
accurate clock values from other machines by
counting network latency - When results are in, compute the average
- Including masters time
- Send each slave the offset its clock need be
adjusted - Avoids problems with network delays if sending a
timestamp
22Berkeley Algorithm
- Algorithm has provisions for ignoring readings
from clocks whose skew is too great - Compute a fault-tolerant average
- Any slave can take over the master if master
fails
23Berkeley Algorithm example
24Berkeley Algorithm example
25Berkeley Algorithm example
26Network Time Protocol (NTP)
- NTP is the most commonly used Internet time
protocol (RFC 1305, http//tf.nist.gov/service/its
.htm ). - Computers often include NTP software in OS. The
client software periodically gets updates from
one or more servers (average them). - Time servers listen to NTP requests on port 123,
and reply a UDP/IP data packet in NTP format,
which is a 64-bit timestamp in UTC seconds since
Jan 1, 1900 with a resolution of 200 pico-s. - Many NTP client software for PC only gets time
from a single server (no averaging). The client
is called SNTP (Simple Network Time Protocol, RFC
2030), a simple version of NTP.
27NTP synchronization subnet
1st stratum machines connected directly to
accurate time source 2nd stratum machines
synchronized from 1st stratum machines
28NTP goals
- Enable clients across Internet to be accurately
synchronized to UTC despite message delays - Use statistical techniques to filter data and
gauge quality of results - Provide reliable service
- Survive lengthy losses of connectivity
- Redundant paths
- Redundant servers
- Enable clients to synchronize frequently
- offset effects of clock drift
- Provide protection against interference
- Authenticate source of data
29NTP Synchronization Modes
- Multicast (for quick LANs, low accuracy)
- server sends its actual time to its leaves in
the LAN - Remote Procedure Call (medium accuracy)
- server responds to requests with its actual
timestamp - like Cristians algorithm
- Symmetric mode (high accuracy)
- used to synchronize between the time servers
- All messages delivered unreliably with UDP
30Symmetric mode
- The delay between the arrival of a request (at
server B) and the dispatch of the reply is NOT
negligible - Delay total transmission time of the two
messages - di (Ti Ti-3 ) (Ti-1 Ti-2)
- Offset of clock A relative to clock B
- Offset of clock A
- Set clock A Ti oi
- Accuracy bound di /2
31Symmetric mode (another expression)
Ti-2
- Delay total transmission time of the two
messages - di (Ti Ti-3 ) (Ti-1 Ti-2)
- Clock A should set its time to
- Ti-1 di/2, which is the same as Ti oi
32Symmetric NTP example
Offset oi((800 1100) (850 1200))/2
325 Set clock A to Ti oi 1200 325 875
33Improving accuracy
- Data filtering from a single source
- Retain the multiple most recent pairs lt oi, di gt
- Filter dispersion oj corresponding to the
smallest dj - Peer-selection synchronize with lower stratum
servers - lower stratum numbers, lower synchronization
dispersion
34Logical Clocks
35Motivation of logical clocks
- Cannot synchronize physical clocks perfectly in
distributed systems. Lamport 1978 - Main function of computer clocks order events
- If two processes dont interact, there is no need
to sync clocks. - This observation leads to causality
36Causality
- Order events with happened-before (?) relation
- a ? b
- a could have affected the outcome of b
- a, b take place on different processes that dont
exchange data - Their relative ordering does not matter
- They are concurrent a b
37Formal definition of happened-before
- If a and b take place in the same process
- a comes before b, then a ? b
- If a and b take place in the different processes
- a is a send and b is the corresponding
receive, then a ? b - Transitive if a ? b and b ? c, then a ? c
- Partial ordering unordered events are
concurrent
38Logical clocks
- A logical clock is a monotonically increasing
software counter. It need not relate to a
physical clock. - Corrections to a clock must be made by adding,
not subtracting - Assign time value to each event
- if a ? b then clock(a) lt clock(b)
39Event counting example
- Three processes P0, P1, P2, events a, b, c,
- Local event counter in each processes.
- Processes occasionally communicate with each
other, where inconsistency occurs,
Bad ordering e ? h, f ? k
40Lamports algorithm, 1978
- Each process Pi has a logical clock Li which is
used to apply logical timestamps to events. - Li is initialized to 0
- Update Li
- LC1 Li is incremented by 1 before each event at
process Pi - LC2 when process Pi sends message m, it
piggybacks t Li to m - LC3 when Pj receives (m,t) it sets Lj maxLj,
t , and then applies LC1 to increment Lj for
event receive(m)
41Problem Identical timestamps
Concurrent events (e.g., a, g) may have the same
timestamp
42Unique timestamps (total ordering)
- Append the process ID (or system ID) to the
clock value after the decimal point - e.g. if P1, P2 both have L1 L2 40, make L1
40.1, L2 40.2
43Problem Detecting causal relations
- If a ? b, then L(a) lt L(b), however
- If L(a) lt L(b), we cannot conclude that a ? b
- It is not very useful in distributed systems.
- Solution use a vector clock
L(g) lt L(c ), but g c
44A Vector of Timestamps
- Suppose there are a group of people and each one
needs to keep track of events happened to other
people. - Requirement Given two events, you can tell if
they are sequential or concurrent. - Solution you need to have a vector of
timestamps, one for each member.
(?,?,?)
(3,0,0)
45Vector clocks
- Vector clock Vi at process Pi is an array of N
integers - Initialization for 1 i N and 1 k N,
Vik 0 - Update Vi
- VC1 before Pi timestamps an event it sets Vii
Vii 1 - VC2 Pi piggybacks t Vi on every message it
sends out - VC3 when Pj receives (m,t), for 1 k N it
sets Vjk maxVjk, tk, then applies VC1
to increment Vjj for event receive(m,t) - Note Vij is a timestamp indicating that Pi
knows all events that happened in Pj upto this
time.
46Vector timestamps example
47Vector timestamps example
48Vector timestamps example
49Vector timestamps example
50Vector timestamps example
51Vector timestamps example
52Vector timestamps example
53Comparing vector timestamps
- Define
- V V iff Vi Vi) for i 1, , N
- V V iff Vi Vi) for i 1, , N
- V lt V iff V V and V ? V
- V(e) ? timestamp of an event e
- For any two events e and e,
- e ? e iff V(e) lt V(e), e ? e
- e e iff neither V(e) V(e) nor V(e) V(e)
54Vector timestamps example
55Summary on vector timestamps
- No need to synchronize physical clocks
- Able to order causal events
- Able to identify concurrent events (but cannot
order them)
56An Application of Vector Timestapms
causally-ordered multicast
- Multicast a sender sends a message to a group of
receivers. Every message in the system must be
received by all group members. - Causally ordered multicast if m1 ? m2, m1 must
be received before m2 by all receivers.
57Causally-Ordered Multicast
- Each group member keeps a timestamp vector of n
components (n group members), all initialized to
0. - When Pi multicasts a message, it increments i-th
component of its time vector Vi and attaches Vi
to the msg. - When Pj (with Vj) receives msg(m, Vi) from Pi, if
(Vj k ? Vik for all k, k? i), then - Vj i Vi i Vj j Vj j 1
- deliver msg m
- otherwise delay the delivery of m until the if
condition is met.
?
58Causal-Order Preserved
- If m1 ? m2, m1 is received by all recipients
before m2. - If m1 m2, m1 and m2 can be received in
arbitrary order by recipients. - Total ordering for case of m1 m2, m1 and m2
must be received in the same order by all
recipients (i.e., either all m1 before m2, or all
m2 before m1).
(3,2,0)
(2,2,0)
(1,0,0)
(0,0,0)
(1,3,0)
(1,2,0)
(3,3,0)
(1,1,0)
(0,0,0)
(3,2,3)
(1,2,2)
(1,0,1)
(0,0,0)