Title: Lecture 12 Synchronization
1Lecture 12Synchronization
2Summary so far
- A distributed system is
- a collection of independent computers that
appears to its users as a single coherent system - Components need to
- Communicate
- Cooperate gt support needed
- Naming enables some resource sharing
- Synchronization
3Synchronization to support coordination
- Examples
- Distributed make
- Printer sharing
- Monitoring of a real world system
- Agreement on message ordering
- Why is this more complex than in a single-box
system - No global views, multiple clocks, failures
4Roadmap
- Physical clocks
- Provide actual / real time
- Logical clocks
- Where only ordering of events matters
- Leader election
- How do I choose a coordinator?
5Physical clocks (I)
- Problem How to achieve agreement on time in a
distributed system? - A possible solution use Universal Coordinated
Time (UTC) - Based on the number of transitions per second of
the cesium 133 atom (pretty accurate). - At present, the real time is taken as the average
of some 50 cesium-clocks around the world. - Introduces a leap second from time to time to
compensate that days are getting longer. - UTC is broadcast through short wave radio and
satellite. - Accuracy 1ms (but if weather conditions
considered 10ms)
6Physical clocks - underlying model
- Problem Suppose we have a distributed system
with a UTC-receiver somewhere in it - ? we still have to distribute its time to each
machine. - Each machine has a timer
- Timer causes an interrupt H times a second
- Interrupt handler adds 1 to a software clock
- Software clock keeps track of the number of ticks
since agreed-upon time in the past. - Notation
- Value of clock on machine p at time t is Cp(t)
7Physical clocks main problem clock drift
- Ideally Cp(t) t and dCp(t) dt
- Clock value (C) guaranteed to
- 1 - ? (dC/dt) 1 ?
- ? -- maximum drift rate
- Goal Never let two clocks
- in any system differ by more
- than x time units
- ? synchronize at least
- every x/(2?) seconds.
8Building a complete system
- Option I Every machine asks a time server for
the accurate time at least once every x/(2?)
seconds (Network Time Protocol). - Okay, but you need an accurate measure of round
trip delay, including interrupt handling and
processing incoming messages. - Option II Let the time server scan all machines
periodically, calculate an average, and inform
each machine how it should adjust its time
relative to its present time. - Note you dont even need to propagate UTC time.
- Fundamental Youll have to take into account
that setting the time back is never allowed ?
smooth adjustments.
9Real world Network Time Protocol (NTP)
- Stratum 0 NTP servers receive time from
external sources (cesium clocks, GPS, radio
broadcasts) - Stratum N1 servers synchronize with stratum N
servers and between themselves - Self-configuring network
- User machine contacts local NTP server
- Survey (N. Minar99)
- gt 175K NTP servers
- 90 of the NTP servers have lt100ms offset fro
synchronization peer - 99 are synchronized within 1s
10Uses of (synchronized) physical clocks
- NTP
- Using physical clocks to implement at-most-once
semantics - Global Positioning Systems
11Efficiently providing at-most-once message
delivery
- Issues
- 1 How long to maintain transaction data?
- 2 How to deal with server failures? (Minimize
state that is persistently stored)
12Efficiently providing at-most-once message
delivery
- Issues
- 1 How long to maintain transaction data?
- 2 How to deal with server failures? (Minimize
state that is persistently stored) - Solution
- Client
- Sends transaction id and physical timestamp
- Client (or network) may resend messages
- Server discards messages with duplicate id.
- Maintains G Tcurrent - MaxLifeTime -
MaxClockSkew - Discards messages with timestamps older than G
- Ignores (or delays) message that arrive in the
future
13Efficiently providing at-most-once message
delivery
- Issue 1 How long to maintain transaction data?
- Issue 2 What to persistently store across server
failures? - Solution
- Client sends transaction id and physical
timestamp - Server discards messages with duplicate id.
- Keep G Tcurrent - MaxLifeTime - MaxClockSkew
- Discards messages with timestamps older than G
- Ignores (or delays) message that arrive in the
future - Crash management
- Current Time (CT) is written to disk every ?T
- Gfailure is recomputed after a crash from saved
CT - After recovery discard messages with timestamp
older than Gfailure ?T
14Uses of (synchronized) physical clocks
- NTP
- Using physical clocks to implement at-most-once
semantics - Global Positioning Systems
15GPS Global Positioning Systems (1)
- Basic idea Estimate signal propagation time
between satellite and receiver to estimate
distance - Principle
- Problem Assuming that the clocks of the
satellites are accurate and synchronized - The receivers clock is definitely out of synch
with the satellite
16GPS Global Positioning Systems (2)
- Xr, Yr, Zr, are unknown coordinates of the
receiver. - Ti is the timestamp on a message from satellite i
- ?I (Tnow Ti) is the measured delay of the
message sent by satellite i. - Distance to satellite i can be estimated in two
ways - Propagation time di c x ?I
- Real distance
- 3 satellites? 3 equations in 3 unknowns
- So far I assumed receiver clock is synchronized.
- What if it needs to be adjusted?
- ?I (Tnow Ti) ?r
- collect one more measurement rim one more
satellite
17Summary so far
- Synchronization solutions
- Physical time synchronization
- Often costly, imperfect
- But with real applications