Title: Time and Global States
1Time and Global States
- ECEN5053 Software Engineering of Distributed
Systems - University of Colorado, Boulder
2Topics
- Clock synchronization
- Logical clocks
- Global State
3How processes can synchronize
- Multiple processes must be able to cooperate in
granting each other temporary exclusive access to
a resource - Also, multiple processes may need to agree on the
ordering of events, such as whether message m1
from process P was sent before or after message
m2 from process Q.
4Centralized system
- Time is unambiguous
- If a process wants to know the time, it makes a
system call and finds out - If process A asks for the time and gets it and
then process B asks for the time and gets it, the
time that B was told will be later than the time
that A was told. - Simple, no?
5Physical Clocks
- Physical computer clocks are not clocks they are
timers - Quartz crystal that oscillates at a well-defined
frequency that depends on physical properties - Two registers counter and a holding register
- Each oscillation decrements the counter by one
- When counter reaches zero, generates an interrupt
and the counter is reloaded from the holding
register - Each interrupt is called a clock tick
- Interrupt service procedure adds 1 to time stored
in memory so the software clock is kept up to date
6The one and the many
- What if the clock is off by a little?
- All processes on single machine use the same
clock so they will still be internally consistent - What matters is relative time
- Impossible to guarantee that crystals in
different computers run at exactly the same
frequency - Gradually software clocks get out of synch --
skew - A program that expects time to be independent of
the machine on which it is run ... fails
7Hey buddy, can you spare me a second?
- To provide UTC (translates as Universal
Coordinated Time) to those who need precise time,
NIST operates a short wave radio station WWV from
Fort Collins, CO - WWV broadcasts a short pulse at the start of each
second - There are stations in other countries plus
satellites - Using either short wave or satellite services
requires an accurate knowledge of the relative
position of the sender and receiver. Why?
8To WWV or not to WWV
- If one computer has a WWV receiver, the goal is
keeping all the others synchronized to it. - If no machines have WWV receivers, each machine
keeps track of its own time - Goal -- keep all machines together as well as
possible - There are many algorithms
9Underlying model for synchronization models
- Each machine has a timer that interrupts H times
a second - Interrupt handler adds 1 to a software clock that
keeps track of the number of ticks since some
agreed-upon time in the past - Call the value of the clock C
- Notationally, when UTC time is t, the value of
the clock on machine p is Cp(t) - In a perfect world, Cp (t) t for all p and all t
10Back to reality
- Theoretically, a timer with H60 should generate
216,000 ticks per hour - Relative error is about 10-5 meaning a
particular machine gets a value in the range
215,998 to 216,002 - There is a constant called the maximum drift rate
and a timer will work with perfect maximum
drift rate. - If two clocks are drifting in the opposite
direction at a time delta-t after they were
synchronized - may be as much as twice the max drift rate apart
- To differ by no more than delta, clocks must be
resynchronized every (delta/2max-drift-rate)
seconds
11Cristians algorithm
- Well suited to one machine with a WWV receiver
and a goal to have all other machines stay
synchronized with it. - Call the one with the WWV receiver the time
server - Periodically, each machine sends a message to the
time server asking for the current time - Machine responds with CUTC as fast as it can
- 1st approximation, requester sets its clock to
CUTC - Whats wrong with that?
12Big Trouble
- Major problem
- Time really should never run backward -- why?
- If senders clock was fast, CUTC will be smaller
than the senders current value of C - Change must be introduced gradually
- If timer generates 100 interrupts/second, each
interrupt adds 10 ms to the time - To slow down, ISR adds only 9 ms until correct
- To speed up, add 11 ms at each interrupt
13Little Trouble
- Minor problem
- Takes a nonzero amount of time for the time
servers reply to get back to the sender - Delay may be large and vary with network load
- Cristian attempts to measure send and receive
times, subtract, divide by 2 add this to
received CUTC - Better length of time servers ISR, I, and
incoming message processing time (T1 - T0 -
I)/2 - To improve accuracy, measure several and average
14If no WWV Receiver
- Berkeley UNIX algorithm
- The time server (actually time daemon) is active,
not passive - It polls every machine and asks what time it is
- Based on answers, it computes an average time and
tells all machines to adjust their clocks to the
new time - The time daemons time is set manually by the
operator periodically - Centralized algorithm though the time daemon does
not have a WWV receiver
15Decentralized synchronization
- Cristian and Berkeley UNIX are centralized
algorithms with the usual downside. What? - There are several decentralized algorithms, for
example - Divide time into fixed length resynchronization
intervals - At the beginning of each interval, every machine
broadcasts its current time - Each starts a local timer to collect all
broadcasts arriving during a certain interval - Algorithm to compute a new time based on some/all
16Internet Synchronization
- New hardware and software technology in the past
few years make it possible to keep millions of
clocks synchronized to within a few ms of UTC - New algorithms using these synchronized clocks
are beginning to appear - Synchronized clocks can be used
- to achieve cache consistency
- to use time-out tickets in distributed system
authentication - to handle commitment in atomic transactions
17Logical Clocks
- See also notes from 3 weeks ago
- For many purposes, it is sufficient that machines
agree on the same time even if it is not the
right time - Internal consistency of the clocks matters
- Clock synchronization is possible but does not
have to be absolute - If 2 processes do not interact, their clocks need
not be synchronized the lack of synch would not
be seen - What is important is that all processes agree on
the order in which events occur
18Lamport timestamps
- a happens-before b means that all processes agree
that first event a occurs, then afterward, event
b occurs - We write a happens-before b as a --gt b
- If a occurs before b in the same process, we say
a --gt b is true - If the event a sends a message and event b
receives that message in another process, a --gt b
is also true because a message cannot be
received until after it is sent. - happens-before is transitive
19Ya caint say
- If x and y happen in different processes that do
not exchange messages, then - we cannot say x --gt y
- we cannot say y --gt x
- nothing can be said about when the events
happened or which event happened first - we call these events concurrent
20Invent time
- Need a way of measuring time so that for every
event a we can assign a time C(a) on which all
processes agree. - Such that, if a --gt b, then C(a) lt C(b)
- If a and b are two events in the same process and
a happens before b, then C(a) lt C(b) - If a is the sending of a msg by one process and b
is the receiving of that msg by another, then
C(a) and C(b) must be assigned so that everyone
agrees on the values of C(a) and C(b) with C(a) lt
C(b) - Corrections to C can only be made by addition,
never subtraction so that the clock time always
goes forward
21If msg leaves at time N, it arrives at gt N1
- Each message carries the time according to its
senders clock - When it arrives, if the receivers clock shows a
value prior to the time the message was sent, the
receiver fast forwards its clock to be 1 more
than the sending time - Between every two events the clock must tick at
least once - If a process sends or receives 2 messages in
quick succession, it must advance its clock by
(at least) 1 tick in between - No 2 events ever occur at exactly the same time
22Totally-ordered Multicast
- Consider a bank with replicated data in San
Francisco and New York City. - Customer in SF wants to add 100 to the account
of 1000 - Meanwhile, a bank employee in NY initiates an
update by which the customers account will be
increased with 1 interest. - Due to communication delays, the instructions
could arrive at the replicated sites in different
orders with differing final answers - Should have been performed at both sites in same
order
23Using Lamport timestamps to get totally ordered
multicast
- Consider group of processes multicasting messages
to each other - Each message is timestamped with the current
(logical) time of its sender - Conceptually, if multicast, the msg is also sent
to its sender - We assume msgs from the same sender are received
in the order they were sent and that no messages
were lost
24totally ordered multicast (cont.)
- When a process receives a message, it goes into a
local queue ordered according to its timestamp - The receiver multicasts an acknowledgement
- Using Lamports algorithm for adjusting local
clocks, the timestamp of the received msg is
lower than the timestamp of the acknowledgement - All processes will eventually have the same copy
of the local queue because each msg is multicast,
plus acks - We assumed msgs are delivered in the order sent
by sender
25 totally ordered multicast (cont. more)
- Each process inserts a received msg in its local
queue according to the timestamp in that msg. - Lamports clocks ensure no two messages have the
same timestamp - Also, the timestamps reflect a consistent global
ordering of events - A process delivers a queued msg to the
application it is running when that message is at
the head of the queue and has been acknowledged
by each other process - The msg removed from queue associated acks
removed.
26Vector Timestamps
- With Lamport timestamps, nothing can be said
about the relationship between a and b simply by
comparing their timestamps C(a) and C(b). - Just because C(a) lt C(b), doesnt mean a happened
before b (remember concurrent events) - Consider network news where processes post
articles and react to posted articles - Postings are multicast to all members
- Want reactions delivered after associated postings
27Will totally-ordered multicasting work?
- That scheme does not mean that if msg B is
delivered after msg A, B is a reaction to msg A.
They may be completely independent. - Whats missing?
- If causal relationships are maintained within a
group of processes, then receipt of a reaction to
an article should always follow the receipt of
the article. - If two items are independent, their order of
delivery should not matter at all
28Vector Timestamps capture causality
- VT(a) lt VT(b) means event a causally precedes
event b. - Let each process Pi maintain vector Vi such that
- Vii is the number of events that have occurred
so far at Pi - If Vij k then Pi knows that k events have
occurred at Pj - We increment Vii at the occurrence of each new
event that happens at process Pi - Piggyback vectors with msgs that are sent. When
Pi sends msg m, it sends its current vector along
as a timestamp vt.
29- Receiver thus knows the number of events that
have occurred at Pi - Receiver is also told how many events at other
processes have taken place before Pi sent message
m. - timestamp vt of m tells the receiver how many
events in other processes have preceded m and on
which m may causally depend - When Pj receives m, it adjusts its own vector by
setting each entry Vjk to maxVjk, vtk - The vector now reflects the of msgs that Pj
must receive to have at least seen the same msgs
that preceded the sending of m. - Vji is incremented by 1 representing the event
of receiving msg m as the next message from Pi
30When are messages delivered?
- Vector timestamps are used to deliver msgs when
no causality constraints are violated. - When process Pi posts an article, it multicasts
that article as a msg a with timestamp vt(a) set
equal to Vi. - When another process Pj receives a, it will have
adjusted its own vector such that Vji gt
vt(a)i - Now suppose Pj posts a reaction by multicasting
msg r with timestamp vt(r) equal to Vj. vt(r)i
gt vt(a)i. - Both msg a and msg r will arrive at Pk in some
order
31- When receiving r, Pk inspects timestamp vt(r) and
decides to postpone delivery until all msgs that
causally precede r have been received as well. - In particular, r is delivered only if the
following conditions are met - vt(r)j Vkj 1
- vt(r)i lt Vk i for all i not equal to j
- says r is the next msg Pk was expecting from Pj
- says Pk has seen no msg not seen by Pj when it
sent r. In particular, Pk has already seen
message a.
32Controversy
- There has been some debate about
- whether support for totally-ordered and
causally-ordered multicasting should be provided
as part of the message-communication layer or - whether applications should handle ordering
- Comm layer doesnt know what it contains, only
potential causality - 2 msgs from same sender will always be marked as
causally related even if they are not - Application developer may not want to think about
it
33Global State
34Global state of a distributed system
- Local state of each process
- The messages that are currently in transit (sent
but not received)
35Who cares, globally speaking?
- When it is known that local computations have
stopped and that there are no more messages in
transit, the system has obviously entered a state
in which no more progress can be made. - deadlocked?
- correctly terminated?
36How to record the global state
- Distributed snapshot
- reflects a state in which the distributed system
might have been - reflects a consistent global state
- If we have recorded that process P has received a
msg from another process Q, then we should also
have recorded that process Q had actually sent
the msg - The reverse condition (Q has sent a msg that P
has not yet received) is allowed.
37Cut!
- A cut represents the last event that has been
recorded for each of several processes. - All recorded msg receipts have a corresponding
recorded send event - An inconsistent cut would have a receipt of a msg
but no corresponding send event
38The algorithm (Chandy Lamport)
- Assume the distributed system can be represented
as a collection of processes connected to each
other through uni-directional point-to-point
communication channels. - Any process may initiate the algorithm.
- P records its own local state
- It sends a marker along each of its outgoing
channels, indicating that the receiver should
participate in recording the global state - ...
39Chandy Lamport algorithm, cont
- When process Q receives the marker through an
incoming channel C, its action depends on whether
or not it has already saved its local state - If it has not
- it first records its local state and also sends a
marker along its own outgoing channels - If it has
- the marker on channel C is an indicator that Q
should record the state of the channel, namely,
the sequence of messages received by Q since the
last time it recorded its own local state and
before it received the marker.
40Chandy Lamport algorithm, cont
- A process has finished its part of the algorithm
when it has received a marker along each of its
incoming channels and processed each one. - Its recorded local state as well as the state it
recorded for each incoming channel, can be
collected and sent to the process that initiated
the snapshot - The initiator can subsequently analyze the
current state - Meanwhile, the distributed system as a whole can
continue to run normally
41Photo album
- Because any process can initiate the algorithm,
the construction of several snapshots may be in
progress at the same time - A marker is tagged with the identifier and
possibly also a version number of the process
that initiated the snapshot - Only after a process has received that marker
through each of its incoming channels, can it
finish its part in the construction of the
markers associated snapshot
42Application of a snapshot
43Termination Detection
- If a process Q receives the marker requesting a
snapshot for the first time, - considers the process that sent that marker as
its predecessor - When Q completes its part of the snapshot, it
sends its predecessor a DONE msg. - By recursion, when the initiator of the
distributed snapshot has received a DONE msg from
all of its successors, it knows the snapshot has
been completely taken
44What if msgs still in transit?
- A snapshot may show a global state in which msgs
are still in transit - Suppose a process records that it had recd msgs
along one of its incoming channels - between the point where it had recorded its local
state - and the point where it received the marker
through that channel - Cannot conclude the distributed computation is
completed - Termination requires a snapshot in which all
channels are empty
45Modified algorithm
- When a process Q finishes its part of a snapshot,
it either returns DONE or CONTINUE to its
predecessor - A DONE msg is returned only when
- All of Qs successors have returned a DONE msg
- Q has not received any msg between the point it
recorded its own local state and the point it had
received the marker along each of its incoming
channels - In all other cases, Q sends a CONTINUE msg to its
predecessor
46Modified algorithm, continued
- The original initiator of the snapshot will
either receive at least one CONTINUE or only DONE
msgs from its successors - When only DONE messages are received, it is known
that no regular msgs are in transit - Conclusion? The computation has terminated.
- If a CONTINUE appears, P initiates another
snapshot and continues to do so until only DONE
msgs are returned. - (There are lots of other algorithms, too.)