Title: Logical Time and Global States
1Logical Time and Global States
- Logical Clocks and Logical Time
- Lamports Logical Clock and Vector Clock
- Global States and Termination
- Chandy and Lamports snapshot Algorithm
2Logical Time and Logical Clock
- Logical time Vs. absolute time (from UTC)
- To obtain the absolute time Require frequent
clock synchronization. Frequent clock
synchronization incurs heavy overheads, i.e.,
short synchronization period gt many
synchronization messages - For many systems, it is sufficient that a group
of related machines in the system agree on the
same logical time, i.e., for events ordering - In many applications (i.e., non-real-time), it is
not essential that the logical time agrees with
the absolute time as announced on the broadcast
stations - Also, for two processes, if they are unrelated,
their occur times are unrelated - Event ordering
- If two events occurred at the same process pi,
they occurred in the order in which pi observes
them, i.e., e1 then e2 - If a message is sent between two processes, the
event of sending the message occurs before the
receipt of the message, i.e., send before receive - Partial ordering gt NOT total ordering, also
called casual ordering
3Logical Time and Logical Clock
- Lamport causal ordering x-P-gty means x happened
before y in process P - happened-before (HB) -gt (precedence
relationship) - HB1 If for process P x-P-gty, then x-gt y.
- HB2 For the same message m, send(m)-gtreceive(m)
- HB3 If x, y, and z are events, x-gty, y-gtz then
x-gtz (transitive) - I.e. a -gtc b-gt d
- How about the event order between a and e?
Nothing can be said - They can be done in any order or even in parallel
a e (i.e., partial order)
p1
b
a
m1
p2
c
d
m2
e
f
p3
4Logical Time and Logical Clock
- Logical clock are used to capture happened-before
ordering - If e is defined to be happened-before e, i.e., e
-gt e gt L(e) lt L(e) - I.e., L(e) the logical clock value at the occur
time of event e - A logical clock needs not bear any particularly
relationship with physical clock but it needs to
be monotonic (i.e., increasing) - Each process/computing unit just keeps its own
(local) clock Cp to timestamp events - The timestamp is monotonic and the initial value
may be zero (or any number/integer) - What is the maximum bound of a logical clock
integer? - Does it need to be reset at after reaching the
maximum value? - Notation Use Lp(a) and Lp(b) to timestamp events
a and b happen in process p and L(b) for event b
at whatever process it occurred - If a happens before b in the same process, Lp(a)
lt Lp (b) - If a and b represent the sending and receiving of
a message, respectively, L(a) lt L(b) - For all distinctive events a and b, L(a) ltgt L(b)
5Logical Time and Logical Clock
- Logical clock update and transmission rules
- Logical clock rule 1
- Lp is incremented before each event is issued at
p Lp Lp1 - Logical clock rule 2
- When p sends a message m, it piggybacks on m the
value t Lp - On receiving (m, t), a process q computes
Lqmax(Lq, t) and then applies rule 2 before
timestamping the event receive(m) - Why increment the value by 1 instead of a larger
number or even a negative number? - All clocks run at the same rate, every time
increases the value by one - Could they run at different rates? One clock
advances by 1 before each event but others
advance by 10 before each event - Could they run at variable rate? Sometimes faster
and sometimes slower - A process consists of a sequence of events gt a
sequence of states - After finishing an event, the process enters a
state - Each process state is associated with a logical
clock (a timestamp) - e -gt e gt L(e) lt L(e) but the converse is not
always true. L(e) lt L(e) ???
6Lamports Logical Clocks
Note there is no increment after receiving the
messages. Is it a problem? What are the rates of
the logical clocks?
Fr. Tanenbaum
- Three processes, each with its own local clock
- Note, the logical clocks run at different rates.
Between two events, the physical clock tick must
at least advance once
7Lamports Logical Clocks
- Lamports algorithm corrects the clocks
- Lamports algorithm can only achieve partial
ordering - Non-related events are unordered. What are
non-related events? - What are related events?
Fr. Tanenbaum
8Lamports Logical Clocks
Fr. Tanenbaum
The positioning of Lamports logical clocks in a
distributed system (middleware)
9Total Order and Logical Clock
- Partial order to total order (changing a set of
partial orders to a total order) - Total order e1-gte2-gt -gten (all pair of events
are ordered) - Assign a unique timestamp to each process
(following the Lamports algorithm in timestamp
assignment, e-gte gtTS(e) lt TS(e)) - For any two events (even unrelated), you can
determine their orderings based on the timestamps
assigned - Why? I.e., comparing their timestamps for data
synchronization - Totally ordered logical clocks (how? Adding
process id) - For pairs of distinct events, we take process id
in setting the timestamps - If a is an event occurring at pa with local
timestamp Ta, and b is an event occurring at pb
with local timestamp Tb - Define the global logical timestamp for those
events as (Ta, a) and (Tb, b), respectively - (Ta, a) lt (Tb, b) if and only if either Ta lt Tb
or (Ta Tb and pa lt pb) - Note this method is just to serialize the
ordering of a set of events. Event a may not
really execute before event b in real time - Making each process has a unique time-stamp
(total order) - TS(Pa) gt TS(Pb) or TS(Pa) lt TS(Pb) but NOT TS(Pa)
ltgt TS(Pb)
10Example Totally-Ordered Multicasting
- Multicast A message is sent to multiple
receivers - Totally-ordered multicast all multicast messages
are delivered in the same order to each receiver - For example, to improve query performance, a bank
may place copies of an account database in two
different cities, say A and B - A customer in B wants to add 100 to his account
that currently contains 1,000 (update 1) - At the same (similar) time, a bank employee in A
initiates an update by which the customers
account is to increase with 1 interest (update
2) - Both updates should be carried out at both copies
(in A and B) of the database (no locking or
synchronization) - If update 2 is performed before update 1 in A,
the A database records 1,110 - If update 1 is performed before update 2 in B,
the B database records 1,111 - An inconsistency occurs if the two updates are
not performed in the same order at the two sites - Solution Using the Lamports algorithm to assign
logical times to implement totally-ordered
multicast (for update messages), so that the
update operations are performed in the same order
at each copy. How?
11Example Totally-Ordered Multicasting
- Updating a replicated database and leaving it in
an inconsistent state.
To ensure the two updates are performed in the
same order at each site. How?
12Example Totally-Ordered Multicasting
- Each update generates two updates to update the
two copies of the record - We have four combinations of the execution orders
at the two sites
- Execution order I
- City A
- Update 2 gt 1010
- Update 1 gt 1110
- City B
- Update 1 gt 1100
- Update 2 gt 1111
- Execution order II
- City A
- Update 1 gt 1100
- Update 2 gt 1111
- City B
- Update 2 gt 1010
- Update 1 gt 1110
- Execution order III
- City A
- Update 2 gt 1010
- Update 1 gt 1110
- City B
- Update 2 gt 1010
- Update 1 gt 1110
- Execution order IV
- City A
- Update 1 gt 1100
- Update 2 gt 1111
- City B
- Update 1 gt 1100
- Update 2 gt 1111
13Totally-Ordered Multicasting Using Logical Time
- For a group of processes, multicasting messages
to each other, we assume - Each message is time-stamped with the current
logical time of its sender - The sender is also a receiver of its own sending
message - The messages from the same sender are received in
the order they were sent, and no messages are
lost - When a process receives a message, it is put into
a local queue, ordered according to its timestamp - The receiver multicasts an acknowledgement to the
other processes. The timestamp assigned to the
acknowledgement according to the Lamports
algorithm and is larger than the timestamp of the
original message
14Totally-Ordered Multicasting Using Logical Time
- A process can deliver a queued message to the
application it is running only when the message
is at the head of the queue and has been
acknowledged by each other process - Thus, all the processes will eventually have the
same copy of the local queue ordered by Lamports
timestamps - The Lamports algorithm ensures that no two
messages have the same timestamp, and the
timestamps reflect a consistent global order of
the events, e-gte gt TS(e) lt TS(e) - Therefore, all messages are delivered in the same
order everywhere. That is, we have established
totally-ordered multicasting - Problems The delay in update and higher
communication overhead - How to solve the problem of loss of messages?
15Totally-Ordered Multicasting Using Logical Time
- Site B
- Receive Update 1
- Generate M1 and sends to site A and itself
- Receive Ack 1 from A
- Receive M2 containing Update 2
- Generate Ack 2 and sends to site A
- Compare the timestamps of M2 with Update 1
- Process Update 1
-
- Site A
- Receive Update 2
- Receive M1 containing Update 1
- Generate Ack 1 and send to site B
- Generate M2 and sends to site B and itself
Local queue Update 1 Ack 1,A Update 2
Local queue Update 1 Ack 1,A Update 2
16Vector Clock
- Shortcoming of Lamports algorithm L(e) lt L(e)
cannot conclude e-gt e - Using a unique (total order) timestamp from the
Lamports algorithm cannot solve this problem.
Why? - In the previous example, we serialize a set of
events and the sequence order may not be the same
as their execution orders following the absolute
time - Causality can be captured by vector timestamp
(clock) - If L(e) lt L(e) then e-gt e
- How to achieve this?
- What are the differences in implications between
- (1) If L(e) lt L(e), then e-gt e
- (2) If e-gt e, then L(e) lt L(e)
- With (1), by checking the time-stamps, the system
can determine the event orders. Note for some
cases, the events are unordered - (2) is to assign time-stamps to the events based
on their event orders - Vector clock for a system with N processes is an
array of N integers for each process - Each process Pi keeps its own vector clock Vi to
timestamp its local events - Processes piggyback vector timestamps on the
messages they send
17Vector Clock
- Rules
- VC1 Initially Vi j 0 for i, j 1, 2, , N
- VC2 Just before pi timestamps an event, it sets
Vi i Vi i 1 - VC3 pi includes the value t Vi i in every
message it sends - VC4 When pi receives a timestamp t in a message,
it sets Vi j max(Vi j, tj), for j 1,
2, N (merge operation) - Two properties
- For a vector clock Vi , Vi i is the number of
events that pi has time-stamped (why?) - Vi j (j ltgt i) is the number of events that have
occurred at pj that pi has recorded (why?) - Based on a message m, a timestamp vt of m tells
the receiver how many events in other processes
have preceded m and on which m may causally
depends on
18Vector Clock
- V V iff Vj Vj for j 1 , 2, , N
- V lt V iff Vj lt Vj for j 1, 2, , N
- V lt V iff V lt V and V ltgt V
- e-gte gt V(e) lt V(e)
- V(e) lt V(e) gt e-gt e
- Compare events a with f
- Compare events c with e
- What are the cost and benefit comparing with the
Lamports logical clock?
19Global State
- Event S gt (e) gt S. An event changes the state
of a process - To get the states of a process, record and
time-stamp the state of the process after each
event has been served - How to get the state of a distributed system
(distributed processes)? - Collect the states of all the processes in the
system. How? - Not so simple due to communication delay and
changing process states - What are the purposes of getting the global
state of a distributed system? - Examples detect the termination of a distributed
computation, garbage collection, verification of
a program correctness and deadlock detection,
etc. - Garbage collection no process (including message
in transit) is referring to the object which may
be collected as a garbage - Deadlock two or more processes are
blocked/waiting. They are blocking each other - Termination when to terminate a process?
Inactive and will not become active again. It may
be waiting a message
20Detecting Global Properties
21Global State
- Global state of a distributed system The local
state of each process, and the messages that are
currently in transit (the state is distributed) - How to obtain the global state of a distributed
system? - The collection takes time and cannot be done
instantaneously. Why? - Distributed snapshot reflects a (consistent
global) state in which the DS might have been (at
a particular time point?) - What is a snapshot? (at a particular time point)
- But, the snapshots are distributed
- Cut A graphical representation of global state,
as shown in the next slide - What is the implication of a cut? A distributed
snapshot - Inconsistent state Incorrect state
- I.e., a snapshot contains a receipt of a message
but not the sending of the message - What is the implication of an inconsistent state?
Incorrect state may generate incorrect results
22Global State
- Problems of obtaining a global state
- Lack of global clock (what is the state to be
collected from each site to form the global
state/global snapshot?) If you have a global
clock, - Transmission delay Vs. always changing statues of
processes - We consider two types of events
- Internal events of a process, i.e., logic
operations and computations - Communication events sending and receiving of
messages - History(pI) hi lte1,I, e1,I, gt
- A prefix history hi,k lte1,I, e1,I, , ek,igt
- sk,i is the state of process pi immediately
before the kth event occurs from the initial
state s0,i -gt -gt sk,i - s0,i is the initial state of pi. After the
occurrence of each event, a new state is created
for a process, ek,i gt sk,i - Global history H h1 U h2 U hn, the union of
all process histories
23Global State and Consistent Cut
- A consistent cut
- An inconsistent cut
24Global State and Consistent Cut
- How to collect the states/distributed snapshot
from distributed processes? Follow a cut - A cut of the systems execution is a subset of
its global state S that is a union of prefixes of
process histories C h1,c1 U h2,c2 U U hn,cn - The state si in S corresponding to C is that of
pi immediately after the last event processed by
the cut ei,ci - The set of events ei,ci i 1, 2, , N is
called the frontier of the cut C - A cut C is consistent if for each event it
contains. It also contains all the events that
happened-before that event e belong C, f -gt e gt
f belong C - A consistent global state is one that corresponds
to a consistent cut S0 -gt S1 -gtS2-gt Each
transition represents an event occurred in one of
the processes in the system - A run is a total ordering of all the events in a
global history that is consistent with each local
history ordering - A linearization (consistent run) is an ordering
of the events in a global history that is
consistent with happened-before relationship -gt
on H - S is reachable from S if there is a
linearization that passes through S and then S
(from state S to S. What is the implication if
S is not reachable from S?)
25Global State and Consistent Cut
L1 e1,0 e2,0 e1,1 e1,2, e1,3 e2,1 e2,2 L2
e1,0 e1,1 e2,0 e2,1 e1,2 e2,2 e1,3 L1
L2 S is reachable from S if there is a
linearization that passes through S and the S
26Global State Predicates
- Global state predicate is a function that maps
from the set of global states of processes in the
system to True False - Stability once the system enters a state in
which the predicate is true and it remains True
in all future reachable from that state - Once the system enters the state and the
predicate becomes true, it will remain true in
all future states reachable from that state - I.e., Deadlock and garbage collection
- Safety Let S0 be the original state of the
system. Safety with respect to a is the assertion
that a evaluates to False for all state S
reachable from S0 - There is an undesirable property a (deadlocked)
that is a predicate of the systems global state.
Safety respect to a is that the assertion that a
evaluates to false for all states S reachable
from the initial state S0 - Liveness with respect to ß is the property that
for any linearization L starting in the state S0,
ß evaluates to True for some state SL reachable
from S0. ß may be a desirable and reachable
property
27Chandy Lamports snapshot Algorithm
- Goal to record a set of processes and channel
states for a set of processes pi such that even
though the combination of recorded states may
never have occurred at the same time, the
recorded global state is consistent - What is a channel state? State of a process
including the messages that are in transmission - Assumptions
- The communication amongst the processes are
reliable and messages are delivered in order - No process or communication failure
- Channels are unidirectional and provide FIFO
transmission - Any process may initiate a global snapshot at any
time - The graph of processes and channels is strongly
connected - The processes may continue execution while the
snapshot takes place - Incoming channels for process pi are those
channels that other processes send messages to pi - Outgoing channels for process pi are those
channels that pi sends messages to other
processes - Each process records its state and also for each
incoming channel a set of messages sent to it
28Chandy Lamports snapshot Algorithm
- Any initiating process, say P, may start by
recording its own local state then it sends a
marker along each of its outgoing channels - When a process P receives a marker from an
incoming channel C, - If P has not saved its local state, it first
saves the state, then sends a marker along each
of its own outgoing channels - If P has recorded the local state, it records the
state of channel C the sequence of messages that
have been received by P since the last time P
recorded its local state, and before it received
the marker - When a process has received a marker along each
of its incoming channels, and processed each one,
its recorded local state and state of each
channel are collected and sent to process P - Because any process can initiate the algorithm,
several snapshots can be constructed at the same
time. To identify different processes of snapshot
construction, a marker can be tagged with
identifier (even version) of process that
initiates the snapshot
29Chandy Lamports snapshot Algorithm
Marker receiving rule for process pi On pis
receipt of a marker message over channel c if
(pi has not yet recorded its state) it records
its process state now records the state of c as
the empty set turns on recording of messages
arriving over other incoming channels else pi
records the state of c as the set of messages it
has received over c since it saved its
state end if Marker sending rule for process
pi After pi has recorded its state, for each
outgoing channel c pi sends one marker message
over c (before it sends any other message over
c)
30Global State
- Organization of a process and channels for a
distributed snapshot
31Global State
- Process Q receives a marker for the first time
and records its local state - Q records all incoming message
- Q receives a marker for its incoming channel and
finishes recording the state of the incoming
channel
32Chandy Lamports snapshot Algorithm
- The biggest problem on getting a distributed
snapshot is how to collect the states of other
processes and those messages in transmission - No message is received but the sending process is
not included in the distributed snapshot - The marker is like a cut on the state of a
process - All other processes follow the same cut sequence
(cut after the first cut) to collect their states - The marker sending rule obligates processes to
send a marker after they have recorded their
state - The marker receiving rule obligates a process
that has not recorded its state to do so - If a process that has already saved its state
receives a marker, it records the state of the
channel as the set of messages it received on it
since it saved its state - What are the importance of reliable communication
and in order transmission?
33Two processes and their initial states
P2 has already received an order for five
widgets, which it will shortly dispatch to P1
34The execution of the processes
35Chandy Lamports snapshot Algorithm
- Process P1 records its state in the global state
S0, when P1s state is lt1000, 0gt - Following the marker sending rule, P1 emits a
marker message over its outgoing channel c2
before it sends the next application-level
message (order 10, 100) over channel c2. Global
state S1 - Before P2 receives the marker, it emits an
application message (five widgets) over c1 in
response to P1s previous order. Global state S2 - Process P1 receives P2s message (five widgets),
and P2 receives the marker. Following the marker
receiving rule, P2 records its state as lt50,
1995gt and that of the channel c2 as empty
sequence. Following the marker sending rule, it
sends a marker message over c1 - When P1 receives P2s marker message, it records
the state of channel c1 as the single message
(five widgets) that it received after it first
recorded its state. Global state S3 - The final recorded state P1 lt1000,0gt P2
lt50, 1995gt c1 lt(five widgets)gt c2 ltgt
36References
- Dollimore ch 11.5
- Tanenbaum ch 5.2 to 5.3