Title: Event Ordering
1 - Event Ordering
- and Global Predicate Detection
2Model
- Message passing
- Asynchronous System
- No upper bound on message delivery time
- No bound on relative process speeds
- No centralized clock
3Problems
- How can we order events?
- How can we compute predicates over the global
state?
4Client-Server
- Processes exchange messages using RPC
The server computes the response (possibly asking
other servers) and returns it to the client
!?!
5Deadlock
6Goal
- Design a protocol by which a processor can
determine if a set of processors is deadlocked
7Any ideas?
- Compute Wait-For-Graph!
- arrow from pi to pj if pj has received a request
but has not responded yet
- To detect deadlock, use p0 to compute WFG of p1,
p2, p3
8The protocol
- p0 sends a message to p1,p3
- On receipt of p0s message, process pi replies
with state and wait-for info
9An execution
10Houston, we have a problem...
- Asynchronous system
- How can p0 synchronize the process of collecting
the necessary data?
11What do we use time for?
- Synchronize actions
- Order events
- Can we order events in a Distributed System?
12But first...
- Definition The local history hp is the sequence
of events executed by a processor p - prefix that contain the first k events
- initial, empty sequence
- Definition The history H is the set
- Definition is the i-th event of processor
p. It can be - a local event
- a send event
- a receive event
13Ordering events
- Events in a local history are totally ordered
- For every message m, receive(m) is after send(m)
14Happened before
- A binary relation defined over events
(Lamport 1978)
15Space-time diagrams
- Given H and we can construct a partially
ordered set some events cannot be ordered
16What p0 sees
- uniquely identified by giving c1,cn
There is a 1-1 correspondence between cuts and
configurations (also called global states)
17Runs and consistent runs
Definition A run is a total ordering of the
events in H that is consistent with the local
histories of the processors. h1, h2,, hn is a
run Definition A run is consistent if the total
order imposed in the run is an extension of the
partial order induced by Note A single
distributed computation may correspond to several
consistent runs
18Consistent Cuts andConsistent Global States
Definition A cut is consistent if Observation
A consistent cut defines a unique consistent
global state
19What p0 sees
Is this cut consistent?
NO!
20Our task
- Develop a protocol by which a processor can build
a consistent cut - Informally, we want to be able to take a snapshot
of the computation - We will record
- processor states
- channel states
21Our approach
- Develop a simple synchronous protocol
- Refine protocol as we relax assumptions
22Snapshot I
- 0 processor p0 selects tss
- 1 p0 sends take a snapshot at tss to all
processes - 2 when clock of pi reads tss then pi
- records its local state si
- sends an empty message along its outgoing
channels - starts recording messages received on each of
incoming channels - stops recording a channel when receives first
message with timestamp greater than or equal to
tss
- Assumptions
- FIFO channels
- synchronous system
- processors timestamp each m with T(send(m))
23Correctness
Theorem The protocol produces a consistent cut
Proof
Need to prove
lt 0 and 1gt
lt 5 and 3gt
lt Definition gt
lt Assumption gt
lt Property of real timegt
lt Definition gt
lt 2 and 4gt
lt Assumption gt
24Logical Clocks
- A clock that satisfies the Clock Condition is
called a logical clock - Real-time clocks are logical clocks
- Can we implement the Clock Condition in some
other way?
25Lamport Clocks
- Each process maintains a local variable LC
- LC(ei) ºValue of LC for event ei
26Increment Rules
Timestamp each message m with timestamp TS(m)
LC(send(m))
27Space-Time Diagrams and Logical Clocks
2
1
4
5
6
5
1
6
7
2
3
4
28Houston, we still have a problem
- How do we choose a logical tss so that the
message from p0 reaches every other process
before tss? - when LC t do S
- doesnt make sense for Lamport clocks
- they are not dense
- they would not execute first instruction of S at
t anyway
29No tss?
- Send take checkpoint at W
- where we assume that W is a value that cannot be
reached by applying the update rules of logical
clocks
30Clocks are not dense?
- To execute when LC t do S
- whenever pi is about to execute event e
- if (e is a local event or a send event) and (LC
t -2) then - execute e execute first event of S
- if (e receive(m)) and (TS(m) ³ t) and (LC lt
t-1) then - put m back on the channel
- re-enable e
- set LC to t-1
- execute first event of S
31Snapshot II
- 0 processor p0 selects W
- 1 p0 sends take a snapshot at W to all
processes and sets its logical clock to W - 2 when clock of pi reads W then pi
- records its local state si
- sends an empty message along its outgoing
channels - starts recording messages received on each of
incoming channels - stops recording a channel when receives first
message with timestamp greater than or equal to W
32Hallo-ho? Houston?
- Assumption about W requires to bound relative
process speed and message delays - What about asynchrony?
33Here Mission Control we hear you
34The Idea
- Use empty message as the announcement for taking
a snapshot!
35Snapshot III
- 0 processor p0 sends itself take a snapshot
- 1 when pi receives take a snapshot for the
first time from pj - records its local state si
- sends take a snapshot along its outgoing
channels - sets channel from pi to empty
- starts recording messages received over each of
its other incoming channels - 2 when pi receives take a snapshot beyond the
first time from pk - stops recording channel from pk
- 3 when pi has received take a snapshot on all
channels, it sends collected state to p0 and
stops.
36Properties of Snapshots
- The global state Ss saved by the snapshot
protocol is a consistent global state - But did it ever occur during the computation?
- a distributed computation provides only a partial
order of events - many total orders (runs) are compatible with that
partial order - all we know is that Ss could have occurred
- We are evaluating predicates on states that may
have never occurred!
37An Execution and its Lattice
S00
S01
S10
S11
S02
S12
S21
S22
S32
S42
38Reachability
- We say that
- Skl is reachable from Sij
- if there is a path from Sij to Skl in the lattice.
S00
S01
S10
S11
S02
S12
S21
S03
S04
S22
S13
S31
S41
S32
S23
S14
S42
S33
S24
S43
S34
S44
S53
S35
S54
S63
S45
S55
S64
S65
39So, why do we care about Ss again?
- Deadlock is a stable property
-
- If Si initial state and Sf termination state for
snapshot -
- for a run R
40Consequences
- Deadlock in Ss implies Deadlock in Sf
- No Deadlock in Ss implies no Deadlock in Si
41Same problem, a different approach
- Monitor process does not query explicitly
- It just passively collects information
- and uses it to build an observation.
- (reactive architectures, Harel and Pnueli 1985)
Definition An observation is an ordering of event
of the distributed computation based on the
order in which the receiver is notified of the
events.
42Building Observations
43Building runs
- Messages must be delivered to p0 in FIFO order
- Is FIFO delivery sufficient to build
consistent runs? -
- NO. We need a stronger delivery rule
-
44Causal Delivery
m
45Implementing Causal Deliveryin Synchronous
Systems
- If upper bound d on message delivery time
- Notification message for event e carries TS(e)
DR1 At time t, p0 delivers all received
messages with timestamps up to t-d in
increasing timestamp order.
46Implementing Causal Deliverywith Lamport Clocks
DR1.1 Deliver all received messages in
increasing (logical clock) timestamp order.
47(Il)logical clocks?
- If no bound on message delay
- and no real time clock to measure it
- no way to decide when to deliver a message?
48Gap-Detection
- Given two events and and their clock
values and where -
- determine whether some other event exists
such that
49Stability
- Definition A message m received by p is stable
at p if p will never receive a future message
such that
DR2 Deliver all received stable messages in
increasing (logical clock) timestamp order.
50Implementing Stability
- Real-time Clocks
- wait for d time units
- Lamport Clocks
- wait on every channel for message m with TS(m)
gt LC(e)
- Our approach
- design better clocks!
51Clocks and Strong Clocks
- RC and LC implement the Clock Condition
We want new clocks TC that implement the
following Strong Clock Condition
52Causal Histories
53How to build q(e)
- Each process pi
- Initializes a local variable q to the empty set
- If e is an internal or a send event, then
- If e is a receive event, then
54Pruning Causal Histories
- Prune segments of history that are known to all
processes (Peterson, Bucholz, Schlichting) - Use a more clever way to encode q(e)
55Vector Clocks
- Projection qi(e) of q(e) for process pi is a
prefix of pis local history and can be
encoded using ki
- can be encoded using k1, k2,, kn
56Update Rules
Timestamp each message m with timestamp TS(m)
VC(send(m))
m
57Example
1,0,0
2,1,0
3,1,2
4,1,2
5,1,2
2,2,3
0,1,0
4,3,3
1,0,1
1,0,2
2,1,3
5,1,4
58Operational Interpretation
VC(ei)i
number of events pi has executed up to and
including ei
VC(ei)j
number of events of pj that causally precede
event ei of pi
59Properties of Vector ClocksEvent Ordering
- Definition Given two vectors V and V the
relation less than is defined as
Property 1 (Strong Clock Condition)
Property 2 (Simple Strong Clock Condition) Given
event ei of process pi and ej of process pj,
where i ? j
Property 3 (Concurrent) Given event ei of
process pi and ej of process pj, where i ? j
60Properties of Vector ClocksConsistency
Vector clocks can be used to check if a set of n
events constitute the frontier of a consistent cut
- Definition Two events are pairwise inconsistent
if they cant be on the frontier of the same
consistent cut.
Property 4 (Pairwise Inconsistent) Events ei of
process pi and ej of process pj, where i ? j, are
pairwise inconsistent if and only if
Property 5 (Consistent Cut) A cut defined by
(c1,cn) is consistent if and only if
61Properties of Vector ClocksGap Detection
Property 6 (Weak Gap Detection) Given event ei
of process pi and ej of process pj, if VC(ei)k
lt VC(ej)k for some k?j, then there exists an
event ek such that
2,1,1
62Strong Gap Detection
- Recall WGP
- Given event ei of process pi and ej of process
pj, if VC(ei)k lt VC(ej)k for some k?j, then
there exists an event ek such that
If ik, then we have strong gap detection if
VC(ei)i lt VC(ej)i then there exists such
that
63VCs for Causal Delivery
- Each process increments the local component of
its VC only for events that are notified to
monitor - Each notification message m is timestamped with
VC(event notified through m) - Monitor keeps all notification messages in a set
M
64Stability
- Suppose monitor p0 has received mj form pj.
- When is it safe for p0 to deliver mj?
- There is no earlier message in M
- There is no earlier message from pj
- There is no earlier message from pk, where
k ? j
65Checking for messages from pk
- Let be the last message that p0 delivered
from pk
- By Strong Gap Detection, exists only if
- Hence deliver mj as soon as
66The Protocol
- p0 maintains array D1n of counters
- Di TS(mi)i
- where mi is the last message delivered from pi
- DR3 Deliver m from pj as soon as both of the
following conditions are satisfied
1,0
1,1
2,1
1,1
1,0
1,1
2,1
1,0
1,1
67More on GPD
- What if we want to detect non-stable predicates?
- Say we want to evaluate F(Sss)
- by the time the predicate is evaluated, the value
of F(Sss) may have changed - the global state Sss that we use may not have
even occurred!
68Example
x 3
x 4
x 5
y 2
y 4
y 6
Detect if the following predicates holds x y
x y - 2
Assume that initially y 10 and x 0
69The Lattice
- If Sss S31 or Sss S41, x y-2 is
detected, but it may never have occurred
- We know that x y has occurred, but it may
not be detected if tested before S32 or after
S54
- Not enough to look at one state look at
observations instead!
70Possibly and Definitely
- Possibly(F) There exists a consistent
observation O of the computation such that F
holds in a global state of O - Definitely(F) For every consistent observation O
of the computation, there exists a global state
of O in which F holds
71Sequences of Non-Stable Predicates
- Simple predicates (even if not stable) cannot
capture dynamic properties of distributed systems - every acquisition of the lock must be preceded by
a release - variable x should be set to 0 only after variable
y has been negative at least once - message m can be discarded once it becomes stable
at all destinations
Definition Observation O satisfies
iff there exists
such that
and
72Computing Possibly and Definitely
S00
- Scan lattice level after level
S01
S10
S11
S02
- To compute Possibly (F)
- If F holds in one global state, then Possibly(F)
S03
S12
S21
S03
S31
S22
S13
S31
S41
S32
S23
S42
S33
- If no such state, announce
- Definitely(F)
Possibly(x y - 2)
Definitely (x y)
73Building the lattice
- p0 collects local states from each process
- For each pi , keeps Qi keeps sequence of local
states in FIFO order
How can we build level i1 of the lattice, given
level i?
74Smin and Smax
75Computing Smin and Smax
76Building the levels
- To build level l
- wait until states at end of each Qi have VC such
that - To build level i1
- For each state on level l, build
- Then, using vector clocks, check whether these
global states are consistent
77Multiple Monitors
- Create a group of monitor processes
- increased performance
- increased reliability
- Notify through a causal multicast to the group
- Each replica will construct a (possibly
different) observation - if property stable, if one monitor detects,
eventually all monitors do - otherwise either use Possibly and Definitely
- or use causal atomic multicast
- What about failures?