Consistent Cuts and Un-coordinated Check-pointing - PowerPoint PPT Presentation

About This Presentation

Title:

Consistent Cuts and Un-coordinated Check-pointing

Description:

... computes a set of concurrent check-points, one from each process. ... When a message is sent from S to R, number of last check-point is piggybacked on message. ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 34

Provided by: ping50

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Consistent Cuts and Un-coordinated Check-pointing

1
Consistent Cuts and Un-coordinated
Check-pointing
2
Cuts
e1
e2
e3
x
x
x
e0
x
e6
x
x
x
e4
e5
x
x
x
x
e7
e8
e9
e10
x
x
x
e11
e12
e13

Subset C of events in computation
some definitions require at least one event from
each process
For each process P, events in C that executed on
P form an initial prefix of all events that
executed on P
Cut e0,e1,e2,e4,e7 Not a cut e0,e2,e4,e7
Frontier of cut subset of cut containing last
events on each process
for our example, e2,e4,e7

3
Equivalent definition of cut
e1
e2
e3
x
x
x
e0
x
e6
x
x
x
e4
e5
x
x
x
x
e7
e8
e9
e10
x
x
x
e11
e12
e13

Subset C of events in computation
If e e C, and e ? e, and e and e executed on
same process, then e e C.
What happens if we remove condition that e and e
were executed on same process?

4
Consistent cut
e1
e2
e3
x
x
x
e0
x
e6
e4
e5
x
x
x
e7
e8
e9
e10
x
x
x
x
x
x
x
e11
e12
e13

Subset C of events in computation
If e e C, and e ? e, then e e C
Consistent cut e0, e1, e2, e4, e5,e7
note e5?e2 but cut is still consistent by our
definition
Inconsistent cut e0,e1,e2,e4,e7
Not a cut e0,e2,e4,e7

5
Properties of consistent cuts(0)
e
x
x
x
e0
x
e6
e4
e5
x
x
x
e7
e8
e9
e10
x
x
x
x
x
x
x
x
e
e11
e12
e13

If cut is inconsistent, there must be a message
such that receiving event is in C but sending
event is not.
Proof there must an e and e such e?e, e in C
but e not in C. Consider the chain e?e0?e1?e.
There must be events ei?ej in this chain such
that events e,e0,ei are not in C, but ej is in
C. Clearly, ei and ej must be executed by
different processes. Therefore, ei is send and ej
is receive.

6
Properties of consistent cuts(I)
x
x
x
e0
x
e6
e4
e5
x
x
x
e7
e8
e9
e10
x
x
x
x
x
x
x
e11
e12
e13

Let e P be a computational event on a frontier of
a consistent cut C. If e P ? eQ , then eQ
cannot be in C.
Proof Consider the causal chain e P ? e1? eQ.
Event e1 must execute on process P because
e P is a computational event. If e P is on
frontier, e1 is not. By definition of consistent
cut, eQ cannot be in consistent cut.

7
Properties (II)
x
x
x
e0
x
e6
e4
e5
x
x
x
e7
e8
e9
e10
x
x
x
x
x
x
x
e11
e12
e13

Let F e0,e1,. be a set of computational
events, one from each process. F is the frontier
of a consistent cut iff the events in F are
concurrent.
Proof from Property (I) and Property(0).

8
Properties of consistent cuts (III)Lattice of
consistent cuts
C2
C1
e1
e2
e3
x
x
x
e0
x
e6
e4
e5
x
x
x
e7
e8
e9
e10
x
x
x
x
x
x
x
e11
e12
e13
9
Un-coordinated check-pointing

Each process saves its local state at start, and
then whenever it wants.
Events compute,send,receive,take check-point
Recovery line frontier of any consistent cut,
whose events are all check-points
Is there an optimum recovery line? How do we find
it?

10
Check-point Dependency Graph
p

q
r
p
q
r

Nodes
One for each local check-point
One for current state of each surviving process
Edges one for each message (e,e) from some P to
Q
Source is node for last check-point on P that
happened before e
Destination is node n on Q for first
check-point/current state such that e happened
before n

11
Properties of check-point dependency graph
p
q
r

Node c2 is reachable from node c1 in graph iff
check-point corresponding to c1 happens before
check-point corresponding to c2.

12
Finding optimum recovery line
RL1
RL2
RL3
RL0
p
q
r

RL0 last nodes on each process
While (there exist u,v in RLi v is reachable
from u)
RLi1 RLi v node before v in same
process as v
Final RL when loop terminates is optimum recovery
line
See later to make this into an algorithm.

13
Correctness
p
q
r

Algorithm obviously computes a set of concurrent
check-points, one from each process.
From Property (II), it follows that these
check-points are frontier of a consistent cut.

14
Optimality
p
q
r

Suppose O is better recovery line.
O cannot be RLO otherwise, our algorithm
succeeds. So RL0 is better than O.
Consider iteration when RLi is better than O but
RLi1is not. There exist u,v in RLi such that v
is reachable from u and RLi1 is obtained from
Rli by dropping v and taking check-point prior to
v. Therefore, v must be in O. Let x in O be
check-point on same process as u. We see that
x?u?v, which contradicts Property(II).

15
Finding recovery line efficiently
p
q
r

Node colors
Yellow on current recovery line
Red beyond current recovery line
Green behind current recovery line
Bad edge
Source is red/yellow
Destination is yellow/green
Algorithm propagate redness forward from
destination bad edges

16
Algorithm

Mark all nodes green
For each node l that is last node of process
Mark node yellow
Add each edge (l,d) to worklist
While worklist is nonempty do
Get edge (s,d) from worklist
If color(d) is red continue
L node to left of d
Mark L yellow Add all bad edges (L,d) to
worklist
R first red node to right of d
For each node t in interval d,R)
Mark t red
Add all bad edges of form (t,d) to worklist

17
Remarks

Complexity of algorithm O(EV)
Each node is touched at most 3 times to mark it
green, yellow,red
Each edge is examined at most twice
Once when its source goes green? yellow
Once when its source goes yellow ? red
Another approach use rollback dependency graph
(see Alvisi et al)

18
Practical details

Each process numbers its checkpoints starting at
0.
When a message is sent from S to R, number of
last check-point is piggybacked on message.
Receiver of message saves message piggyback in
log.
When checkpoint is taken, message log is also
saved on disk.
In-flight messages can be recovered from this log
after recovery line has been established.

19
Garbage collection of saved states

Garbage collection of old states is key problem.
One solution run the recovery line algorithm
once in a while even if there is no failure, and
GC all states behind the recovery line.

20
Application-level Check-pointing
21
Recall

We have seen system-level check-pointing.
Trouble with system-level check-pointing
lot of data saved at each check-point
PC, registers, stack, heap, some O/S
state,network state,
thin pipe to disk problem
lack of portability
processor/OS state is very implementation-specific
cannot restart check-point on different platform
cannot restart check-point on different number of
processors
One alternative application-level check-pointing

22
Application-level check-pointing

Key idea permit user to specify
what variables should be saved at a check-point
program point where check-point should be taken
Example protein-folding
save only positions and velocities of bases
check-point at end of time-step
Advantages
less data saved
only live data needs to be saved
check-point at program points where live data is
small and no in-flight messages
data can be saved in implementation-independent
manner

23
Warning

This is more complex than it appears!
We must restore
PC need to save where check-point was taken
registers
stack
In general, many active procedure invocations
when check-point is taken.
How do we restore stack so procedure returns etc.
happen correctly?
Heap restored heap data will be in different
locations than at check-point

24
Right intuition

In application-level check-pointing, we must use
the saved variables to recompute the system state
we would have saved in system-level
check-pointing, modulo relocation of heap
variables.
Recovery script
code that is executed to accomplish this
distinct from user code, but obviously derived
from it
however, needs to woven into user code to
simplify problems such as register restoration

25
Example DOME (Beguelin et al,CMU)

Distributed Object Migration Environment (DOME)
C library of data parallel objects
automatically distributed over networks of
heterogenous work-stations
Application-level check-pointing and restart
supported
User-level
Pre-processor based

26
Simple case

Most computation occurs in a loop in main
Solution
put one check-point at bottom of loop
live variables at bottom of loop are globals
write script to save and restore globals
weave script into main

27
Dome example

main (int argc, char argv)
dome-init(argc,argv)
// statements are introduced for failure
recovery
//prefix d on variable type says save me
at checkpoint
dScalarltintgt integer-variable
dScalarltfloatgt float-variable
dVectorltintgt int-vector
if (! is_dome_restarting())
execute_user_initialization_code()
while (!loop_done())
//loop_done uses only saved variables
do_computation()
dome_check_point()

28
Analysis

Let us understand how this code restores
processor state
PC we drop into loop after restoring globals
registers by making recovery script part of
main, we ensure that register contents at top of
loop are same for normal execution and for
restart
stack we re-execute main, so frame is restored
heap restored from saved check-point but may be
relocated
Think this works even if we restart on different
machine!

29
Remarks

Loop body is allowed to make function calls
real restriction is that there is one check-point
and it must be in main
Command-line parameter is used to determine
whether execution is normal or restart
User must write some code to restore variables
from check-point
perhaps library code can help

30
More complex example

f()
dScalarltintgt i
do_f_stuff
g(i)
next_statement
g(dScalarltintgt I)
do_g_stuff_1
dome_checkpoint()
do_g_stuff_2

31
General scenario

Check-point could happen deep inside a bunch of
procedure calls.
On restart, we need to restore stack so procedure
returns etc. can happen normally.
Solution save information about which procedure
invocations are live at check-point

32
Example with Dome constructs

f()
g(dScalarltintgt I)
dScalarltintgt i
if (is_dome_restarting())
if (is_dome_restarting())
goto restart_done
next_call dome_get_next_call()
do_g_stuff_1
..
dome_checkpoint()
do_f_stuff
restart_done
dome_push(g1)
do_g_stuff_2
g1
g(i)
dome_pop()
next_statement

33
Challenge

Do this for MPI code.
Can compiler determine
where to check-point?
what data to check-point?
Need not save all data live at check-point
if some variables can be easily recomputed from
saved data and program constants, we can
re-compute those values in the recovery script.
we can modify program to make this easier.
Measure of success beat hand-written recovery
code