Title: Snapshots
1Snapshots
- Goal Capture the current configuration of a
distributed system - Motivation
- analyze the properties of the computation
- for stable properties, i.e termination,
deadlock, loss of tokens, non-reachability of
objects (garbage collection) - failure recovery
- debugging
- Problems
- WHAT is a snapshot?
- HOW to construct it?
2Preliminaries
- Local snapshot
- snapshot state (local state, communication
history) - preshot and postshot events
- sentp,q , recvp,q
- Global snapshot
- collection of local snapshots
- Feasible (Global) snapshot
- for all neighbours p,q recvp,q is a subset
of sentp,q
3Preliminaries II
- Consistent Cut L
- if an event is in L then also all events that
causally precede it are in L - Meaningful snapshot (in computation C)
- a snapshot that captures a configuration that
occurs in some execution of the computation C - Lemma Let S be a snapshot and let L be the cut
implied by S. The following three statements are
equivalent - S is feasible
- L is a consistent cut
- S is meaningful
4Snapshot Algorithms
- It is sufficient to ensure that
- local snapshot is taken in each process
- no postshot message is received in a preshot
event - Chandy Lamport Algorithm
- assumes FIFO channels
- Lai Yang Algorithm
- does not assume FIFO channels, but piggybacking
must be possible - TAYLOR If neither FIFO, nor piggybacking is
available, each snapshot algorithm must be
inhibitory.
5Chandy Lamport Algorithm
Idea flood the network, take snapshot when
message received Spontanneous wake-up take
local snapshot send(take_snapshot) to all
neighbours Upon receiving take_snapshot
message if no snapshot has been taken yet
then take local snapshot send(take_snapshot)
to all neighbours except the one from which
the message arrived Complexity (overhead) 2E
take_snapshot messages Correctness follows from
FIFO, as no postshot message can overtake the
take_snapshot message
6Lai Yang Algorithm
- Idea each node p maintains variable takenp,
whether it has taken the snapshot or not. This is
appended to each message sent. - A node takes snapshot either spontaneously or
when it receives the first message with taken
true. (The snapshot is taken before processing
the contents of that message.) - Correctness
- no postshot message is processed before a local
snapshot is taken - no assurance that each node will take local
snapshot - initial flooding of control messages may be
necessary - Complexity if no initial flooding need, then
just one biggybacked bit on each message,
otherwise the cost of initial flooding
7Using Snapshots Deadlock Detection
- Model of the basic algorithm
- when a process p becomes blocked, it sends
requests to nodes Reqp - predicate Freep defines the subsets of Reqp from
which the grant messages are sufficient to free p - a node collects requests in a set Pendp, it can
grant request (send back grant message) only if
it is active - request and grant numbers and attached to the
messages, so that a node can discard obsolete
grant messages - Deadlock
- a node p is alive if there is a reachable
configuration in which p is active - p is deadlocked if it is not alive
- a configuration is deadlocked if there is a
deadlocked process
8Deadlock Detection
- Requirements
- non-interference dont influence the basic
algorithm - liveness deadlock will be detected, if present
- safety algorithm detects deadlock only if there
indeed is deadlock - usually we also to report which nodes are
deadlocked
9Global Marking Algorithm
- Overall structure
- repeatedly take a snapshot and check it for
deadlock - Checking idea
- simulate propagations of grant from the snapshot
on, assuming every active process grants access
to all requests it received - Algorithm
- At an active node
- Send alive message to all node which sent
requests to me. - At a blocked node p
- Receive alive messages. If the predicate Freep
is satisfied, become active.