CS 141a Distributed Computation LaboratoryDetection Algorithms - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

CS 141a Distributed Computation LaboratoryDetection Algorithms

Description:

Understanding global snapshots. Developing detection algorithms for different problems starting from global snapshots. ... snapshot. State of channel from ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 46

Provided by: mcha96

Category:

more less

Transcript and Presenter's Notes

Title: CS 141a Distributed Computation LaboratoryDetection Algorithms

1
Detecting Properties of Distributed Systems

Central ideas
Understanding global snapshots
Developing detection algorithms for different
problems starting from global snapshots.
Understanding logical clocks

2
REVIEW

Time lines
True snapshots taken at a time t.
Problem with absence of global clock.
Distributed snapshots.

3
Process Time Lines
Process time line
State change
message
time
Process P
Q
R
4
Time Lines and True Snapshots
All processes take their local snapshots at
exactly the same time.
snapshot
Time t
snapshot
Time t
5
Time Lines and Distributed Snapshots
Events AFTER snapshot
Processes take local snapshots at different times
that satisfy the criterion.
Events BEFORE snapshot
6
KEY Property of Distributed Snaphsots

All edges between AFTER snapshot events to
BEFORE snapshot events are directed from BEFORE
snapshot event to AFTER snapshot event.

7
Time Lines and Distributed Snapshots
Process lines
Events AFTER snapshot
Directions of edges crossing boundary are from
BEFORE events to AFTER events.
Message lines
Events BEFORE snapshot
8
Process States at Distributed Snapshots
Events AFTER snapshot
State of process P at this point on its time line
State of process R at this point on its time line
Events BEFORE snapshot
State of process Q at this point on its time line
9
Channel States at Distributed Snapshots
State of channel from P to Q is given by sequence
of message lines crossing the BEFORE-AFTER
boundary
Events BEFORE snapshot
10
Start and Finish of Distributed Snapshots
finish
Snapshot in progress
start
time
11
Key Theorem

Distributed snapshot state is reachable from
snapshot initiating state, and
2. distributed snapshot state can reach snapshot
terminating state.

12
Proof of Key Theorem
Depiction of a computation
Time
BEFORE snapshot events in red
AFTER snapshot events in black
13
Proof of Key Theorem
Given a computation Flipping the order of a
BEFORE event that follows an AFTER event gives us
a new computation with the same states.
flip
BEFORE snapshot event
AFTER snapshot event
14
Proof of Key Theorem

Cases
AFTER event is a
Message receive,
Message send
Autonomous (local) process event

Cases
BEFORE event is a
Message receive,
Message send
Autonomous (local) process event

15
Proof of Key Theorem

To make the flip we need only prohibit
the BEFORE event receiving a message sent by the
AFTER event
The BEFORE event appearing later than the AFTER
event on the same process.

16
Proof of Key Theorem
Carry out flips of pairs of adjacent events where
a BEFORE event follows and AFTER event.
17
Proof of Key Theorem
Repeat such flips until all BEFORE events occur
before all AFTER events.
18
Proof of Key Theorem
Snapshot algorithm ends
Snapshot algorithm starts
Snapshot state
19
Proof of Key Theorem
Snapshot algorithm ends
Snapshot algorithm starts
Snapshot state
reachable
reachable
20
Deriving Snapshot Algorithms

We need only prohibit
A BEFORE event receiving a message sent by an
AFTER event
A BEFORE event appearing later than an AFTER
event on the same process.

21
Snapshot Algorithm 1

Initiator takes its local snapshot and sends one
signal message on each of its outgoing channels.
When a process receives a signal for the first
time, it sends a signal on each of its outgoing
channels, and it records the state of the channel
on which it received its first signal as being
empty.
When a process receives a signal after it has
recorded its state, the process records the state
of the channel on which the signal was received
as the sequence of messages the process received
after it recorded its state and before it
received the signal.

22
Proof of Correctness

The algorithm ensures that the following never
happens
A BEFORE event receiving a message sent by an
AFTER event
A BEFORE event appearing later than an AFTER
event on the same process.

23
Snapshot Algorithm 2

Logical clocks.
Each process ticks its local (integer) clock
forward by at least one with each event.
When a process sends a message, it timestamps the
message with its local clock at the time that it
sent the message.
When a receiver gets a message with timestamp T
the receiver ensures that its timestamp upon
receiving the message is greater than T.
All processes take their local snapshots at the
same logical time.

24
Proof of Correctness

The algorithm ensures that the following never
happens
A BEFORE event receiving a message sent by an
AFTER event
A BEFORE event appearing later than an AFTER
event on the same process.

25
Termination Detection

A distributed system is represented by a directed
graph where vertices represent processes and
directed edges represent directed channels.
The graph is fixed, so processes and channels are
not created or destroyed.
A process is either idle or active.
An idle process remains idle until the process
receives a message from any of its incoming
channels.
An active process can send messages at any time.
An active process can become idle at any time.

26
Termination Detection

Detect when all processes are idle and all
channels are empty.

27
Termination Detection Algorithm 1
Given distributed system represented by directed
graph.
directed channel
process
28
Termination Detection Algorithm 1
Detector is a process that is part of the
operating system.
detector
Channels from client processes to the detector
directed channel
process
29
Termination Detection Algorithm 1

When an active process becomes idle it sends a
message to the detector (along the channel to the
detector). The message contains, for each channel
incident on the process, the number of messages
received on each incoming channel and the number
of messages sent on each outgoing channel.
The detector has two local variables for each
channel c in the underlying distributed system
numberSentc, numberReceivedc are the values
of the number of messages sent and received on
channel c in the messages that the detector last
received about channel c from client processes.

30
Termination Detection Algorithm 1

When the detector receives a message from a
client process the detector updates its values of
numberSent and numberReceived to the values
in the message. For example, if the message from
the client contains 20 messages received on an
incoming channel c and 30 messages sent on an
outgoing channel d, then the detector sets its
local variables numberSentd to 30 and
numberReceivedc to 20.

31
Termination Detection Algorithm 1

After updating its local variables the detector
checks whether for all channels c, numberSentc
numberReceivedc, and if does, then the
detector claims termination.
Initial conditions of local variables in the
detector are set to ensure that this termination
condition is not satisfied initially. For
example, set numberSentc to 2 and
numberReceivedc -1.

32
Termination Detection Algorithm

Note that the values of the local variables,
numberSentc and numberReceivedc are NOT, in
general the number of messages sent on channel c
and the number of messages received on channel c
at this instant.
The messages from client processes to the
detector can take arbitrary (finite) time in
flight. So, by the time the message from the
client reaches the detector the message may be
old and not representative of the current
situation in the client.

Is this algorithm correct?

33
Proof Obligation

We must prove that the following never happens
A BEFORE event receiving a message sent by an
AFTER event
A BEFORE event appearing later than an AFTER
event on the same process.
In this termination detection algorithm, what are
the points in the process time line at which each
process snapshots itself?

34
Proof Obligation
In this termination detection algorithm, what are
the points in the process time line at which each
process snapshots itself?
The detector claimed termination based on certain
values of numberSentc and numberReceivedc for
all c. Lets say these values are NSc and
NRc. The detector set numberSentc to NSc
when it received a message from the client
likewise it set numberReceivedc to NRc when
it received a message from a client. The snapshot
point for a process is the point at which it sent
the message containing the values NSc or NRc.
35
Proof
Proof by contradiction. Consider first process p
to become active after its snapshot.
Active here
Snapshot points
Idle here
Process p timeline
36
Proof
An idle process becomes active only when it
receives a message.
Active here
message
Snapshot points
Idle here
37
Proof

The message was sent by a process either
After it took its snapshot or
Before it took its snapshot.
Let us consider each case in turn.
We will show that both cases are impossible.

38
Proof

CASE 1
Suppose the message was sent by a process q after
q took its snapshot.
When q took its snapshot q was idle (from the
algorithm).
When q sent the message, q was active (because
only active processes send messages).
So q became active after its snapshot and before
sending the message.
But p is the first process to become active after
its snapshot.
So q became active after p received the message.
So q sent the message after p received the
message. This is impossible.

39
Proof

CASE 2 Suppose a process q sent the message
before q took its snapshot.
The number of messages sent on the channel from q
to p in the message that q sends to the detector
includes this message that q sent to p.
The number of messages received on the channel
from q to p in the message that p sends to the
detector does not include this message that q
sent to p.
So for this channel c, NSc is not equal to
NRc.
So, the detector could not have claimed
termination.

40
Proposed Variant of Detection Algorithm

Instead of keeping track of numbers of messages
sent and received on each channel separately,
keep track of ONLY the total number of messages
sent and received by each process on all channels.

41
The Variant

Previous Algorithm When an active process
becomes idle it sends a message to the detector
(along the channel to the detector). The message
contains, for each channel incident on the
process, the number of messages received on each
incoming channel and the number of messages sent
on each outgoing channel.
Variant The message contains the TOTAL number of
messages received by the process on all its
incoming channels, and the total number of
messages sent on all its outgoing channels.

42
The Variant

Previous Algorithm The detector has two local
variables for each channel c in the underlying
distributed system numberSentc,
numberReceivedc are the values of the number of
messages sent and received on channel c in the
messages that the detector last received about
channel c from client processes.
Variant The detector has two local variables for
each process totalNumberSentp and
totalNumberReceivedp for each process p.

43
Proposed Variant

The detector updates its (possibly out-of-date)
copy of totalNumberSentp and totalNumberReceived
p when it receives a message from process p,
and claims termination if
Sum over all p of totalNumberSentp
Sum over all p of totalNumberReceivedp
Is this algorithm correct?

44
Counterexample

The algorithm is incorrect. Heres a counter
example with 3 processes P, Q, R with channels
between them. Initially all processes are active
and all channels are empty.
P sends a message to Q and becomes idle. When it
becomes idle it sends a message to the detector
with totalNumberSentP 1, totalNumberReceivedP
0
Q sends a message to P and becomes idle. When it
becomes idle it sends a message to the detector
with totalNumberSentQ 1, totalNumberReceivedQ
0

45
Counterexample (continued)

P gets the message from Q and becomes active. It
sends a message to R and P remains active.
Q gets the message from P and becomes active. It
sends a message to R and Q remains active.
R receives the messages from P and Q, and then R
becomes idle. When R becomes idle it sends a
message to the detector with totalNumberSentR
0 and totalNumberReceivedR 2.
The detector claims termination though P and Q
are still active!