CS 141a Distributed Computation LaboratoryDetection Algorithms - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

CS 141a Distributed Computation LaboratoryDetection Algorithms

Description:

Understanding global snapshots. Developing detection algorithms for different problems starting from global snapshots. ... snapshot. State of channel from ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 46
Provided by: mcha96
Category:

less

Transcript and Presenter's Notes

Title: CS 141a Distributed Computation LaboratoryDetection Algorithms


1
Detecting Properties of Distributed Systems
  • Central ideas
  • Understanding global snapshots
  • Developing detection algorithms for different
    problems starting from global snapshots.
  • Understanding logical clocks

2
REVIEW
  • Time lines
  • True snapshots taken at a time t.
  • Problem with absence of global clock.
  • Distributed snapshots.

3
Process Time Lines
Process time line
State change
message
time
Process P
Q
R
4
Time Lines and True Snapshots
All processes take their local snapshots at
exactly the same time.
snapshot
Time t
snapshot
Time t
5
Time Lines and Distributed Snapshots
Events AFTER snapshot
Processes take local snapshots at different times
that satisfy the criterion.
Events BEFORE snapshot
6
KEY Property of Distributed Snaphsots
  • All edges between AFTER snapshot events to
    BEFORE snapshot events are directed from BEFORE
    snapshot event to AFTER snapshot event.

7
Time Lines and Distributed Snapshots
Process lines
Events AFTER snapshot
Directions of edges crossing boundary are from
BEFORE events to AFTER events.
Message lines
Events BEFORE snapshot
8
Process States at Distributed Snapshots
Events AFTER snapshot
State of process P at this point on its time line
State of process R at this point on its time line
Events BEFORE snapshot
State of process Q at this point on its time line
9
Channel States at Distributed Snapshots
State of channel from P to Q is given by sequence
of message lines crossing the BEFORE-AFTER
boundary
Events BEFORE snapshot
10
Start and Finish of Distributed Snapshots
finish
Snapshot in progress
start
time
11
Key Theorem
  • Distributed snapshot state is reachable from
    snapshot initiating state, and
  • 2. distributed snapshot state can reach snapshot
    terminating state.

12
Proof of Key Theorem
Depiction of a computation
Time
BEFORE snapshot events in red
AFTER snapshot events in black
13
Proof of Key Theorem
Given a computation Flipping the order of a
BEFORE event that follows an AFTER event gives us
a new computation with the same states.
flip
BEFORE snapshot event
AFTER snapshot event
14
Proof of Key Theorem
  • Cases
  • AFTER event is a
  • Message receive,
  • Message send
  • Autonomous (local) process event
  • Cases
  • BEFORE event is a
  • Message receive,
  • Message send
  • Autonomous (local) process event

15
Proof of Key Theorem
  • To make the flip we need only prohibit
  • the BEFORE event receiving a message sent by the
    AFTER event
  • The BEFORE event appearing later than the AFTER
    event on the same process.

16
Proof of Key Theorem
Carry out flips of pairs of adjacent events where
a BEFORE event follows and AFTER event.
17
Proof of Key Theorem
Repeat such flips until all BEFORE events occur
before all AFTER events.
18
Proof of Key Theorem
Snapshot algorithm ends
Snapshot algorithm starts
Snapshot state
19
Proof of Key Theorem
Snapshot algorithm ends
Snapshot algorithm starts
Snapshot state
reachable
reachable
20
Deriving Snapshot Algorithms
  • We need only prohibit
  • A BEFORE event receiving a message sent by an
    AFTER event
  • A BEFORE event appearing later than an AFTER
    event on the same process.

21
Snapshot Algorithm 1
  • Initiator takes its local snapshot and sends one
    signal message on each of its outgoing channels.
  • When a process receives a signal for the first
    time, it sends a signal on each of its outgoing
    channels, and it records the state of the channel
    on which it received its first signal as being
    empty.
  • When a process receives a signal after it has
    recorded its state, the process records the state
    of the channel on which the signal was received
    as the sequence of messages the process received
    after it recorded its state and before it
    received the signal.

22
Proof of Correctness
  • The algorithm ensures that the following never
    happens
  • A BEFORE event receiving a message sent by an
    AFTER event
  • A BEFORE event appearing later than an AFTER
    event on the same process.

23
Snapshot Algorithm 2
  • Logical clocks.
  • Each process ticks its local (integer) clock
    forward by at least one with each event.
  • When a process sends a message, it timestamps the
    message with its local clock at the time that it
    sent the message.
  • When a receiver gets a message with timestamp T
    the receiver ensures that its timestamp upon
    receiving the message is greater than T.
  • All processes take their local snapshots at the
    same logical time.

24
Proof of Correctness
  • The algorithm ensures that the following never
    happens
  • A BEFORE event receiving a message sent by an
    AFTER event
  • A BEFORE event appearing later than an AFTER
    event on the same process.

25
Termination Detection
  • A distributed system is represented by a directed
    graph where vertices represent processes and
    directed edges represent directed channels.
  • The graph is fixed, so processes and channels are
    not created or destroyed.
  • A process is either idle or active.
  • An idle process remains idle until the process
    receives a message from any of its incoming
    channels.
  • An active process can send messages at any time.
  • An active process can become idle at any time.

26
Termination Detection
  • Detect when all processes are idle and all
    channels are empty.

27
Termination Detection Algorithm 1
Given distributed system represented by directed
graph.
directed channel
process
28
Termination Detection Algorithm 1
Detector is a process that is part of the
operating system.
detector
Channels from client processes to the detector
directed channel
process
29
Termination Detection Algorithm 1
  • When an active process becomes idle it sends a
    message to the detector (along the channel to the
    detector). The message contains, for each channel
    incident on the process, the number of messages
    received on each incoming channel and the number
    of messages sent on each outgoing channel.
  • The detector has two local variables for each
    channel c in the underlying distributed system
    numberSentc, numberReceivedc are the values
    of the number of messages sent and received on
    channel c in the messages that the detector last
    received about channel c from client processes.

30
Termination Detection Algorithm 1
  • When the detector receives a message from a
    client process the detector updates its values of
    numberSent and numberReceived to the values
    in the message. For example, if the message from
    the client contains 20 messages received on an
    incoming channel c and 30 messages sent on an
    outgoing channel d, then the detector sets its
    local variables numberSentd to 30 and
    numberReceivedc to 20.

31
Termination Detection Algorithm 1
  • After updating its local variables the detector
    checks whether for all channels c, numberSentc
    numberReceivedc, and if does, then the
    detector claims termination.
  • Initial conditions of local variables in the
    detector are set to ensure that this termination
    condition is not satisfied initially. For
    example, set numberSentc to 2 and
    numberReceivedc -1.

32
Termination Detection Algorithm
  • Note that the values of the local variables,
    numberSentc and numberReceivedc are NOT, in
    general the number of messages sent on channel c
    and the number of messages received on channel c
    at this instant.
  • The messages from client processes to the
    detector can take arbitrary (finite) time in
    flight. So, by the time the message from the
    client reaches the detector the message may be
    old and not representative of the current
    situation in the client.
  • Is this algorithm correct?

33
Proof Obligation
  • We must prove that the following never happens
  • A BEFORE event receiving a message sent by an
    AFTER event
  • A BEFORE event appearing later than an AFTER
    event on the same process.
  • In this termination detection algorithm, what are
    the points in the process time line at which each
    process snapshots itself?

34
Proof Obligation
In this termination detection algorithm, what are
the points in the process time line at which each
process snapshots itself?
The detector claimed termination based on certain
values of numberSentc and numberReceivedc for
all c. Lets say these values are NSc and
NRc. The detector set numberSentc to NSc
when it received a message from the client
likewise it set numberReceivedc to NRc when
it received a message from a client. The snapshot
point for a process is the point at which it sent
the message containing the values NSc or NRc.
35
Proof
Proof by contradiction. Consider first process p
to become active after its snapshot.
Active here
Snapshot points
Idle here
Process p timeline
36
Proof
An idle process becomes active only when it
receives a message.
Active here
message
Snapshot points
Idle here
37
Proof
  • The message was sent by a process either
  • After it took its snapshot or
  • Before it took its snapshot.
  • Let us consider each case in turn.
  • We will show that both cases are impossible.

38
Proof
  • CASE 1
  • Suppose the message was sent by a process q after
    q took its snapshot.
  • When q took its snapshot q was idle (from the
    algorithm).
  • When q sent the message, q was active (because
    only active processes send messages).
  • So q became active after its snapshot and before
    sending the message.
  • But p is the first process to become active after
    its snapshot.
  • So q became active after p received the message.
  • So q sent the message after p received the
    message. This is impossible.

39
Proof
  • CASE 2 Suppose a process q sent the message
    before q took its snapshot.
  • The number of messages sent on the channel from q
    to p in the message that q sends to the detector
    includes this message that q sent to p.
  • The number of messages received on the channel
    from q to p in the message that p sends to the
    detector does not include this message that q
    sent to p.
  • So for this channel c, NSc is not equal to
    NRc.
  • So, the detector could not have claimed
    termination.

40
Proposed Variant of Detection Algorithm
  • Instead of keeping track of numbers of messages
    sent and received on each channel separately,
    keep track of ONLY the total number of messages
    sent and received by each process on all channels.

41
The Variant
  • Previous Algorithm When an active process
    becomes idle it sends a message to the detector
    (along the channel to the detector). The message
    contains, for each channel incident on the
    process, the number of messages received on each
    incoming channel and the number of messages sent
    on each outgoing channel.
  • Variant The message contains the TOTAL number of
    messages received by the process on all its
    incoming channels, and the total number of
    messages sent on all its outgoing channels.

42
The Variant
  • Previous Algorithm The detector has two local
    variables for each channel c in the underlying
    distributed system numberSentc,
    numberReceivedc are the values of the number of
    messages sent and received on channel c in the
    messages that the detector last received about
    channel c from client processes.
  • Variant The detector has two local variables for
    each process totalNumberSentp and
    totalNumberReceivedp for each process p.

43
Proposed Variant
  • The detector updates its (possibly out-of-date)
    copy of totalNumberSentp and totalNumberReceived
    p when it receives a message from process p,
    and claims termination if
  • Sum over all p of totalNumberSentp
  • Sum over all p of totalNumberReceivedp
  • Is this algorithm correct?

44
Counterexample
  • The algorithm is incorrect. Heres a counter
    example with 3 processes P, Q, R with channels
    between them. Initially all processes are active
    and all channels are empty.
  • P sends a message to Q and becomes idle. When it
    becomes idle it sends a message to the detector
    with totalNumberSentP 1, totalNumberReceivedP
    0
  • Q sends a message to P and becomes idle. When it
    becomes idle it sends a message to the detector
    with totalNumberSentQ 1, totalNumberReceivedQ
    0

45
Counterexample (continued)
  • P gets the message from Q and becomes active. It
    sends a message to R and P remains active.
  • Q gets the message from P and becomes active. It
    sends a message to R and Q remains active.
  • R receives the messages from P and Q, and then R
    becomes idle. When R becomes idle it sends a
    message to the detector with totalNumberSentR
    0 and totalNumberReceivedR 2.
  • The detector claims termination though P and Q
    are still active!
Write a Comment
User Comments (0)
About PowerShow.com