CSCI 633: Advanced Operating Systems II Dept. of Computer Science CSU San Marcos - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

CSCI 633: Advanced Operating Systems II Dept. of Computer Science CSU San Marcos

Description:

Cuts. Cuts: graphical representation of a global state. ... Consistent Cut: If every message received by a Si before a cut event, was sent ... – PowerPoint PPT presentation

Number of Views:457
Avg rating:3.0/5.0
Slides: 55
Provided by: drgoldengr
Category:

less

Transcript and Presenter's Notes

Title: CSCI 633: Advanced Operating Systems II Dept. of Computer Science CSU San Marcos


1
CSCI 633 Advanced Operating Systems- IIDept.
of Computer ScienceCSU San Marcos
  • Fall 2003
  • Kayhan Erciyes

2
Theory
3
Model
  • Distributed system model
  • n processes connected by a communications network
  • imperfectly synchronized clocks, thus no exact
    notion of what time it is at other processes
  • reliable message delivery
  • may or may not bound message delivery time
  • depending on what problem were examining
  • depending on how practical we want solution to be
  • no shared memory

4
Fundamental Limitations
  • Distributed systems are assumed to have no global
    clock
  • Distributed systems have no shared memory
  • These limitations force us to be creative in
    solving the following fundamental problems (among
    others)
  • Ordering events
  • Capturing a global state
  • Ensuring mutual exclusion

5
Time in Distributed Systems
  • Three notions of time
  • Time seen by external observer A global clock
    of perfect accuracy
  • Time seen on clocks of individual hosts Each
    has its own clock, and clocks may drift out of
    sync
  • Logical time event a occurs before event b and
    this is detectable because information about a
    may have reached b
  • b is causally dependent on a

6
External Time
  • The gold standard against which many protocols
    are defined
  • Not implementable with non-zero transmission
    speeds
  • GPS helps, but there is still latency
  • and GPS only works outside. ?

7
Internal Clocks
  • Most workstations have reasonable clocks
  • But clocks drift apart and software
    resynchronization is inaccurate
  • Workstation clocks are appropriate only for
    coarse-grained notion of time
  • Unpredictable speeds a feature of all computing
    systems
  • Thus cant reliably predict how long events will
    take
  • Example how long for message transmission
  • Example how long to compute n!
  • Scheduling issues, daemon processes, other
    applications
  • Artificially choose bounds which trade off
    performance for safety
  • If I dont receive a response in 15 seconds,
    server is down

8
Logical Time
  • Abstraction
  • Doesnt provide a clock in the sense of wall
    clock time
  • Focus is on definition of the happens before
    relationship between events
  • Could whats happening in this event have been
    influenced by what happened in that event?
  • Events are
  • message sends
  • message receptions
  • local events

9
Logical Time The Picture
a
p0 p1 p2 p3
a,b and a,c are concurrent
b
d
c
c happens after b
e
e happens after a, b, c, d
10
Happened-Before Relation
  • The happened before relation ( à ) captures the
    causal relationships between events
  • a à b if a and b are events in the same process
    and a occurred before b
  • a à b if a is the send event of a message m and b
    is the corresponding receive event
  • à is a transitive relation
  • Event a potentially causally affects event b if a
    à b
  • Event a is concurrent with event b if neithera à
    b nor b à a

11
Three Important Schemes
  • Lamports logical clocks
  • clock is a single integer
  • Vector clocks
  • clock is a vector of integers with length n
  • n number of processes
  • Matrix clocks
  • clock is an n x n matrix of integers
  • n number of processes
  • Useful for ordered multicastcovered later
  • Optimizations for the latter two schemes to
    reduce overheads are important!

12
Lamports Clocks
  • Lamports clocks captures partial ordering
    defined by the à relation
  • Lamports logical clocks
  • Each process i maintains a counter Ci whose
    current value is attached to each message sent
  • Internal events in a process cause the counter Ci
    to be incremented
  • When a message is received by a process i, Ci is
    set to max(m.Cj, Ci) 1

13
Lamports Clock
  • Happened before relation
  • a -gt b Event a occurred before event b. Events
    in the same process.
  • a -gt b If a is the event of sending a message m
    in a process and b is the event of receipt of
    the same message m by another process.
  • a -gt b, b -gt c, then a -gt c. -gt is transitive.
  • Causally Ordered Events
  • a -gt b Event a causally affects event b
  • Concurrent Events
  • Only one is true a -gt b (or) b -gt a (or) a
    b.

14
Space-time Diagram
e12
Space
e11
e14
e13
P1
P2
e21
e22
e23
e24
Time
15
Logical Clocks
  • Conditions satisfied
  • Ci is clock in Process Pi.
  • If a -gt b in process Pi, Ci(a) lt Ci(b)
  • a sending message m in Pi b receiving message
    m in Pj then, Ci(a) lt Cj(b).
  • Implementation Rules
  • R1 Ci Ci d (d gt 0) clock is updated between
    two successive events.
  • R2 Cj max(Cj, tm d) (d gt 0) When Pj
    receives a message m with a time stamp tm (tm
    assigned by Pi, the sender tm Ci(a), a being
    the event of sending message m).

16
Space-time Diagram
e12
e13
e16
Space
e11
e14
e15
e17
P1
(6)
(7)
(1)
(3)
(5)
(2)
(4)
(1)
(3)
(4)
(7)
(2)
P2
e21
e22
e23
e25
e24
Time
17
Limitation of Lamports Clock
e12
Space
e11
e14
e13
P1
P2
e21
e22
e23
e24
e31
e32
e33
P3
Time
18
Vector Clocks
  • Keep track of transitive dependencies among
    processes for recovery purposes.
  • Ci1..n is a vector clock at process Pi whose
    entries are the assumed/best guess clock
    values of different processes.
  • Cij (j ! i) is the best guess of Pi for Pjs
    clock.
  • Vector clock rules
  • Cii Cii d, (d gt 0) for successive
    events in Pi
  • For all k, Cjk max (Cjk,tmk), when a
    message M is received by Pj from Pi.

For all
19
Vector Clock
e12
Space
e11
e14
P1
(2,0,0)
(1,0,0)
(3,4,1)
(0,1,0)
(2,4,1)
(2,3,1)
(2,2,0)
P2
e21
e22
e23
e24
e31
e32
P3
(0,0,1)
(0,0,2)
Time
20
Causal Ordering of Messages
Send(M1)
Space
P1
Send(M2)
P2
(1)
P3
(2)
Time
21
Message Ordering
  • Not really worry about maintaining clocks.
  • Order the messages sent and received among all
    processes in a distributed system.
  • Send(M1) -gt Send(M2), M1 should be received ahead
    of M2 by all processes.
  • This is not guaranteed by the communication
    network since M1 may be from P1 to P2 and M2 may
    be from P3 to P4.
  • Message ordering
  • Deliver a message only if the preceding one has
    already been delivered.
  • Otherwise, buffer it up.

22
BSS Algorithm
  • BSS Birman-Schiper-Stephenson Protocol
  • Broadcast based a message sent is received by
    all other processes.
  • Deliver a message to a process only if the
    message preceding it immediately, has been
    delivered to the process.
  • Otherwise, buffer the message.
  • Accomplished by using a vector accompanying the
    message.

23
BSS Algorithm ...
  • Process Pi increments the vector time VTpii,
    time stamps,
  • and broadcasts the message m.
  • 2. Pj ! Pi receives m. m is delivered when
  • a. VTpji VTmI - 1
  • b. VTpjk gt VTmk for all k in
    1,2,..n - i, n is the
  • total number of processes. Delayed
    message are queued
  • in a sorted manner.
  • c. Concurrent messages are ordered by
    time of receipt.
  • When m is delivered at Pj, VTpj updated according
    Rule 2 of
  • vector clocks.

24
BSS Algorithm
??
P1
P2
P3
??
25
SES Algorithm
  • SES Schiper-Eggli-Sandoz Algorithm. No need for
    broadcast messages.
  • Each process maintains a vector V_P of size N -
    1, N the number of processes in the system.
  • V_P is a vector of tuple (P,t) P the
    destination process id and t, a vector timestamp.
  • Tm logical time of sending message m
  • Tpi present logical time at pi

26
SES Algorithm
  • Sending a Message
  • Send message M, time stamped tm, along with V_P1
    to P2.
  • Insert (P2, tm) into V_P1. Overwrite the previous
    value of (P2,t), if any.
  • Any future message carrying (P2,tm) in V_P1
    cannot be delivered to P2 until tm lt tP2.
  • Delivering a message
  • If V_M (in the message) does not contain any pair
    (P2, t), it can be delivered.
  • / (P2, t) exists / If t gt Tp2, buffer the
    message. (dont deliver)
  • else deliver it

27
SES Algorithm
buffer
(2,2,1)
(1,0,1)
P1
Deliver from buffer
(0,1,1)
M2
P2
(0,2,1)
P3
M1
(0,2,2)
(0,0,1)
28
SES Algorithm ...
  • On delivering the message
  • Merge V_M (in message) with V_P2 as follows.
  • If (P,t) is not there in V_P2, merge.
  • If (P,t) is present in V_P2, t is updated with
    max(t in Vm, t in V_P2).
  • Message cannot be delivered until t in V_M is
    greater than t in V_P2
  • Update site P2s local, logical clock.
  • Check buffered messages after local, logical
    clock update.

29
SES Example
  • What does the condition t gt Tp2 imply?
  • t is message vector time stamp.
  • t gt Tp2 -gt For all j, tj gt Tp2j
  • This implies some events occurred without P2s
    knowledge in other processes. So P2 decides to
    buffer the message.
  • When t lt Tp2, message is delivered Tp2 is
    updated with the help of V_P2 (after the merge
    operation).

30
Global State
Global State 1
C1 Empty
500
200
C2 Empty
A
B
Global State 2
C1 Tx 50
450
200
C2 Empty
A
B
Global State 3
C1 Empty
450
250
C2 Empty
A
B
31
Recording Global State
  • Send(Mij) message M sent from Si to Sj
  • rec(Mij) message M received by Sj, from Si
  • time(x) Time of event x
  • LSi local state at Si
  • send(Mij) is in LSi iff (if and only if)
    time(send(Mij)) lt time(LSi)
  • rec(Mij) is in LSj iff time(rec(Mij)) lt time(LSj)
  • transit(LSi, LSj) set of messages sent/recorded
    at LSi and NOT received/recorded at LSj

32
Recording Global State
  • inconsistent(LSi,LSj) set of messages NOT
    sent/recorded at LSi and received/recorded at LSj
  • Global State, GS LS1, LS2,., LSn
  • Consistent Global State, GS LS1, ..LSn AND
    for all i in n, inconsistent(LSi,LSj) is null.
  • Transitless global state, GS LS1,,LSn AND
    for all I in n, transit(LSi,LSj) is null.

33
Recording Global State ..
LS1
M1
M2
S1
S2
LS2
M1 transit M2 inconsistent
34
Recording Global State...
  • Strongly consistent global state consistent and
    transitless, i.e., all send and the corresponding
    receive events are recorded in all LSi.

LS12
LS11
LS22
LS23
LS21
LS33
LS32
LS31
35
Chandy-Lamport Algorithm
  • Distributed algorithm to capture a consistent
    global state. Communication channels assumed to
    be FIFO.
  • Uses a marker to initiate the algorithm. Marker
    sort of dummy message, with no effect on the
    functions of processes.
  • Sending Marker by P
  • P records its state. For each outgoing channel C,
    P sends a marker on C with the state info.
  • Receiving Marker by Q
  • If Q has NOT recorded its state a. Record the
    state as an empty sequence, SEND marker (use
    above rule).
  • Else (Q has recorded state before) Record the
    state of C as sequence of messages received along
    C, after Qs state was recorded and before Q
    received the marker.

36
Chandy-Lamport Algorithm
  • Initiation of marker can be done by any process,
    with its own unique marker ltprocess id, sequence
    numbergt.
  • Several processes can initiate state recording by
    sending markers. Concurrent sending of markers
    allowed.
  • One possible way to collect global state all
    processes send the recorded state information to
    the initiator of marker. Initiator process can
    sum up the global state.

Seq
Sj
Si
Sc
Seq
37
Chandy-Lamport Algorithm ...
  • Example

Pk
Pi
Pj
Send Marker
Send Marker
Record channel state
Record channel state
Record channel state
Channel state example M1 sent to Px at t1, M2
sent to Py at t2, .
38
Cuts
  • Cuts graphical representation of a global state.
  • Cut C c1, c2, .., cn ci cut event at Si.
  • Consistent Cut If every message received by a Si
    before a cut event, was sent before the cut event
    at Sender.
  • One can prove A cut is a consistent cut iff no
    two cut events are causally related, i.e., !(ci
    -gt cj) and !(cj -gt ci).

c1
S1
c2
S2
c3
S3
c4
S4
39
Time of a Cut
  • C c1, c2, .., cn with vector time stamp VTci.
    Vector time of the cut, VTc sup(VTc1, VTc2, ..,
    VTcn).
  • sup is a component-wise maximum, i.e., VTci
    max(VTc1I, VTc2I, .., VTcnI).
  • Now, a cut is consistent iff VTc (VTc11,
    VTc22, .., VTcnn).

40
Termination Detection
  • Termination completion of the sequence of
    algorithm. (e.g.,) leader election, deadlock
    detection, deadlock resolution.
  • Use a controlling agent or a monitor process.
  • Initially, all processes are idle. Weight of
    controlling agent is 1 (0 for others).
  • Start of computation message from controller to
    a process. Weight split into half (0.5 each).
  • Repeat this any time a process send a
    computation message to another process, split the
    weights between the two processes (e.g., 0.25
    each for the third time).
  • End of computation process sends its weight to
    the controller. Add this weight to that of
    controllers.
  • Rule Sum of W always 1.
  • Termination When weight of controller becomes 1
    again.

41
Huangs Algorithm
  • B(DW) computation message, DW is the weight.
  • C(DW) control/end of computation message
  • Rule 1 Before sending B, compute W1, W2 (such
    that W1 W2 is W of the process). Send B(W2) to
    Pi, W W1.
  • Rule 2 Receiving B(DW) -gt W W DW, process
    becomes active.
  • Rule 3 Active to Idle -gt send C(DW), W 0.
  • Rule 4 Receiving C(DW) by controlling agent -gt W
    W DW, If W 1, computation has terminated.

42
Lamports Clocks Limitations
  • Clearly if an event a à b, then C(a) lt C(b)
  • Whats known if C(a) lt C(b)?
  • Only that b didnt happen before a
  • Can concurrent events be identified? Answer NO
  • Whats the case if C(a) gt C(b) ???
  • Correctly identifying all causal relationships
    will require more horsepower, but Lamports
    clocks are useful when you need to know about
    relationships, but can live with false
    causality reporting
  • One example some mutual exclusion algorithms

43
Lamports Clocks Total Order
  • à defines a partial order rather than a total
    order
  • How to impose a total order?
  • Use process IDs to break ties
  • Important Total order is artificial--it has
    nothing to do with the clock on the wall!

44
Vector Time
  • Vector time correctly captures the causality
    relationship between distributed events (captures
    the transitive closure of the à relation)
  • Each process i maintains a vector of state
    numbers TSi
  • TSii contains the current state number for
    process i TSin, n ¹ i, contains the most
    recent state number of process n upon which i
    currently depends

45
Vector Time Updates
  • TSi is updated as follows
  • TSii is incremented between successive events
    at i
  • When i sends a message m, TSi is attached to the
    message
  • When i receives a message m, TSik
    max(TSik, m.TSk)

46
Vector Time Comparison
  • Vector timestamps are compared in the following
    manner
  • TSi lt TSj iff " k, TSik TSjk and TSi ltgt
    TSj.
  • An event e causally depends on an event e' iff
    e'.TS lt e.TS
  • Two events e and e' are concurrent if neither
    e'.TS lt e.TS nor e.TS lt e'.TS

47
Vector Time The Picture
0 0 0 0
p0 p1 p2 p3
1 0 0 0
1 0 0 0
0 0 0 0
1 2 0 0
1 3 2 0
1 4 2 0
0 1 0 0
0 1 0 0
0 1 2 0
1 4 2 0
0 0 0 0
0 1 1 0
0 1 2 0
0 0 0 0
1 4 2 1
48
More Efficient Vector Time Protocols
  • The bad news
  • Bernadette Charron-Bosts theorem
  • The good news
  • Never send the receivers value
  • Plausible Clocks
  • Singhal/Kshemkalyani VT (static)
  • Richard VT (dynamic)

49
Singhal/Kshemkalyani
  • Transmit only portions of vector timestamps that
    have changed
  • Dont have to send element for receiver
  • Instead of a vector attachment for messages, now
    use set data structure
  • (PID, clock), , (PID, clock)
  • Why?
  • Larger memory footprint is traded for reduced
    message size

50
Singhal, Cont.
  • Data structures
  • TS Vector Timestamp O(n)
  • LS Last Sent O(n)
  • LU Last Updated O(n)
  • LR Last Received O(n2)
  • All except the latter is a vector (LR is a vector
    of vectors)
  • Need LR to be able to reconstitute complete
    vector timestamps for message reception events

51
Vector Time The Picture
0 0 0 0
p0 p1 p2 p3
1 0 0 0
2 0 0 0
3 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
0 0 0 0
0 1 0 0
1 2 0 0
2 3 0 0
3 4 0 0
0 1 0 0
0 0 0 0
0 1 1 0
0 1 2 0
0 1 2 0
0 0 0 0
0 1 2 1
52
Optimized Vector Time The Picture
0 0 0 0
p0 p1 p2 p3
1 0 0 0
2 0 0 0
3 0 0 0
(p0,1)
(p0,2)
(p0,3)
0 0 0 0
0 1 0 0
1 2 0 0
2 3 0 0
3 4 0 0
(p1,1)
0 0 0 0
0 1 1 0
0 1 2 0
(p1,1), (p2,2)
0 0 0 0
0 1 2 1
53
Singhal Rules
  • For process i
  • When updating any element at index k of TS
  • LUk TSi
  • When sending a message from j
  • Attach all (pk, TSk) such that LUk gt LSk
  • LSk TSi
  • When receiving a message m from j
  • Update TS as usual
  • For all x, LRjx max(LRjx, m.TSx)
  • If logging, can then attach LRj to message to
    represent complete vector timestamp

54
Singhal Simple Analysis
  • Good if communication is highly localized
  • N of processes, b of bits in one vector
    clock element
  • Beneficial only if

Cost of sending all entries of vector clock
Cost of sending one (PID, clock) pair
Write a Comment
User Comments (0)
About PowerShow.com