Title: Global States in a Distributed System
1Global States in a Distributed System
- By John Kor and Yvonne Cheng
2Initial Problem Example
- Garbage Collector
- Frees up memory which is no longer in use
- Checks if a reference to memory still exists
- What about in a distributed system
3Initial Problem Example (contd)
- A distributed system consists of multiple
processes - Each process is located on a different computer
- No sharing of processor or memory
4Initial Problem Example (contd)
- Each process can only determine its own state
- Problem How do we determine when to garbage
collect in a distributed system? - How do we check whether a reference to memory
still exists?
5System Model
- A distributed system consists of multiple
processes - Each process is located on a different computer
- Each process consists of events
- An event is either sending a message, receiving a
message, or changing the value of some variable - Each process has a communication channel in and
out
6Our Garbage Collection Problem
- In order to test whether a certain property of
our system is true, we cannot just look at each
process individually - A snapshot of the entire system must be taken
to test whether a certain property of the system
is true - This snapshot is called a Global State
7Definition
- The global state of a distributed system is the
set of local states of each individual processes
involved in the system plus the state of the
communication channels.
8Determinism
- Deterministic Computation
- At any point in computation there is at most one
event that can happen next. - Non-Deterministic Computation
- At any point in computation there can be more
than one event that can happen next.
9Deterministic Computation
10Non-Deterministic Computation
11Determinism
- Deterministic computation
- A local event would reveal everything about the
global state! - The process will know other process state
- Non-Deterministic computation
- Because of branching, a local event cannot reveal
what the next step will be
12Simple Algorithm
- Create a new process that collects the states of
every other process - Every process will save their state at an
arbitrary time and send it to this new process
13Advantages
- Very simple
- Easy to implement
14Problems?
- Based on the assumption that all processes work
on a synchronized global clock - Wrong assumption!
15Problems (contd)
p
q
m
16Problems (contd)
p
q
m
17Problems (contd)
p
q
m
18Problems (contd)
p
q
m
m
19Another view
p
m
q
20Another view
- Process p has no record of sending m
- Process q HAS record of receiving m
- Problem?
- Global state does not show p sending m, therefore
there is confusion as to where m came from - Breaks the Consistency concept
21Consistency
- A global state is consistent if it could have
been observed by an external observer - If e ? e , then both e and e must reside within
the same state - For a successful Global State, all states must be
consistent
22Solution
- Need to develop an asynchronous algorithm
- Cannot depend on a clock
- Must ensure consistency in all global states
23Assumptions
- Distributed system Finite set of processes and
channels described by graph - Processes
- Set of states, initial state, set of events
- Channels
- FIFO, error-free, infinite buffers, arbitrary but
finite delay
24PART 2
25Idea of a global state recording algorithm
- each process records its own state
- the two processes incident by one channel
cooperate in recording the channel state
26Challenge
- No global clock
- Need a meaningful result
- Superimposed on underlying computation
27Meaningful The notion of Consistency
- it could have been observed by an external
observer - All feasible states are consistent
28An Example
q
p
Sp0
Sp1
Sp2
Sp3
p
m2
m1
m3
q
Sq0
Sq1
Sq2
Sq3
29A Consistent State?
q
p
Sq1
Sp1
Sp0
Sp1
Sp2
Sp3
p
m2
m1
m3
q
Sq0
Sq1
Sq2
Sq3
30Yes
q
p
Sq1
Sp1
Sp0
Sp1
Sp2
Sp3
p
m2
m1
m3
q
Sq0
Sq1
Sq2
Sq3
31A Consistent State?
q
p
Sq3
Sp2
m3
Sp0
Sp1
Sp2
Sp3
p
m2
m3
m1
q
Sq0
Sq1
Sq2
Sq3
32Yes
q
p
Sq3
Sp2
m3
Sp0
Sp1
Sp2
Sp3
p
m2
m3
m1
q
Sq0
Sq1
Sq2
Sq3
33An inconsistent State
q
p
Sq3
Sp1
Sp0
Sp1
Sp2
Sp3
p
m2
m1
m3
q
Sq0
Sq1
Sq2
Sq3
34Conducting algorithm Using An Example
- Processes p and q
- Channels c and c
- Token t
p
q
c
c
35An Example
p
q
c
t
c
36An Example
- q, c, and c record their states
p
q
c
t
c
37An Example
- The composite global state!
p
q
c
t
t
c
38An Example
- n number of messages sent along c before ps
state is recorded - n number of message sent along c before cs
state is recorded
p
q
c
c
39An Example
- - Reason of inconsistency nltn
p
q
c
t
n 0
c
p
q
c
t
n 1
c
40Similar scenario
- c is recorded when the token is at process p.
- p sends the token through channel c, and the
states of c, p, and q are recorded. - The recorded global state no tokens in the
system. - The reason of inconsistency ngtn
41Conclusion from the example
- A consistent global state
- requires
- n n
42Similar Conclusion
- m number of messages received along c before
qs state is recorded - m number of messages received along c before
cs state is recorded - To be consistency mm
43Some other equations
n gt m
- m number of messages received along c before
cs state is recorded - n number of messages sent along c before cs
state is recorded - m number of messages received along c before
ps state is recorded - n number of messages sent along c before ps
state is recorded - n n
- m m
n gt m
44Other Fact
- The state of channel c that is recorded must be
the sequence of messages sent along the channel
before the senders state is recorded, excluding
the sequence of messages received along the
channel before the receivers state is recorded. - Two cases
- nm c is empty
- ngtm c must be the (m1)stnth messages sent
by p along c
45Put All TogetherA brief sketch of the algorithm
- p sends a marker message along all its outgoing
channels after it records its state and before it
sends any other messages. - On receipt of a marker message from channel c
- else
- state ( c ) messages received on c since it
had recorded its state excluding the marker. - if p has not recorded its state
- record the state
- state ( c ) EMPTY
46Chandy and Lamport Algorithm
- Features
- Does not promise us to give us exactly what is
there - But gives us consistent state!!
47Algorithm in Action
Sp0
Sp1
Sp2
Sp3
p
m1
m2
m3
q
Sq0
Sq1
Sq2
Sq3
48Algorithm in Action
q records state as Sq1 , sends marker to p
Sp0
Sp1
Sp2
Sp3
p
m1
m2
m3
q
Sq0
Sq1
Sq2
Sq3
49Algorithm in Action
p records state as Sp2, channel state as empty
Sp0
Sp1
Sp2
Sp3
p
m1
m2
m3
q
Sq0
Sq1
Sq2
Sq3
50Algorithm in Action
q records channel state as m3
Sp0
Sp1
Sp2
Sp3
p
m1
m2
m3
q
Sq0
Sq1
Sq2
Sq3
51Algorithm in Action
Recorded Global State ((Sp2, Sq1), (0,m3) )
Sp0
Sp1
Sp2
Sp3
p
m1
m2
m3
q
Sq0
Sq1
Sq2
Sq3
52Algorithm in Action
Recorded Global State ((Sp2, Sq1), (0,m3)
) Computation may not even have passed
through the state recorded!
Sp0
Sp1
Sp2
Sp3
p
m1
m2
m3
q
Sq0
Sq1
Sq2
Sq3
53What have we recorded
- The recorded consistent state can be anything!
54Properties of the recorded global state
- Si global state when the algorithm starts
- Sj global state when the algorithm finishs
- S state recorded by the algorithm
- Then
- S is reachable from Si
- Sj is reachable from S
55S Is reachable from Si
Si
Sj
56Sj Is reachable from S
Si
Sj
57Still what good is it?
- Stable Properties
- A property Y is called a stable property iff for
all states S reachable from S - Y(S) -gt Y(S)
58Detection of Stable Properties
- Outcome false
- while ( outcome false )
-
- determine Global State S
- outcome Y (S)
59Checkpoint
- S serves as a checkpoint
- On a failure, restart the computation from S
- Problem!
- Not able to restore to Sj
Si
S
Sj
60Solution Publishing
- A Broadcast medium
- A central recorder process records all the
messages received by each process - Processes record their states at their own time
and send it to the recorder
61Determining Global State
- Recorder can construct global state from
- Checkpointed States of all processes
- Plus
- Messages recorded since last checkpoint
62Problems
- Publishing keeps track of all messages received
by each process - Expensive!
- Solution
- recorder takes checkpoint of process p at time t
- deletes all messages recd by p before t.
63Comparison
64Conclusion
- Global State detection difficult in Distributed
Systems - Snapshot algorithm may not give an actual state
but is very helpful in detecting Stable
Properties - Publishing gives an asynchronous way of
determining global states but is unscalable