Title: CS 582 CMPE 481 Distributed Systems
1CS 582 / CMPE 481Distributed Systems
2Class Overview
- Synchronization in Distributed Systems
- Physical and Logical Time
- Global State
- Distributed Synchronization
- Election Algorithm
3Global State
- current state of a distributed computation
- meaningful global state
- from local states recorded at different local
times - distributed snapshot
- It consists of all local states and messages in
transit - A distributed snapshot should reflect a
consistent state
4Global State (cont.)
- A distributed system is defined as a collection P
of N processes pi, i 1,2, N - history(pi)
- hi ltei0, ei1, ei2, gt
- finite prefix of process history
- hik ltei0, ei1, , eikgt
- state of pi just before kth event occurs
- sik
- si0 initial state
- Global History of P
- H h0 U h1 U U hN-1
- Global State
- S (s1, s2, , sN)
5Global State (cont.)
- Meaningful global state
- process states that could have occurred at the
same time - corresponds to initial prefixes of the individual
process histories - A cut of the systems execution is a subset of
its global history - union of prefixes of process histories
-
- si ? S corresponding to the cut C
- state of pi immediately after the last event
processed by pi in the cut - frontier of
the cut
6Global State (cont.)
7Global State (cont.)
- Consistent Cut C
- C is consistent if for each event it contains, it
also contains the events that happened before
that event. - for all events e ? C, f ? e ? f ? C
- Consistent global state
- state that corresponds to a consistent cut
8Global State (cont.)
- Execution of distributed systems
- series of transitions between global states of
the system - S0 ? S1 ? S2 ?
- in each transition one event occurs at some
single process - Run
- total ordering of all events in a global history
consistent with each local history ordering - Consistent Run or Linearization
- ordering of the events in a global history
consistent with the happened before
relationship on H - Reachable States
- S is reachable from S if there is a
linearization that passes through S and then S.
9Distributed Snapshot Algorithm
- Chandy Lamport 1985
- records set of process channel states for set
of processes pi such that recorded global state
is consistent - state recorded locally at pi
- assumptions
- reliable communications (exactly once semantics)
- processes channels do not fail
- unidirectional channels with FIFO delivery
- path between any two processes (strongly
connected process-channel graph) - any process may initiate snapshot algorithm at
any time - processes may continue execution, send or receive
normal messages while snapshot algorithm executes
10Distributed Snapshot Algorithm (cont)
- Marker receiving rule for process pi
- On pis receipt of a marker message over channel
c - if (pi has not yet recorded its state) it
- records its process state now
- records the state of c as the empty set
- turns on recording of messages arriving over
other incoming channels - else
- pi records the state of c as the set of
messages it has received over c - since it saved its state.
- end if
- Marker sending rule for process pi
- After pi has recorded its state, for each
outgoing channel c - pi sends one marker message over c
- (before it sends any other message over c).
11Distributed Synchronization
- In a distributed system
- resources are shared by multiple processes, whose
activities need to be synchronized - mutual exclusion is often required to prevent
interference and ensure consistency - Distributed mutual exclusion
- ME1 (safety)
- at most one process may execute in the critical
section (CS) at a time - ME2 (liveness)
- a process requesting entry to the CS is
eventually granted it - ME3 (ordering)
- entry to the CS should be granted in
happened-before order - approaches
- centralized
- decentralized
12Centralized Solution
- A server process coordinates mutual exclusion
- Algorithm
- Clients
- before entering the CS,
- a process sends a request message to the server
and waits for a reply from it - when leaving the CS,
- a process sends a release message to the server
- Server
- on receipt of request
- if no process in the CS and queue is empty, send
a reply message otherwise, queue the request - on receipt of release
- remove next request from queue and send a reply
- A single point of failure
13Ring Based Distributed Algorithm
- Algorithm
- processes form a ring and token message is
circulated around it - possession of token implies right to enter CS
- after leaving CS, pass token to its neighbour
- Analysis
- 1 to (N - 1) messages are taken to get token
- token is not necessarily obtained in
happened-before order - if one process fails, need reconfiguration
- process assumed to be failed may inject old token
14Distributed Algorithm
- Ricart Agrawala 1981
- based on distributed agreement using event
ordering and timestamps - Assumptions
- processes p1, , pn know one anothers address
- all messages sent are eventually delivered
- each process pi keeps a logical clock conforming
to LC1 LC2 - token is being used to represent the state of a
process - RELEASED
- WANTED
- HELD
15Distributed Algorithm (cont)
On initialization state RELEASED To enter
the section state WANTED Multicast request
to all processes request processing
deferred here T requests timestamp Wait
until (number of replies received (N
1)) state HELD On receipt of a request
ltTi, pigt at pj (i ? j) if (state HELD or
(state WANTED and (Tj, pj) lt (Ti, pi))) then
queue request from pi without replying else
reply immediately to pi end if To exit the
critical section state RELEASED reply to
any queued requests
16Distributed Algorithm (cont)
- Analysis
- 2 (N - 1) messages are required to access CS
- expensive and a failure of any process becomes
bottleneck - extra overhead since even if the process
requesting a token was the last to possess, it
still goes through the process above
17Elections
- Purpose
- to choose a process from a group
- select a new master in Berkeley clock
synchronization algorithm - select a new member generating a token in
ring-based distributed synchronization - Algorithms
- ring-based
- bully
18Ring-based Election
- Chang Roberts 1979
- goal to elect single process, i.e. coordinator,
- process with largest identifier
- Algorithm
- initially, every process is marked as a
non-participant - any process begins election by marking itself as
participant and sending election message to its
neighbor - when election message is received, check if
participant compare id - if not participant
- arrived id is higher - claims myself as
participant pass message - my id is higher - substitute id, claims myself as
a participant pass message - receiver already participant do not forward
message if arrived id is smaller - election is done when id in election message is
same as claimed participant - mark itself as non participant send elected
message with id - process receives elected message mark itself as
non participant forward - Analysis
- (3N - 1) messages in worst case and 2N in best
case
19A ring-based election in progress
3
17
17
4
24
9
24
1
15
24
28
Note The election was started by process 17.The
highest process identifier encountered so far is
24. Participant processes are shown darkened
20Bully Algorithm Garcia-Molina 1982
- Assumptions
- each process has a unique id
- processes know id and address of every other
process - communication is assumed reliable but process can
fail during election - election begins when detecting the coordinator
has failed - Algorithm
- to begin election, a process sends election
message to all processes with higher ids and
awaits answer message - if no answer message, process becomes coordinator
and sends coordinator message to processes with
lower ids - if process receives answer message, waits for
coordinator message - if process receives election message, it returns
answer and starts an election - if process receives coordinator message, it
treats the sender as coordinator - if failed process with highest id is restarted,
it overrides the current coordinator - Analysis
- (N - 2) in best case and O(N2) messages in worst
case
21The bully algorithm example
The election of coordinator p2, after the
failure of p4 and then p3