Title: Chapter 18: Distributed Process Management
1Chapter 18 Distributed Process Management
- CS 472 Operating Systems
- Indiana University Purdue University Fort Wayne
2Distributed Process Management
- Note Chapter 18 is an online chapter and does
not appear in the textbook - Available under Online Chapters at
- WilliamStallings.com/OS/OSe6.html
- Be aware that the URL is case sensitive
- A .pdf of this chapter should also be available
on the class web site under Resources
3Distributed Process Management
- This chapter concerns some issues in developing a
distributed OS - Process migration
- Global state of a distributed system
- Distributed mutual exclusion
4Process migration
- A sufficient amount of the state of a process
must be transferred from one computer to another
for the process to execute on the target machine - Goals
- Load sharing
- Efficient interaction with other processes and
data - Access to special resources
- Survival
5Process migration goals
- Load sharing
- Move processes from heavily loaded to lightly
load systems - OS typically initiates the migration for load
sharing - Communications performance
- Processes that interact intensively can be moved
to the same node to reduce communications cost - May be better to move process to where the data
reside when the data set is large - The process itself typically initiates this
migration
6Process migration goals
- Utilizing special capabilities
- Process can migrate to take advantage of unique
hardware or software capabilities - Availability (survival)
- Long-running process may need to move because the
machine it is running on will be down
7To migrate process P from A to B . . .
- Destroy the process on A and create it on B
- Move at least the PCB
- Update any links between P and other processes
and data . . . - Including any outstanding messages and signals,
open files, etc.
8(No Transcript)
9Migration of process P from A to B
- Entire address space can be moved or pieces
transferred on demand - Transfer strategies (assuming paged virtual
memory) - Eager (all)
- Precopy
- Eager(dirty)
- Copy-on-reference
- Flushing
10Transfer strategies
- Eager (all) Transfer entire address space
- No trace of process is left behind
- If address space is large and if the process does
not need most of it, then this approach my be
unnecessarily expensive - Precopy Process continues to execute on the
source node while the address space is copied - Pages modified on the source during precopy
operation have to be copied a second time - Reduces the time that a process is frozen and
cannot execute during migration
11Transfer strategies
- Eager (dirty) Transfer only modified pages in
main memory - Any additional blocks of the virtual address
space are transferred on demand from disk - The source machine is involved throughout the
life of the process - Copy-on-reference Transfer pages only when
referenced - Has lowest initial cost of process migration
- Flushing Pages are cleared from main memory by
flushing dirty pages to disk - Relieves the source of holding any pages of the
migrated process in main memory - Needed pages are subsequently loaded from disk
12Initiation of migration can be made by ...
- by a load balancing process
- by the process itself
- for communications performance
- for survival
- to access special resources
- by the foreign system (eviction)
13Global state of a distributed system
- Difficult concept to understand
- Global state of a distributed system needs to be
known for mutual exclusion, avoiding deadlock,
etc. - Operating system cannot know the current state of
all process in the distributed system
14Global state of a distributed system
- A node can only know the current state of all
local processes and earlier states of remote
processes - States of remote processes are known only through
messages - Even the exact times of remote states cannot be
known - It is impossible to synchronize clocks of nodes
accurately enough to be of use
15Example
- A bank is distributed over two branches
- To close a checking account at a bank, the
account balance (global state of account) needs
to be known - Deposits may not have cleared
- Fund transfers may be pending
- Checks may not have been cashed
- Ask all correspondents to state pending activity
- Close the account when all reply
- Situation is analogous to determining the global
state of a system
16Example
- At 3 PM the account balance is to be determined
- Messages are exchanged for needed information
- A snapshot is established for each branch as of
3 PM
17Example
- Suppose at the time of balance determination, a
fund transfer message is in progress from branch
A to branch B - The result is a false balance determination (0)
18Example
- To correct the balance, all messages in transit
at the time of observation must be examined - Total consists of balance at both branches and
amount in the messages
19Example
- Suppose the clocks at the two branches are not
perfectly synchronized - Transfer amount at 301 from branch A
- Amount arrives at branch B at 259
- At 300 the amount is counted twice (200)
20Terminology
- Channel
- Exists between two processes if they exchange
messages - State
- Sequence of messages that have been sent and
received along channels incident with the process - Snapshot of a process
- Current local state of the process . . .
- together with the state as defined above
- Global state
- The combined snapshots of all processes
21Problem
- Process P gathers snapshots from the other
processes and determines a global state - Process Q does the same
- The two global states as determined by P and Q
may be different - Solution Settle for consistent global states
- Global states are consistent if . . .
- for each message received, the snapshot of the
sender indicates that the message was sent
22Inconsistent global state
23Consistent global state
24Distributed snapshot algorithm
- Assumes that all messages are delivered in the
order sent and no messages are lost (e.g. TCP) - Special control message called a marker is used
- Any process may initiate the algorithm by
- recording its state
- sending out the marker on all outgoing channels
before any other messages are sent
25(No Transcript)
26Distributed snapshot algorithm
- Let P be any participating process
- Upon first receipt of the marker (say from
process Q) process P does the following - P records its local state SP
- P records the state of the incoming channel from
Q to P as empty - P propagates the marker to all its neighbors
along all outgoing channels - These three steps must be performed atomically
without any other messages sent or received
27Distributed snapshot algorithm
- Later, when P receives a marker from another
incoming channel (say, from process R) . . . - P records the state of the channel from R to P as
the sequence of messages P has received from R
from the time P recorded its local state SP to
the time it received the marker from R - The algorithm terminates at process P once the
marker has been received along every incoming
channel
28Distributed snapshot algorithm
- Once the algorithm has terminated at all
processes, the consistent global state can be
assembled at any node - Any node wanting a consistent global state asks
every other node to send it the state data
recorded at that node
29Distributed snapshot algorithm
- The algorithm succeeds even if several nodes
independently decide to initiate the algorithm - The algorithm is not affected by any other
distributed algorithm the processes are
executing - Algorithm terminates in a finite amount of time
- Algorithm can be used to adapt any centralized
algorithm to a distributed environment
30Distributed mutual exclusion
- Recall that shared memory and semaphores cannot
be used to enforce mutual exclusion - Instead, any mechanism must depend on the
exchange of messages - Algorithms for mutual exclusion may be
- Centralized
- Distributed
31Centralized mutual exclusion algorithm
- Algorithm is straightforward
- One node is designated as the control node
- This node controls access to all shared objects
- Only the control node makes resource-allocation
decisions - Uses Request, Permission, and Release messages
- The control node may be a bottleneck
- Failure of the control node causes a breakdown of
mutual exclusion
32Distributed mutual exclusion algorithm
- Each node has only a partial picture of the total
system and must make decisions based on this
information - All nodes bear equal responsibility for the final
decision - Failure of a node, in general, does not result in
a total system collapse - There is no common clock and no way to adequately
synchronize clocks
33Distributed mutual exclusion
- Distributed algorithm does require a time
ordering of events - For this, an event is the sending of a message
- Did event E1 on node S1 occur before event E2 on
node S2 ? - Communication delays must be overcome
- The answer need not be correct, but all nodes
must reach the conclusion
34Lamports timestamping algorithm
- Gives a consistent time-ordering of events in a
distributed system - Each node I has a local counter CI
- When node I sends a message, it first increments
CI by 1 - Messages from node I all have form ( m, TI, I),
where - m is the actual message (like Request or Release)
- I is the node number
- TI is a copy of CI (the nodes timestamp) at
the time the message was created
35Lamports timestamping algorithm
- When node J receives a message from node I, it
updates its local CJ to 1 max CJ , Ti - ( m, TI, I ) precedes ( m, TJ, J ) . . .
- if TI lt TJ
- if TI TJ and I lt J
- For this to work, each message must be sent to
all other nodes
36Example
- (a,1,1) lt (x,3,2) lt (b,5,1) lt (j,5,3)
37Example
38Distributed mutual exclusion using a distributed
queue
- The queue is just an array with one entry for
each node - Requests for resources are granted FIFO, based on
timestamped request messages - All nodes maintain a copy of the queue
- Each node keeps the most recent message from each
of the other nodes in the queue
39Distributed mutual exclusion using a distributed
queue
- All nodes agree on a order for the messages
within the queue if no messages are in transit - The transit problem is overcome by the
distributed queue algorithm (First Version) - Summary on next slide
- 3(N-1) messages are involved per request
- Version Two is more efficient 2(N-1) messages
40Summary of distributed queue algorithm
- A timestamped resource Request message is sent to
all other nodes - A copy of the Request message is also saved in
the queue of the requesting node - If it has not itself made a request, each node
receiving a request sends a Reply message back to
the sender - This assures that no earlier Request message is
in transit when the requester makes its decision - A process may access the requested resource when
its request is the earliest message in its queue - After acting on a resource request, a node sends
a Release message to all other nodes and puts a
copy in its own queue
412
Suppose node wants to enter a critical
section . . .
1 2 3 4
1 2 3 4
1
Q
P Q P P
P Q
2
P
L
Q
Q
L
P
P
L
1 2 3 4
1 2 3 4
3
4
Q P
Q P
Q reQuest P rePly L reLease
4
What if node made an earlier request (in
transit)?
42Token-passing algorithm for distributed mutual
exclusion
- Two arrays are used
- Token array
- Passed from node to node
- The kth position contains timestamp of node k the
last time the token visited that node - Request array
- Maintained by each node
- The jth position contains the timestamp of the
last Request message received from node j
43Token-passing algorithm
- Send request to all other nodes
- Wait for the token
- Release the resource by sending the token to some
node requesting the resource - Choose the first requesting node K whose Request
message has a timestamp gt its timestamp in the
token - That is requestK gt tokenK
442
Suppose node wants to enter a critical
section and holds the token
3
1 2 3 4
1 2 3 4
1
Q
Q
2
Q
Q
token
1 2 3 4
1 2 3 4
3
4
Q
Q
Q reQuest T Time of last visit
1 2 3 4
T T T T
token
45Token-passing algorithm
- See full algorithm in Figure 18.11
- N messages are needed per resource request
- Choice of next requesting node is not FIFO
- However, no starvation
46Distributed deadlock in resource allocation
- Distributed deadlock prevention
- Circular-wait can be denied by defining a linear
ordering of resource types - Hold-and-wait condition can be denied by
requiring that a process request all of its
required resources at one time - The process is blocked until all requests can be
granted simultaneously - Resource requirements need to be known in advance
47Distributed deadlock
- Distributed deadlock avoidance is impractical
- Every node must keep track of the global state of
the system - The process of checking for a safe global state
must be done under mutual exclusion - Otherwise two nodes, each considering a different
request, could erroneously honor both requests
when only one is safe - Checking for safe states involves considerable
processing overhead for a distributed system with
a large number of processes and resources
48Distributed deadlock
- Distributed deadlock detection
- Each site only knows about its own resources
- Deadlock may involve distributed resources
- Three possible techniques
- Centralized control
- Hierarchical control
- Distributed control
49Distributed deadlock
- Distributed deadlock detection
- Centralized control
- One site is responsible for deadlock detection
- Simple, subject to failure of central node
- Hierarchical control
- Sites organized as a tree
- Each node collects information from children
- Detects deadlocks at common ancestor
- Distributed control
- All processes cooperate in the deadlock detection
function - This may have considerable overhead
50Deadlock in message communication
- Mutual Waiting
- Deadlock can exist due to mutual waiting among a
group of processes when each process is waiting
for a message from another process and there are
no messages in transit
P1 is waiting for a message from either P2 or P5
51Deadlock in message communication
- Unavailability of Message Buffers
- Well known problem in packet-switching data
networks - Store-and-forward deadlock
- Example of direct store-and-forward deadlock
- buffer space for A is filled with packets
destined for B - The reverse is true at B.
52Deadlock in message communication
- Unavailability of Message Buffers
- For each node, the queue to the adjacent node in
one direction is full with packets destined for
the next node beyond - Indirect store-and-forward deadlock