Title: Outline
1Outline
- Distributed scheduling
- Motivations
- Design issues
- Distributed scheduling algorithms
2Motivations
- In a locally distributed system, there is a good
possibility that several computers are heavily
loaded while others are idle or lightly loaded - If we can move jobs around (in other words,
distribute the load more evenly), the overall
performance of the system can be maximized
3Motivations cont.
4Motivations cont.
5Distributed Scheduling
- A distributed scheduler is a resource management
component of a distributed operating system that
focuses on judiciously and transparently
redistributing the load of the system among the
computers to maximize the overall performance
6Issues in Load Distribution
- Load estimation
- Queue lengths
- CPU utilization
- Load distributing algorithms
- Static
- Dynamic
- Adaptive
7Issues in Load Distribution cont.
- Load balancing vs. load sharing
- Load sharing tries to reduce the likelihood of an
unshared state (where one computer is idle while
at the same time others are overloaded) by
transferring tasks - Load balancing algorithms attempt to equalize
loads at all computers
8Issues in Load Distribution cont.
- Preemptive vs. non-preemptive transfers
- Preemptive task transfers involve the transfer of
tasks that are partially executed - This transfer is in general expensive as it needs
to transfer the entire task state consisting of a
virtual memory image, a process control block,
unread I/O buffers and messages, file pointers,
times that have been set, and so on - Non-preemptive transfers involve the transfer of
tasks that have not started yet - Environment transfer
9Components of a Load Distributing Algorithm
- Four components
- Transfer policy
- Determines when a node needs to send tasks to
other nodes or can receive tasks from other nodes - Selection policy
- Determines which task(s) to transfer
- Location policy
- Find suitable nodes for load sharing
10Components of a Load Distributing Algorithm
cont.
- Four components continued
- Information policy
- Demand-driven
- Periodic
- State-change driven
11Stability
- The queuing-theoretic perspective
- The CPU queues grow without bound if arrival rate
is greater than the rate at which the system can
perform work - A load distributing algorithm is effective under
a given set of conditions if it improves the
performance relative to that of a system not
using load distribution - Algorithmic stability
- An algorithm is unstable if it can perform
fruitless actions indefinitely with finite
probability - Processor thrashing
12Sender-Initiated Algorithms
- In sender-initiated algorithms, an overloaded
node initiates the load distribution - Transfer policy
- Selection policy
- Location policy
- Random
- Threshold
- Shortest
- Information policy
13Sender-Initiated Algorithms cont.
14Sender-Initiated Algorithms cont.
- Performance analysis
- Instability at high system loads
- When system loads are high, the sender-initiated
algorithms can cause the systems to be unstable - At high system loads, no node is likely to be
lightly loaded and the probability that a sender
will find a receiver is very low - However, the polling activity increases as the
rate at which work arrives increases - Performance at low system loads
15Receiver-Initiated Algorithms
- In receiver-initiated algorithms, an under loaded
node initiates the load distribution - Transfer policy
- Selection policy
- Location policy
- Information policy
16Receiver-Initiated Algorithms cont.
17Receiver-Initiated Algorithms cont.
- Performance analysis
- At high system loads, the probability of finding
a sender is high and thus a sender can find a
receiver in a few polls in general - At low system loads, there are few senders but
more receiver-initiated polls these polls do not
cause system instability as spare CPU cycles are
available - A drawback
- Most transfers will be preemptive and thus
expensive
18Empirical Comparison of Sender-Initiated and
Receiver-Initiated Algorithms
19Symmetrically Initiated Algorithms
- In symmetrically initiated algorithms, both
senders and receivers search for receivers and
senders respectively for task transfers - The above average algorithm
- Transfer policy
- Location policy
- Sender-initiated component
- Receiver-initiated component
- Selection policy
- Information policy
20Symmetrically Initiated Algorithms cont.
- Sender-initiated component
- A sender broadcasts a TooHigh message, sets a
TooHigh timeout alarm, and listens for an Accept - A receiver that receives a TooHigh message
cancels its TooLow timeout, sends an Accept
message to the sender, and increases its load
value - On receiving an Accept message, if the site is
still a sender, choose the best task to transfer
and transfer it - If no Accept has been received before the
timeout, it broadcasts a ChangeAverage message to
increase the average load estimates at the other
nodes
21Symmetrically Initiated Algorithms cont.
- Receiver-initiated component
- It broadcasts a TooLow message, set a TooLow
timeout alarm, and starts listening for a TooHigh
message - If TooHigh message is received, it cancels its
TooLow timeout, sends an Accept message to the
sender, and increases its load value - If no TooHigh message is received before the
timeout, the receiver broadcasts a ChangeAverage
message to decrease the average at other nodes
22Symmetrically Initiated Algorithms cont.
- Performance analysis
- Instability at high system loads
- Due to the sender-initiated components
23Comparison
24Adaptive Algorithms
- A stable symmetrically initiated algorithm
- Each node keeps of a senders list, a receivers
list, and an OK list - By classifying the nodes in the system as
Sender/overloaded, Receiver/underloaded, or OK
using the information gathered through polling
25A Stable Symmetrically Initiated Algorithm cont.
- Sender-initiated component
- The sender polls the node at the head of the
receiver - The polled node moves the sender to the head of
its sender list and sends a message indicating it
is a receiver, sender, or OK node - The sender updates the polled node based on the
reply - If the polled node is a receiver, it transfers a
task - The polling process stops if its receivers list
becomes empty, or the number of polls reaches a
PollLimit
26A Stable Symmetrically Initiated Algorithm cont.
- Receiver-initiated component
- The nodes polled in the following order
- Head to tail of its senders list
- Tail to head in the OK list
- Tail to head in the receivers list
27A Stable Sender-Initiated Algorithm
- This algorithm uses the sender-initiated
algorithm of the stable symmetrically initiated
algorithm - Each node is augmented by an array called the
statevector - It keeps track of its status at all the other
nodes in the system - It is updated based on the information at the
polling stage - The receiver-initiated component is replaced by
the following protocol - When a node becomes a receiver, it informs all
the nodes that are misinformed
28Comparison
29Performance Under Heterogeneous Workloads
30Selecting a Suitable Load Sharing Algorithm
- The best algorithm depends on the system under
consideration - For example, if the system never attains high
loads, sender-initiated algorithms will give an
improved algortihm - Stable scheduling algorithms should be used for
systems that can reach high loads - For systems with heterogeneous work loads,
adaptive stable algorithms are preferable
31Other Requirements of Load Distributing
- Scalability
- The algorithm should work well in large
distributed systems - Location transparency
- Determinism
- Preemption
- Heterogeneity
32Case Studies
- The V-System
- The Sprite system
- Condor system
- The Stealth distributed scheduler