Title: Distributed Dynamic Load Balancing
1Distributed Dynamic Load Balancing
- A Survey of Algorithms and Details of an
Implementation - by
- Julie Thorpe
2Outline
- Introduction
- Motivation of Load Balancing
- What is Distributed Dynamic Load Balancing?
- Survey of Algorithms
- Implementation Details
- Demonstration of Implementation
- Concluding Remarks
3Introduction
- Load balancing is often performed in distributed
operating systems. - Aim is to increase the collective speed of the
processes to be run by cooperation of the hosts
in the system. - Distributed computing is similar to parallel
computing in that the idea is to distribute work
across hosts in order to optimize for speed.
4Introduction (2)
- Load balancing is achieved by data and/or
processes migrating from one host to another over
the network. - The problem of migrating processes to achieve
this balance breaks down into two problems - load balancing
- process migration
- This presentation deals with the load balancing
problem.
5Motivation of Load Balancing
- Optimizing the speed of processes to be executed.
- Speed optimization needed for time-sensitive
applications that can be slow due to intensive
processing. - E.g. - video servers that serve hundreds or
thousands of simultaneous requests for
three-dimensional real-time video. - Each video stream can involve encoding and
decoding of sustained data transfer rates many
megabytes per second FOS95.
6What is Distributed Dynamic Load Balancing?
- There are a variety of classifications of load
balancing algorithms they can be classified into
a category and a migration strategy. - Category
- Distributed
- Each host in the system makes its own local load
balancing decisions. - Centralized
- A central scheduler maintains the status
information of all hosts on the system and uses
this information to make load balancing
decisions.
7What is Distributed Dynamic Load Balancing? (2)
- Migration Strategy
- Dynamic
- The processes can be migrated at any point in
their lifetimes. - Has been shown to be much more effective (35-50
lower mean delay) than the static load balancing
method HBD97 - Static
- process migration may only occur prior to the
process entering the CPU and allocating memory
for the first time. - This method avoids the high migration cost of
transferring the data structures allocated to the
process.
8Survey of Algorithms
- Six-phase model to describe specifics of an
algorithm. - These phases can be different for each algorithm.
- Different algorithm categories
- Push algorithms
- Pull algorithms
- Cluster-aware algorithms
9Considerations that Define the Specifics of the
Algorithm
- Six-phase model to describe them WLR
- Phase 1 Load Measurement
- Phase 2 Profitability Determination
- Phase 3 Load Transfer Calculation
- Phase 4 Process Selection
- Phase 5 Process Migration
- Phase 6 Granularity Adjustment
- Load balancing consists of phases 1-4, and
sometimes 6.
10Load Balancing Algorithms
- Different algorithm categories
- Push algorithms
- Steal algorithms
- Cluster-aware algorithms
- Variations on each detailed in paper
11Push Algorithms
- Sender-initiated transfers of processes from
overloaded hosts to underloaded hosts. - The destination host is determined by the type of
the algorithm whether load-based or random.
12Push Algorithms (2)
- Benefit
- Minimize processor idle time, since processes are
pushed before underloaded processors become idle.
- Drawbacks
- Produce high amounts of communication overhead.
- Unstable due to bandwidth problems, exceeded
communication buffers, and constant process
migration attempts.
13Steal Algorithms
- Hosts request processes from other hosts on the
distributed system once their load goes below a
predetermined threshold. - Can be either load-based or random.
14Steal Algorithms (2)
- Benefits
- Remain stable under heavy loads. If the load is
heavy on all hosts, there are no steal requests
made. - Minimizes communication overhead
- Drawback
- the stealing host being underused or even idle
during the time it takes for the stolen process
to arrive.
15Cluster-Aware Algorithms
- LANs, or clusters, are connected to other
clusters or the Internet via a WAN. - Cluster-aware algorithms were designed to
minimize wide area communication by taking
advantage of the clustered structure of most wide
area distributed systems.
16Cluster-Aware Algorithms (2)
- Problems with other 2 algorithms in WAN
environment - Push Algorithms
- The excessive network traffic can cause or
increase network congestion. - Steal algorithms
- The underloaded host is idle waiting for a
response from their steal request for entire RTT.
17Implementation Details
- Algorithm Chosen
- Constraints and related modifications
- Implementation decisions (based on six-phase
model) - Implementation overview
- Functionality provided
- Specifications for processes
- Program Structure
- Example walk through
18Algorithm Chosen
- A Cluster-aware version of Random Stealing,
called CRS. - When a host falls below a given threshold
(normally when it is idle), it will first attempt
a wide area steal request to a random host. - To avoid the node staying idle during the RTT
wait for the result, it sets a flag and performs
additional steal requests within its own cluster
until a new process arrives in its ready queue.
19Reasons for Choosing CRS
- It reduces wide-area communication.
- It is successful in hiding long wide-area RTT
times due to local stealing. - It has no parameters that need tuning.
- It is a completely distributed dynamic load
balancing algorithm. - CRS and its single-cluster variant were both
found to work well NKB01.
20Constraints and Necessary Modifications
- The implementation is an application program, and
thus cannot access the internal data structures
of the operating system. - Thus it cannot directly access the ready queue
and each process information. - Some additional communication and data structures
are used to obtain ready queue and process
information.
21Constraints and Necessary Modifications (2)
- Available development and testing environment
were the LAN of computers located in the CS
Building. - Therefore, the single cluster version of CRS was
implemented, i.e. the Random Stealing (RS)
algorithm on which CRS is based. - This implementation acts as a building block to a
CRS implementation, with an intermediate step of
simulating a WAN by adding an artificial delay.
22Implementation Decisions
- We implemented the first four phases of the
six-phase model. - The load-balancer provides an interface for the
fifth phase of process migration such that its
mechanisms are transparent. - The sixth phase is a feature that may be added in
the future.
23Implementation Decisions (2)
- Phase 1
- load descriptor is the ready queue length. An
easy and effective measurement. - Phase 2
- Profitability determination is done using the
method of Harcol-Balter and Downey that older
processes are more likely to live long enough to
justify its migration cost. - This method uses the migration cost to determine
a minimum migration age at which it would be
beneficial to migrate the process.
24Implementation Decisions (3)
- Phase 3
- Load transfer calculation - if a steal request is
received, and there is a process to give them,
migrate one process to the requesting host. - Avoids the host getting overloaded in case local
processes have started since the time of the
steal request. - Phase 4
- Process Selection - performed by the host that
receives the steal request. Selects the process
with the largest positive difference between the
minimum migration age and the age of the process.
25Implementation Overview
- Program will be referred to as load-balancer.
- The load-balancer runs as a heavyweight process
on each host in the distributed system. - The load-balancer provides an interface and set
of specifications for the processes that wish to
be load balanced.
26Implementation Overview (2)
- Each load-balancer heavyweight process contains
three lightweight processes. - The main thread runs the Process Communication
Server (PCS). - The PCS communicates with the local processes,
and other remote PCSs. - The load-balancers lightweight processes are
the Load Balancing Client, the Load Balancing
Server, and the RTT Updater.
27Program Structure
- There are three conceptual components to the
load-balancer - The Process Communication Server
- The Load Balancing Server
- The Load Balancing Client
- The Load Balancing Client and Server take care of
communication between load-balancers on different
hosts.
28The Process Communication Server (PCS)
- The local load-balancers PCS performs all local
process communication. - Registers, updates, and transfers migrating
processes. - The communication for migrating processes is
performed by both the source and destination
PCSs.
29Steal Request Communication
- The load-balancers on each host communicate by
steal requests and migrating processes. - The source Load Balancing Client sends steal
requests when load is below threshold. - The destination Load Balancing Server receives
and processes steal requests.
30Example
31Demo
32Concluding Remarks
- The topic of load balancing reinforced many
concepts learned in CS courses, and ties them
together in an interesting real-life application.
- Load balancing is a vast topic with many
branches, some extremely complex, and others
relatively simple.
33The End