Title: Task Assignment with Unknown Duration
1Task Assignment with Unknown Duration
- ICS 248 Queuing Theory
- Project Presentation
- Keita Fujii (kfujii_at_uci.edu)
Published in Journal of the ACM, Vol.49 No.2,
March 2002, pp. 260-288
2What this paper is about
- This paper proposes
- a new task assignment policy for a distributed
server system called TAGS (Task Assignment based
on Guessing Size). - TAGS outperforms other existing task assignment
policies under heavy-tailed workloads
3Distributed Server System
- A distributed server system is a computational
system that distributes incoming jobs to several
machines - Incoming jobs are dispatched to one of the
machines for processing - Task Assignment Policy determines which host a
job is going to be assigned to - Performance of a distributed server system
depends on Task Assignment Policy
Incoming Jobs
Dispatcher
4Assumptions
- This paper assumes that
- All machines are identical
- No cost (time) is required for dispatching jobs
to hosts - Jobs are NOT preemptible (preemptive?)
- Jobs can be aborted, but must be restarted from
the beginning again - Job sizes are not known in advance
5Why not preemptive
- Jobs submitted to a supercomputer tend to be not
preemptive because - They require lots of memory ? swapping jobs are
very expensive - Many OS do not support preemption among several
processors.
6Existing Task Assignment Policies
- Random
- Chooses a host randomly
- Round-Robin
- Chooses a host in a cyclical fashion
- Performs almost as well as Random
- Shortest-Queue
- Chooses a host which has the fewest number of
jobs in its queue - Least-Work-Remaining
- Chooses a host with the least remaining work (
sum of the remaining job sizes) - Optimal for exponential job size distribution
- Those policies do not perform well if job size
distribution becomes more variant than
exponential - ? The distribution of actual job sizes is not
exponential, but heavy-tailed
7Implementation of Least-Work-Remaining
- We need to know job sizes beforehand in order to
implement Least-Work-Remaining policy?? ? NO - Least-Work-Remaining policy is equivalent to
Central-Queue policy
Host A
Host B
Host C
8Heavy-tailed Distribution
- Heavy-tailed distribution is a one for
whichPXgtxx-? where 0lt?lt2. - Characteristic of Heavy-tailed distribution
- Infinite variance (and if ?lt1, infinite mean)
- A tiny fraction (lt1) of the jobs comprise over
50 of the total load - Examples of Heavy-tailed distribution
- UNIX process CPU requirement (1lt?lt1.25)
- Sizes of files transferred through HTTP
(1.1lt?lt1.3) and FTP (.9lt?lt1.1) - Pittsburg Supercomputing Center workloads for
distributed servers - Heavy-tailed distribution is common
- ? tends to be close to 1
9Bounded Pareto Distribution
- Pareto Distribution is heavy-tailed
- However, in practice, there is some upper bound
on the maximum size of a job ? Bounded Pareto
Distribution B(k,p,?) - k the shortest possible job (0ltklt1500)
- k is adjusted when ? has changed so that
E(X)3000 - p the largest possible job (1010)
- ? the exponent of the power low
- The smaller ? is, the bigger E(X2)
10TAGS (Task Assignment based on Guessing Size)
- TAGS (Task Assignment based on Guessing Size)
- Is designed to perform better than existing
policies when the distribution of job sizes is
heavy-tailed - Algorithm
- Suppose there are h hosts with queues
- The ith host has a number si, where s1lts2ltltsh
- All incoming jobs are dispatched to Host 1
- If a job cannot complete before s1, it is killed
and forwarded to Host 2 - If the job cannot complete before si, it is
killed and forwarded to Host si1
Incoming Jobs
11Performance of TAGS( of hosts 2, system
load0.5)
- TAGS outperforms Random and Leas-Work-Remaining
policies - Metrics
- Mean waiting time
- mean slowdown (waiting time / job size)
12Why TAGS is better than others?
- There are two reasons why TAGS works better than
other Task Assignment policies - Variance reduction
- TAGS reduces the variance of job sizes that share
the same queue - Load unbalancing
- TAGS keeps Host 2 busier than Host 1
13Variance Reduction
- Variance reduction
- Reduce the variance of job sizes that share the
same queue - Why variance reduction improves performance?
- Because it reduces the chance of a short job
getting stuck behind a long job in the same queue - Theoretical proof all metrics (W,S,Q) depend on
E(X2)
Better
W waiting time in queueS slowdownQ queue
length? jobs arrival rate? System load
(Pollaczek-Khinchin formula)
(Littles formula)
14- Random and Round-robin-Least-Work-Remaining
policies do NOT reduce the variance of job sizes - TAGS reduces the variance of job sizes
- The sizes of jobs assigned to Host i are between
si-1 and si
15Load Unbalancing
- Load unbalancing work better for heavy-tailed job
size distribution - Reminder In heavy-tailed distribution, a tiny
fraction (lt1) of the jobs comprise over 50 of
the total load - Idea Put the heavy jobs into Host 2, so that
Host 1 is always available for the small jobs - Note if job size distribution becomesless
heavy-tailed, Host 1 is moreoverloaded than Host
2, becausemore jobs need to be forwardedto Host
2 but those jobs are processedand terminated by
Host1
Workload lt1 Workload1
16Other characteristics of TAGS
- As system load increases, the performance
improvement of TAGS decreases to the same level
as other policies - Because the penalty of killing big jobs becomes
significant
System Load0.7
System Load0.5
17- In TAGS, you can choose a different set of si in
order to satisfy a different requirement - TAGS-opt-waitingtime
- Minimizes mean waiting time
- TAGS-opt-slowdown
- Minimizes mean slowdown
- Slowdown waiting time / job size
- TAGS-opt-fairness
- Optimizes fairness
- Fairness all jobs experience the same slowdown
18- TAGS also outperforms other policies in term of
server expansion - Server expansion how many additional hosts must
be added to the existing server to bring mean
slowdown down to a certain level
Initial of hosts 2Initial workload 0.7
19Theoretical analysis of TAGS
Properties of Bounded Pareto Distribution B(k,p,?)
If ?!j If ?j1 If ?j2
20pi fraction of jobs whose final destination
is Host i pivisit fraction of jobs which ever
visit Host i
EXij, the jth moment of size of jobs whose
final destination is Host i, can be computed as
following
If ?!j If ?j1 If ?j2
21Then EXivisit, the distribution of the time the
jobs who visited Host i spent at Host i, is
because pi/pivisit jobs spend EXi and complete
at Host i, and (pivisit-pi)/pivisit jobs spend si
at Host i and then terminated and sent to Host
i1.
?i, the arrival rate of Host i, and ?i, the load
of Host i, can also be computed as following
22Then you can use Pollaczek-Khinchin formula and
Littles formula to calculate mean waiting time
and mean slowdown.
Now you can put all the formulas into Mathematica
to compute si which minimizes the mean slowdown,
mean waiting time, or fairness.