Title: Quantifying the Properties of SRPT Scheduling
1Quantifying the Properties of SRPT
Scheduling
- Mingwei Gong and Carey Williamson
- Department of Computer Science
- University of Calgary
2Outline
- Introduction
- Background
- Web Server Scheduling Policies
- Related Work
- Research Methodology
- Simulation Results
- Defining/Refining Unfairness
- Quantifying Unfairness
- Summary, Conclusions, and Future Work
3Introduction
- Web large-scale, client-server system
- WWW World Wide Wait!
- User-perceived Web response time is composed of
several components - Transmission delay, propagation delay in network
- Queueing delays at busy routers
- Delays caused by TCP protocol effects
(e.g., handshaking, slow start, packet loss,
retxmits) - Queueing delays at the Web server itself, which
may be servicing 100s or 1000s of concurrent
requests - Our focus in this work Web request scheduling
4Example Scheduling Policies
- FCFS First Come First Serve
- typical policy for single shared resource
(unfair) - e.g., drive-thru restaurant Sens playoff tickets
- PS Processor Sharing
- time-sharing a resource amongst M jobs
- each job gets 1/M of the resources (equal,
fair) - e.g., CPU VM multi-tasking Apache Web server
- SRPT Shortest Remaining Processing Time
- pre-emptive version of Shortest Job First (SJF)
- give resources to job that will complete quickest
- e.g., ??? (express lanes in grocery
store)(almost)
5Related Work
- Theoretical work
- SRPT is provably optimal in terms of mean
response time and mean slowdown (classical
results) - Practical work
- CMU prototype implementation in Apache Web
server. The results are consistent with
theoretical work. - Concern unfairness problem (starvation)
- large jobs may be penalized (but not always
true!)
6Related Work (Contd)
- Harchol-Balter et al. show theoretical results
- For the largest jobs, the slowdown asymptotically
converges to the same value for any preemptive
work-conserving scheduling policies (i.e., for
these jobs, SRPT, or even LRPT, is no worse than
PS) - For sufficiently large jobs, the slowdown under
SRPT is only marginally worse than under PS, by
at most a factor of 1 e, for small e gt 0.
M.Harchol-Balter, K.Sigman, and A.Wierman
2002, Asymptotic
Convergence of Scheduling Policies w.r.t.
Slowdown, Proceedings of IFIP Performance 2002,
Rome, Italy, September 2002
7Related Work (Contd)
- Wierman and Harchol-Balter 2003
SJF
LAS
FSP
Always Unfair
Always Fair
Sometimes Unfair
PS
FCFS
PLCFS
LRPT
SRPT
A. Wierman and M.Harchol-Balter 2003,
(Best Paper) Classifying
Scheduling Policies w.r.t. Unfairness in an
M/GI/1, Proceedings of ACM SIGMETRICS, San
Diego, CA, June 2003
8A Pictorial View
8
PS
Slowdown
SRPT
1
0
8
Job Size
9Research Questions
- Do these properties hold in practice for
empirical Web server workloads? (e.g., general
arrival processes, service time distributions) - What does sufficiently large mean?
- Is the crossover effect observable?
- If so, for what range of job sizes?
- Does it depend on the arrival process and the
service time distribution? If so, how? - Is PS (the gold standard) really fair?
- Can we do better? If so, how?
10Overview of Research Methodology
- Trace-driven simulation of simple Web server
- Empirical Web server workload trace (1M
requests from WorldCup98) for main expts - Synthetic Web server workloads for the
sensitivity study experiments - Probe-based sampling methodology
- Estimate job response time distributions for
different job size, load level, scheduling policy - Graphical comparisons of results
- Statistical tests of results (t-test, F-test)
11Simulation Assumptions
- User requests are for static Web content
- Server knows response size in advance
- Network bandwidth is the bottleneck
- All clients are in the same LAN environment
- Ignores variations in network bandwidth and
propagation delay - Fluid flow approximation service time response
size - Ignores packetization issues
- Ignores TCP protocol effects
- Ignores network effects
- (These are consistent with SRPT literature)
12Performance Metrics
- Number of jobs in the system
- Number of bytes in the system
- Normalized slowdown
- The slowdown of a job is its observed response
time divided by the ideal response time if it
were the only job in the system - Ranges between 1 and ?
- Lower is better
13Empirical Web Server Workload
1998 WorldCup Internet Traffic Archive http//ita.ee.lbl.gov/ 1998 WorldCup Internet Traffic Archive http//ita.ee.lbl.gov/
Item Value
Trace Duration 861 sec
Total Requests 1,000,000
Unique Documents 5,549
Total Transferred Bytes 3.3 GB
Smallest Transfer Size (bytes) 4
Largest Transfer Size (bytes) 2,891,887
Median Transfer Size (bytes) 889
Mean Transfer Size (bytes) 3,498
Standard Deviation (bytes) 18,815
14Preliminaries An Example
TIMESTAMP SIZE
0.000000 3038
0.000315 949
0.001048 2240
0.004766 2051
0.005642 366
0.005872 201
0.006380 298
0.006742 1272
0.007271 597
0.008008 283
Jobs in System
Bytes in System
15Observations
- The byte backlog is the same for each
scheduling policy - The busy periods are the same for each policy.
- The distribution of the number of jobs in the
system is different
16General Observations (Empirical trace)
Load 50
Load 80
Load 95
Marginal Distribution (Num Jobs in System) for PS
and SRPT differences are more pronounced at
higher loads
17Objectives (Restated)
- Compare PS policy with SRPT policy
- Confirm theoretical results in previous work
(Harchol-Balter et al.) - For the largest jobs
- For sufficiently large jobs
- Quantify unfairness properties
18Probe-Based Sampling Algorithm
- The algorithm is based on PASTA (Poisson Arrival
See Time Average) Principle.
Slowdown (1 sample)
Repeat N times
19Probe-based Sampling Algorithm
- For scheduling policy S (PS, SRPT, FCFS, LRPT,
) do - For load level U (0.50, 0.80, 0.95) do
- For probe job size J (1B, 1KB, 10KB,
1MB...) do - For trial I (1,2,3 N) do
- Insert probe job at randomly chosen
point - Simulate Web server scheduling policy
- Compute and record slowdown value
observed - end of I
- Plot marginal distribution of slowdown
results - end of J
- end of U
- end of S
20Example Results for 3 KB Probe Job
21Example Results for 100 KB Probe Job
22Example Results for 10 MB Probe Job
23Statistical Summary of Results
24Two Aspects of Unfairness
- Endogenous unfairness (SRPT)
- Caused by an intrinsic property of a job, such as
its size. This aspect of unfairness is invariant - Exogenous unfairness (PS)
- Caused by external conditions, such as the number
of other jobs in the system, their sizes, and
their arrival times. - Analogy showing up at a restaurant without a
reservation, wanting a table for k people
25Observations for PS
Exogenous unfairness dominant
26Observations for SRPT
Endogenous unfairness dominant
27Asymptotic Convergence?
Yes!
283M
- Illustrating the crossover effect (load95)
3.5M
4M
29Crossover Effect?
Yes!
30Summary and Conclusions
- Trace-driven simulation of Web server scheduling
strategies, using a probe-based sampling
methodology (probe jobs) to estimate response
time (slowdown) distributions - Confirms asymptotic convergence of the slowdown
metric for the largest jobs - Confirms the existence of the cross-over effect
for some job sizes under SRPT - Provides new insights into SRPT and PS
- Two types of unfairness endogenous vs. exogenous
- PS is not really a gold standard for fairness!
31Ongoing Work
- Synthetic Web workloads
- Sensitivity to arrival process (self-similar
traffic) - Sensitivity to heavy-tailed job size
distributions - Evaluate novel scheduling policies that may
improve upon PS (e.g., FSP, k-SRPT, )
32Sensitivity to Arrival Process
- A bursty arrival process (e.g., self-similar
traffic, with Hurst parameter H gt 0.5) makes
things worse for both PS and SRPT policies - A bursty arrival process has greater impact on
the performance of PS than on SRPT - PS exhibits higher exogenous unfairness than SRPT
for all Hurst parameters and system loads tested
33Sensitivity to Job Size Distribution
- SRPT loves heavy-tailed distributions
the heavier the tail the better! - For all Pareto parameter values and all system
loads considered, SRPT provides better
performance than PS with respect to mean slowdown
and standard deviation of slowdown - At high system load (U 0.95), SRPT has more
pronounced endogenous unfairness than PS
34Thank You!Questions?
For more information
M. Gong and C. Williamson, Quantifying the
Properties of SRPT Scheduling, to appear,
Proceedings of IEEE MASCOTS, Orlando, FL,
October 2003
Email gongm,carey_at_cpsc.ucalgary.ca