Title: Workload Modeling and its Effect on Performance Evaluation
1Workload Modelingand its Effect on Performance
Evaluation
- Dror Feitelson
- Hebrew University
2Performance Evaluation
- In system design
- Selection of algorithms
- Setting parameter values
- In procurement decisions
- Value for money
- Meet usage goals
- For capacity planing
3The Good Old Days
- The skies were blue
- The simulation results were conclusive
- Our scheme was better than theirs
Feitelson Jette, JSSPP 1997
4- But in their papers,
- Their scheme was better than ours!
5- How could they be so wrong?
6Performance evaluation depends on
- The systems design
- (What we teach in algorithms and data structures)
- Its implementation
- (What we teach in programming courses)
- The workload to which it is subjected
- The metric used in the evaluation
- Interactions between these factors
7Performance evaluation depends on
- The systems design
- (What we teach in algorithms and data structures)
- Its implementation
- (What we teach in programming courses)
- The workload to which it is subjected
- The metric used in the evaluation
- Interactions between these factors
8Outline for Today
- Three examples of how workloads affect
performance evaluation - Workload modeling
- Getting data
- Fitting, correlations, stationarity
- Heavy tails, self similarity
- Research agenda
- In the context of parallel job scheduling
9- Example 1
- Gang Scheduling and
- Job Size Distribution
10Gang What?!?
- Time slicing parallel jobs with coordinated
context switching - Ousterhout
- matrix
Ousterhout, ICDCS 1982
11Gang What?!?
- Time slicing parallel jobs with coordinated
context switching - Ousterhout
- matrix
- Optimization
- Alternative
- scheduling
Ousterhout, ICDCS 1982
12Packing Jobs
- Use a buddy system for allocating processors
Feitelson Rudolph, Computer 1990
13Packing Jobs
- Use a buddy system for allocating processors
14Packing Jobs
- Use a buddy system for allocating processors
15Packing Jobs
- Use a buddy system for allocating processors
16Packing Jobs
- Use a buddy system for allocating processors
17The Question
- The buddy system leads to internal fragmentation
- But it also improves the chances of alternative
scheduling, because processors are allocated in
predefined groups - Which effect dominates the other?
18The Answer (part 1)
Feitelson Rudolph, JPDC 1996
19The Answer (part 2)
20The Answer (part 2)
21The Answer (part 2)
22The Answer (part 2)
- Many small jobs
- Many sequential jobs
- Many power of two jobs
- Practically no jobs use full machine
- Conclusion buddy system should work well
23Verification
Feitelson, JSSPP 1996
24- Example 2
- Parallel Job Scheduling
- and Job Scaling
25Variable Partitioning
- Each job gets a dedicated partition for the
duration of its execution - Resembles 2D bin packing
- Packing large jobs first should lead to better
performance - But what about correlation of size and runtime?
26Scaling Models
- Constant work
- Parallelism for speedup Amdahls Law
- Large first ? SJF
- Constant time
- Size and runtime are uncorrelated
- Memory bound
- Large first ? LJF
- Full-size jobs lead to blockout
Worley, SIAM JSSC 1990
27Scan Algorithm
- Keep jobs in separate queues according to size
(sizes are powers of 2) - Serve the queues Round Robin, scheduling all jobs
from each queue (they pack perfectly) - Assuming constant work model, large jobs only
block the machine for a short time - But the memory bound model would lead to
excessive queueing of small jobs
Krueger et al., IEEE TPDS 1994
28The Data
29The Data
30The Data
31The Data
Data SDSC Paragon, 1995/6
32The Data
Data SDSC Paragon, 1995/6
33The Data
Data SDSC Paragon, 1995/6
34Conclusion
- Parallelism used for better results, not for
faster results - Constant work model is unrealistic
- Memory bound model is reasonable
- Scan algorithm will probably not perform well in
practice
35- Example 3
- Backfilling and
- User Runtime Estimation
36Backfilling
- Variable partitioning can suffer from external
fragmentation - Backfilling optimization move jobs forward to
fill in holes in the schedule - Requires knowledge of expected job runtimes
37Variants
- EASY backfilling
- Make reservation for first queued job
- Conservative backfilling
- Make reservation for all queued jobs
38User Runtime Estimates
- Lower estimates improve chance of backfilling and
better response time - Too low estimates run the risk of having the job
killed
- So estimates should be accurate, right?
39They Arent
Mualem Feitelson, IEEE TPDS 2001
40Surprising Consequences
- Inaccurate estimates actually lead to improved
performance - Performance evaluation results may depend on the
accuracy of runtime estimates - Example EASY vs. conservative
- Using different workloads
- And different metrics
41EASY vs. Conservative
Using CTC SP2 workload
42EASY vs. Conservative
Using Jann workload model
43EASY vs. Conservative
Using Feitelson workload model
44Conflicting Results Explained
- Jann uses accurate runtime estimates
- This leads to a tighter schedule
- EASY is not affected too much
- Conservative manages less backfilling of long
jobs, because respects more reservations
45Conservative is bad for the long jobsGood for
short ones that are respectedConservativeEA
SY
46Conflicting Results Explained
- Response time sensitive to long jobs, which favor
EASY - Slowdown sensitive to short jobs, which favor
conservative - All this does not happen at CTC, because
estimates are so loose that backfill can occur
even under conservative
47Verification
Run CTC workload with accurate estimates
48But What About My Model?
- Simply does not have such small long jobs
49Workload Data Sources
50No Data
- Innovative unprecedented systems
- Wireless
- Hand-held
- Use an educated guess
- Self similarity
- Heavy tails
- Zipf distribution
51Serendipitous Data
- Data may be collected for various reasons
- Accounting logs
- Audit logs
- Debugging logs
- Just-so logs
- Can lead to wealth of information
52NASA Ames iPSC/860 log
- 42050 jobs from Oct-Dec 1993
- user job nodes runtime date
time - user4 cmd8 32 70 11/10/93
101317 - user4 cmd8 32 70 11/10/93
101930 - user42 nqs450 32 3300 11/10/93 102207
- user41 cmd342 4 54 11/10/93 102237
- sysadmin pwd 1 6 11/10/93
102242 - user4 cmd8 32 60 11/10/93
102542 - sysadmin pwd 1 3 11/10/93
103043 - user41 cmd342 4 126 11/10/93 103132
Feitelson Nitzberg, JSSPP 1995
53Distribution of Job Sizes
54Distribution of Job Sizes
55Distribution of Resource Use
56Distribution of Resource Use
57Degree of Multiprogramming
58System Utilization
59Job Arrivals
60Arriving Job Sizes
61Distribution of Interarrival Times
62Distribution of Runtimes
63User Activity
64Repeated Execution
65Application Moldability
66Distribution of Run Lengths
67Predictability in Repeated Runs
68Recurring Findings
- Many small and serial jobs
- Many power-of-two jobs
- Weak correlation of job size and duration
- Job runtimes are bounded but have CVgt1
- Inaccurate user runtime estimates
- Non-stationary arrivals (daily/weekly cycle)
- Power-law user activity, run lengths
69Instrumentation
- Passive snoop without interfering
- Active modify the system
- Collecting the data interferes with system
behavior - Saving or downloading the data causes additional
interference - Partial solution model the interference
70Data Sanitation
- Strange things happen
- Leaving them in is safe and faithful to the
real data - But it risks situations in which a
non-representative situation dominates the
evaluation results
71Arrivals to SDSC SP2
72Arrivals to LANL CM-5
73Arrivals to CTC SP2
74Arrivals to SDSC Paragon
What are they doing at 330 AM?
75330 AM
- Nearly every day, a set of 16 jobs are run by the
same user - Most probably the same set, as they typically
have a similar pattern of runtimes - Most probably these are administrative jobs that
are executed automatically
76Arrivals to CTC SP2
77Arrivals to SDSC SP2
78Arrivals to LANL CM-5
79Arrivals to SDSC Paragon
80Are These Outliers?
- These large activity outbreaks are easily
distinguished from normal activity - They last for several days to a few weeks
- They appear at intervals of several months to
more than a year - They are each caused by a single user!
- Therefore easy to remove
81(No Transcript)
82Two Aspects
- In workload modeling, should you include this in
the model? - In a general model, probably not
- Conduct separate evaluation for special
conditions (e.g. DOS attack) - In evaluations using raw workload data, there is
a danger of bias due to unknown special
circumstances
83Automation
- The idea
- Cluster daily data in based on various
workload attributes - Remove days that appear alone in a cluster
- Repeat
- The problem
- Strange behavior often spans multiple days
Cirne Berman, Wkshp Workload Charact. 2001
84Workload Modeling
85Statistical Modeling
- Identify attributes of the workload
- Create empirical distribution of each attribute
- Fit empirical distribution to create model
- Synthetic workload is created by sampling from
the model distributions
86Fitting by Moments
- Calculate model parameters to fit moments of
empirical data - Problem does not fit the shape of the
distribution
87Jann et al, JSSPP 1997
88Fitting by Moments
- Calculate model parameters to fit moments of
empirical data - Problem does not fit the shape of the
distribution - Problem very sensitive to extreme data values
89Effect of Extreme Runtime Values
Downey Feitelson, PER 1999
90Alternative Fit to Shape
- Maximum likelihood what distribution parameters
were most likely to lead to the given
observations - Needs initial guess of functional form
- Phase type distributions
- Construct the desired shape
- Goodness of fit
- Kolmogorov-Smirnov difference in CDFs
- Anderson-Darling added emphasis on tail
- May need to sample observations
91Correlations
- Correlation can be measured by the correlation
coefficient - It can be modeled by a joint distribution
function - Both may not be very useful
92(No Transcript)
93Correlation Coefficient
Gives low results for correlation of runtime and
size in parallel systems
94Distributions
A restricted version of a joint distribution
95Modeling Correlation
- Divide range of one attribute into sub-ranges
- Create a separate model of other attribute for
each sub-range - Models can be independent, or model parameter can
depend on sub-range
96Stationarity
- Problem of daily/weekly activity cycle
- Not important if unit of activity is very small
(network packet) - Very meaningful if unit of work is long (parallel
job)
97How to Modify the Load
- Multiply interarrivals or runtimes by a factor
- Changes the effective length of the day
- Multiply machine size by a factor
- Modifies packing properties
- Add users
98Stationarity
- Problem of daily/weekly activity cycle
- Not important if unit of activity is very small
(network packet) - Very meaningful if unit of work is long (parallel
job) - Problem of new/old system
- Immature workload
- Leftover workload
99Heavy Tails
100Tail Types
- When a distribution has mean m, what is the
distribution of samples that are larger than x? - Light expected to be smaller than xm
- Memoryless expected to be xm
- Heavy expected to be larger than xm
101Formal Definition
- Tail decays according to a power law
- Test log-log complementary distribution
102Consequences
- Large deviations from the mean are realistic
- Mass disparity
- small fraction of samples responsible for large
part of total mass - Most samples together account for negligible part
of mass
Crovella, JSSPP 2001
103Unix File Sizes Survey, 1993
104Unix File Sizes LLCD
105Consequences
- Large deviations from the mean are realistic
- Mass disparity
- small fraction of samples responsible for large
part of total mass - Most samples together account for negligible part
of mass - Infinite moments
- For mean is undefined
- For variance is undefined
Crovella, JSSPP 2001
106Pareto Distribution
- With parameter the density is
proportional to - The expectation is then
- i.e. it grows with the number of samples
107Pareto Samples
108Pareto Samples
109Pareto Samples
110Effect of Samples from Tail
- In simulation
- A single sample may dominate results
- Example response times of processes
- In analysis
- Average long-term behavior may never happen in
practice
111Real Life
- Data samples are necessarily bounded
- The question is how to generalize to the model
distribution - Arbitrary truncation
- Lognormal or phase-type distributions
- Something in between
112Solution 1 Truncation
- Postulate an upper bound on the distribution
- Question where to put the upper bound
- Probably OK for qualitative analysis
- May be problematic for quantitative simulations
113Solution 2 Model the Sample
- Approximate the empirical distribution using a
mixture of exponentials (e.g. phase-type
distributions) - In particular, exponential decay beyond highest
sample - In some cases, a lognormal distribution provides
a good fit - Good for mathematical analysis
114Solution 3 Dynamic
- Place an upper bound on the distribution
- Location of bound depends on total number of
samples required - Example
- Note does not change during simulation
115Self Similarity
116The Phenomenon
- The whole has the same structure as certain parts
- Example fractals
117(No Transcript)
118The Phenomenon
- The whole has the same structure as certain parts
- Example fractals
- In workloads burstiness at many different time
scales - Note relates to a time series
119Job Arrivals to SDSC Paragon
120Process Arrivals to SDSC Paragon
121Long-Range Correlation
- A burst of activity implies that values in the
time series are correlated - A burst covering a large time frame implies
correlation over a long range - This is contrary to assumptions about the
independence of samples
122Aggregation
- Replace each subsequence of m consecutive values
by their mean - If self-similar, the new series will have
statistical properties that are similar to the
original (i.e. bursty) - If independent, will tend to average out
123Poisson Arrivals
124Tests
- Essentially based on the burstiness-retaining
nature of aggregation - Rescaled range (R/s) metric the range (sum) of n
samples as a function of n
125R/s Metric
126Tests
- Essentially based on the burstiness-retaining
nature of aggregation - Rescaled range (R/s) metric the range (sum) of n
samples as a function of n - Variance-time metric the variance of an
aggregated time series as a function of the
aggregation level
127Variance Time Metric
128Modeling Self Similarity
- Generate workload by an on-off process
- During on period, generate work at steady pace
- During off period to nothing
- On and off period lengths are heavy tailed
- Multiplex many such sources
- Leads to long-range correlation
129Research Areas
130Effect of Users
- Workload is generated by users
- Human users do not behave like a random sampling
process - Feedback based on system performance
- Repetitive working patterns
131Feedback
- User population is finite
- Users back off when performance is inadequate
- Negative feedback
- Better system stability
- Need to explicitly model this behavior
132Locality of Sampling
- Users display different levels of activity at
different times - At any given time, only a small subset of users
is active
133Active Users
134Locality of Sampling
- Users display different levels of activity at
different times - At any given time, only a small subset of users
is active - These users repeatedly do the same thing
- Workload observed by system is not a random
sample from long-term distribution
135SDSC Paragon Data
136SDSC Paragon Data
137Growing Variability
138SDSC Paragon Data
139SDSC Paragon Data
140Locality of Sampling
- The questions
- How does this effect the results of performance
evaluation? - Can this be exploited by the system, e.g. by a
scheduler?
141Hierarchical Workload Models
- Model of user population
- Modify load by adding/deleting users
- Model of a single users activity
- Built-in self similarity using heavy-tailed
on/off times - Model of application behavior and internal
structure - Capture interaction with system attributes
142A Small Problem
- We dont have data for these models
- Especially for user behavior such as feedback
- Need interaction with cognitive scientists
- And for distribution of application types and
their parameters - Need detailed instrumentation
143 144- We like to think that we design systems based on
solid foundations
145- But beware
- the foundations might be unbased assumptions!
146Computer Systems are Complex
- We should have more science in computer
science - Collect data rather than make assumptions
- Run experiments under different conditions
- Make measurements and observations
- Make predictions and verify them
- Share data and programs to promote good
- practices and ensure comparability
147Advice from the Experts
- Science if built of facts as a house if built of
stones. But a collection of facts is no more a
science than a heap of stones is a house - -- Henri Poincaré
148Advice from the Experts
- Science if built of facts as a house if built of
stones. But a collection of facts is no more a
science than a heap of stones is a house - -- Henri Poincaré
- Everything should be made as simple as possible,
but not simpler - -- Albert Einstein
149Acknowledgements
- Students Ahuva Mualem, David Talby,
- Uri Lublin
- Larry Rudolph / MIT
- Data in Parallel Workloads Archive
- Joefon Jann / IBM
- Allen Downey / Welselley
- CTC SP2 log / Steven Hotovy
- SDSC Paragon log / Reagan Moore
- SDSC SP2 log / Victor Hazelwood
- LANL CM-5 log / Curt Canada
- NASA iPSC/860 log / Bill Nitzberg