Workload Modeling and its Effect on Performance Evaluation presentation

About This Presentation

Transcript and Presenter's Notes

Title: Workload Modeling and its Effect on Performance Evaluation

1
Workload Modelingand its Effect on Performance
Evaluation

Dror Feitelson
Hebrew University

2
Performance Evaluation

In system design
Selection of algorithms
Setting parameter values
In procurement decisions
Value for money
Meet usage goals
For capacity planing

3
The Good Old Days

The skies were blue
The simulation results were conclusive
Our scheme was better than theirs

Feitelson Jette, JSSPP 1997
4

But in their papers,
Their scheme was better than ours!

How could they be so wrong?

6
Performance evaluation depends on

The systems design
(What we teach in algorithms and data structures)
Its implementation
(What we teach in programming courses)
The workload to which it is subjected
The metric used in the evaluation
Interactions between these factors

7
Performance evaluation depends on

The systems design
(What we teach in algorithms and data structures)
Its implementation
(What we teach in programming courses)
The workload to which it is subjected
The metric used in the evaluation
Interactions between these factors

8
Outline for Today

Three examples of how workloads affect
performance evaluation
Workload modeling
Getting data
Fitting, correlations, stationarity
Heavy tails, self similarity
Research agenda
In the context of parallel job scheduling

Example 1
Gang Scheduling and
Job Size Distribution

10
Gang What?!?

Time slicing parallel jobs with coordinated
context switching
Ousterhout
matrix

Ousterhout, ICDCS 1982
11
Gang What?!?

Time slicing parallel jobs with coordinated
context switching
Ousterhout
matrix
Optimization
Alternative
scheduling

Ousterhout, ICDCS 1982
12
Packing Jobs

Use a buddy system for allocating processors

Feitelson Rudolph, Computer 1990
13
Packing Jobs

Use a buddy system for allocating processors

14
Packing Jobs

Use a buddy system for allocating processors

15
Packing Jobs

Use a buddy system for allocating processors

16
Packing Jobs

Use a buddy system for allocating processors

17
The Question

The buddy system leads to internal fragmentation
But it also improves the chances of alternative
scheduling, because processors are allocated in
predefined groups
Which effect dominates the other?

18
The Answer (part 1)
Feitelson Rudolph, JPDC 1996
19
The Answer (part 2)
20
The Answer (part 2)
21
The Answer (part 2)
22
The Answer (part 2)

Many small jobs
Many sequential jobs
Many power of two jobs
Practically no jobs use full machine
Conclusion buddy system should work well

23
Verification
Feitelson, JSSPP 1996
24

Example 2
Parallel Job Scheduling
and Job Scaling

25
Variable Partitioning

Each job gets a dedicated partition for the
duration of its execution
Resembles 2D bin packing
Packing large jobs first should lead to better
performance
But what about correlation of size and runtime?

26
Scaling Models

Constant work
Parallelism for speedup Amdahls Law
Large first ? SJF
Constant time
Size and runtime are uncorrelated
Memory bound
Large first ? LJF
Full-size jobs lead to blockout

Worley, SIAM JSSC 1990
27
Scan Algorithm

Keep jobs in separate queues according to size
(sizes are powers of 2)
Serve the queues Round Robin, scheduling all jobs
from each queue (they pack perfectly)
Assuming constant work model, large jobs only
block the machine for a short time
But the memory bound model would lead to
excessive queueing of small jobs

Krueger et al., IEEE TPDS 1994
28
The Data
29
The Data
30
The Data
31
The Data
Data SDSC Paragon, 1995/6
32
The Data
Data SDSC Paragon, 1995/6
33
The Data
Data SDSC Paragon, 1995/6
34
Conclusion

Parallelism used for better results, not for
faster results
Constant work model is unrealistic
Memory bound model is reasonable
Scan algorithm will probably not perform well in
practice

Example 3
Backfilling and
User Runtime Estimation

36
Backfilling

Variable partitioning can suffer from external
fragmentation
Backfilling optimization move jobs forward to
fill in holes in the schedule
Requires knowledge of expected job runtimes

37
Variants

EASY backfilling
Make reservation for first queued job
Conservative backfilling
Make reservation for all queued jobs

38
User Runtime Estimates

Lower estimates improve chance of backfilling and
better response time
Too low estimates run the risk of having the job
killed

So estimates should be accurate, right?

39
They Arent
Mualem Feitelson, IEEE TPDS 2001
40
Surprising Consequences

Inaccurate estimates actually lead to improved
performance
Performance evaluation results may depend on the
accuracy of runtime estimates
Example EASY vs. conservative
Using different workloads
And different metrics

41
EASY vs. Conservative
Using CTC SP2 workload
42
EASY vs. Conservative
Using Jann workload model
43
EASY vs. Conservative
Using Feitelson workload model
44
Conflicting Results Explained

Jann uses accurate runtime estimates
This leads to a tighter schedule
EASY is not affected too much
Conservative manages less backfilling of long
jobs, because respects more reservations

45
Conservative is bad for the long jobsGood for
short ones that are respectedConservativeEA
SY
46
Conflicting Results Explained

Response time sensitive to long jobs, which favor
EASY
Slowdown sensitive to short jobs, which favor
conservative
All this does not happen at CTC, because
estimates are so loose that backfill can occur
even under conservative

47
Verification
Run CTC workload with accurate estimates
48
But What About My Model?

Simply does not have such small long jobs

49
Workload Data Sources
50
No Data

Innovative unprecedented systems
Wireless
Hand-held
Use an educated guess
Self similarity
Heavy tails
Zipf distribution

51
Serendipitous Data

Data may be collected for various reasons
Accounting logs
Audit logs
Debugging logs
Just-so logs
Can lead to wealth of information

52
NASA Ames iPSC/860 log

42050 jobs from Oct-Dec 1993
user job nodes runtime date
time
user4 cmd8 32 70 11/10/93
101317
user4 cmd8 32 70 11/10/93
101930
user42 nqs450 32 3300 11/10/93 102207
user41 cmd342 4 54 11/10/93 102237
sysadmin pwd 1 6 11/10/93
102242
user4 cmd8 32 60 11/10/93
102542
sysadmin pwd 1 3 11/10/93
103043
user41 cmd342 4 126 11/10/93 103132

Feitelson Nitzberg, JSSPP 1995
53
Distribution of Job Sizes
54
Distribution of Job Sizes
55
Distribution of Resource Use
56
Distribution of Resource Use
57
Degree of Multiprogramming
58
System Utilization
59
Job Arrivals
60
Arriving Job Sizes
61
Distribution of Interarrival Times
62
Distribution of Runtimes
63
User Activity
64
Repeated Execution
65
Application Moldability
66
Distribution of Run Lengths
67
Predictability in Repeated Runs
68
Recurring Findings

Many small and serial jobs
Many power-of-two jobs
Weak correlation of job size and duration
Job runtimes are bounded but have CVgt1
Inaccurate user runtime estimates
Non-stationary arrivals (daily/weekly cycle)
Power-law user activity, run lengths

69
Instrumentation

Passive snoop without interfering
Active modify the system
Collecting the data interferes with system
behavior
Saving or downloading the data causes additional
interference
Partial solution model the interference

70
Data Sanitation

Strange things happen
Leaving them in is safe and faithful to the
real data
But it risks situations in which a
non-representative situation dominates the
evaluation results

71
Arrivals to SDSC SP2
72
Arrivals to LANL CM-5
73
Arrivals to CTC SP2
74
Arrivals to SDSC Paragon
What are they doing at 330 AM?
75
330 AM

Nearly every day, a set of 16 jobs are run by the
same user
Most probably the same set, as they typically
have a similar pattern of runtimes
Most probably these are administrative jobs that
are executed automatically

76
Arrivals to CTC SP2
77
Arrivals to SDSC SP2
78
Arrivals to LANL CM-5
79
Arrivals to SDSC Paragon
80
Are These Outliers?

These large activity outbreaks are easily
distinguished from normal activity
They last for several days to a few weeks
They appear at intervals of several months to
more than a year
They are each caused by a single user!
Therefore easy to remove

81
(No Transcript)
82
Two Aspects

In workload modeling, should you include this in
the model?
In a general model, probably not
Conduct separate evaluation for special
conditions (e.g. DOS attack)
In evaluations using raw workload data, there is
a danger of bias due to unknown special
circumstances

83
Automation

The idea
Cluster daily data in based on various
workload attributes
Remove days that appear alone in a cluster
Repeat
The problem
Strange behavior often spans multiple days

Cirne Berman, Wkshp Workload Charact. 2001
84
Workload Modeling
85
Statistical Modeling

Identify attributes of the workload
Create empirical distribution of each attribute
Fit empirical distribution to create model
Synthetic workload is created by sampling from
the model distributions

86
Fitting by Moments

Calculate model parameters to fit moments of
empirical data
Problem does not fit the shape of the
distribution

87
Jann et al, JSSPP 1997
88
Fitting by Moments

Calculate model parameters to fit moments of
empirical data
Problem does not fit the shape of the
distribution
Problem very sensitive to extreme data values

89
Effect of Extreme Runtime Values
Downey Feitelson, PER 1999
90
Alternative Fit to Shape

Maximum likelihood what distribution parameters
were most likely to lead to the given
observations
Needs initial guess of functional form
Phase type distributions
Construct the desired shape
Goodness of fit
Kolmogorov-Smirnov difference in CDFs
Anderson-Darling added emphasis on tail
May need to sample observations

91
Correlations

Correlation can be measured by the correlation
coefficient
It can be modeled by a joint distribution
function
Both may not be very useful

92
(No Transcript)
93
Correlation Coefficient
Gives low results for correlation of runtime and
size in parallel systems
94
Distributions
A restricted version of a joint distribution
95
Modeling Correlation

Divide range of one attribute into sub-ranges
Create a separate model of other attribute for
each sub-range
Models can be independent, or model parameter can
depend on sub-range

96
Stationarity

Problem of daily/weekly activity cycle
Not important if unit of activity is very small
(network packet)
Very meaningful if unit of work is long (parallel
job)

97
How to Modify the Load

Multiply interarrivals or runtimes by a factor
Changes the effective length of the day
Multiply machine size by a factor
Modifies packing properties
Add users

98
Stationarity

Problem of daily/weekly activity cycle
Not important if unit of activity is very small
(network packet)
Very meaningful if unit of work is long (parallel
job)
Problem of new/old system
Immature workload
Leftover workload

99
Heavy Tails
100
Tail Types

When a distribution has mean m, what is the
distribution of samples that are larger than x?
Light expected to be smaller than xm
Memoryless expected to be xm
Heavy expected to be larger than xm

101
Formal Definition

Tail decays according to a power law
Test log-log complementary distribution

102
Consequences

Large deviations from the mean are realistic
Mass disparity
small fraction of samples responsible for large
part of total mass
Most samples together account for negligible part
of mass

Crovella, JSSPP 2001
103
Unix File Sizes Survey, 1993
104
Unix File Sizes LLCD
105
Consequences

Large deviations from the mean are realistic
Mass disparity
small fraction of samples responsible for large
part of total mass
Most samples together account for negligible part
of mass
Infinite moments
For mean is undefined
For variance is undefined

Crovella, JSSPP 2001
106
Pareto Distribution

With parameter the density is
proportional to
The expectation is then
i.e. it grows with the number of samples

107
Pareto Samples
108
Pareto Samples
109
Pareto Samples
110
Effect of Samples from Tail

In simulation
A single sample may dominate results
Example response times of processes
In analysis
Average long-term behavior may never happen in
practice

111
Real Life

Data samples are necessarily bounded
The question is how to generalize to the model
distribution
Arbitrary truncation
Lognormal or phase-type distributions
Something in between

112
Solution 1 Truncation

Postulate an upper bound on the distribution
Question where to put the upper bound
Probably OK for qualitative analysis
May be problematic for quantitative simulations

113
Solution 2 Model the Sample

Approximate the empirical distribution using a
mixture of exponentials (e.g. phase-type
distributions)
In particular, exponential decay beyond highest
sample
In some cases, a lognormal distribution provides
a good fit
Good for mathematical analysis

114
Solution 3 Dynamic

Place an upper bound on the distribution
Location of bound depends on total number of
samples required
Example
Note does not change during simulation

115
Self Similarity
116
The Phenomenon

The whole has the same structure as certain parts
Example fractals

117
(No Transcript)
118
The Phenomenon

The whole has the same structure as certain parts
Example fractals
In workloads burstiness at many different time
scales
Note relates to a time series

119
Job Arrivals to SDSC Paragon
120
Process Arrivals to SDSC Paragon
121
Long-Range Correlation

A burst of activity implies that values in the
time series are correlated
A burst covering a large time frame implies
correlation over a long range
This is contrary to assumptions about the
independence of samples

122
Aggregation

Replace each subsequence of m consecutive values
by their mean
If self-similar, the new series will have
statistical properties that are similar to the
original (i.e. bursty)
If independent, will tend to average out

123
Poisson Arrivals
124
Tests

Essentially based on the burstiness-retaining
nature of aggregation
Rescaled range (R/s) metric the range (sum) of n
samples as a function of n

125
R/s Metric
126
Tests

Essentially based on the burstiness-retaining
nature of aggregation
Rescaled range (R/s) metric the range (sum) of n
samples as a function of n
Variance-time metric the variance of an
aggregated time series as a function of the
aggregation level

127
Variance Time Metric
128
Modeling Self Similarity

Generate workload by an on-off process
During on period, generate work at steady pace
During off period to nothing
On and off period lengths are heavy tailed
Multiplex many such sources
Leads to long-range correlation

129
Research Areas
130
Effect of Users

Workload is generated by users
Human users do not behave like a random sampling
process
Feedback based on system performance
Repetitive working patterns

131
Feedback

User population is finite
Users back off when performance is inadequate
Negative feedback
Better system stability
Need to explicitly model this behavior

132
Locality of Sampling

Users display different levels of activity at
different times
At any given time, only a small subset of users
is active

133
Active Users
134
Locality of Sampling

Users display different levels of activity at
different times
At any given time, only a small subset of users
is active
These users repeatedly do the same thing
Workload observed by system is not a random
sample from long-term distribution

135
SDSC Paragon Data
136
SDSC Paragon Data
137
Growing Variability
138
SDSC Paragon Data
139
SDSC Paragon Data
140
Locality of Sampling

The questions
How does this effect the results of performance
evaluation?
Can this be exploited by the system, e.g. by a
scheduler?

141
Hierarchical Workload Models

Model of user population
Modify load by adding/deleting users
Model of a single users activity
Built-in self similarity using heavy-tailed
on/off times
Model of application behavior and internal
structure
Capture interaction with system attributes

142
A Small Problem

We dont have data for these models
Especially for user behavior such as feedback
Need interaction with cognitive scientists
And for distribution of application types and
their parameters
Need detailed instrumentation

143

Final Words

144

We like to think that we design systems based on
solid foundations

145

But beware
the foundations might be unbased assumptions!

146
Computer Systems are Complex

We should have more science in computer
science
Collect data rather than make assumptions
Run experiments under different conditions
Make measurements and observations
Make predictions and verify them
Share data and programs to promote good
practices and ensure comparability

147
Advice from the Experts

Science if built of facts as a house if built of
stones. But a collection of facts is no more a
science than a heap of stones is a house
-- Henri Poincaré

148
Advice from the Experts

Science if built of facts as a house if built of
stones. But a collection of facts is no more a
science than a heap of stones is a house
-- Henri Poincaré
Everything should be made as simple as possible,
but not simpler
-- Albert Einstein

149
Acknowledgements

Students Ahuva Mualem, David Talby,
Uri Lublin
Larry Rudolph / MIT
Data in Parallel Workloads Archive
Joefon Jann / IBM
Allen Downey / Welselley
CTC SP2 log / Steven Hotovy
SDSC Paragon log / Reagan Moore
SDSC SP2 log / Victor Hazelwood
LANL CM-5 log / Curt Canada
NASA iPSC/860 log / Bill Nitzberg

Write a Comment

User Comments (0)

About PowerShow.com

Workload Modeling and its Effect on Performance Evaluation PowerPoint PPT Presentation