Title: Designing Parallel Operating Systems using Modern Interconnects
1Designing Parallel Operating Systems using Modern
Interconnects
Pitfalls in Parallel Job Scheduling Evaluation
Eitan Frachtenberg and Dror Feitelson
Computer and Computational Sciences Division Los
Alamos National Laboratory
Ideas that change the world
2Scope
- Numerous methodological issues occur with the
evaluation of parallel job schedulers - Experiment theory and design
- Workloads and applications
- Implementation issues and assumptions
- Metrics and statistics
- Paper covers 32 recurring pitfalls, organized
into topics and sorted by severity - Talk will describe a real case study, and the
heroic attempts to avoid most such pitfalls - as well as the less-heroic oversight of several
others
3Evaluation Paths
- Theoretical Analysis (queuing theory)
- Reproducible, rigorous, and resource-friendly
- Hard for time slicing due to unknown parameters,
application structure, and feedbacks - Simulation
- Relatively simple and flexible
- Many assumptions, not all known/reported hard to
reproduce rarely factors application
characteristics - Experiments with real sites and workloads
- Most representative (at least locally)
- Largely impractical and irreproducible
- Emulation
4Emulation Environment
- Experimental platform consisting of three
clusters with high-end network - Software several job scheduling algorithms
implemented on top of STORM - Batch / space sharing, with optional EASY
backfilling - Gang Scheduling, Implicit Coscheduling (SB),
Flexible Coscheduling - Results described in JSSPP03 and TPDS05
5Step One Choosing Workload
- Static vs. Dynamic
- Size of workload
- How many different workloads are needed?
- Use trace data?
- Different sites have different workload
characteristics - Inconvenient sizes may require imprecise scaling
- Polluted data, flurries
- Use model-generated data?
- Several models exist, with different strengths
- By trying to capture everything, may capture
nothing
6Static Workloads
- We start with a synthetic application static
workloads - Simple enough to model, debug, and calibrate
- Bulk-synchronous application
- Can control granularity, variability and
Communication pattern
7Synthetic Scenarios
Balanced Complementing Imbalanced
Mixed
8Example Turnaround Time
9Dynamic Workloads
- We chose Lublins model JPDC03
- 1000 jobs per workload
- Multiplying run-times AND arrival times by
constant to shrink run time (2-4 hours) - Shrinking too much is problematic (system
constants) - Multiplying arrival times by a range of factors
to modify load - Unrepresentative, since deviates from real
correlations with run times and job sizes. - Better solution is to use different workloads
10Step Two Choosing Applications
- Synthetic applications are easy to control, but
- Some characteristics are ignored (e.g., I/O,
memory) - Others may not be representative, in particular
communication, which is salient of parallel apps. - Granularity, pattern, network performance
- If not sure, conduct sensitivity analysis
- Might be assumed malleable, moldable, or with
linear speedup, which many MPI applications are
not - Real applications have no hidden assumptions
- But may also have limited generality
11Example Sensitivity Analysis
12Application Choices
- Synthetic applications on first set
- Allows control over more parameters
- Allows testing unrealistic but interesting
conditions (e.g., high multiprogramming level) - LANL applications on second set (Sweep3D, Sage)
- Real memory and communication use (MPL2)
- Important applications for LANLs evaluations
- But probably only for LANL
- Runtime estimate f-model on batch, MPL on others
13Step Three Choosing Parameters
- What are reasonable input parameters to use in
the evaluation? - Maximum multiprogramming level (MPL)
- Timeslice quantum
- Input load
- Backfilling method and effect on multiprogramming
- Run time estimate factor (not tested)
- Algorithm constants, tuning, etc.
14Example 1 MPL
- Verified with different offered loads
15Example 2 Timeslice
- Dividing to quantiles allows analysis of effect
on different job types
16Considerations for Parameters
- Realistic MPLs
- Scaling traces to different machine sizes
- Scaling offered load
- Artificial user estimates and multiprogramming
estimates
17Step Four Choosing Metrics
- Not all metrics are easily comparable
- Absolute times, slowdown with time slicing, etc.
- Metrics may need to be limited to a relevant
context - Use multiple metrics to understand
characteristics - Measuring utilization for an open model
- Direct measure of offered load till saturation
- Same goes for throughput and makespan
- Better metrics slowdown, response time, wait
time - Using mean with asymmetric distributions
- Inferring scalability from O(1) nodes
18Example Bounded Slowdown
19Example (continued)
20Response Time
21Bounded Slowdown
22Step Five Measurement
- Never measure saturated workloads
- When arrival rate is higher than service rate,
queues grow to infinity all metrics become
meaningless - but finding saturation point can be tricky
- Discard warm-up and cool-down results
- May need to measure subgroups separately
(long/short, day/night, weekday/weekend,) - Measurement should still have enough data points
for statistical meaning, especially workload
length
23Example Saturation Point
24Example Shortest jobs CDF
25Example Longest jobs CDF
26Conclusion
- Parallel Job Scheduling Evaluation is complex
- but we can avoid past mistakes
- Paper can be used as a checklist to work with
when designing and executing evaluations - Additional information in paper
- Pitfalls, examples, and scenarios
- Suggestions on how to avoid pitfalls
- Open research questions (for next JSSPP?)
- Many references to positive examples
- Be cognizant when Choosing your compromises
27References
- Workload archive
- http//www.cs.huji.ac.il/feit/worklad
- Contains several workload traces and models
- Drors publication page
- http//www.cs.huji.ac.il/feit/pub.html
- Eitans publication page
- http//www.cs.huji.ac.il/etcs/pubs
- Email eitanf_at_lanl.gov