Title: Tools for Engineering Analysis of High Performance Parallel Programs
1Tools for Engineering Analysis of High
Performance Parallel Programs
- David Culler,
- Frederick Wong, Alan Mainwaring
- Computer Science Division
- U.C.Berkeley
- http//www.cs.berkeley.edu/culler/talks
2Traditional Parallel Programming Tools
- Focus on showing what program did and when it
did it - microscopic analysis of deterministic events
- oriented towards initial development of small
programs on small data sets and small machines - Instrumentation
- traces, counters, profiles
- Visualization
- Examples
- AIMS, PTOOLS, PPP
- pablo paradyn ... gt delphi
- ACTS TAU - tuning and analysis util.
3Example Pablo
4Beyond Zeroth-order Analysis
- Basic level to get to a system design that is
reasonable and behaves properly under ideal
condition - Subject the system to various stresses to
understand its operating regime and gain deeper
insight into its dynamic behavior - Combine empirical data with analytical models
- Iterate
- from What? to What if?
max displacement
Wind Speed
5Approach Framework for Parameterized Sensitivity
Analsys
- framework performs analysis over numerous runs
- statistical filtering
- vary parameter of interest
- provides means of combining data to isolate
effects of interest - gt ROBUSTNESS
Problem Data Set Generator
Well-developed Parallel Program
Instrumentation Tools
Study Parameter
Machine Characterizers
- Procs
- Comm. perf.
- Cache
- Scheduling
- ...
visualization, modeling
6Simplest Example Performance( P )
- NPB2.2 on NOW and Origin 2000 (250)
7Where Time is Spent ( P )
- Reveal basic Processor and network loading (vs P)
- Basis for model derivation - comm(P)
8Where Time is Spent ( P ) - cont
- Reveal basic Processor and network loading (vs P)
9Communication Volume ( P )
10Communication Structure ( P )
11Understanding Efficiency ( P, M )
- Want to understand both what load the program is
placing on the system - and how well the system is handling that load
- gt characterize the capability of the system via
simple benchmarks (rather than advertised peaks) - gt combine with measured load for predictive
model, compare
12Communication Efficiency
13Tools gt Improvements in Run Time
- Efficiency analysis (vs parameters) gives insight
into where to improve the system or the program - use traditional profiling to see where is program
the bad stuff happens - or go back and tune the system to do better
14Cache Behavior (P, )
- Combining trace generation with simulation
provides new structural insight - Here clear knees in program working set
() these shift with machine size (P)
15Cache Behavior (P, )
- Clear knees in program working set () not
affected by P
16Sensitivity to Multiprogramming
- Parallel machines are increasingly general
purpose - multiprogramming, at least interrupts and daemons
- Many ideal programs very sensitive to
perturbations - Msg Passing is loosely coupled, but
implementation may not be!
17Tools gt Improvements in Run Time
- MPI implementation spin-waits on send till
network available (or queue not full) or on
recv-complete - Should use two-phase spin-block
18Sensitivity to Seemingly Unrelated Activity
- The mechanism for doing parameter studies is
naturally extended to get statistically valid
data through multiple samples at each point - tend to get crisp, fast results in the wee hours
- Extend study outside the app
- Example two programs on big Origin
- alone together on 64 P
- 8 processor IS run 4.71 sec 6.18
- 36 processor SP run 26.36 sec 65.28
19Repeatability
- The variance for the repeated runs is a key
result for production codes - the real world is
not ideal
20Plans
- Integrate our instrumentation and analysis tools
with ACTS TAU - port to UCB Millennium environment
- experiment with ASCI platforms
- Refine and complete the automated sensitivity
analysis framework - Backend performance data storage
- Pablo SPPF?
- Next Year
- integrate performance model development,
prediction