Title: The Politics and Economics of Parallel Computing Performance
1The Politics and Economics of Parallel Computing
Performance
- Allan Snavely
- UCSD Computer Science Dept.
-
- SDSC
2Computnik
- Not many of us (not even me) are old enough to
remember Sputnik - But recently U.S. technology received a similar
shock
3Japanese Earth Simulator
- The worlds mot powerful computer
4Top500.org
- HIGHLIGHTS FROM THE TOP 10
- The Earth Simulator, built by NEC, remains the
unchallenged 1, gt 30 TFlops - The cost is conservatively 500M
5- ASCI Q at Los Alamos is at 2 at 13.88 TFlop/s.
- The third system ever to exceed the 10 TFflop/s
mark is Virgina Tech's X measured at 10.28
TFlop/s. This cluster is built with the Apple G5
as building blocks and is often referred to as
the 'SuperMac. - The fourth system is also a cluster. The Tungsten
cluster at NCSA is a Dell PowerEdge-based system
using a Myrinet interconnect. It just missed the
10 TFlop/s mark with a measured 9.82 TFlop/s.
6More top 500
- The list of clusters in the TOP10 continues with
the upgraded Itanium2-based Hewlett-Packard
system, located at DOE's Pacific Northwest
National Laboratory, which uses a Quadrics
interconnect. - 6 is the first system in the TOP500 based on
AMD's Opteron chip. It was installed by Linux
Networx at the Los Alamos National Laboratory and
also uses a Myrinet interconnect. T - With the exception of the leading Earth
Simulator, all other TOP10 systems are installed
in the U.S. - The performance of the 10 system jumped to 6.6
TFlop/s. -
-
7The fine print
- But how is performance measured?
- Linpack is very compute intensive and not very
memory or communications inten sive and it scales
perfectly!
8Axiom You get what you ask for(or what you
measure for)
- Measures of goodness
- Macho image
- Big gas tank
- Cargo space
- Drive it offroad
- Arnold drives one
- Measures of goodness
- Trendy Euro image
- Fuel efficiency
- Parking space
- Drive it on narrow streets
- Herr Schroeder drives one
9HPC Users Forum and metrics
- From the beginning we dealt with
- Political issues
- You get what you ask for (Top500 Macho Flops)
- Policy makers need a number (Macho Flops)
- You measure what makes you look good (Macho
Flops) - Technical issues
- Recent reports (HECRTF, SCALES) echo our earlier
consensus that time-to-solution (TTS) is the HPC
metric - But TTS is complicated and problem dependent (
and policy makers need a number) - Is it even technically feasible to encompass TTS
in one or a few low-level metrics?
10A science of performance
- A model is a calculable explanation of why a
program, application,input, tuple performs as
it does - Should yield a prediction (quantifiable
objective) - Accurate predictions of observable performance
points give you some confidence in methods (as
for example to allay fears of perturbation via
intrusion) - Performance models embody understanding of the
factors that affect performance - Inform the tuning process (of application and
machine) - Guide applications to the best machine
- Enable applications driven architecture design
- Extrapolate to the performance of future systems
PMaC
11Goals for performance modeling tools and methods
- Performance should map back to a small set of
orthogonal benchmarks - Generation of performance models should be
automated, or at least as regular and systemized
as possible - Performance models must be time-tractable
- Error is acceptable if it is bounded and allows
meeting these objectives - Taking these principles to extremes would allow
dynamic, automatic performance improvement via
adaption (this is open research)
PMaC
12A useful framework
- Machine Profiles - characterizations of the rates
at which a machine can (or is projected to) carry
out fundamental operations abstract from the
particular application - Application Signature - detailed summaries of the
fundamental operations to be carried out by the
application independent of any particular machine - Combine Machine Profile and Application
Signature using - Convolution Methods - algebraic mappings of the
Application Signatures on to the Machine profiles
to arrive at a performance prediction
PMaC
13PMaC HPC Benchmark Suite
- The goal is develop means to infer execution time
of full applications at scale from low-level
metrics taken on (smaller) prototype systems - To do this in a systematic, even automated way
- To be able to compare apples and oranges
- To enable wide workload characterizations
- To keep number of metrics compact
- Add metrics only to increase resolution
- Go to web page www.sdsc.edu/PMaC
14Machine Profiles Single Processor Component
MAPS
- Machine Profiles useful for
- revealing underlying capability of the machine
- comparing machines
- Machine Profiles produced by
- MAPS (Memory Access Pattern Signature) along with
the rest of the PMaC HPC Benchmark Suite is
available at www.sdsc.edu/PMaC
15Convolutions put the two togethermodeling deep
memory hierarchies
MetaSim trace collected on PETSc Matrix-Vector
code 4 CPUs with user supplied memory parameters
for PSCs TCSini
- Single-processor or per-processor performance
- Machine profile for processor (Machine A)
- Application Signature for application (App. 1)
- The relative per-processor performance of
- App. 1 on Machine A is represented as the
- MetaSim Number
16Metasim cpu events convolverpick simple models
to apply to each basic block
Output 5 different convolutions. Meta1 Mem.
time Meta2 Mem. timeFP time Meta3
MAX(mem.time,FP time) Meta4 .5Mem. time.5FP
time Meta5 .9Mem. time.1FP time
17 Dimemas communications events convolver Simple
communication models applied to each
communication event
18POP results graphically
- Seconds per simulation day
PMaC
19Quality of model predictions for POP
PMaC
20Explaining Relative Performance of POP
21POP Performance Sensitivity
1/Execution Time
Latency Performance Normalized
BW Performance Normalized
22Practical uses
- DoD HPCMO procurement cycle
- Identify strategic applications
- Identify candidate machines
- Run PMaC HPC Benchmark Probes on (prototypes of)
machines - Use tools to model applications on exemplary
inputs - Generate performance expectations
- Input to solver that factors in performance,
cost, architectural diversity, whim of program
director ? - DARPA HPCS program
- Help vendors evaluate performance impacts of
proposed architectural features
23Acknowledgments
- This work was sponsored in part by the Department
of Energy Office of Science through SciDAC award
High-End Computer System Performance Science
and Engineering. This work was sponsored in part
by the Department of Defense High Performance
Computing Modernization Program office through
award HPC Applications Benchmarking. This
research was sponsored in part by DARPA through
award HEC Metrics. This research was supported
in part by NSF cooperative agreement ACI-9619020
through computing resources provided by the
National Partnership for Advanced Computational
Infrastructure at the San Diego Supercomputer
Center. Computer time was provided by the
Pittsburgh Supercomputer Center and the Texas
Advanced Computing Center and Oak Ridge National
laboratory and ERDC. We would like to thank
Francesc Escale of CEPBA for all his help with
Dimemas, and Pat Worley for all his help with POP.