Title: Benchmarking Working Group Session Agenda
1Benchmarking Working GroupSession Agenda
100-115 David Koester What Makes HPC Applications Challenging?
115-130 Piotr Luszczek HPCchallenge Challenges
130-145 Fred Tracy Algorithm Comparisons of Application Benchmarks
145-200 Henry Newman I/O Challenges
200-215 Phil Colella The Seven Dwarfs
215-230 Glenn Luecke Run-Time Error Detection Benchmark
230-300 Break Break
300-315 Bill Mann SSCA 1 Draft Specification
315-330 Theresa Meuse SSCA 6 Draft Specification
330-?? Discussions User Needs Discussions User Needs
HPCS Vendor Needs for the MS4 Review HPCS Vendor Needs for the MS4 Review
HPCS Vendor Needs for the MS5 Review HPCS Vendor Needs for the MS5 Review
HPCS Productivity Team Working Groups HPCS Productivity Team Working Groups
2What Makes HPC Applications Challenging?
- David Koester, Ph.D
- 11-13 January 2005HPCS Productivity Team
MeetingMarina Del Rey, CA
3Outline
- HPCS Benchmark Spectrum
- What Makes HPC Applications Challenging?
- Memory access patterns/locality
- Processor characteristics
- Concurrency
- I/O characteristics
- What new challenges will arise from Petascale/s
applications? - Bottleneckology
- Amdahls Law
- Example Random Stride Memory Access
- Summary
4HPCS Benchmark Spectrum
5HPCS Benchmark Spectrum
- What MakesHPC Applications Challenging?
- Full applications may be challenging due to
- Killer Kernels
- Global data layouts
- Input/Output
- Killer Kernels are challenging because of many
things that link directly to architecture - Identify bottlenecks by mapping applications to
architectures
6What Makes HPC Applications Challenging?
- Memory access patterns/locality
- Spatial and Temporal
- Indirect addressing
- Data dependencies
- Processor characteristics
- Processor throughput (Instructions per cycle)
- Low arithmetic density
- Floating point versus integer
- Special features
- GF(2) math
- Popcount
- Integer division
- Concurrency
- Ubiquitous for Petascale/s
- Load balance
- I/O characteristics
- Bandwidth
- Latency
- File access patterns
Killer Kernels Global Data Layouts
Killer Kernels
Killer Kernels Global Data Layouts
Input/Output
7CrayParallel Performance Killer Kernels
Kernel Performance Characteristic
RandomAccess High demand on remote memory No locality
3D FFT Non-unit strides High bandwidth demand
Sparse matrix-vector multiply Irregular, unpredictable locality
Adaptive mesh refinement Dynamic data distribution dynamic parallelism
Multi-frontal method Multiple levels of parallelism
Sparse incomplete factorization Amdahls Law bottlenecks
Preconditioned domain decomposition Frequent large messages
Triangular solver Frequent small messages poor ratio of computation to communication
Branch-and-bound algorithm Frequent broadcast synchronization
8Killer KernelsPhil Colella The Seven Dwarfs
9Mission Partner Applications
Memory Access Patterns/Locality
HPCS Challenge Points HPCchallenge Benchmarks
- How do mission partner applications relate to
HPCS spatial/temporal view of memory? - Kernels?
- Full applications?
10Processor CharacteristicsSpecial Features
- Comparison of similar speed MIPS processors with
and without - GF(2) math
- Popcount
- Similar or better performance reported using
Alpha processors (Jack Collins (NCIFCRF)) - Codes
- Cray-supplied library
- The Portable Cray Bioinformatics Library by ARSC
- References
- http//www.cray.com/downloads/biolib.pdf
- http//cbl.sourceforge.net/
Algorithmic speedup of 120x
11Concurrency
Insert Cluttered VAMPIR Plot here
12I/O Relative Data Latency
Note 11 orders of magnitude relative differences!
Henry Newman (Instrumental)
13I/O Relative Data Bandwidth per CPU
Note 5 orders of magnitude relative differences!
Henry Newman (Instrumental)
14StrawmanHPCS I/O Goals/Challenges
- 1 Trillion files in a single file system
- 32K file creates per second
- 10K metadata operations per second
- Needed for Checkpoint/Restart files
- Streaming I/O at 30 GB/sec full duplex
- Needed for data capture
- Support for 30K nodes
- Future file system need low latency communication
An envelope on HPCS Mission Partner requirements
15HPCS Benchmark Spectrum Future and Emerging
Applications
- Identifying HPCS Mission Partner efforts
- 10-20K processor 10-100 Teraflop/s scale
applications - 20-120K processor 100-300 Teraflop/s scale
applications - Petascale/s applications
- Applications beyond Petascale/s
- LACSI Workshop The Path to Extreme
Supercomputing - 12 October 2004
- http//www.zettaflops/org
- What new challenges will arise from Petascale/s
applications?
16Outline
- HPCS Benchmark Spectrum
- What Makes HPC Applications Challenging?
- Memory access patterns/locality
- Processor characteristics
- Parallelism
- I/O characteristics
- What new challenges will arise from Petascale/s
applications? - Bottleneckology
- Amdahls Law
- Example Random Stride Memory Access
- Summary
17Bottleneckology
- Bottleneckology
- Where is performance lost when an application is
run on an architecture? - When does it make sense to invest in architecture
to improve application performance? - System analysis driven by an extended Amdahls
Law - Amdahls Law is not just about parallel and
sequential parts of applications! - References
- Jack Worlton, "Project Bottleneck A Proposed
Toolkit for Evaluating Newly-Announced High
Performance Computers", Worlton and Associates,
Los Alamos, NM, Technical Report No.13,January
1988 - Montek Singh, Lecture Notes Computer
Architecture and Implementation COMP 206, Dept.
of Computer Science, Univ. of North Carolina at
Chapel Hill, Aug 30, 2004www.cs.unc.edu/montek/t
eaching/ fall-04/lectures/lecture-2.ppt
18Lecture Notes Computer Architecture and
Implementation (5)
Montek Singh (UNC)
19Lecture Notes Computer Architecture and
Implementation (6)
Montek Singh (UNC)
20Lecture Notes Computer Architecture and
Implementation (7)
Also works for Rate Bandwidth!
Montek Singh (UNC)
21Lecture Notes Computer Architecture and
Implementation (8)
Montek Singh (UNC)
22Bottleneck Example (1)
- Combine stride 1 and random stride memory access
- 25 random stride access
- 33 random stride access
- Memory bandwidth performance is dominated by the
random stride memory access
SDSC MAPS on an IBM SP-3
23Bottleneck Example (2)
- Combine stride 1 and random stride memory access
- 25 random stride access
- 33 random stride access
- Memory bandwidth performance is dominated by the
random stride memory access
SDSC MAPS on a COMPAQ Alphaserver
7000 / (70.25 0.75) 2800 MB/s
Amdahls Law
24Bottleneck Example (2)
- Combine stride 1 and random stride memory access
- 25 random stride access
- 33 random stride access
- Memory bandwidth performance is dominated by the
random stride memory access
- Some HPCS Mission Partner applications
- Extensive random stride memory access
- Some random stride memory access
- However, even a small amount of random memory
access can cause significant bottlenecks!
SDSC MAPS on a COMPAQ Alphaserver
7000 / (70.25 0.75) 2800 MB/s
Amdahls Law
25Outline
- HPCS Benchmark Spectrum
- What Makes HPC Applications Challenging?
- Memory access patterns/locality
- Processor characteristics
- Parallelism
- I/O characteristics
- What new challenges will arise from Petascale/s
applications? - Bottleneckology
- Amdahls Law
- Example Random Stride Memory Access
- Summary
26Summary (1)
What makes Applications Challenging!
- Memory access patterns/locality
- Spatial and Temporal
- Indirect addressing
- Data dependencies
- Processor characteristics
- Processor throughput (Instructions per cycle)
- Low arithmetic density
- Floating point versus integer
- Special features
- GF(2) math
- Popcount
- Integer division
- Parallelism
- Ubiquitous for Petascale/s
- Load balance
- I/O characteristics
- Bandwidth
- Latency
- File access patterns
- Expand this List as required
- Work toward consensus with
- HPCS Mission Partners
- HPCS Vendors
- Understand Bottlenecks
- Characterize applications
- Characterize architectures
27HPCS Benchmark Spectrum
- What MakesHPC Applications Challenging?
- Full applications may be challenging due to
- Killer Kernels
- Global data layouts
- Input/Output
- Killer Kernels are challenging because of many
things that link directly to architecture - Identify bottlenecks by mapping applications to
architectures
Impress upon the HPCS community to identify what
makes the application challenging when using an
existing Mission Partner application for a
systems analysis in the MS4 review