Benchmarking Working Group Session Agenda - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Benchmarking Working Group Session Agenda

Description:

Killer Kernels. Global data layouts. Input/Output. Killer Kernels are challenging because of many things that link directly to architecture ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 28
Provided by: charmC
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Benchmarking Working Group Session Agenda


1
Benchmarking Working GroupSession Agenda
100-115 David Koester What Makes HPC Applications Challenging?
115-130 Piotr Luszczek HPCchallenge Challenges
130-145 Fred Tracy Algorithm Comparisons of Application Benchmarks
145-200 Henry Newman I/O Challenges
200-215 Phil Colella The Seven Dwarfs
215-230 Glenn Luecke Run-Time Error Detection Benchmark
230-300 Break Break
300-315 Bill Mann SSCA 1 Draft Specification
315-330 Theresa Meuse SSCA 6 Draft Specification
330-?? Discussions User Needs Discussions User Needs
HPCS Vendor Needs for the MS4 Review HPCS Vendor Needs for the MS4 Review
HPCS Vendor Needs for the MS5 Review HPCS Vendor Needs for the MS5 Review
HPCS Productivity Team Working Groups HPCS Productivity Team Working Groups
2
What Makes HPC Applications Challenging?
  • David Koester, Ph.D
  • 11-13 January 2005HPCS Productivity Team
    MeetingMarina Del Rey, CA

3
Outline
  • HPCS Benchmark Spectrum
  • What Makes HPC Applications Challenging?
  • Memory access patterns/locality
  • Processor characteristics
  • Concurrency
  • I/O characteristics
  • What new challenges will arise from Petascale/s
    applications?
  • Bottleneckology
  • Amdahls Law
  • Example Random Stride Memory Access
  • Summary

4
HPCS Benchmark Spectrum
5
HPCS Benchmark Spectrum
  • What MakesHPC Applications Challenging?
  • Full applications may be challenging due to
  • Killer Kernels
  • Global data layouts
  • Input/Output
  • Killer Kernels are challenging because of many
    things that link directly to architecture
  • Identify bottlenecks by mapping applications to
    architectures

6
What Makes HPC Applications Challenging?
  • Memory access patterns/locality
  • Spatial and Temporal
  • Indirect addressing
  • Data dependencies
  • Processor characteristics
  • Processor throughput (Instructions per cycle)
  • Low arithmetic density
  • Floating point versus integer
  • Special features
  • GF(2) math
  • Popcount
  • Integer division
  • Concurrency
  • Ubiquitous for Petascale/s
  • Load balance
  • I/O characteristics
  • Bandwidth
  • Latency
  • File access patterns

Killer Kernels Global Data Layouts
Killer Kernels
Killer Kernels Global Data Layouts
Input/Output
7
CrayParallel Performance Killer Kernels
Kernel Performance Characteristic
RandomAccess High demand on remote memory No locality
3D FFT Non-unit strides High bandwidth demand
Sparse matrix-vector multiply Irregular, unpredictable locality
Adaptive mesh refinement Dynamic data distribution dynamic parallelism
Multi-frontal method Multiple levels of parallelism
Sparse incomplete factorization Amdahls Law bottlenecks
Preconditioned domain decomposition Frequent large messages
Triangular solver Frequent small messages poor ratio of computation to communication
Branch-and-bound algorithm Frequent broadcast synchronization
8
Killer KernelsPhil Colella The Seven Dwarfs
9
Mission Partner Applications
Memory Access Patterns/Locality
HPCS Challenge Points HPCchallenge Benchmarks
  • How do mission partner applications relate to
    HPCS spatial/temporal view of memory?
  • Kernels?
  • Full applications?

10
Processor CharacteristicsSpecial Features
  • Comparison of similar speed MIPS processors with
    and without
  • GF(2) math
  • Popcount
  • Similar or better performance reported using
    Alpha processors (Jack Collins (NCIFCRF))
  • Codes
  • Cray-supplied library
  • The Portable Cray Bioinformatics Library by ARSC
  • References
  • http//www.cray.com/downloads/biolib.pdf
  • http//cbl.sourceforge.net/

Algorithmic speedup of 120x
11
Concurrency
Insert Cluttered VAMPIR Plot here
12
I/O Relative Data Latency
Note 11 orders of magnitude relative differences!
Henry Newman (Instrumental)
13
I/O Relative Data Bandwidth per CPU
Note 5 orders of magnitude relative differences!
Henry Newman (Instrumental)
14
StrawmanHPCS I/O Goals/Challenges
  • 1 Trillion files in a single file system
  • 32K file creates per second
  • 10K metadata operations per second
  • Needed for Checkpoint/Restart files
  • Streaming I/O at 30 GB/sec full duplex
  • Needed for data capture
  • Support for 30K nodes
  • Future file system need low latency communication

An envelope on HPCS Mission Partner requirements
15
HPCS Benchmark Spectrum Future and Emerging
Applications
  • Identifying HPCS Mission Partner efforts
  • 10-20K processor 10-100 Teraflop/s scale
    applications
  • 20-120K processor 100-300 Teraflop/s scale
    applications
  • Petascale/s applications
  • Applications beyond Petascale/s
  • LACSI Workshop The Path to Extreme
    Supercomputing
  • 12 October 2004
  • http//www.zettaflops/org
  • What new challenges will arise from Petascale/s
    applications?

16
Outline
  • HPCS Benchmark Spectrum
  • What Makes HPC Applications Challenging?
  • Memory access patterns/locality
  • Processor characteristics
  • Parallelism
  • I/O characteristics
  • What new challenges will arise from Petascale/s
    applications?
  • Bottleneckology
  • Amdahls Law
  • Example Random Stride Memory Access
  • Summary

17
Bottleneckology
  • Bottleneckology
  • Where is performance lost when an application is
    run on an architecture?
  • When does it make sense to invest in architecture
    to improve application performance?
  • System analysis driven by an extended Amdahls
    Law
  • Amdahls Law is not just about parallel and
    sequential parts of applications!
  • References
  • Jack Worlton, "Project Bottleneck A Proposed
    Toolkit for Evaluating Newly-Announced High
    Performance Computers", Worlton and Associates,
    Los Alamos, NM, Technical Report No.13,January
    1988
  • Montek Singh, Lecture Notes Computer
    Architecture and Implementation COMP 206, Dept.
    of Computer Science, Univ. of North Carolina at
    Chapel Hill, Aug 30, 2004www.cs.unc.edu/montek/t
    eaching/ fall-04/lectures/lecture-2.ppt

18
Lecture Notes Computer Architecture and
Implementation (5)
Montek Singh (UNC)
19
Lecture Notes Computer Architecture and
Implementation (6)
Montek Singh (UNC)
20
Lecture Notes Computer Architecture and
Implementation (7)
Also works for Rate Bandwidth!
Montek Singh (UNC)
21
Lecture Notes Computer Architecture and
Implementation (8)
Montek Singh (UNC)
22
Bottleneck Example (1)
  • Combine stride 1 and random stride memory access
  • 25 random stride access
  • 33 random stride access
  • Memory bandwidth performance is dominated by the
    random stride memory access

SDSC MAPS on an IBM SP-3
23
Bottleneck Example (2)
  • Combine stride 1 and random stride memory access
  • 25 random stride access
  • 33 random stride access
  • Memory bandwidth performance is dominated by the
    random stride memory access

SDSC MAPS on a COMPAQ Alphaserver
7000 / (70.25 0.75) 2800 MB/s
Amdahls Law
24
Bottleneck Example (2)
  • Combine stride 1 and random stride memory access
  • 25 random stride access
  • 33 random stride access
  • Memory bandwidth performance is dominated by the
    random stride memory access
  • Some HPCS Mission Partner applications
  • Extensive random stride memory access
  • Some random stride memory access
  • However, even a small amount of random memory
    access can cause significant bottlenecks!

SDSC MAPS on a COMPAQ Alphaserver
7000 / (70.25 0.75) 2800 MB/s
Amdahls Law
25
Outline
  • HPCS Benchmark Spectrum
  • What Makes HPC Applications Challenging?
  • Memory access patterns/locality
  • Processor characteristics
  • Parallelism
  • I/O characteristics
  • What new challenges will arise from Petascale/s
    applications?
  • Bottleneckology
  • Amdahls Law
  • Example Random Stride Memory Access
  • Summary

26
Summary (1)
What makes Applications Challenging!
  • Memory access patterns/locality
  • Spatial and Temporal
  • Indirect addressing
  • Data dependencies
  • Processor characteristics
  • Processor throughput (Instructions per cycle)
  • Low arithmetic density
  • Floating point versus integer
  • Special features
  • GF(2) math
  • Popcount
  • Integer division
  • Parallelism
  • Ubiquitous for Petascale/s
  • Load balance
  • I/O characteristics
  • Bandwidth
  • Latency
  • File access patterns
  • Expand this List as required
  • Work toward consensus with
  • HPCS Mission Partners
  • HPCS Vendors
  • Understand Bottlenecks
  • Characterize applications
  • Characterize architectures

27
HPCS Benchmark Spectrum
  • What MakesHPC Applications Challenging?
  • Full applications may be challenging due to
  • Killer Kernels
  • Global data layouts
  • Input/Output
  • Killer Kernels are challenging because of many
    things that link directly to architecture
  • Identify bottlenecks by mapping applications to
    architectures

Impress upon the HPCS community to identify what
makes the application challenging when using an
existing Mission Partner application for a
systems analysis in the MS4 review
Write a Comment
User Comments (0)
About PowerShow.com