Supercomputer Benchmarking - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Supercomputer Benchmarking

Description:

... of benchmark suites used to study supercomputer performance has varied widely over the years. ... metric = (86400 seconds) / (elapsed time of benchmark in seconds) ... – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 19

Provided by: IBMU597

Category:

more less

Transcript and Presenter's Notes

Title: Supercomputer Benchmarking

1
Supercomputer Benchmarking

By John Dorfner, Wesley Jones, and Eric Ng

Cray-1
CDC 1604
Origin 2000
RS/6000 SP
2
Overview

Definition of Benchmark
Introduction to Benchmark Suites
SPEChpc96 Suite
Livermore Loops
The Linpack Benchmark
The Top 8 Supercomputers
HPC Challenge Benchmark
Cray 1-A vs. IBM Cluster 1600
inside the IBM Cluster 1600
Conclusion

3

Benchmark def.

A measurement or standard that serves as a point
of reference by which process performance is
measured. Benchmarking is a structured approach
for identifying the best practices from industry
and government, and comparing and adapting them
to the organization's operations. Such an
approach is aimed at identifying more efficient
and effective processes for achieving intended
results, and suggesting ambitious goals for
program output, product/service quality, and
process improvement.
www.ichnet.org

4
Supercomputer Benchmarking

The number and type of benchmark suites used to
study supercomputer performance has varied widely
over the years. In early studies, an ad hoc
collection of programs was typically used to
measure the performance of a given system
relative to a known performance benchmark.
Eventually, this practice evolved into groups of
programs explicitly designed as supercomputer
benchmark suites. The most widely used benchmarks
for performance on supercomputing clusters are
the SPEChpc96 suite the Livermore Loops and for
scientific machines, the Linpack Kernels.
Some general examples of individual computer
benchmarks
Dhrystone - Integer benchmark for UNIX systems
Whetstone - Floating point benchmark for
minicomputers
I/O benchmarks
MIPS
Synthetic benchmarks
Kernel benchmarks
SPECint / SPECfp
Summarizing

5
SPEChpc96 Suite

In 1995, the Standard Performance Evaluation
Corp. (SPEC) announced the release of SPEChpc96,
the first standard benchmark suite specifically
designed for measuring high-performance
computing. SPEChpc96 was developed by SPEC's
High Performance Group (HPG), which includes
several leading high-performance computer
vendors, systems integrators, and major
universities and research institutes.
SPEChpc96 allows users and vendors of high-end
computers to make objective performance
comparisons across different hardware platforms.
Specific scientific and industrial applications
are represented within the SPEChpc96 benchamrk
suite.
The first two SPEChpc96 benchmarks are
SPECseis96, a seismic processing application
SPECchem96, a computational chemistry
application
Since SPECseis96 and SPECchem96 can be run in
both serial and parallel modes, the SPEChpc96
suite can be used for general performance
comparisons over a broad range of
high-performance computing systems. This list
includes multiprocessor systems, workstation
clusters, distributed memory parallel systems,
and traditional vector and vector parallel
supercomputers.

6
SPEChpc96 Suite Metrics

The SPECseis96 and SPECchem96 suites each
generate four metrics. Each program represents a
different problem size and is used to
characterize the scalability of the application
as well as the entire system.
The SPEChpc96 metrics are as follows
SPECseis96_SM
SPECseis96_MD
SPECseis96_LG
SPECseis96_XL
SPECchem96_SM
SPECchem96_MD
SPECchem96_LG
SPECchem96_XL.
The metrics are unitless. They are derived as
follows
metric (86400 seconds) / (elapsed time of
benchmark in seconds)
Since these benchmarks are both compute-intensive
and data-intensive, the above metrics are used to
reflect the performance of the entire system.
This includes the processors, memory access, I/O
bandwidth, interconnect topology, etc. For
example, the SPECseis96_XL requires processing of
100GB of data.

7
Livermore Loops

Livermore Loops is a set of kernels consisting of
loops from real Fortran programs.
Introduced in 1970, this supercomputer benchmark
was initially comprised of 14 kernels of
numerically intensive applications written in
Fortran. The number of kernels was increased to
24 in the 1980's. Performance measurements are
taken in units of Millions of Floating Point
Operations Per Second or MFLOPS. The program
also evaluates the results for computational
accuracy. A main aim of the Livermore design was
to avoid producing single number performance
comparisons. The 24 kernels can be executed
three times each at a range of do-loop spans to
produce short, medium and long vector performance
measurements. In this mode, if overall averages
are quoted, the geometric mean may be interpreted
as a characteristic rate of computation for the
suite. However, it is more realistic to retain
the range of statistics in terms of geometric,
harmonic and arithmetic means, minimum and
maximum.

8
Livermore Loops Kernels

Kernel 1 an excerpt from a hydrodynamic code.
Kernel 2 an excerpt from an Incomplete
Cholesky-Conjugate Gradient code.
Kernel 3 the standard Inner Product function of
linear algebra.
Kernel 4 an excerpt from a Banded Linear
Equations routine.
Kernel 5 an excerpt from a Tridiagonal
Elimination routine.
Kernel 6 an example of a general linear
recurrence equation.
Kernel 7 an Equation of State fragment.
Kernel 8 an excerpt of an Alternating Direction,
Implicit Integration code.
Kernel 9 an Integrate Predictor code.
Kernel 10 a Difference Predictor code.
Kernel 11 a First Sum.
Kernel 12 a First Difference.
Kernel 13 an excerpt from a 2-D Particle-in-Cell
code.
Kernel 14 an excerpt of a 1-D Particle-in-Cell
code.
Kernel 15 a sample of how casually FORTRAN can
be written.
Kernel 16 a search loop from a Monte Carlo code.
Kernel 17 an example of an implicit conditional
computation.
Kernel 18 an excerpt from a 2-D Explicit
Hydrodynamic code.
Kernel 19 a general Linear Recurrence Equation.

9
Livermore Loops Kernel Output

THE LIVERMORE FORTRAN KERNELS SUMMARY
Computer CRAY-YMP C90 (240 MHz)
System UNICOS 7.C, loaded
Compiler CFT77 5.0.1.17
Date 92.02.18
Testor Charles Grassl, CRI
MFLOPS RANGE REPORT ALL RANGE STATISTICS
Mean DO Span 167
Code Samples 72
Maximum Rate 826.0859 Mega-Flops/Sec.
Average Rate 190.5636 Mega-Flops/Sec.
GEOMETRIC MEAN 86.2649 Mega-Flops/Sec.
Median Q2 83.5138 Mega-Flops/Sec.
Harmonic Mean 40.7302 Mega-Flops/Sec.
Minimum Rate 6.7925 Mega-Flops/Sec.
Mean Precision 11.07 Decimal Digits
ltltltltltltltltltltltltltltltltltltltltltltltltltltltgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt
gtgtgtgtgt

10
The Linpack Benchmark

The Linpack Benchmark measures a computers
floating-point rate of execution, Mflop/s, by
running a mathematics application that solves a
dense system of linear equations. Over the
years, the characteristics of the benchmark have
changed. Today, in fact, there are three
benchmarks included in the Linpack Benchmark
report.
The Linpack Benchmark grew out of the Linpack
software project. It was originally intended to
give end-users an indication of length of time it
would take to solve certain matrix problems.
The three benchmarks in the Linpack Benchmark
report are
Linpack Fortran n 100 benchmark
Linpack n 1000 benchmark
Linpacks Highly Parallel Computing benchmark
Mflop/s, millions of floating point operations
per second, execution rate refers to 64-bit
floating-point operations of either addition or
multiplication. Gflop/s are billions of
floating-point operations per second and Tflop/s
are trillions of floating-point operations per
second.

11
Linpack Performance Example

Measured Gflop/s Peak rate of execution in
billions of floating point operations per second.
Size of Problem The matrix size at which the
measured performance was observed.
Size of ½ Perf The size of problem needed to
achieve ½ the measured peak performance.
Theoretical Peak Gflop/s The theoretical peak
performance for the computer.

12
The Top 8 Supercomputers
13
The Top 8 Supercomputers
Table Key

Rank Position within the TOP500 ranking
Manufacturer Manufacturer or vendor
Computer Model type indicated by manufacturer
or vendor
Installation Site Customer
Location Location and country
Year Year of installation/last major update
Installation Area Field of Application
Processors Number of processors
Rmax Maximum LINPACK performance achieved
Rpeak Theoretical peak performance
Nmax Problem size for achieving Rmax
N1/2 Problem size for achieving half of Rmax

14
HPC Challenge Benchmark

A Group of 20 top researchers has initiated a
program to redefine the benchmarks used to
measure high-performance systems under the
direction of the High Productivity Computing
Systems program under the Defense Advanced
Research Projects Agency (DARPA). It is designed
to broaden the Linpack benchmark of raw
floating-point operations/second (flops). They
have established a target date of 2006 to release
new a benchmark.
The HPC Challenge benchmark consists of 5
hardware performance metrics
HPL - the Linpack TPP benchmark which measures
the floating point rate of execution for solving
a linear system of equations
STREAM - a simple synthetic benchmark program
that measures sustainable memory bandwidth (in
GB/s) and the corresponding computation rate for
simple vector kernels
RandomAccess - measures the rate of integer
random updates of memory
PTRANS (parallel matrix transpose) - exercises
the communications where pairs of processors
communicate with each other simultaneously. It
is a useful test of the total communications
capacity of the network
b_eff (effective bandwidth benchmark) - a set of
tests to measure latency and bandwidth of a
number of simultaneous communication patterns

15
Cray 1-A
vs.
IBM Cluster 1600
1978
2002
16
Inside the IBM 1600 cluster
The diagram above shows a schematic view of the
two-cluster configuration
The diagram above shows the configuration of a
single cluster
17
Conclusion

Benchmarking refers to a measurement standard
that serves as a point of reference by which
process performance is measured
Three of the more popular suites for
benchmarking supercomputers are the SPEChpc96
suite, the Livermore Loops, and for scientific
machines, the Linpack Kernels
The performance ratios, for important HPC
features, between supercomputers of the past and
those used today, is vastly different
As the High Performance Computing industry
grows, the benchmarks used upon supercomputers
must also grow in order to provide a yard stick
by which these systems can be measured

18
For more information