Benchmarks for Parallel Systems - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Benchmarks for Parallel Systems

Description:

Benchmarks for Parallel Systems Sources/Credits: Performance of Various Computers Using Standard Linear Equations Software , Jack Dongarra, University of ... – PowerPoint PPT presentation

Number of Views:139

Avg rating:3.0/5.0

Slides: 22

Provided by: Sathi64

Category:

more less

Transcript and Presenter's Notes

Title: Benchmarks for Parallel Systems

1
Benchmarks for Parallel Systems

Sources/Credits
Performance of Various Computers Using Standard
Linear Equations Software, Jack Dongarra,
University of Tennessee, Knoxville TN, 37996,
Computer Science Technical Report Number CS - 89
85, April 8, 2004, urlhttp//www.netlib.org/ben
chmark/performance.ps
http//www.top500.org
FAQ http//www.netlib.org/utk/people/JackDongarra
/faq-linpack.html
Courtesy Jack Dongarra (Top500)
http//www.top500.org
The LINPACK Benchmark Past, Present, and Future,
Jack Dongarra, Piotr Luszczek, and Antoine
Petitet
NAS Parallel Benchmarks. http//www.nas.nasa.gov/S
oftware/NPB/

2
LINPACK (Dongarra 1979)

Dense system of linear equations
Initially used as a users guide for LINPACK
package
LINPACK 1979
N100 benchmark, N1000 benchmark, Highly
Parallel Computing benchmark

3
LINPACK benchmark

Implemented on top of BLAS1
2 main operations DGEFA(Gaussian elimination -
O(n3)) and DGESL(Ax b O(n2))
Major operation (97) DAXPY y y a.x
Called n3/3 n2 times. Hence 2n3/3 2n2 flops
(approx.)
64-bit floating point arithmetic

4
LINPACK

N100, 100x100 system of equations. No change in
code. User asked to give a timing routine called
SECOND, no compiler optimizations
N1000, 1000x1000 user can implement any code,
should provide the required accuracy Towards
Peak Performance (TPP). Driver program always
uses 2n3/3 2n2
Highly Parallel Computing benchmark any
software, matrix size can be chosen. Used in
Top500
Based on 64-bit floating point arithmetic

5
LINPACK

100x100 inner loop optimization
1000x1000 three-loop/whole program optimization
Scalable parallel program Largest problem that
can fit in memory
Template of Linpack code
Generate
Solve
Check
Time

6
HPL (Implementation of HPLinpack Benchmark)
7
HPL Algorithm

2-D block-cyclic data distribution
Right-looking LU
Panel factorization various options
- Crout, left or right-looking recursive
variants based on matrix multiply
- Number of sub-panels
- recursive stopping criteria
- pivot search and broadcast by
binary-exchange

8
HPL algorithm

Panel broadcast
-
Update of trailing matrix
- look-ahead pipeline
Validity check
- should be O(1)

9
Top500 (www.top500.org)

Top500 1993
Twice a year June and November
Top500 gives Nmax, Rmax, N1/2, Rpeak

10
TOP500 list Data shown

Manufacturer Manufacturer or vendor
Computer Type indicated by manufacturer or
vendor
Installation Site Customer
Location Location and country
Year Year of installation/last major update
Installation Type Academic, Research, Industry,
Vendor, Classified, Government
Installation Area e.g. Research Energy /
Industry Finance
Processors Number of processors
Rmax Maxmimal LINPACK performance achieved
Rpeak Theoretical peak performance
Nmax Problem size for achieving Rmax
N1/2 Problem size for achieving half of Rmax
Nworld Position within the TOP500 ranking

11
(No Transcript)
12
India and Top 500
Rank Site SystemVendor Processors Rmax Rpeak
111 Geoscience (B)India BladeCenter HS20 Cluster, Xeon EM64T 3.4 GHz - Gig-Ethernet IBM 1024 3755 6963.2
204 Semiconductor Company (L)India eServer, Opteron 2.6 GHz, GigEthernet IBM 1024 2791 5324.8
231 Semiconductor Company (K)India xSeries x336 Cluster Xeon EM64T 3.6 GHz - Gig-Ethernet IBM 730 2676.88 5256
293 Institute of Genomics and Integrative BiologyIndia Cluster Platform 3000 DL140G2 Xeon 3.6 GHz Infiniband Hewlett-Packard 576 2156 4147.2
13
(No Transcript)
14
(No Transcript)
15
NAS Parallel Benchmarks - NPB

Also for evaluation of Supercomputers
A set of 8 programs from CFD
5 kernels, 3 pseudo applications
NPB 1 Original benchmarks
NPB 2 NASs MPI implementation. NPB 2.4 Class D
has more work and more I/O
NPB 3 based on OpenMP, HPF, Java
GridNPB3 for computational grids
NPB 3 multi-zone for hybrid parallelism

16
NPB 1.0 (March 1994)

Defines class A and class B versions
Paper and pencil algorithmic specifications
Generic benchmarks as compared to MPI-based
LinPack
General rules for implementations Fortran90 or
C, 64-bit arithmetic etc.
Sample implementations provided

17
Kernel Benchmarks

EP embarrassingly parallel
MG multigrid. Regular communication
CG conjugate gradient. Irregular long distance
communication
FT a 3-D PDE using FFT. Rigorous test of long
distance communication
IS large integer sort
Detailed rules regarding
- brief statement of the problem
- algorithm to be practiced
- validation of results
- where to insert timing calls
- method for generating random numbers
- submission of results

18
Pseudo applications / Synthetic CFDs

Benchmark 1 perform few iterations of the
approximate factorization algorithm (SP)
Benchmark 2 - perform few iterations of diagonal
form of the approximate factorization algorithm
(BT)
Benchmark 3 - perform few iterations of SSOR (LU)

19
Class A and Class B
Class A
Sample Code
Class B
20
NPB 2.0 (1995)

MPI and Fortran 77 implementations
2 parallel kernels (MG, FT) and 3 simulated
applications (LU, SP, BT)
Class C bigger size
Benchmark rules 0, 5, gt5 change in source
code

21
NPB 2.2 (1996), 2.4 (2002), 2.4 I/O (Jan 2003)

EP and IS added
FT rewritten
NPB 2.4 class D and rationale for class D sizes
2.4 I/O a new benchmark problem based on BT
(BTIO) to test the output capabilities
A MPI implementation of the same (MPI-IO)
different options using collective buffering or
not etc.

22
Thank You !

Write a Comment

User Comments (0)