Optimizing Matrix Multiply - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Optimizing Matrix Multiply

Description:

Overture ... OVERTURE. Semi-Structured Meshes. CHOMBO (AMR) Hypre. OVERTURE. PETSc. Structured Meshes. OVERTURE. Grid Generation. Global Arrays. Shared-Memory ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 60
Provided by: kath222
Category:

less

Transcript and Presenter's Notes

Title: Optimizing Matrix Multiply


1
The U.S. DOE Advanced CompuTational Software
(ACTS) Collection
Tony Drummond Lawrence Berkeley National
Laboratory LADrummond_at_lbl.gov
2
OUTLINE
  • Motivation
  • Introduction to the DOE ACTS Collection
  • Interfaces to the ACTS Collection
  • Software Sustainability Requirements
  • References

3
Where are the applications?
Development of High End Computer Simulations
  • Accelerator Science
  • Astrophysics
  • Biology
  • Chemistry
  • Earth Sciences
  • Materials Science
  • Nanoscience
  • Plasma Science
  • Commonalities
  • Major advancements in Science
  • Increasing demands for computational power
  • Rely on available computational systems,
  • languages, and software tools

4
Software Development and Evolution
mintime_to_first_solution (prototype)
  • Outlive Complexity
  • Increasingly sophisticated models
  • Model coupling
  • Interdisciplinary
  • Sustained Performance
  • Increasingly complex algorithms
  • Increasingly diverse architectures
  • Increasingly demanding applications

5
OUTLINE
  • Motivation
  • Introduction to the DOE ACTS Collection
  • Interfaces to the ACTS Collection
  • Software Sustainability Requirements
  • References

6
THE U.S. DOE ACTS COLLECTION
Goal The Advanced CompuTational Software
Collection (ACTS) makes reliable and efficient
software tools more widely used, and more
effective in solving the nations engineering and
scientific problems.
  • References
  • L.A. Drummond, O. Marques An Overview of the
    Advanced CompuTational Software (ACTS)
    Collection. ACM Transactions on Mathematical
    Software Vol. 31 pp. 282-301, 2005
  • http//acts.nersc.gov

7
The Advanced CompuTational Software Collection
(ACTS)
  • Components
  • Solid Base non-commercial and open source tools
    developed at DOE laboratories and universities.
  • Independent Tool Evaluations and Consultation
    provided through acts-support_at_nersc.gov
  • High Level User Support problem identification,
    tool and interface selection, specific tuning
    parameter configurations, installation,
    documentation, etc.
  • Training and Dissemination workshops, lectures,
    active conference participation (acts.nersc.gov.
  • Collaborations with HPC centers, computational
    sciences research centers (national and
    international level), and software and computer
    vendors.

8
(No Transcript)
9
Software Sustainability
Algorithmic Implementations
I/O
Application Data Layout
Control
Tuned and machine Dependent modules
10
Software Sustainability
USER's APPLICATION CODE (Main Control)
Compilers Expert Drivers Support
AVAILABLE
AVAILABLE
Algorithmic Implementations
AVAILABLE
I/O
Application Data Layout
LIBRARIES
LIBRARIES PACKAGES
LIBRARIES PACKAGES
Tuned and machine Dependent modules
11
Critical Path for HPC Software Stack
  • Simulation codes
  • Data Analysis codes

General Purpose Libraries
  • Algorithms
  • Data Structures
  • Code Optimization
  • Programming Languages
  • O/S - Compilers

Hardware - Middleware - Firmware
12
Critical Path for HPC Software Stack
General Purpose Libraries
Hardware - Middleware - Firmware
13
ACTS Numerical Tools Functionality
14
ACTS Numerical Tools Functionality
15
Structure of PETSc
16
Hypre Conceptual Interfaces
17
Hypre Conceptual Interfaces to Solvers
List of Solvers and Preconditioners per
Conceptual Interface
18
ACTS Numerical Tools Functionality
19
ACTS Numerical Tools Functionality
20
ACTS Numerical Tools Functionality
21
ACTS Numerical Tools Functionality
22
ACTS Numerical Tools Functionality
23
TAO - Interface with PETSc
24
OPT Interfaces
  • Four major classes of problems available
  • NLF0(ndim, fcn, init_fcn, constraint)
  • Basic nonlinear function, no derivative
    information available
  • NLF1(ndim, fcn, init_fcn, constraint)
  • Nonlinear function, first derivative information
    available
  • FDNLF1(ndim, fcn, init_fcn, constraint)
  • Nonlinear function, first derivative information
    approximated
  • NLF2(ndim, fcn, init_fcn, constraint)
  • Nonlinear function, first and second derivative
    information available

25
ACTS Numerical Tools Functionality
26
ACTS Numerical Tools Functionality
27
ACTS Tools Functionality
28
ACTS Tools Functionality
29
OUTLINE
  • Motivation
  • Introduction to the DOE ACTS Collection
  • Interfaces to the ACTS Collection
  • Software Sustainability Requirements
  • References

30
CALL BLACS_GET( -1, 0, ICTXT ) CALL
BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL
) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL,
MYROW, MYCOL ) CALL PDGESV( N, NRHS, A, IA,
JA, DESCA, IPIV, B, IB, JB, DESCB,
INFO )
Language Calls
Command lines
Problem Domain
31
Tool to Tool Interoperability
One Side Interoperability
TOOL B
TOOL C
TOOL A
TOOL F
TOOL E
TOOL D
32
High-level User Interfaces to the ACTS Collection
Ax b
View_field(T1)
User
PAWS
OPT
CUMULVS
TAU
Globus
Chombo
AZTEC
Hypre
Global Arrays
PETSc
ScaLAPACK
PVODE
SuperLU
TAO
Overture
33
PyACTS
Vicente Galiano Miguel Hernandez University
Tony Drummond Lawrence Berkeley National
Laboratory
Violeta Migallón and José Penadés University of
Alicante
Goal Provide a didactical tool to the ACTS
collection. Provide a Python based interface to
the ACTS Collection.
  • References
  • L. A. Drummond, V. Galiano, O. Marques, V.
    Migallon, J.Penades PyACTS A High-level
    Framework for Fast Development of High
    Performance Applications. Lecture Notes in
    Computer Sciences, Vol. 4395, pp 417-425, 2007.

34
PyACTS
PyACTS
PyScaLAPACK
PySuperLU
PyACTS Wrappers
ScaLAPACK Wrappers
SuperLU Wrappers
Python World
PyMPI
NumPy
. . .
ScaLAPACK
SuperLU
Python
35
PyACTS Basic Services
  • BASIC Services Creation and modification of
    different data objects and parallel environment
    specifications (matrices, data layouts, ctx,)
  • I/O Services Parallel read/write. Currently
    supported ASCII and NetCDF.
  • Verification and Validation Predicates and
    parameter type checking.
  • Data Conversion. Interoperable objects between
    libraries.

36
PyACTS Motivation
PyClimate (J. Saenz et al,Univ. Basque Country)
  • Support to common tasks during the analysis of
    climate variability data.
  • Simple IO operations
  • Operations with COARDS-compliant NetCDF files
  • Empirical Orthogonal Function (EOF) analysis,
  • Canonical Correlation Analysis (CCA)
  • Singular Value Decomposition (SVD) analysis of
    coupled datasets
  • Some linear digital filters
  • Kernel based probability-density function
    estimation and
  • access to DCDFLIB.C library from Python.

37
PyACTS Performance in PyClimate EOF calculations
Empirical Orthogonal Function (Day calc)
38
PyScaLAPACK pvgesvd Performance
39
PyACTS Performance
  • gt from PyACTS import
  • gt import PyACTS.PyPBLAS as PyPBLAS
  • gt import time
  • gt n500
  • gt ACTS_lib1 ScaLAPACK library
  • gt PyACTS.gridinit() grid initialization
  • gt alphaScal2PyACTS(2,ACTS_lib) convert scalar
  • to PyACTS scalar
  • gt betaScal2PyACTS(3,ACTS_lib)
  • gt aRand2PyACTS(n,n,ACTS_lib) generate a random
  • PyACTS array
  • gt bRand2PyACTS(n,n,ACTS_lib)
  • gt cRand2PyACTS(n,n,ACTS_lib)
  • gt cPyPBLAS.pvgemm(alpha,a,b,beta,c) call level
    3
  • PBLAS routine
  • gt PyACTS.gridexit()

cPyPBLAS.pvgemm(alpha,a,b,beta,c)
40
OUTLINE
  • Motivation
  • Introduction to the DOE ACTS Collection
  • Interfaces to the ACTS Collection
  • Software Sustainability Requirements
  • References

41
Problem Statement Software Sustainability
  • THE GOOD
  • Many successful HPC stories have induced major
    advances in science and engineering
  • We have successful run and scale applications on
    100000 processors
  • THE BAD
  • Portability Across Platforms is Still An
    Outstanding Issue
  • Readiness
  • Performance
  • Robustness and Correctness
  • THE UGLY
  • Multi-Core and Many Core Era is knocking at the
    HPC door

42
Problem Statement Software Sustainability
  • THE GOOD
  • Many successful HPC stories have induced major
    advances in science and engineering
  • We have successful run and scale applications on
    100000 processors
  • THE BAD
  • Portability Across Platforms is Still An
    Outstanding Issue
  • Readiness
  • Performance
  • Robustness and Correctness
  • THE UGLY
  • Multi-Core and Many Core Era is knocking at the
    HPC door

43
Problem Statement Software Sustainability
  • THE GOOD
  • Many successful HPC stories have induced major
    advances in science and engineering
  • We have successful run and scale applications on
    100000 processors
  • THE BAD
  • Portability Across Platforms is Still An
    Outstanding Issue
  • Readiness
  • Performance
  • Robustness and Correctness
  • THE UGLY
  • Multi-Core and Many Core Era is knocking at the
    HPC door

44
Software Quality Assurance
  • Robustness
  • Scalability
  • Extensibility
  • Interoperability
  • User Friendliness
  • Documentation
  • Periodic test and evaluations
  • (test engines and dependency graphs)

45
ScaLAPACKs Software Structure
46
BLAS Basic Linear Algebra Subroutines
BLAS LEVELS
  • Level 1 BLAS vector-vector
  • Level 2 BLAS matrix-vector
  • Level 3 BLAS matrix-matrix
  • Design Considerations
  • Portability
  • Performance development of blocked algorithms is
    important for performance!

47
ScaLAPACK Data Layouts
  • 1D block and column distributions
  • 1D block-cycle column and 2D block-cyclic
    distribution
  • 2D block-cyclic distribution used in ScaLAPACK
    for dense matrices

48
Astrophysics Applications
Cosmic Microwave Background Analysis, BOOMERanG
collaboration, MADCAP code (Apr. 27, 2000).
  • The statistics of the tiny variations in the CMB
    (the faint echo of the Big Bang) allows the
    determination of the fundamental parameters of
    cosmology to the percent level or better.
  • MADCAP (Microwave Anisotropy Dataset
    Computational Analysis Package)
  • Makes maps from observations of the CMB and then
    calculates their angular power spectra. (See
    http//crd.lbl.gov/borrill).
  • Calculations are dominated by the solution of
    linear systems of the form MA-1B for dense nxn
    matrices A and B scaling as O(n3) in flops.
    MADCAP uses ScaLAPACK for those calculations.

49
PETSc
Image Provided by PETSc Development Team, ANL)
50
Basic Conjugate Gradient Algorithm
Scalars ?, ?, y ?
Vectors x, r, p ( search direction), and q
51
Preconditioning Matrices
Gauss-Seidel M D-E Uses lower triangular
part of matrix A Jacobi M D Uses diagonal
of A SOR M 1/?(D- ?E), Uses lower
triangular part of A SSOR M 1/?(2- ?) (D-
?E)D-1(D- ?F) Uses the whole matrix A
52
PETSc Matrix Distribution
M8,N8,m3,nk1 rstart0,rend4
proc 1
M8,N8,m3,nk2 rstart3,rend6
proc 2
M8,N8,m2,n k3 rstart6,rend8
proc 3
53
Software Dependency Graph
  • Software Dependency Tree
  • ScaLAPACK PBLAS, LAPACK
  • LAPACK BLAS
  • PBLAS BLACS, MPI
  • Computational Platform Dependency
  • ScaLAPCK compilescompiler-list
  • optionscompile-options
  • Software Testing

Python-base scripts
ScaLAPACK testsdir-list
54
Software Sustainability
Software Testing Engines (automatic)
Errors/Problems
No
End
yes
Fix/Report and Document
User Reported Problems
55
Software Sustainability
Performance and Scalability
  • Profiling and Tracing Tools TAU
  • Auto-Tuning (OSKI, ATLAS like)

56
Software Sustainability Requirement
57
ACTS Software Sustainability Center
t8
Sustainable Software Support
t8
58
Open Challenges - Multi-core
  • Improve interactions between Tool-Compilers-Hardwa
    re
  • Software Distribution and Installation
  • Automatic Tuning and Profiling (TAU, IPM, etc)
  • Automatic Code Generators (ATLAS-like)
  • Debugging tools
  • Tools and Language Interoperability

59
References
  • ACTS Information Center http//acts.nersc.gov
  • Two Upcoming Journal Issues dedicated to ACTS
  • Ninth ACTS Collection Workshop, August 19-22,
    2008

IJHPCA
ACM TOMS
Write a Comment
User Comments (0)
About PowerShow.com