Title: Profiling S3D on Cray XT3 using TAU
1Profiling S3D on Cray XT3 using TAU
- Sameer Shende
- tau-team_at_cs.uoregon.edu
2Acknowledgements
- Alan Morris UO
- Kevin Huck UO
- Allen D. Malony UO
- Kenneth Roche ORNL
- Bronis R. de Supinski LLNL
3TAU Parallel Performance System
- http//www.cs.uoregon.edu/research/tau/
- Multi-level performance instrumentation
- Multi-language automatic source instrumentation
- Flexible and configurable performance measurement
- Widely-ported parallel performance profiling
system - Computer system architectures and operating
systems - Different programming languages and compilers
- Support for multiple parallel programming
paradigms - Multi-threading, message passing, mixed-mode,
hybrid
4TAU Performance System Architecture
event selection
5TAU Performance System Architecture
6Program Database Toolkit (PDT)
Application / Library
C / C parser
Fortran parser F77/90/95
Program documentation
PDBhtml
Application component glue
IL
IL
SILOON
C / C IL analyzer
Fortran IL analyzer
C / F90/95 interoperability
CHASM
Program Database Files
Automatic source instrumentation
TAU_instr
DUCTAPE
7PAPI
- Performance Application Programming Interface
- The purpose of the PAPI project is to design,
standardize and implement a portable and
efficient API to access the hardware performance
monitor counters found on most modern
microprocessors. - Parallel Tools Consortium project
- Developed by University of Tennessee, Knoxville
- http//icl.cs.utk.edu/papi/
8S3D - Building with TAU
- Change name of compiler in build/make.XT3
- ftngt tau_f90.sh
- cc gt tau_cc.sh
- Set compile time environment variables
- setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_late
st/xt3/lib/ Makefile.tau-callpath-multiplecounte
rs-mpi-papi-pdt-pgi - Choose callpath, PAPI counters, MPI profiling,
PDT for source instrumentation - setenv TAU_OPTIONS -optTauSelectFileselect.tau
-optPreProcess - Selective instrumentation file eliminates
instrumentation in lightweight routines - Pre-process Fortran source code using cpp before
compiling - Set runtime environment variables for
instrumentation control and event PAPI counter
selection in job submission script - export TAU_THROTTLE1
- export COUNTER1 GET_TIME_OF_DAY
- export COUNTER2 PAPI_FP_INS
- export COUNTER3 PAPI_L1_DCM
- export COUNTER4 PAPI_RES_STL
- export COUNTER5 PAPI_L2_DCM
9Selective Instrumentation in TAU
cat select.tau BEGIN_EXCLUDE_LIST MCADIF GETRATE
S TRANSPORT_MMCAVIS_NEW MCEDIF MCACON CKYTCP THE
RMCHEM_MMIXCP THERMCHEM_MMIXENTH THERMCHEM_M
GIBBSENRG_ALL_DIMT CKRHOY MCEVAL4 THERMCHEM_MHIS
THERMCHEM_MCPS THERMCHEM_MENTROPY END_EXCLUDE
_LIST BEGIN_INSTRUMENT_SECTION loops
routine"" END_INSTRUMENT_SECTION
10TAUs ParaProf Profile Browser - Manager
Derived Metrics Flops PAPI_FP_INS/wallclock time
11Main Window - 8 cpus (MPI Ranks 0-7)
Some routines execute on different sets of
processors
12Mean Profile Over 8 cpus -- Exclusive Time
13Mean Percentage -- Exclusive Time
14Loop Level Profile With PAPI Counter Data
15ParaProfs Source Browser
16Exclusive MFLOPS
17FP Instructions per L1 Data Cache Miss (rank 0)
18Level 1 Data Cache Misses
19Callpath Profiles
20Callpath Profiles Flops, Resource Stalls
21Callpath Thread Relations Window
parent
routine
children
22Flat Profile
23TAUs ParaProf Profile Browser - Manager
Different sections of code within the same
routine execute on odd and even processors!
243D Window Rank, Routine, Time, Instructions
253D Window Variations in FP/L1 DCM ratios
26Getting Access to TAU on Jaguar
- set path(/spin/proj/perc/TOOLS/tau_latest/x86_64/
bin path) - Choose Stub Makefiles (TAU_MAKEFILE env. var.)
from /spin/proj/perc/TOOLS/tau_latest/xt3/lib/Make
file. - Makefile.tau-mpi-pdt-pgi (flat profile)
- Makefile.tau-mpi-pdt-pgi-trace (event trace, for
use with Vampir) - Makefile.tau-callpath-mpi-pdt-pgi (single metric,
callpath profile) - Binaries of S3D can be found in
- sameer/scratch/S3D-BINARIES
- withtau
- papi, multiplecounters, mpi, pdt, pgi options
- without_tau