Title: PAPI Update
1 PAPI Update Shirley Browne, Cricket Deane,
George Ho, Philip Mucci browne_at_cs.utk.edu,
gho_at_cs.utk.edu, mucci_at_cs.utk.edu University of
Tennessee Computer Science Department
Ptools Annual Meeting 1999
2Review Why PAPI? - Hardware counters exist on
every major processor today and can provide
performance tool developer a basis for tool
development and application developers valuable
information about sections of their code that can
be improved. - However, there are only a few
APIs that allow access to these counters, and
most of them are poorly documented, unstable or
unavailable. - Also, performance metrics may
have different definitions. (graduated vs.
speculative)
3PAPI Project Goals
- To provide a lightweight, portable, and
straightforward API to access these counters on
major HPC platforms - To provide a common subset of these performance
metrics on all platforms.
4PAPI Project Goals (cont.)
- Provide application developers the information
they may need to tune their codes on different
platforms - Encourage vendors to standardize the interface to
and semantics of the hardware counters
5PAPI Project Goals (cont.)
- To make it easy to write tools for
- Performance analysis
- Performance modeling
- Feedback directed compilation
6Current Status
- API Spec http//icl.cs.utk.edu/projects/papi/
- R10K, Pentium Pro, Pentium II nearly complete
- Null substrate written to help test and debug on
any platform
7Current Status (cont.)
- Library calls working
- PAPI_add_event() PAPI_read()
- PAPI_reset() PAPI_write()
- PAPI_set_opt() PAPI_get_opt()
- PAPI_start() PAPI_accum()
- PAPI_stop()
8include ltstdio.hgt include ltunistd.hgt include
lterrno.hgt include ltsys/types.hgt include
ltmemory.hgt include "papiStdEventDefs.h" include
"papi.h" include "papi_internal.h" void main()
int r, i double a, b, c unsigned long
long ct3 int EventSet PAPI_NULL
PAPI_option_t options rPAPI_add_event(EventSe
t, PAPI_FP_INS) rPAPI_add_event(EventSet,
PAPI_TOT_INS) rPAPI_add_event(EventSet,
PAPI_TOT_CYC) options.domain.eventset1
options.domain.domainPAPI_DOM_DEFAULT
rPAPI_set_opt(PAPI_SET_DOMAIN, options)
rPAPI_reset(EventSet) rPAPI_start(EventSet)
a 0.5 b 6.2 for (i0 i lt 50000000
i) c ab rPAPI_stop(EventSet, ct)
9Script started on Wed Apr 14 190740
1999 gho_at_redwood/papi/srcgt uname -a Linux
redwood.cs.utk.edu 2.0.36 22 Sun Feb 21 165712
EST 1999 i686 unknown gho_at_redwood/papi/srcgt
make clean rm -rf papi.o linux-pentium.o
libpapi.a example1 example2 example3 first second
example1.o example2.o example3.o first.o core
gho_at_redwood/papi/srcgt make first gcc -g
-DDEBUG -Wall -c first.c -o first.o gcc -g
-DDEBUG -Wall -c papi.c -o papi.o gcc -g
-DDEBUG -Wall -c linux-pentium.c -o
linux-pentium.o ar ruv libpapi.a papi.o
linux-pentium.o a - papi.o a - linux-pentium.o gcc
-g first.o -o first libpapi.a gho_at_redwood/papi/
srcgt first DEBUG CPU number 1 at 200 MHZ
found DEBUG Empty slot for EventSetInfo at
2 DEBUG PAPI_reset returns 0 DEBUG PAPI_start
returns 0 DEBUG PAPI_stop values0 50000338 DE
BUG PAPI_stop values1 350002016 DEBUG
PAPI_stop values2 298700134
10gho_at_redwood/papi/srcgt rsh picasso gho_at_picassogt
uname -a IRIX64 picasso 6.5 05190004
IP28 gho_at_picassogt cd papi/src/ gho_at_picasso/pap
i/srcgt make clean rm -rf papi.o irix-mips.o
libpapi.a example1 example2 example3 first second
example1.o example2.o example3.o first.o core
gho_at_picasso/papi/srcgt make first cc -g
-DDEBUG -fullwarn -O0 -c first.c cc -g -DDEBUG
-fullwarn -O0 -c papi.c cc -g -DDEBUG
-fullwarn -O0 -c irix-mips.c ar ruv libpapi.a
papi.o irix-mips.o a - papi.o a -
irix-mips.o ar Warning creating libpapi.a cc -g
-O0 first.o -o first libpapi.a gho_at_picasso/papi/
srcgt first DEBUG CPU number 1 at 195 MHZ
found DEBUG PAPI_stop values0
49892512 DEBUG PAPI_stop values1
650005635 DEBUG PAPI_stop values2
451049048 gho_at_picasso/papi/srcgt
exit logout gho_at_redwood/papi/srcgt
exit exit Script done on Wed Apr 14 190934
1999
11Specifics
- Overflow
- Multiplexing
- Implementation details
12Overflow
- If requested, PAPI can notify the user when a
hardware counter exceeds a certain threshold even
when the kernel or hardware cannot. - How? A high resolution interval timer with a
default setting of 1 ms. Check for overflow and
call user handler when necessary.
13Multiplexing
- If requested, PAPI can multiplex the hardware
counters even when the kernel cannot. - How? A high resolution interval timer with a
default setting of 1 ms. User programmable. - Accurate? As can be only multiplex the active
events. Best in user domain.
14PAPI A first application
- Curtis Janssens vperf graphical (Qt) performance
visualizer. - Based on bprof. Gives line by line profiling.
- All vperf needs is a hash table of text addresses
to the number of interrupts at that address. More
interrupts mean more time or events. - Stay tuned.
15Next steps
- Substrates (IBM, Linux/EV6, Ultra)
- Overflow (95 complete)
- Multiple nested event sets. (Ans. 2 new substrate
functions) - Threading issues. Safety, Portability, Accuracy.
(Ans. OpenMP thread library calls and a portable
spin-lock)
16- Related Work
- Rabbit - Don Heller http//www.scl.ameslab.gov/Pr
ojects/Rabbit/ - Perf - Erik Hendriks ftp//www.beowulf.org/www.beo
wulf.org/software/perf.html ported to Linux 2.1.x
and 2.2.x by Curtis Janssen http//aros.ca.sandia.
gov/cljanss/perf/ - PCL - Performance Counter Library
- More at http//icl.cs.utk.edu/projects/papi/refere
nce.html
17- More Information
- The draft API is available at
- http//icl.cs.utk.edu/projects/papi/
- To join the projects email reflector, send a
message to majordomo_at_ptools.org with the message
subscribe ptools-PAPI
18 A Parallel Tools Consortium Sponsored
project http//www.ptools.org/ Work partially
funded by the DoD High Performance Computing
Modernization Program, CEWES and ARL Major Shared
Resource Centers, through Programming Environment
and Training (PET) Views, opinions, and/or
findings contained in this report are those of
the author(s) and should not be construed as an
official Department of Defense position, policy
or decision unless so designated by other
official documentation.