PAPI and Dynaprof - PowerPoint PPT Presentation

About This Presentation
Title:

PAPI and Dynaprof

Description:

Understanding the behavior of the application. Identification ... IBM Nighthawk, 16-way Power 3, 375MHz. FP Results/Clock: 4 (1.5 Gflips) Caches: 32K/64K, 8MB ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 56
Provided by: csU6
Learn more at: https://icl.utk.edu
Category:

less

Transcript and Presenter's Notes

Title: PAPI and Dynaprof


1
PAPI and Dynaprof
  • Application Signatures and Performance Analysis
    of Scientific Applications
  • Philip J. Mucci
  • Innovative Computing Laboratory, UTK
  • Performance Evaluation Research Center, LBL
  • mucci_at_cs.utk.edu
  • http//icl.cs.utk.edu/mucci/dynaprof/snapshots/sc
    2002.ppt

2
Goals
  • Understanding the behavior of the application
  • Identification of bottlenecks.
  • Usage of the hardware resources.
  • Effects of that usage on performance.
  • Using Dynaprof to achieve that goal
  • Command line usage
  • 3 Dynaprof probes
  • Wallclock Time
  • Hardware performance counters
  • Resource usage traces

3
Motivation
  • Optimize the application's performance.
  • Evaluate the algorithms efficiency.
  • Generate an application signature.
  • A collection of data that represent the major
    terms in the performance model.
  • Develop a performance model.

4
Overview of Hardware Counters
  • Data is NOT PORTABLE, but PAPI is...
  • Small number of registers dedicated for
    performance monitoring functions.
  • AMD Athlon, 4 counters
  • Pentium lt III, 2 counters
  • Pentium IV, 18 counters
  • IA64, 4 counters
  • Alpha 21x64, 2 counters
  • Power 3, 8 counters
  • Power 4, 8 counters to a group
  • UltraSparc II, 2 counters
  • MIPS R14K, 2 counters

5
Applications used in this Tutorial
  • Serial
  • FSPX A binary alloy solidification benchmark.
  • SWIM The SPEC shallow water benchmark.
  • Parallel (MPI)
  • Ex19 from PetSC distribution.
  • Solves nonlinear driven cavity with multigrid. A
    2D driven cavity problem solved in a
    velocity-vorticity formulation.

6
FPSX Execution Environment
  • Intel PIII, 1.2 Ghz
  • FP Results/Clock 1 1.2 Gflips
  • 4 SP/clk with SSE, 2DP/clk with SSE2
  • Caches 16K/16K, 256K
  • G77 version 2.96
  • -g -O -malign-double -mpentiumpro -funroll-loops
    -fexpensive-optimizations
  • Execution time
  • gt /bin/time fspx
  • 115.370u 0.030s 158.17 97.6 00k 00io 162pf0w

7
swim Execution Environment
  • IBM Nighthawk, 16-way Power 3, 375MHz
  • FP Results/Clock 4 (1.5 Gflips)
  • Caches 32K/64K, 8MB
  • MPI over TCP/IP via switch
  • Xlc 5.0.2.1 built with -g -O3 -qstrict
    -qarchpwr3 -qtunepwr3
  • Execution time
  • gt /bin/time poe swim -procs 2
  • 0.4u 0.0s 015 3 2173933k 00io 1pf0w

8
ex19 Execution Environment
  • IBM Nighthawk, 16-way Power 3, 375MHz
  • FP Results/Clock 4 (1.5 Gflips)
  • Caches 32K/64K, 8MB
  • Xlc 5.0.2.1 built with -g
  • Execution time
  • gt /bin/time poe ex19 -procs 2 -da_grid_x 56
    -da_grid_y 56
  • 0.520u 0.200s 044.18 1.6 2973580k 00io 0pf0w

9
Gprof
  • Gathers timer interrupts vs. text address.
  • Recompile with -p option.
  • Gprof profile is useful for a high level overview
  • Does it tell us why?

10
Gprof Profile of FSPX
11
FPSX Top 4 functions
  • Top 4 functions make up 50 of execution time
  • In module update.F
  • flux
  • proflux
  • pde
  • In module phase.F
  • phase
  • Use the list command to explore modules and
    functions

12
Gprof Profile of SWIM
13
Gprof Profile of ex19
14
Dynaprof Environment Variables
  • LD_LIBRARY_PATH Colon seperated list where to
    look for shared libraries. We need to find
  • DynInst library
  • PAPI library
  • Any dependancies on the above. (libperfctr.so,
    libcpc.so)
  • DYNINSTAPI_RT_LIB Full pathname of DynInst
    runtime library.
  • No settings necessary for AIX/DPCL port

15
Running Dynaprof
  • Usage
  • dynaprof -d serial_application
  • -d enables debugging output
  • Specifying an application automatically loads it
    into the tool immediately after initialization.

16
Command Line Interface
  • Uses GNU Readline library for input
  • Full featured Command Line Editing
  • File and command completion ltTabgt
  • History ltUpgt/ltDowngt
  • Settings, macros and aliases in /.inputrc
  • Allows Emacs or VI style bindings
  • set editing-mode emacs
  • set editing-mode vi
  • See man page, TexInfo file or home page.

17
Load command
  • Starts the application and stops it at the first
    instruction.
  • Usage
  • load ltapplicationgt args
  • gt dynaprof
  • (dynaprof) load tests/fpsx

18
Poeload command
  • For use with MPI applications on AIX and DPCL.
  • DPCL lt 3.2.5 requires full path
  • Usage
  • poeload ltapplicationgt args
  • (dynaprof) poeload tests/swim -procs 2

19
Mpiload command
  • For use with MPI applications.
  • Stops the application after it calls PMPI_Init().
  • Mostly useful for script driven execution of MPI
    jobs
  • Usage
  • mpiload ltapplicationgt args
  • (dynaprof) mpiload tests/mpicount

20
Attach command
  • Attaches to a running application (or poe
    process) and stops it.
  • Usage
  • attach ltapplicationgt ltpidgt
  • (dynaprof) Z
  • gt tests/fspx
  • 2 17500
  • gt fg
  • (dynaprof) attach tests/fspx 17500

21
Poeattach Command
  • For use with MPI applications on AIX and DPCL.
  • DPCL lt 3.2.5 requires full path
  • Usage
  • poeattach ltapplicationgt ltpid_of_poegt
  • (dynaprof) Z
  • poe ex19 -da_grid_x 56 -da_grid_y 56 -procs 2
  • 2 17500
  • gt fg
  • (dynaprof) poeattach ex19 17500

22
List command
  • list
  • List all modules in process
  • list ltpatterngt
  • List all matching modules
  • list ltmodulegt
  • List all functions in module
  • list ltmodulegt ltpatterngt
  • List all matching functions in module
  • list ltmodulegt ltfunctiongt
  • List instrumentable points in function

23
Exploring FSPX
  • G77's Fortran Runtime support
  • Code compiled with g77 without -g
  • ends up in the DEFAULT_MODULE
  • Application Code
  • Shared libraries

24
Exploring FSPX 2
  • G77's Fortran Runtime support
  • Code compiled with g77 without -g
  • ends up in the DEFAULT_MODULE

25
Exploring FSPX 3
Function Calls
26
Use command
  • Loads a probe shared library into address space
  • (dynaprof) use probe args
  • Use by itself displays current probe.
  • To change options, respecify probe.
  • 4 probes in this release
  • Wallclock Real time clock
  • PAPI Hardware metrics
  • Perfometer RT Visi of streaming hardware metrics

27
Instr command
  • instr
  • list all instrumented functions
  • instr module ltpatterngt arg
  • Instrument all functions in modules matching
    pattern
  • instr function ltmodulegt ltpatterngt arg
  • Instrument all functions matching pattern in
    module

28
Threads and Dynaprof Probes
  • For threaded code, use the same probe!
  • Dynaprof detects threads and loads a special
    version of the probe library.
  • Each probe specifies what to do when a new thread
    is discovered.
  • Each thread gets the same instrumentation.

29
Probe Warning
  • Instrumentation is not free.
  • Consider granularity of region being measured.
  • Overhead for PAPI 2.3 is O(100) cycles.
  • Between 500 and 2000 cycles for a 2 counter read.
  • Overhead for Wallclock is O(100) cycles.

30
Wallclock Probe
  • High resolution, low latency timer
  • Usage
  • use wallclockprobe
  • Reports time in microseconds, 1.0x10-6s.

31
PAPI Probe
  • Count PAPI Presets or Native Events
  • Usage
  • use papiprobe event,event,...
  • Default argument is either PAPI_FP_INS or
    PAPI_TOT_INS if the architecture doesn't support
    it.
  • Available events a can be obtained by using
  • papi_avail -a

32
PAPI Probe and Multiplexing
  • More than physical number of metrics
    automatically enables multiplexing.
  • Minimum runtime of instrumented regions must be
    observed, such that all virtual counters get a
    chance to run at least once.
  • run-timemin num_events .01s
  • Automatic warning functionality is being rolled
    into PAPI.

33
PAPI Native Events
  • Look in the PAPI distribution
  • See the README file for your architecture in the
    src directory
  • See the example program tests/native.c in the
    src/tests directory

34
Power 3 Events
35
Power 3 Events 2
36
Power 4 Events
37
Pentium III Events
38
Intel Pentium IV Events
(Arguments to perfex -e from PerfCtr distribution)
39
Sun UltraSparc II Events
40
Sun UltraSparc III Events
41
MIPS R12K Events
42
Alpha/DADD 21264 Events
43
Perfometer Probe
  • Sends a stream of performance data every N
    seconds to the Perfometer GUI.
  • Functions can be colored at instrumentation time.
  • Default color is white, 0xFFFFFF
  • Usage
  • use perfometerprobe 0xRRGGBB
  • instr ltargsgt lt0xRRGGBBgt

44
Perfometer Probe 2
  • Perfometer GUI is NOT launched automatically.
  • showrgb in X11 lists colors and names.
  • Run the Java GUI
  • Java -jar Perfometer.jar
  • Connect up to the specified hostname and port.

45
Instrumenting SWIM withperfometerprobe
46
Instrumenting FSPX forInstructions Per Cycle
47
Instrumenting SWIM forInstructions Per Cycle
48
Reporting Probe Data
  • The wallclock and PAPI probes produce very
    similar data.
  • Both use a parsing script written in Perl.
  • wallclockrpt ltfilegt
  • papiproberpt ltfilegt
  • Produce 3 profiles
  • Inclusive Tfunction Tself Tchildren
  • Exclusive Tfunction Tself
  • 1-Level Call Tree Tchild Inclusive Tfunction

49
Fspx Cycles Instrs.
50
fspx IPC
proflux 0.61 phase 0.63 flux 0.49 pde 0.46
51
Swim Cycles Instrs.
52
Swim IPC
calc2 0.59 calc1 0.53 calc3 0.46
53
Perfometer Screenshot
54
Dynaprof 0.8 SC Release
  • Binary distribution for 4 Platforms on the
    website
  • AIX 3.x / DPCL 3.2.5 on Power 3
  • Linux / DynInst 3.0 on Pentium lt III
  • Solaris 2.8 / DynInst 3.0 on UltraSparc II/III
  • IRIX / DynInst 3.0 on MIPS R10/12/14k
  • Power 4 and Pentium 4 are coming...
  • Xdynaprof Java/Swing GUI included
  • perfometerprobe and GUI included
  • Updated documentation

55
References
  • The Dynaprof Homepage
  • http//www.cs.utk.edu/mucci/dynaprof
  • The PAPI Homepage
  • http//icl.cs.utk.edu/projects/papi
  • The DynInst Homepage
  • http//www.dyninst.org
  • The DPCL Homepage
  • http//oss.software.ibm.com/developerworks/opensou
    rce/dpcl
  • The Vprof Homepage
  • http//aros.ca.sandia.gov/cljanss/perf/vprof
  • The GNU Readline Homepage
  • http//cnswww.cns.cwru.edu/chet/readline/rltop.ht
    ml
Write a Comment
User Comments (0)
About PowerShow.com