Title: Itanium 2 Profiling Tools: Performance monitoring events Pfmon (Open Source) Intel Vtune Analyzer
1Itanium 2 Profiling Tools Performance
monitoring events Pfmon (Open Source) Intel
Vtune Analyzer
- Arthur Raefsky
- raefsky_at_sgi.com
2Agenda
- Performance Monitoring Events
- Based on talk by David Levinthal of Intel
- presented at Fall IDF.
- Pfmon/Profile.pl
- VTune
3Overview
- Profiling Tools
- The Intel Vtune Performance Analyzer
- Collects, analyzes, and displays performance data
for Windows and Linux systems - Applications ( Both single and Multi-threaded)
- System-wide Profile
- No special build required
- Very low overhead
- Linux remote sampling
4Overview
- Profiling Tools
- Pfmon Performance Analyzer
- Collects, analyzes, and displays performance data
for Linux systems - Applications ( Both user and kernel level)
- System-wide Profile
- No special build required
- Very low overhead
- Pfmon will be shipped with all SNIA systems
5Optimization Guide
6Itanium 2 Architecture
7Performance Monitoring Events
8Performance Monitoring Events
9Performance Monitoring Events
10Performance Monitoring Events
11Performance Monitoring Events
12Performance Monitoring Events
13Performance Monitoring Events
14Performance Monitoring Events
15Performance Monitoring Events
16Performance Monitoring Events
17Performance Monitoring Events
18Performance Monitoring Events
19Performance Monitoring Events
pfmon --smpl-outfilesample.out \
--smpl-entries100000 \ -u --short-smpl-periods9
958 \ --smpl-output-formatdetailed-itanium2 \
--eventsDATA_EAR_CACHE_LAT8 ./bar
Sample entry in the file sample.out entry 1
PID9133 CPU1 STAMP0xb6fd88b06dbf
IIP0x40000000000035c0 PMD OVFL 4
PMD2 0x200000000042c448 PMD3
0x0000000000004009, valid Y, latency 9, overflow
N PMD17 0x4000000000003608, valid Y,
bundle 0, address 0x4000000000003600
20Performance Monitoring Events
21Performance Monitoring Events
22Performance Monitoring Events
23Pfmon / Profile .pl
- Profile.pl
- Written by Ray Bryant
- Profile.pl is a Perl script that provides a
simple way to do procedure- level profiling of an
unmodified binary on an SDV or SN2 system, - The simplest way to use these scripts is as
follows
profile.pl -c0-3 x6 test_program - In this case, it is assumed that the test_program
uses 4 processes.The 4 processes will be bound to
processors 0-3 (via dplace) and the program will
profiled under control of pfmon. The profile
event will be CPU_CYCLES and the PMU will be set
up to generate approximately 1,000 interrupts per
second. - The profile.pl script will create a map file
(using makemap.pl) for test_program and put it
into test_program.map. - The profile samples themselves will go into
sample.out. The analyzed profile will
go into profile.out.
24Pfmon / Profile .pl example
25Pfmon / Profile .pl example
26Pfmon / Profile .pl profile.out
27Pfmon / Profile .pl profile.out
Understanding _shell_207_par_loop5 _functionName_
Line_par_loopXX Find function shell Go to line
207
28Pfmon / Profile .pl (OpenMP)
On a SNIA system if test_program is an OpenMP
program, then you need to specify the "-x6
option as well (to get dplace to ignore the two
shepherd processes that the OpenMP library
creates) profile.pl -c0-3 -x6
test_program Program arguments can be supplied as
follows profile.pl -c0-3 -x6 test_program arg1
arg2 arg3 etc To make input or output redirection
apply to test_program only, you need to put
quotes around the program name as
follows profile.pl -c0-3 -x6 "test_program
ltinput gtoutput" otherwise the redirection applies
to profile.pl instead, which is probably not what
you wanted.
29Pfmon / Profile .pl (MPI)
. To use MPI with profile.pl mpirun -np 4
/usr/bin/profile.pl c0-3 s1 ./blast_waves lt
input
30Pfmon Profile.out (kernel)
31Pfmon Profile.out (User)
32Pfmon / Profile .pl (MPI)
. To use MPI with profile.pl mpirun -np 4
/usr/bin/profile.pl -K c0-3 s1 ./blast_waves lt
input -K keep the separate per cpu sample files
around and produce a separate profile report for
each cpu.
33Pfmon / Profile.pl (List of commands)
34Pfmon / Profile .pl example 1
Start application mpirun np 4 ./blastwave lt
input Run Top to get the PIDs of processes you
want to profile When application has reached
point where you want to start profiling, issue
the command profile.pl -T (secs Run a timed
profile experiment for the given number of
seconds.) -P (Program name)
-c0-3 -L (pidlist pidlist is a
comma separated list of pid's (containing no
blanks)This list will be passed to profile
analyzer and will restrict profiling to these
process ID's.
35Pfmon / Profile .pl example 1
profile.pl -c0-3 -P blast_waves -T 120 -L
1527,1528,1529,1530
36Pfmon / Profile .pl example 2
ecc -o barkern -O3 -ftz ./barkern44.c ecc -o bar
-O3 -ftz -mP3OPT_ecg_mm_fp_ld_latency16./barkern
44.c
37Pfmon / Profile .pl example 2
Barkern 6122528008 BACK_END_BUBBLE_ALL 60804
04237 BE_EXE_BUBBLE_ALL 642645
BE_FLUSH_BUBBLE_ALL 41149543
BE_L1D_FPU_BUBBLE_ALL 12383933977
CPU_CYCLES BE_EXE_BUBBLE_ALL/ CPU_CYCLES
.49 Bar 2654275323 BACK_END_BUBBLE_ALL 2567
404906 BE_EXE_BUBBLE_ALL 615358
BE_FLUSH_BUBBLE_AL 85854187
BE_L1D_FPU_BUBBLE_ALL 9035670097
CPU_CYCLES BE_EXE_BUBBLE_ALL/ CPU_CYCLES .28
38Pfmon / Profile .pl example 2
39Pfmon / Profile .pl example 2
// Block 9 lentry lexit ltail collapsed
pipelined Pred 9 8 Succ 9 10 -S // Freq
1.2e05, Prob 0.99 .b3_9 // emit lab 1
.mfi (p16) ldfd f32r15,8
//0115 1207 (p17) fma.d
f41f48,f37,f42 //8118
1215 nop.i 0 .mfi (p16) ldfd
f36r14,8
//0115 1208 (p17) fma.d f45f33,f51,f46
//8119 1217 nop.i
0
40Pfmon / Profile .pl example 2
.mfi nop.m 0 (p16) fma.d
f34f32,f36,f35 //6115
1209 nop.i 0 .mfi nop.m
0 (p17) fma.d f38f48,f52,f39
//14130 1230 nop.i 0
41Pfmon / Profile .pl example 2
42Pfmon / Profile .pl example 2
ecc -o barkern -O3 -ftz ./barkern44.c ecc -o bar
-O3 -ftz --mP3OPT_ecg_mm_fp_ld_latency16
./barkern44.c
43Guideview
To get the profiling statistics for OpenMP use
the following compiler options -O3
-openmp -openmp_profile This will cause the
linker to use libguide_stats.a instead of
libguide.a For example efc O3 -openmp
-openmp_profile o swim swim.f To
get the profiling data you simply run the
program. For example export
OMP_NUM_THREADS8
./swim lt swim.in Once the program has
finished a file named swim.gvs will be
produced.
44Guideview
Without Java, the functionality of Guideview is
severely limited but text output is still
available and is useful. The graphical portions
of Guideview require Java. Java 1.1.6-8 and
Java 1.2.2 version are supported. Later versions
seem to work also. To invoke guideview guideview
-jpath/root/java/j2sdk1.4.1/bin/java -mhz998
./swim.gvs NOTE a beta version of guideview is
now available for Vtune
45Guideview Main panel
46Guideview Region View
47Guideview Thread View
48lipfpm and histx
lipfpm does not work on statically linked
applications. The correct invocation for MPI is
mpirun -np N lipfpm ltlipfpm args including
"-f"gt a.out lta.out argsgt histx does not work on
statically linked executables The correct
invocation for MPI is mpirun -np N histx lthistx
args including "-f"gt a.out lta.out argsgt When
using dplace on OpenMP codes, the correct
invocation is dplace ltdplace args including
"-x13"gt histx lthistx args, "-f" not requiredgt \
a.out lta.out argsgt
49Vtune Main Panel
50Vtune Pick View As Table
51Vtune Edit Menu, Pick Filter
52Vtune To Drill down, Click on process ID
53Vtune