Dynaprof Evaluation Report - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Dynaprof Evaluation Report

Description:

Camel overhead very high. Only instrumented main. LU overhead really low? ... CAMEL: FAILED. Instrumenting main caused too much application perturbation ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 18

Provided by: dral60

Category:

more less

Transcript and Presenter's Notes

Title: Dynaprof Evaluation Report

1
Dynaprof Evaluation Report

Adam Leko,
Hans Sherburne
UPC Group
HCS Research Laboratory
University of Florida

Color encoding key Blue Information Red
Negative note Green Positive note
2
Basic Information

Name Dynaprof
Developer Philip Mucci (UTK)
Current versions
Dynaprof CVS as of 2/21/2005
DynInst API v4.1.1 (dependency)
PAPI v3.0.7 (dependency)
Website http//www.cs.utk.edu/mucci/dynaprof/
Contact
Philip Mucci (mucci_at_cs.utk.edu)

3
Dynaprof Overview
DynaProf 0.9 Philip J. Mucci, mucci_at_cs.utk.edu,
2000-2003 Provided courtesy of UTK's Innovative
Computing Laboratory. See http//icl.cs.utk.edu
for more information. This is Open Source
Software! (dynaprof)

Merges existing tools
PAPI
DynInst API
Command-line tool
Dynamically instruments programs at runtime
Requires no recompilation!
Insert probes at runtime
Metrics available
Wall clock time
Any PAPI metrics
Can be extended
Only simple GUI available (see right)
Just wrapper around command-line version
Currently pretty broken

4
Instrumentation Overview

Instrumentation very easy
Especially for sequential/threaded applications
Compile application regularly (-g eases naming
later)
gcc -O3 -g -o camel camel.c
Dynaprof commands
Load the exe
load camel
Specify which probe you wish to use
use papiprobe args
List available functions
list camel.c
Instrument command
All functions in a file instr module camel.c
A single function instr function camel.c main
Run command
continue
pauses execution (currently does not
work)
Instrumentation output is produced in an
additional file (will be shown at runtime)

5
Instrumentation Overview (2)

No special commands needed for
sequential applications
pthread applications
MPI not supported directly through command line
Wrapper scripts available for MPICH and LAM
Dynaprof must be run in batch mode
A file containing all instrumentation commands
Halts the app before MPI_Init() is called
However, not working with current version of
MPICH
Get assertion failure and stops working
Can only use MPI programs with 1 process
UPC?
Tried
GCC-UPC
BUPC (smp pthreads)
Both produced no output or crashed Dynaprof

6
Instrumentation Overhead

Only could instrument one-process MPI code
MPI run wrapper script broken
No PPerf apps! (all require 1 process)
Camel overhead very high
Only instrumented main
LU overhead really low?
Possible causes of overhead
Frequent subroutine calls from main
Use of tsc.h processor counters for timers
confuse Dynaprof
Expect overhead similar to Paradyn
5-10 for most applications with a reasonable
number of instrumentation points

7
Dynaprof Probe Information

Probes perform all data collection and analysis
Provide code to insert into a function when
instrumented
Probes can be called 4 different times
Function entry point
Function exit point
Function call point
Function return point
Each probe is encapsulated in a shared library
Allows relatively easy creation of new probes
Available probes
Wallclock probe (records wall clock time)
PAPI wallclock probe (same as wallclock, uses
high-resolution timers)
PAPI probe (records any PAPI metric, such as
FLOPs)
Specify PAPI metrics as args in use papiprobe
args command
Existing probes provide profile-style data only
Although no reason that a trace could not also be
collected

8
Probe Output

After running, an ASCII file containing raw data
is created
At runtime, a message like output will be in
/home/leko/ will be printed indicating where
file will be
Three programs are provided which analyze the raw
data
wallclockrpt for wall clock probe
papiclockrpt for PAPI wall clock probe
papiproberpt for PAPI probe
Summary statistics are provided
Exclusive profile (metric collected excluding
children)
Inclusive profile (metric collected including
children)
1-call level deep profile (see which functions an
instrumented function called)
Output from rpt programs is simple ASCII (sample
next page)

9
Sample Probe Report (lu.W.1)
leko_at_eta-1 dynaprof wallclockrpt
lu-1.wallclock.16143 Exclusive Profile. Name
Percent Total
Calls ------------- ------- -----
------- TOTAL 100
1.436e11 1 unknown 100
1.436e11 1 main 3.837e-06
5511 1 Inclusive Profile. Name
Percent Total
SubCalls ------------- ------- -----
------- TOTAL 100
1.436e11 0 main 100
1.436e11 5
1-Level Inclusive Call Tree. Parent/-Child
Percent Total
Calls ------------- ------- -----
-------- TOTAL 100
1.436e11 1 main 100
1.436e11 1 - f_setarg.0 1.414e-05
2.03e04 1 - f_setsig.1 1.324e-05
1.902e04 1 - f_init.2 2.569e-05
3.691e04 1 - atexit.3 7.042e-06
1.012e04 1 - MAIN__.4 0
0 1
Note only main was instrumented in this
profiled run
10
Bottleneck Identification Test Suite

Testing metric what did output of probe tell us?
CAMEL FAILED
Instrumenting main caused too much application
perturbation
NAS LU (W workload) TOSS-UP
Given enough time, any bottleneck could be
identified
Even cache miss problems, thanks to PAPI!
But how much time to identify bottlenecks?
Communication problems difficult/impossible to
pinpoint
No tracing
No communication visualization
PPerfMark tests NOT TESTED
Could not evaluate PPerfMark suite (running MPI
commands broken)
However, same comments for LU would probably
apply to all
In general,
Heavily reliant on users proficiency with
pinpointing problems
Incremental approach
Instrument, re-run, instrument w/PAPI, re-run
Process can be tedious
But, ease of instrumentation does ease this

11
Dynaprof General Comments

Good points
Free
Source code available, relatively organized
Good reference on how to use PAPI DynInst API
Very easy to use
Relatively easy to extend
Developer very responsive to questions
Not-so-good points
High instrumentation overhead in a few cases
Simple to understand, but not much available
functionality
Only profiling data with current probes
Not really being updated much any more
Changing program arguments requires reloading
reinstrumenting executable
Dynaprof illustrates that a tool doesnt have to
be ultra-complicated to be useful
KISS!

12
Adding UPC/SHMEM Support

UPC support
Would need to do a ton of work
Best bet
Provide a UPC probe
Instrument known UPC runtime functions
Gasnet functions for Berkeley
Etc.
Need one probe per UPC runtime/compiler
environment
SHMEM support
No extra work necessary!
Handles instrumenting libraries like any other
code

However, a few potential problems
Reliance on DynInst
Hard to port
Hard to compile!
Reliance on PAPI
Can add own probes which do not use PAPI though
Best way to use Dynaprof
Steal ideas on how to make tool extensible
Probes as shared libraries nice idea!
Steal code on how to use DynInst PAPI

13
Evaluation (1)

Available metrics 1/5
Can use PAPI to get lots of data
Limited in what you can collect in a single run,
only
Two PAPI metrics or
Wall clock time
Cost 5/5
Free
Documentation quality 4/5
Minimal documentation, but covers the basics
pretty well
Extensibility 3.5/5
Open source
Can add new functionality by writing new probes
Must write new code to extend (not much existing
functionality)
Filtering and aggregation 2/5
Most program data is filtered out for you
Direct result of profile-nature of current probes
Many times too much information is lost
Filtering and aggregation behavior fixed in
source code of probes

14
Evaluation (2)

Hardware support 3/5
64-bit Linux (Itanium only), Sparc, IRIX,
AlphaServer (Tru64), IBM SP (AIX)
Most everything supported Linux, AIX, IRIX,
HP-UX
Reliance on PAPI and DynInst could hinder porting
No Cray support
Heterogeneity support 0/5 (not supported)
Installation 3/5
Dynaprof easy to compile, but
PAPI and DynInst a nightmare to install
Also had to hack up some source code a bit to
work with newer versions of gcc javac (JDK1.5)
Interoperability 0.5/5
No export interoperability with other tools
There is a half-done TAU probe
Not sure if it works
Or how useful it is!
Learning curve 4/5
Very easy to use
Anyone used to prof/gprof will feel right at home

15
Evaluation (3)

Manual overhead 3/5
Can automatically instrument all functions, a
handful of functions, and all function calls
within a given function
Very easy to choose which functions you want
instrumented
Can script behavior of dynaprof executable
Reinstrumenting requires no recompilation
Measurement accuracy 5/5
For LU, tracing overhead almost negligible using
PAPI probes
Tracing overhead small as long as number of
instrumented functions kept reasonable
Programs correctness of execution not affected
Dynamic instrumentation does not get in
compilers way for optimizations
Multiple executions 0/5
Not supported
Multiple analyses views 1/5
One way of recording data, one way of presenting
it
Probes could theoretically present things
differently, but none currently do

16
Evaluation (4)

Performance bottleneck identification 1/5
No automatic detection
Usefulness of tool directly related to cleverness
of user
Many bottlenecks would be very difficult to
detect with only basic profile information given
by hardware counters only
Profiling/tracing support 2/5
Only supports profiling
Could feasibly add tracing if you wanted to code
Response time 3/5
No data at all until after run has completed and
tracefile has been opened
Generating reports from raw data instantaneous
though
Software support 4.5/5
Can link against (and instrument!!) any existing
library
Supports MPI (although broken) and shared-memory
threaded programs
Source code correlation 2/5
Data reported to user at the function name level
Searching 0/5 (not supported)