Title: Dynaprof Evaluation Report
1Dynaprof Evaluation Report
- Adam Leko,
- Hans Sherburne
- UPC Group
- HCS Research Laboratory
- University of Florida
Color encoding key Blue Information Red
Negative note Green Positive note
2Basic Information
- Name Dynaprof
- Developer Philip Mucci (UTK)
- Current versions
- Dynaprof CVS as of 2/21/2005
- DynInst API v4.1.1 (dependency)
- PAPI v3.0.7 (dependency)
- Website http//www.cs.utk.edu/mucci/dynaprof/
- Contact
- Philip Mucci (mucci_at_cs.utk.edu)
3Dynaprof Overview
DynaProf 0.9 Philip J. Mucci, mucci_at_cs.utk.edu,
2000-2003 Provided courtesy of UTK's Innovative
Computing Laboratory. See http//icl.cs.utk.edu
for more information. This is Open Source
Software! (dynaprof)
- Merges existing tools
- PAPI
- DynInst API
- Command-line tool
- Dynamically instruments programs at runtime
- Requires no recompilation!
- Insert probes at runtime
- Metrics available
- Wall clock time
- Any PAPI metrics
- Can be extended
- Only simple GUI available (see right)
- Just wrapper around command-line version
- Currently pretty broken
4Instrumentation Overview
- Instrumentation very easy
- Especially for sequential/threaded applications
- Compile application regularly (-g eases naming
later) - gcc -O3 -g -o camel camel.c
- Dynaprof commands
- Load the exe
- load camel
- Specify which probe you wish to use
- use papiprobe args
- List available functions
- list camel.c
- Instrument command
- All functions in a file instr module camel.c
- A single function instr function camel.c main
- Run command
- continue
- pauses execution (currently does not
work) - Instrumentation output is produced in an
additional file (will be shown at runtime)
5Instrumentation Overview (2)
- No special commands needed for
- sequential applications
- pthread applications
- MPI not supported directly through command line
- Wrapper scripts available for MPICH and LAM
- Dynaprof must be run in batch mode
- A file containing all instrumentation commands
- Halts the app before MPI_Init() is called
- However, not working with current version of
MPICH - Get assertion failure and stops working
- Can only use MPI programs with 1 process
- UPC?
- Tried
- GCC-UPC
- BUPC (smp pthreads)
- Both produced no output or crashed Dynaprof
6Instrumentation Overhead
- Only could instrument one-process MPI code
- MPI run wrapper script broken
- No PPerf apps! (all require 1 process)
- Camel overhead very high
- Only instrumented main
- LU overhead really low?
- Possible causes of overhead
- Frequent subroutine calls from main
- Use of tsc.h processor counters for timers
confuse Dynaprof - Expect overhead similar to Paradyn
- 5-10 for most applications with a reasonable
number of instrumentation points
7Dynaprof Probe Information
- Probes perform all data collection and analysis
- Provide code to insert into a function when
instrumented - Probes can be called 4 different times
- Function entry point
- Function exit point
- Function call point
- Function return point
- Each probe is encapsulated in a shared library
- Allows relatively easy creation of new probes
- Available probes
- Wallclock probe (records wall clock time)
- PAPI wallclock probe (same as wallclock, uses
high-resolution timers) - PAPI probe (records any PAPI metric, such as
FLOPs) - Specify PAPI metrics as args in use papiprobe
args command - Existing probes provide profile-style data only
- Although no reason that a trace could not also be
collected
8Probe Output
- After running, an ASCII file containing raw data
is created - At runtime, a message like output will be in
/home/leko/ will be printed indicating where
file will be - Three programs are provided which analyze the raw
data - wallclockrpt for wall clock probe
- papiclockrpt for PAPI wall clock probe
- papiproberpt for PAPI probe
- Summary statistics are provided
- Exclusive profile (metric collected excluding
children) - Inclusive profile (metric collected including
children) - 1-call level deep profile (see which functions an
instrumented function called) - Output from rpt programs is simple ASCII (sample
next page)
9Sample Probe Report (lu.W.1)
leko_at_eta-1 dynaprof wallclockrpt
lu-1.wallclock.16143 Exclusive Profile. Name
Percent Total
Calls ------------- ------- -----
------- TOTAL 100
1.436e11 1 unknown 100
1.436e11 1 main 3.837e-06
5511 1 Inclusive Profile. Name
Percent Total
SubCalls ------------- ------- -----
------- TOTAL 100
1.436e11 0 main 100
1.436e11 5
1-Level Inclusive Call Tree. Parent/-Child
Percent Total
Calls ------------- ------- -----
-------- TOTAL 100
1.436e11 1 main 100
1.436e11 1 - f_setarg.0 1.414e-05
2.03e04 1 - f_setsig.1 1.324e-05
1.902e04 1 - f_init.2 2.569e-05
3.691e04 1 - atexit.3 7.042e-06
1.012e04 1 - MAIN__.4 0
0 1
Note only main was instrumented in this
profiled run
10Bottleneck Identification Test Suite
- Testing metric what did output of probe tell us?
- CAMEL FAILED
- Instrumenting main caused too much application
perturbation - NAS LU (W workload) TOSS-UP
- Given enough time, any bottleneck could be
identified - Even cache miss problems, thanks to PAPI!
- But how much time to identify bottlenecks?
- Communication problems difficult/impossible to
pinpoint - No tracing
- No communication visualization
- PPerfMark tests NOT TESTED
- Could not evaluate PPerfMark suite (running MPI
commands broken) - However, same comments for LU would probably
apply to all - In general,
- Heavily reliant on users proficiency with
pinpointing problems - Incremental approach
- Instrument, re-run, instrument w/PAPI, re-run
- Process can be tedious
- But, ease of instrumentation does ease this
11Dynaprof General Comments
- Good points
- Free
- Source code available, relatively organized
- Good reference on how to use PAPI DynInst API
- Very easy to use
- Relatively easy to extend
- Developer very responsive to questions
- Not-so-good points
- High instrumentation overhead in a few cases
- Simple to understand, but not much available
functionality - Only profiling data with current probes
- Not really being updated much any more
- Changing program arguments requires reloading
reinstrumenting executable - Dynaprof illustrates that a tool doesnt have to
be ultra-complicated to be useful - KISS!
12Adding UPC/SHMEM Support
- UPC support
- Would need to do a ton of work
- Best bet
- Provide a UPC probe
- Instrument known UPC runtime functions
- Gasnet functions for Berkeley
- Etc.
- Need one probe per UPC runtime/compiler
environment - SHMEM support
- No extra work necessary!
- Handles instrumenting libraries like any other
code
- However, a few potential problems
- Reliance on DynInst
- Hard to port
- Hard to compile!
- Reliance on PAPI
- Can add own probes which do not use PAPI though
- Best way to use Dynaprof
- Steal ideas on how to make tool extensible
- Probes as shared libraries nice idea!
- Steal code on how to use DynInst PAPI
13Evaluation (1)
- Available metrics 1/5
- Can use PAPI to get lots of data
- Limited in what you can collect in a single run,
only - Two PAPI metrics or
- Wall clock time
- Cost 5/5
- Free
- Documentation quality 4/5
- Minimal documentation, but covers the basics
pretty well - Extensibility 3.5/5
- Open source
- Can add new functionality by writing new probes
- Must write new code to extend (not much existing
functionality) - Filtering and aggregation 2/5
- Most program data is filtered out for you
- Direct result of profile-nature of current probes
- Many times too much information is lost
- Filtering and aggregation behavior fixed in
source code of probes
14Evaluation (2)
- Hardware support 3/5
- 64-bit Linux (Itanium only), Sparc, IRIX,
AlphaServer (Tru64), IBM SP (AIX) - Most everything supported Linux, AIX, IRIX,
HP-UX - Reliance on PAPI and DynInst could hinder porting
- No Cray support
- Heterogeneity support 0/5 (not supported)
- Installation 3/5
- Dynaprof easy to compile, but
- PAPI and DynInst a nightmare to install
- Also had to hack up some source code a bit to
work with newer versions of gcc javac (JDK1.5) - Interoperability 0.5/5
- No export interoperability with other tools
- There is a half-done TAU probe
- Not sure if it works
- Or how useful it is!
- Learning curve 4/5
- Very easy to use
- Anyone used to prof/gprof will feel right at home
15Evaluation (3)
- Manual overhead 3/5
- Can automatically instrument all functions, a
handful of functions, and all function calls
within a given function - Very easy to choose which functions you want
instrumented - Can script behavior of dynaprof executable
- Reinstrumenting requires no recompilation
- Measurement accuracy 5/5
- For LU, tracing overhead almost negligible using
PAPI probes - Tracing overhead small as long as number of
instrumented functions kept reasonable - Programs correctness of execution not affected
- Dynamic instrumentation does not get in
compilers way for optimizations - Multiple executions 0/5
- Not supported
- Multiple analyses views 1/5
- One way of recording data, one way of presenting
it - Probes could theoretically present things
differently, but none currently do
16Evaluation (4)
- Performance bottleneck identification 1/5
- No automatic detection
- Usefulness of tool directly related to cleverness
of user - Many bottlenecks would be very difficult to
detect with only basic profile information given
by hardware counters only - Profiling/tracing support 2/5
- Only supports profiling
- Could feasibly add tracing if you wanted to code
- Response time 3/5
- No data at all until after run has completed and
tracefile has been opened - Generating reports from raw data instantaneous
though - Software support 4.5/5
- Can link against (and instrument!!) any existing
library - Supports MPI (although broken) and shared-memory
threaded programs - Source code correlation 2/5
- Data reported to user at the function name level
- Searching 0/5 (not supported)
17Evaluation (5)
- System stability 3/5
- Command-line interface relatively stable
- pause while running broken in
command-line - GUI severely broken
- Technical support 4/5
- Responses from contact within 24 hours
- Philip Mucci very helpful, knowledgeable