Dynaprof Evaluation Report - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Dynaprof Evaluation Report

Description:

Camel overhead very high. Only instrumented main. LU overhead really low? ... CAMEL: FAILED. Instrumenting main caused too much application perturbation ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 18
Provided by: dral60
Category:

less

Transcript and Presenter's Notes

Title: Dynaprof Evaluation Report


1
Dynaprof Evaluation Report
  • Adam Leko,
  • Hans Sherburne
  • UPC Group
  • HCS Research Laboratory
  • University of Florida

Color encoding key Blue Information Red
Negative note Green Positive note
2
Basic Information
  • Name Dynaprof
  • Developer Philip Mucci (UTK)
  • Current versions
  • Dynaprof CVS as of 2/21/2005
  • DynInst API v4.1.1 (dependency)
  • PAPI v3.0.7 (dependency)
  • Website http//www.cs.utk.edu/mucci/dynaprof/
  • Contact
  • Philip Mucci (mucci_at_cs.utk.edu)

3
Dynaprof Overview
DynaProf 0.9 Philip J. Mucci, mucci_at_cs.utk.edu,
2000-2003 Provided courtesy of UTK's Innovative
Computing Laboratory. See http//icl.cs.utk.edu
for more information. This is Open Source
Software! (dynaprof)
  • Merges existing tools
  • PAPI
  • DynInst API
  • Command-line tool
  • Dynamically instruments programs at runtime
  • Requires no recompilation!
  • Insert probes at runtime
  • Metrics available
  • Wall clock time
  • Any PAPI metrics
  • Can be extended
  • Only simple GUI available (see right)
  • Just wrapper around command-line version
  • Currently pretty broken

4
Instrumentation Overview
  • Instrumentation very easy
  • Especially for sequential/threaded applications
  • Compile application regularly (-g eases naming
    later)
  • gcc -O3 -g -o camel camel.c
  • Dynaprof commands
  • Load the exe
  • load camel
  • Specify which probe you wish to use
  • use papiprobe args
  • List available functions
  • list camel.c
  • Instrument command
  • All functions in a file instr module camel.c
  • A single function instr function camel.c main
  • Run command
  • continue
  • pauses execution (currently does not
    work)
  • Instrumentation output is produced in an
    additional file (will be shown at runtime)

5
Instrumentation Overview (2)
  • No special commands needed for
  • sequential applications
  • pthread applications
  • MPI not supported directly through command line
  • Wrapper scripts available for MPICH and LAM
  • Dynaprof must be run in batch mode
  • A file containing all instrumentation commands
  • Halts the app before MPI_Init() is called
  • However, not working with current version of
    MPICH
  • Get assertion failure and stops working
  • Can only use MPI programs with 1 process
  • UPC?
  • Tried
  • GCC-UPC
  • BUPC (smp pthreads)
  • Both produced no output or crashed Dynaprof

6
Instrumentation Overhead
  • Only could instrument one-process MPI code
  • MPI run wrapper script broken
  • No PPerf apps! (all require 1 process)
  • Camel overhead very high
  • Only instrumented main
  • LU overhead really low?
  • Possible causes of overhead
  • Frequent subroutine calls from main
  • Use of tsc.h processor counters for timers
    confuse Dynaprof
  • Expect overhead similar to Paradyn
  • 5-10 for most applications with a reasonable
    number of instrumentation points

7
Dynaprof Probe Information
  • Probes perform all data collection and analysis
  • Provide code to insert into a function when
    instrumented
  • Probes can be called 4 different times
  • Function entry point
  • Function exit point
  • Function call point
  • Function return point
  • Each probe is encapsulated in a shared library
  • Allows relatively easy creation of new probes
  • Available probes
  • Wallclock probe (records wall clock time)
  • PAPI wallclock probe (same as wallclock, uses
    high-resolution timers)
  • PAPI probe (records any PAPI metric, such as
    FLOPs)
  • Specify PAPI metrics as args in use papiprobe
    args command
  • Existing probes provide profile-style data only
  • Although no reason that a trace could not also be
    collected

8
Probe Output
  • After running, an ASCII file containing raw data
    is created
  • At runtime, a message like output will be in
    /home/leko/ will be printed indicating where
    file will be
  • Three programs are provided which analyze the raw
    data
  • wallclockrpt for wall clock probe
  • papiclockrpt for PAPI wall clock probe
  • papiproberpt for PAPI probe
  • Summary statistics are provided
  • Exclusive profile (metric collected excluding
    children)
  • Inclusive profile (metric collected including
    children)
  • 1-call level deep profile (see which functions an
    instrumented function called)
  • Output from rpt programs is simple ASCII (sample
    next page)

9
Sample Probe Report (lu.W.1)
leko_at_eta-1 dynaprof wallclockrpt
lu-1.wallclock.16143 Exclusive Profile. Name
Percent Total
Calls ------------- ------- -----
------- TOTAL 100
1.436e11 1 unknown 100
1.436e11 1 main 3.837e-06
5511 1 Inclusive Profile. Name
Percent Total
SubCalls ------------- ------- -----
------- TOTAL 100
1.436e11 0 main 100
1.436e11 5
1-Level Inclusive Call Tree. Parent/-Child
Percent Total
Calls ------------- ------- -----
-------- TOTAL 100
1.436e11 1 main 100
1.436e11 1 - f_setarg.0 1.414e-05
2.03e04 1 - f_setsig.1 1.324e-05
1.902e04 1 - f_init.2 2.569e-05
3.691e04 1 - atexit.3 7.042e-06
1.012e04 1 - MAIN__.4 0
0 1
Note only main was instrumented in this
profiled run
10
Bottleneck Identification Test Suite
  • Testing metric what did output of probe tell us?
  • CAMEL FAILED
  • Instrumenting main caused too much application
    perturbation
  • NAS LU (W workload) TOSS-UP
  • Given enough time, any bottleneck could be
    identified
  • Even cache miss problems, thanks to PAPI!
  • But how much time to identify bottlenecks?
  • Communication problems difficult/impossible to
    pinpoint
  • No tracing
  • No communication visualization
  • PPerfMark tests NOT TESTED
  • Could not evaluate PPerfMark suite (running MPI
    commands broken)
  • However, same comments for LU would probably
    apply to all
  • In general,
  • Heavily reliant on users proficiency with
    pinpointing problems
  • Incremental approach
  • Instrument, re-run, instrument w/PAPI, re-run
  • Process can be tedious
  • But, ease of instrumentation does ease this

11
Dynaprof General Comments
  • Good points
  • Free
  • Source code available, relatively organized
  • Good reference on how to use PAPI DynInst API
  • Very easy to use
  • Relatively easy to extend
  • Developer very responsive to questions
  • Not-so-good points
  • High instrumentation overhead in a few cases
  • Simple to understand, but not much available
    functionality
  • Only profiling data with current probes
  • Not really being updated much any more
  • Changing program arguments requires reloading
    reinstrumenting executable
  • Dynaprof illustrates that a tool doesnt have to
    be ultra-complicated to be useful
  • KISS!

12
Adding UPC/SHMEM Support
  • UPC support
  • Would need to do a ton of work
  • Best bet
  • Provide a UPC probe
  • Instrument known UPC runtime functions
  • Gasnet functions for Berkeley
  • Etc.
  • Need one probe per UPC runtime/compiler
    environment
  • SHMEM support
  • No extra work necessary!
  • Handles instrumenting libraries like any other
    code
  • However, a few potential problems
  • Reliance on DynInst
  • Hard to port
  • Hard to compile!
  • Reliance on PAPI
  • Can add own probes which do not use PAPI though
  • Best way to use Dynaprof
  • Steal ideas on how to make tool extensible
  • Probes as shared libraries nice idea!
  • Steal code on how to use DynInst PAPI

13
Evaluation (1)
  • Available metrics 1/5
  • Can use PAPI to get lots of data
  • Limited in what you can collect in a single run,
    only
  • Two PAPI metrics or
  • Wall clock time
  • Cost 5/5
  • Free
  • Documentation quality 4/5
  • Minimal documentation, but covers the basics
    pretty well
  • Extensibility 3.5/5
  • Open source
  • Can add new functionality by writing new probes
  • Must write new code to extend (not much existing
    functionality)
  • Filtering and aggregation 2/5
  • Most program data is filtered out for you
  • Direct result of profile-nature of current probes
  • Many times too much information is lost
  • Filtering and aggregation behavior fixed in
    source code of probes

14
Evaluation (2)
  • Hardware support 3/5
  • 64-bit Linux (Itanium only), Sparc, IRIX,
    AlphaServer (Tru64), IBM SP (AIX)
  • Most everything supported Linux, AIX, IRIX,
    HP-UX
  • Reliance on PAPI and DynInst could hinder porting
  • No Cray support
  • Heterogeneity support 0/5 (not supported)
  • Installation 3/5
  • Dynaprof easy to compile, but
  • PAPI and DynInst a nightmare to install
  • Also had to hack up some source code a bit to
    work with newer versions of gcc javac (JDK1.5)
  • Interoperability 0.5/5
  • No export interoperability with other tools
  • There is a half-done TAU probe
  • Not sure if it works
  • Or how useful it is!
  • Learning curve 4/5
  • Very easy to use
  • Anyone used to prof/gprof will feel right at home

15
Evaluation (3)
  • Manual overhead 3/5
  • Can automatically instrument all functions, a
    handful of functions, and all function calls
    within a given function
  • Very easy to choose which functions you want
    instrumented
  • Can script behavior of dynaprof executable
  • Reinstrumenting requires no recompilation
  • Measurement accuracy 5/5
  • For LU, tracing overhead almost negligible using
    PAPI probes
  • Tracing overhead small as long as number of
    instrumented functions kept reasonable
  • Programs correctness of execution not affected
  • Dynamic instrumentation does not get in
    compilers way for optimizations
  • Multiple executions 0/5
  • Not supported
  • Multiple analyses views 1/5
  • One way of recording data, one way of presenting
    it
  • Probes could theoretically present things
    differently, but none currently do

16
Evaluation (4)
  • Performance bottleneck identification 1/5
  • No automatic detection
  • Usefulness of tool directly related to cleverness
    of user
  • Many bottlenecks would be very difficult to
    detect with only basic profile information given
    by hardware counters only
  • Profiling/tracing support 2/5
  • Only supports profiling
  • Could feasibly add tracing if you wanted to code
  • Response time 3/5
  • No data at all until after run has completed and
    tracefile has been opened
  • Generating reports from raw data instantaneous
    though
  • Software support 4.5/5
  • Can link against (and instrument!!) any existing
    library
  • Supports MPI (although broken) and shared-memory
    threaded programs
  • Source code correlation 2/5
  • Data reported to user at the function name level
  • Searching 0/5 (not supported)

17
Evaluation (5)
  • System stability 3/5
  • Command-line interface relatively stable
  • pause while running broken in
    command-line
  • GUI severely broken
  • Technical support 4/5
  • Responses from contact within 24 hours
  • Philip Mucci very helpful, knowledgeable
Write a Comment
User Comments (0)
About PowerShow.com