A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - PowerPoint PPT Presentation

About This Presentation
Title:

A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001)

Description:

A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita Nagarajan – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 18
Provided by: anag156
Category:

less

Transcript and Presenter's Notes

Title: A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001)


1
A Dynamic Tracing Mechanism For Performance
Analysis of OpenMP Applications- Caubet,
Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001)
  • - Presented by Anita Nagarajan

2
Introduction
  • OpenMP
  • Standard for shared memory parallel programming
  • Set of directives and library routines for
    Fortran and C/C
  • Performance Tools
  • Need Analyse parallel behaviour. Determine
    causes for OpenMP application performance
    problems.
  • Properties Minimize intrusion cost, maximize
    performance data captured

3
Introduction(Contd.)
  • Dynamic Instrumentation
  • Instrument application while it is executing,
    recompilation not required.
  • Dynamic Probe Class Library(DPCL)
  • Developed at IBM, built on top of the Dyninst
    API.
  • Using DPCL, performance tool attaches to
    application, inserts code patches into the
    binary, starts/continues its execution
  • Program instrumentation can be done at function
    entry points, exit points and call sites.

4
DPCL
  • DPCL consists of
  • Client library
  • Runtime library
  • Daemon
  • Super-daemon

5
OMPtrace
  • Built on top of DPCL
  • IBM compiler translates OpenMP directives into
    function calls.

6
Translation of OpenMP Directives
7
OMPtrace
8
OMPtrace(Contd.)
9
OMPtrace(Contd.)
  • Hardware Counters
  • OMPtrace can access hardware counters, and
    provide statistics of the hardware events.
  • Eg.L1/L2 hits, L1/L2 misses, number of
    instructions
  • Paraver
  • Computes Derived Metrics from hardware events.
  • Eg. L1 misses per second

10
Case Study Sweep3D
  • Multidimensional wavefront algorithm for
    discrete ordinates deterministic particle
    transport simulation.

11
Sweep3D(Contd.)
  • diag - original version of Sweep3D
  • mkj do idiag and do jkm loops replaced by a
    triple nested loop (do m, do k, do j)
  • ccrit - based on mkj, outer loop parallelized,
    synchronization implemented using the CRITICAL
    directive.
  • cpipe based on mkj, outer loop parallelized,
    synchronization implemented using shared arrays
    and busy waiting.

12
Results from Experiments
version 1 2 3 4 5 6 12
Ccrit 28.26 24.41 26.84 26.47 29.28 30.34 30.43
Cpipe 25.63 18.45 13.01 12.53 10.06 7.67 7.76
Diag 17.28 13.09 11.40 9.64 8.50 7.78 6.55
Elapsed time in seconds for the different OpenMP
versions
13
Analysis of Results using Paraver
  • Ccrit
  • Not scalable
  • Overhead of mutex lock and unlock, contention

Red Trying to obtain lock Blue Using lock Green
Release lock Light Blue Execution outside
critical section
14
  • Cpipe
  • Better performance than ccrit.
  • Poor locality because the m loop has an
    iteration count of 6.

15
  • Diag
  • Limited scalability due to high number of L2
    misses

Blue Large values Green Low values
16
Optimization
  • kjmi
  • Interchange loops
  • Good performance, better scalability

1 2 3 4 5 6 12
kjmi 14.86 10.01 7.35 5.82 4.89 3.62 2.88
17
Conclusions
  • OMPtrace and Paraver form a useful tool for
    performance analysis and optimization of OpenMP
    applications.
Write a Comment
User Comments (0)
About PowerShow.com