PPT – A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) PowerPoint presentation

About This Presentation

Title:

A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001)

Description:

A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita Nagarajan – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 18

Provided by: anag156

Learn more at: https://arcb.csc.ncsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001)

1
A Dynamic Tracing Mechanism For Performance
Analysis of OpenMP Applications- Caubet,
Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001)

- Presented by Anita Nagarajan

2
Introduction

OpenMP
Standard for shared memory parallel programming
Set of directives and library routines for
Fortran and C/C
Performance Tools
Need Analyse parallel behaviour. Determine
causes for OpenMP application performance
problems.
Properties Minimize intrusion cost, maximize
performance data captured

3
Introduction(Contd.)

Dynamic Instrumentation
Instrument application while it is executing,
recompilation not required.
Dynamic Probe Class Library(DPCL)
Developed at IBM, built on top of the Dyninst
API.
Using DPCL, performance tool attaches to
application, inserts code patches into the
binary, starts/continues its execution
Program instrumentation can be done at function
entry points, exit points and call sites.

4
DPCL

DPCL consists of
Client library
Runtime library
Daemon
Super-daemon

5
OMPtrace

Built on top of DPCL
IBM compiler translates OpenMP directives into
function calls.

6
Translation of OpenMP Directives
7
OMPtrace
8
OMPtrace(Contd.)
9
OMPtrace(Contd.)

Hardware Counters
OMPtrace can access hardware counters, and
provide statistics of the hardware events.
Eg.L1/L2 hits, L1/L2 misses, number of
instructions
Paraver
Computes Derived Metrics from hardware events.
Eg. L1 misses per second

10
Case Study Sweep3D

Multidimensional wavefront algorithm for
discrete ordinates deterministic particle
transport simulation.

11
Sweep3D(Contd.)

diag - original version of Sweep3D
mkj do idiag and do jkm loops replaced by a
triple nested loop (do m, do k, do j)
ccrit - based on mkj, outer loop parallelized,
synchronization implemented using the CRITICAL
directive.
cpipe based on mkj, outer loop parallelized,
synchronization implemented using shared arrays
and busy waiting.

12
Results from Experiments
version 1 2 3 4 5 6 12
Ccrit 28.26 24.41 26.84 26.47 29.28 30.34 30.43
Cpipe 25.63 18.45 13.01 12.53 10.06 7.67 7.76
Diag 17.28 13.09 11.40 9.64 8.50 7.78 6.55
Elapsed time in seconds for the different OpenMP
versions
13
Analysis of Results using Paraver