Tools for Engineering Analysis of High Performance Parallel Programs

About This Presentation

Title:

Tools for Engineering Analysis of High Performance Parallel Programs

Description:

Instrumentation. traces, counters, profiles. Visualization. Examples. AIMS, PTOOLS, PPP ... Integrate our instrumentation and analysis tools with ACTS TAU ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 21

Provided by: DavidE2

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Tools for Engineering Analysis of High Performance Parallel Programs

1
Tools for Engineering Analysis of High
Performance Parallel Programs

David Culler,
Frederick Wong, Alan Mainwaring
Computer Science Division
U.C.Berkeley
http//www.cs.berkeley.edu/culler/talks

2
Traditional Parallel Programming Tools

Focus on showing what program did and when it
did it
microscopic analysis of deterministic events
oriented towards initial development of small
programs on small data sets and small machines
Instrumentation
traces, counters, profiles
Visualization
Examples
AIMS, PTOOLS, PPP
pablo paradyn ... gt delphi
ACTS TAU - tuning and analysis util.

3
Example Pablo
4
Beyond Zeroth-order Analysis

Basic level to get to a system design that is
reasonable and behaves properly under ideal
condition
Subject the system to various stresses to
understand its operating regime and gain deeper
insight into its dynamic behavior
Combine empirical data with analytical models
Iterate
from What? to What if?

max displacement
Wind Speed
5
Approach Framework for Parameterized Sensitivity
Analsys

framework performs analysis over numerous runs
statistical filtering
vary parameter of interest
provides means of combining data to isolate
effects of interest
gt ROBUSTNESS

Problem Data Set Generator
Well-developed Parallel Program
Instrumentation Tools
Study Parameter
Machine Characterizers

Procs
Comm. perf.
Cache
Scheduling
...

visualization, modeling
6
Simplest Example Performance( P )

NPB2.2 on NOW and Origin 2000 (250)

7
Where Time is Spent ( P )

Reveal basic Processor and network loading (vs P)
Basis for model derivation - comm(P)

8
Where Time is Spent ( P ) - cont

Reveal basic Processor and network loading (vs P)

9
Communication Volume ( P )
10
Communication Structure ( P )
11
Understanding Efficiency ( P, M )

Want to understand both what load the program is
placing on the system
and how well the system is handling that load
gt characterize the capability of the system via
simple benchmarks (rather than advertised peaks)
gt combine with measured load for predictive
model, compare

12
Communication Efficiency
13
Tools gt Improvements in Run Time

Efficiency analysis (vs parameters) gives insight
into where to improve the system or the program
use traditional profiling to see where is program
the bad stuff happens
or go back and tune the system to do better

14
Cache Behavior (P, )

Combining trace generation with simulation
provides new structural insight
Here clear knees in program working set
() these shift with machine size (P)

15
Cache Behavior (P, )

Clear knees in program working set () not
affected by P

16
Sensitivity to Multiprogramming

Parallel machines are increasingly general
purpose
multiprogramming, at least interrupts and daemons
Many ideal programs very sensitive to
perturbations
Msg Passing is loosely coupled, but
implementation may not be!

17
Tools gt Improvements in Run Time

MPI implementation spin-waits on send till
network available (or queue not full) or on
recv-complete
Should use two-phase spin-block

18
Sensitivity to Seemingly Unrelated Activity

The mechanism for doing parameter studies is
naturally extended to get statistically valid
data through multiple samples at each point
tend to get crisp, fast results in the wee hours
Extend study outside the app
Example two programs on big Origin
alone together on 64 P
8 processor IS run 4.71 sec 6.18
36 processor SP run 26.36 sec 65.28

19
Repeatability