Tools for Engineering Analysis of High Performance Parallel Programs - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Tools for Engineering Analysis of High Performance Parallel Programs

Description:

Instrumentation. traces, counters, profiles. Visualization. Examples. AIMS, PTOOLS, PPP ... Integrate our instrumentation and analysis tools with ACTS TAU ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 21
Provided by: DavidE2
Category:

less

Transcript and Presenter's Notes

Title: Tools for Engineering Analysis of High Performance Parallel Programs


1
Tools for Engineering Analysis of High
Performance Parallel Programs
  • David Culler,
  • Frederick Wong, Alan Mainwaring
  • Computer Science Division
  • U.C.Berkeley
  • http//www.cs.berkeley.edu/culler/talks

2
Traditional Parallel Programming Tools
  • Focus on showing what program did and when it
    did it
  • microscopic analysis of deterministic events
  • oriented towards initial development of small
    programs on small data sets and small machines
  • Instrumentation
  • traces, counters, profiles
  • Visualization
  • Examples
  • AIMS, PTOOLS, PPP
  • pablo paradyn ... gt delphi
  • ACTS TAU - tuning and analysis util.

3
Example Pablo
4
Beyond Zeroth-order Analysis
  • Basic level to get to a system design that is
    reasonable and behaves properly under ideal
    condition
  • Subject the system to various stresses to
    understand its operating regime and gain deeper
    insight into its dynamic behavior
  • Combine empirical data with analytical models
  • Iterate
  • from What? to What if?

max displacement
Wind Speed
5
Approach Framework for Parameterized Sensitivity
Analsys
  • framework performs analysis over numerous runs
  • statistical filtering
  • vary parameter of interest
  • provides means of combining data to isolate
    effects of interest
  • gt ROBUSTNESS

Problem Data Set Generator
Well-developed Parallel Program
Instrumentation Tools
Study Parameter
Machine Characterizers
  • Procs
  • Comm. perf.
  • Cache
  • Scheduling
  • ...

visualization, modeling
6
Simplest Example Performance( P )
  • NPB2.2 on NOW and Origin 2000 (250)

7
Where Time is Spent ( P )
  • Reveal basic Processor and network loading (vs P)
  • Basis for model derivation - comm(P)

8
Where Time is Spent ( P ) - cont
  • Reveal basic Processor and network loading (vs P)

9
Communication Volume ( P )
10
Communication Structure ( P )
11
Understanding Efficiency ( P, M )
  • Want to understand both what load the program is
    placing on the system
  • and how well the system is handling that load
  • gt characterize the capability of the system via
    simple benchmarks (rather than advertised peaks)
  • gt combine with measured load for predictive
    model, compare

12
Communication Efficiency
13
Tools gt Improvements in Run Time
  • Efficiency analysis (vs parameters) gives insight
    into where to improve the system or the program
  • use traditional profiling to see where is program
    the bad stuff happens
  • or go back and tune the system to do better

14
Cache Behavior (P, )
  • Combining trace generation with simulation
    provides new structural insight
  • Here clear knees in program working set
    () these shift with machine size (P)

15
Cache Behavior (P, )
  • Clear knees in program working set () not
    affected by P

16
Sensitivity to Multiprogramming
  • Parallel machines are increasingly general
    purpose
  • multiprogramming, at least interrupts and daemons
  • Many ideal programs very sensitive to
    perturbations
  • Msg Passing is loosely coupled, but
    implementation may not be!

17
Tools gt Improvements in Run Time
  • MPI implementation spin-waits on send till
    network available (or queue not full) or on
    recv-complete
  • Should use two-phase spin-block

18
Sensitivity to Seemingly Unrelated Activity
  • The mechanism for doing parameter studies is
    naturally extended to get statistically valid
    data through multiple samples at each point
  • tend to get crisp, fast results in the wee hours
  • Extend study outside the app
  • Example two programs on big Origin
  • alone together on 64 P
  • 8 processor IS run 4.71 sec 6.18
  • 36 processor SP run 26.36 sec 65.28

19
Repeatability
  • The variance for the repeated runs is a key
    result for production codes - the real world is
    not ideal

20
Plans
  • Integrate our instrumentation and analysis tools
    with ACTS TAU
  • port to UCB Millennium environment
  • experiment with ASCI platforms
  • Refine and complete the automated sensitivity
    analysis framework
  • Backend performance data storage
  • Pablo SPPF?
  • Next Year
  • integrate performance model development,
    prediction
Write a Comment
User Comments (0)
About PowerShow.com