Summary Presentation (3/24/2005) presentation

About This Presentation

Transcript and Presenter's Notes

Title: Summary Presentation (3/24/2005)

1
Summary Presentation (3/24/2005)

UPC Group
HCS Research Laboratory
University of Florida

2
Performance Analysis Strategies
3
Methods

Three general performance analysis approaches
Performance modeling
Mostly predictive methods
Useful to examine in order to extract important
performance factors
Could also be used in conjunction with
experimental performance measurement
Experimental performance measurement
Strategy used by most modern PATs
Uses actual event measurement to perform the
analysis
Simulation
We will probably not use this approach
See Combined Report - Section 4 for details

4
Performance Modeling
5
Purpose and Method

Purpose
Review of existing performance models and its
applicability to our project
Method
Perform literature search on performance modeling
techniques
Categorize the techniques
Give weight to methods that are easily adapted to
UPC SHMEM
Also more heavily consider methods that give
accurate performance predictions or that help the
user pick between different design strategies
See Combined Report - Section 5 for details

6
Performance Modeling Overview

Why do performance modeling? Several reasons
Grid systems need a way to estimate how long a
program will take (billing/scheduling issues)
Could be used in conjunction with optimization
methods to suggest improvements to user
Also can guide user on what kind of benefit can
be expected from optimizing aspects of code
Figure out how far code is from optimal
performance
Indirectly detect problems if a section of code
is not performing as predicted, it probably has
cache locality problems/etc
Challenge
Many models already exist, with varying degrees
of accuracy and speed
Choose best model to fit in UPC/SHMEM PAT
Existing performance models categorized into
different categories
Formal models (process algebras, petri nets)
General models that provide mental pictures of
hardware/performance
Predictive models that try to estimate timing
information

7
Formal Performance Models

Least useful for our purposes
Formal methods are strongly rooted in math
Can make strong statements and guarantees
However, difficult to adapt and automate for new
programs
Examples include
Petri nets (specialized graphs which represent
processes and systems)
Process algebras (formal algebra for specifying
how parallel processes interact)
Queuing theory (strongly rooted in math)
PAMELA (C-style language to model concurrency and
time-related operations)
For our purposes, formal models are too abstract
to be directly useful

8
General Performance Models

Provide user with mental picture
Rules of thumb for cost of operations
Guides strategies used while creating programs
Usually analytical in nature
Examples include
PRAM (classical model, unit cost operations)
BSP (breaks execution into communication and
computation phases)
LogP (analytical model of network operations)
Many more (see report for details)
For our purposes, general models can be useful
Created to be easily understood by programmer
But, may need lots of adaptation (and model
fitting) to be directly useful

9
Predictive Performance Models

Models that specifically predict performance of
parallel codes
Similar to general models, except meant to be
used with existing systems
Usually a combination of mathematical
models/equations and very simple simulation
Examples include
Lost cycles (samples program state to see if
useful work is being done)
Task graphs (algorithm structure represented with
graphs)
Vienna Fortran Compilation System (uses an
analytical model to parallelize code by examining
cost of operations)
PACE (Geared towards grid applications)
Convolution (Snaveleys method uses a
combination of existing tools to predict
performance based on memory traces and network
traces)
Many more (see report, section 5 for details)
Lost cycles very promising
Provides very easy way to quantify performance
scalability
Needs extension for greater correlation with
source code

10
Experimental Performance Measurement
11
Overview

Instrumentation insertion of instrumentation
code (in general)
Measurement actual measuring stage
Analysis filtering, aggregation, analysis of
data gathered
Presentation display of analyzed data to the
user. The only phase that deals directly with
user
Optimization process of resolving bottleneck

12
Profiling/Tracing Methods
13
Purpose and Method

Purpose
Review on existing profiling and tracing methods
(instrumentation stage) based on experimental
performance measurement
Evaluate the various methods and their
applicability to our PAT
Method
Literature search on profiling and tracing
(include some review of existing tools)
Categorize the methods
Evaluate the applicability of each method toward
design of UPC/SHMEM PAT
Quick overview of method and recommendations
included here
See Combined Report - Section 6.1 for complete
description and recommendations

14
Summary (1)

Overhead
Manual amount of work needed from user
Performance overhead added by tool to program
Profiling / Tracing
Profiling collecting of statistical event data.
Generally refers to filtering and aggregating a
subset of event data after program terminates
Tracing Use to record the majority of events
possible in logical order (generally with
timestamp). Can use to reconstruct accurate
program behavior. Require large amount of storage
2 ways to lower tracing cost (1) compact tract
file format (2) Smart tracing system that turns
on and off
Manual vs. Automatic user/tool that is
responsible for the instrumentation of original
code. Categorization of which event is better
suited for which method is desirable

15
Summary (2)

Number of passes The number of times a program
need to be executed to get performance data. One
pass is desirable for long running program, but
multi-pass can provide more accurate data (ex
first passprofiling, later passtracing using
profiling data to turn on and off tracing).
Hybrid method is available but might not be as
accurate as multi-pass
Levels - need at least source and binary to be
useful (some event more suited for source level
and other binary level)
Source level manual, pre-compiler,
instrumentation language
System level library or compiler
Operating system level
Binary level statically or dynamically

16
Performance Factors
17
Purpose and Method

Purpose
Provide a formal definition of the term
performance factor
Present motivation for calculating performance
factors
Discuss what constitutes a good performance
factor
Introduce a three step approach to determine if a
factor is good
Method
Review and provide a concise summary of the
literature in the area of performance factors for
parallel systems
See Combined Report - Section 6.2 for more details

18
Features of Good Performance Factors

Characteristics of a good performance factor
Reliability
Repeatability
Ease of Measurement
Consistency
Testing
On each platform, determine ease of measurement
Determine repeatability
Determine reliability and consistency by one of
the following
Modify the factor using real hardware
Find justification in the literature
Derive the information from performance models

19
Analysis Strategies
20
Purpose and Method

Purpose
Review of existing analysis and bottleneck
detection methods
Method
Literature search on existing analysis strategies
Categorize the strategies
Examine methods that are applied before, during,
or after execution
Weight post-mortem runtime analysis (most
useful for a PAT)
Evaluate the applicability of each method toward
design of UPC/SHMEM PAT
See Analysis Strategies report for details

21
Analysis Methods

Performance analysis methods
The why of performance tools
Make sense of data collected from tracing or
profiling
Classically performed after trace collection,
before visualization (see right)
But, some strategies choose to do it at other
times and in different ways
Bottleneck detection
Another form of analysis!
Bottleneck detection methods are also shown in
this report
Optimizations also closely related, but discussed
in combined report
Combined Report - Section 6.5

22
When/How to Perform Analysis

Can do at different times
Post-mortem after a program runs
Usually performed in conjunction with tracing
During runtime must be quick, but can guide data
collection
Beforehand work on abstract syntax trees from
parsing source code
But hard to know what will happen at runtime!
Only one existing strategy fit in this category
Also manual vs. automatic
Manual Rely on user to perform actions
e.g., manual post-mortem analysis look at
visualizations and manually determine bottlenecks
User is clever, but hard to scale this analysis
technique
Semi-automatic Perform some work to make users
job easier
e.g., filtering, aggregation, pattern matching
Most techniques try to strike a balance
Too automated can miss stuff (computer is dumb)
Too manual high overhead for user
Can also be used to guide data collection at
runtime
Automatic No existing systems are really fully
automatic

23
Post-mortem

Manual techniques
Types
Let the user figure it out based on
visualizations
Data can be very overwhelming!
Simulation based on collected data at runtime
Traditional analysis techniques (Amdahls law,
isoefficiency)
De-facto standard for most existing tools
Tools Jumpshot, Paraver, VampirTrace, mpiP,
SvPablo
Semi-automated techniques
Let the machine do the hard work
Types
Critical path analysis, phase analysis (IPS-2)
Sensitivity analysis (S-Check)
Automatic event classification (machine learning)
Record overheads predict effect of removing
(Scal-tool, SCALEA)
Knowledge based (Poirot, KAPPA-PI, FINESSE,
KOJAK/EXPERT)
Knowledge representation techniques (ASL, EDL,
EARL)

24
On-line

Manual techniques
Make the user perform analysis during execution
Not a good idea!
Too many things going on
Semi-automated techniques
Try to reduce overhead of full tracing
Look at a few metrics at a time
Most use dynamicdynamic instrumentation
Types
Paradyn-like approach
Start with hypotheses
Use refinements based on data collected at
runtime
Paradyn, Peridot (not implemented?), OPAL
(incremental approach)
Lost cycles (sample program state at runtime)
Trace file clustering

25
Pre-execution

Manual techniques
Simulation modeling (FASE approach at UF, etc.)
Can be powerful, but
Computationally expensive to do for accuracy
High user overhead in creating models
Semi-automated techniques
Hard to analyze a program automatically!
One existing system PPA
Parallel program analyzer
Works on source codes abstract syntax tree
Requires compiler/parsing support
Vaporware?

26
Presentation Methodology
27
Purpose and Method

Purpose
Discuss visualization concepts
Present general approaches for performance
visualization
Summarize a formal user interface evaluation
technique
Discuss the integration of user-feedback into a
graphical interface
Methods
Review and provide a concise summary of the
literature in the area of visualization for
parallel performance data
See Presentation Methodology report for details

28
Summary of Visualizations
Visualization Name Advantages Disadvantages Include in the PAT Used For
Animation Adds another dimension to visualizations CPU intensive Yes Various
Program Graphs (N-ary tree) Built-in zooming Integration of high and low-level data Difficult to see inter-process data Maybe Comprehensive Program Visualization
Gantt Charts (Time histogram Timeline) Ubiquitous Intuitive Not as applicable to shared memory as to message passing Yes Communication Graphs
Data Access Displays (2D array) Provide detailed information regarding the dynamics of shared data Narrow focus Users may not be familiar with this type of visualization Maybe Data Structure Visualization
Kiviat Diagrams Provides an easy way to represent statistical data Can be difficult to understand Maybe Various statistical data (processor utilization, cache miss rates, etc.)
Event Graph Displays (Timeline) Can be used to display multiple data types (event-based) Mostly provides only high-level information Maybe Inter-process dependency
29
Evaluation of User Interfaces

General Guidelines
Visualization should guide, not rationalize
Scalability is crucial
Color should inform, not entertain
Visualization should be interactive
Visualizations should provide meaningful labels
Default visualization should provide useful
information
Avoid showing too much detail
Visualization controls should be simple
GOMS
Goals, Operators, Methods, and Selection Rules
Formal user interface evaluation technique
A way to characterize a set of design decisions
from the point of view of the user
A description of what the user must learn may be
the basis for reference documentation
The knowledge is described in a form that can
actually be executed (there have been several
fairly successful attempts to implement GOMS
analysis in software, ie GLEAN)
There are various incarnations of GOMS with
different assumptions useful for more specific
analyses (KVL, CMN-GOMS, NGOMSL, CPM-GOMS, etc.)

30
Conclusion

Plan for development
Develop a preliminary interface that provides the
functionality required by the user while
conforming to visualization guidelines presented
previously
After the preliminary design is complete, elicit
user feedback
During periods where user contact is unavailable,
we may be able to use GOMS analysis or another
formal interface evaluation technique

31
Usability
32
Purpose and Method

Purpose
Provide a discussion on the factors influencing
the usability of performance tools
Outline how to incorporate user-centered design
into the PAT
Discuss common problems seen in performance tools
Present solutions to these problems
Method
Review and provide a concise summary of the
literature in the area of usability for parallel
performance tools
See Combined Report - Section 6.4.1 for complete
description and reasons behind inclusion of
various criteria

33
Usability Factors

Ease-of-learning
Discussion
Important for attracting new users
A tools interface shapes the users
understanding of its functionality
Inconsistency leads to confusion
Example Providing defaults for some object but
not all
Conclusions
We should strive for internally and externally
consistent tool
Stick to established conventions
Provide as uniform an interface as possible
Target as many platforms as possible so the user
can amortize the time invested over many uses
Ease-of-use
Discussion
Amount of effort required to accomplish work with
the tool
Conclusions
Dont force the user to memorize information
about the interface. Use menus, mnemonics, and
other mechanisms
Provide a simple interface
Make all user-required actions concrete and
logical
Usefulness

34
User-Centered Design

General Principles
Usability will be achieved only if the software
design process is user-driven
Understand the target users
Usability should be the driving factor in tool
design
Four-step model to incorporate user feedback
(Chronological)
Ensure initial functionality is based on user
needs
Solicit input directly from the user
MPI users
UPC/SHMEM users
Meta-user
We cant just go by what we think is useful
Analyze how users identify and correct
performance problems
UPC/SHMEM users primarily
Gain a better idea of how the tool will actually
be used on real programs
Information from users is then presented to the
meta-user for critique/feedback
Develop incrementally
Organize the interface so that the most useful
features are the best supported
User evaluation of preliminary/prototype designs
Maintain a strong relationship with the users
with whom we have access

35
UPC/SHMEM Language Analysis
36
Purpose and Method

Purpose
Determine performance factors purely from the
languages perspective
Correlate performance factors to individual
UPC/SHMEM construct
Method
Come up with a complete and minimal factor list
Analyze the UPC and SHMEM (Quadrics and SGI) spec
Analyze the various implementations
Berkeley Michigan UPC translated file system
code
HP UPC pending until NDA process is completed
GPSHMEM based on system code
See Language Analysis report for complete details

37
Tool Evaluation Strategy
38
Purpose and Method

Purpose
Provide the basis for evaluation of existing tool
Method
Literature search on existing evaluation methods
Categorize, adding and filtering of applicable
criterion
Evaluate the importance of these criterion
Summary table of the final 23 criteria
See Combined Report - Section 9 for complete
description and reasons behind inclusion of
various criteria

39
Feature (section) Description Information to gather Categories Importance Rating
Available metrics (9.2.1.3) Kind of metric/events the tool can tract (ex function, hardware, synchronization) Metrics it can provide (function, hw ) Productivity Critical
Cost (9.1.1) Physical cost for obtaining software, license, etc. How much Miscellaneous Average
Documentation quality (9.3.2) Helpfulness of the document in term of understanding the tool design and its usage (usage more important) Clear document? Helpful document? Miscellaneous Minor
Extendibility (9.3.1) Ease of (1) add new metrics (2) extend to new language, particularly UPC/SHMEM Estimating of how easy it is to extend to UPC/SHMEM How easy is it to add new metrics Miscellaneous Critical
Filtering and aggregation (9.2.3.1) Filtering is the elimination of noise data, aggregation is the combining of data into a single meaningful event. Does it provide filtering? Aggregation? To what degree Productivity, Scalability Critical
Hardware support (9.1.4) Hardware support of the tool Which platforms? Usability, Portability Critical
Heterogeneity support (9.1.5) Heterogeneity deals with the ability to run the tool in a system where nodes have different HW/SW configuration. Support running in a heterogeneous environment? Miscellaneous Minor
40
Installation (9.1.2) Ease of installing the tool How to get the software How hard to install the software Components needed Estimate number of hours needed for installation Usability Minor
Interoperability (9.2.2.2) Ease of viewing result of tool using other tool, using other tool in conjunction with this tool, etc. List of other tools that can be used with this Portability Average
Learning curve (9.1.6) Learning time required to use the tool Estimate learning time for basic set of features and complete set of features Usability, Productivity Critical
Manual overhead (9.2.1.1) Amount of work needed by the user to instrument their program Method for manual instrumentation (source code, instrumentation language, etc) Automatic instrumentation support Usability, Productivity Average
Measurement accuracy (9.2.2.1) Accuracy level of the measurement Evaluation of the measuring method Productivity, Portability Critical
Multiple analyses (9.2.3.2) The amount of post measurement analysis the tool provides. Generally good to have different analyses for the same set of data Provide multiple analyses? Useful analyses? Usability Average
Multiple executions (9.3.5) Tool support for executing multiple program at once Support multiple executions? Productivity Minor ? Average
Multiple views (9.2.4.1) Tools ability to provide different view/presentation for the same set of data Provide multiple views? Intuitive views? Usability, Productivity Critical
41
Performance bottleneck identification (9.2.5.1) Tools ability to identify the point of performance bottleneck and its ability to help resolving the problem Support automatic bottleneck identification? How? Productivity Minor ? Average
Profiling / tracing support (9.2.1.2) Method of profiling/tracing the tool utilize Profiling? Tracing? Trace format Trace strategy Mechanism for turning on and off tracing Productivity, Portability, Scalability Critical
Response time (9.2.6) Amount of time needed before any useful information is feed back to the user after program execution How long does it take to get back useful information Productivity Average
Searching (9.3.6) Tool support for search of particular event or set of events Support data searching? Productivity Minor
Software support (9.1.3) Software support of the tool Libraries it supports Languages it supports Usability, Productivity Critical
Source code correlation (9.2.4.2) Tools ability to correlate event data back to the source code Able to correlate performance data to source code? Usability, Productivity Critical
System stability (9.3.3) Stability of the tool Crash rate Usability, Productivity Average
Technical support (9.3.4) Responsiveness of the tool developer Time to get a response from developer. Quality/usefulness of system messages Usability Minor ? Average
42
Tool Evaluations
43
Purpose and Method

Purpose
Evaluation of existing tools
Method
Pick a set of modern performance tools to
evaluate
Try to pick most popular tools
Also pick tools that are innovative in some form
For each tool, evaluate and score using the
standard set of criteria
Also
Evaluate against a set of programs with known
bottlenecks to test how well each tool helps
improve performance
Attempt to find out which metrics are recorded by
a tool and why
Tools TAU, PAPI, Paradyn, MPE/Jumpshot-4, mpiP,
Vampir/VampirTrace (now Intel cluster tools),
Dynaprof, KOJAK, SvPablo in progress,
MPICL/Paragraph in progress
See Tool Evaluation presentations for complete
evaluation of each tool

44
Instrumentation Methods

Instrumentation methodology
Most tools use the MPI profiling interface
Reduces instrumentation overhead for user and
tool developer
We are exploring ways to create and use something
similar for UPC, SHMEM
A few tools use dynamic, binary instrumentation
Paradyn, Dynaprof examples
Makes things very easy for user, but very
complicated for tool developer
Tools that rely entirely on manual
instrumentation can be very frustrating to use!
We should avoid this by using existing
instrumentation libraries and code from other
projects
Instrumentation overhead
Most tools achieved less than 20 overhead for
default set of instrumentation
Seems to be a likely target we should aim for in
our tool

45
Visualizations

Many tools provide one way of looking at things
Do one thing, but do it well
Can cause problems if performance is hindered due
to something not being shown
Gantt-chart/timeline visualizations most
prevalent
Especially in MPI-specific tools
Tools that allow multiple ways of looking things
can ease analysis
However, too many methods can become confusing
Best to use a few visualizations that display
different information
In general, creating good visualizations not
trivial
Some visualizations that look neat arent
necessarily useful
We should try to export to known formats (Vampir,
etc) to leverage existing tools and code

46
Bottleneck Detection

To test, used PPerfMark
Extension of GrindStone benchmark suite for MPI
applications
Contains short (lt100 lines C code) applications
with obvious bottlenecks
Most tools rely on user to pick out bottlenecks
from visualization
This affects scalability of tool as size of
system increases
Notable exceptions Paradyn, KOJAK
In general, most tools faired well
System time benchmark was hardest to pick out
Tools that lack source code correlation also make
it hard to track down where bottleneck occurs
Best strategy seems to be combination of trace
visualization and automatic analysis

47
Conclusions and Status 1

Completed tasks
Programming practices
Mod 2n inverse, convolution, CAMEL cipher,
concurrent wave equation, depth-first search
Literature searches/preliminary research
Experimental performance measurement techniques
Language analysis for UPC (spec, Berekely,
Michigan) and SHMEM (spec, GPSHMEM, Quadrics
SHMEM, SGI SHMEM)
Optimizations
Performance analysis strategies
Performance factors
Presentation methodologies
Performance modeling and prediction
Creation of tool evaluation strategy
Tool evaluations
Paradyn, TAU, PAPI/Perfometer, MPE/Jumpshot,
Dimemas/Paraver/MPITrace, mpiP, Intel cluster
tools, Dynaprof, KOJAK

48
Conclusions and Status 2

Tasks currently in progress
Finish tool evaluations
SvPablo and MPICL/ParaGraph
Finish up language analysis
Waiting on NDAs for HP UPC
Also on access to a Cray machine
Write tool evaluation and language analysis
report
Creation of high-level PAT design documents
(starting week of 3/28/2005)
Creating a requirements list
Generating a specification for each requirement
Creating a design plan based on the
specifications and requirements
For more information, see PAT Design Plan on
project website

Write a Comment

User Comments (0)

About PowerShow.com

Summary Presentation (3/24/2005) PowerPoint PPT Presentation