Performance Analysis and Optimization through Run-time Simulation and Statistics - PowerPoint PPT Presentation

About This Presentation

Title:

Performance Analysis and Optimization through Run-time Simulation and Statistics

Description:

Title: Performance Optimization through Run-time Analysis Author: Susan Blackford Last modified by: mucci Created Date: 7/9/1998 3:32:08 AM Document presentation format – PowerPoint PPT presentation

Number of Views:133

Avg rating:3.0/5.0

Slides: 31

Provided by: SusanBl1

Learn more at: https://icl.utk.edu

Category:

more less

Transcript and Presenter's Notes

Title: Performance Analysis and Optimization through Run-time Simulation and Statistics

1
Performance Analysis and Optimization through
Run-time Simulation and Statistics

Philip J. Mucci
University Of Tennessee
mucci_at_cs.utk.edu
http//www.cs.utk.edu/mucci

2
Motivation

Tuning real DOD and DOE applications!
Performance on most codes is low.
Poor overall efficiency due to poor single node
performance.
Show good scalability because of the above and
faster interconnects.
The expertise is not there, nor should it be.

3
Description

To use data available at run-time to better
compilation and optimization technology.
Empirically determine how well the code maps to
the underlying architecture.
Bottlenecks can be identified and possibly
corrected by an explicit set of rules and
transformations.

4
Information not being used

Hardware statistics gathered through simulation
or monitoring can identify the problem. (sample
listing)
Cache and branching behavior
Cycle/Load/Store/FLOP counts
Bottleneck determination
Reference pattern
Dynamic memory placement

5
Problem Areas

Efficient use of the memory hierarchy
Register re-use
Aliasing
Inlining
Demotion
Algorithms (iterative vs. direct)

6
Solutions

Understanding (tutorials, reference material)
Tools
Preprocessors
Compilers
Manpower

7
Increasing Cache Performance

How do we better the use of the memory hierarchy?
For computer scientists, its not that hard. We
need the right tools.
How much can we automate?
Through available tools and source analysis we
can usually get down to the function.

8
Cache Simulation

Instrumentation of routines
Run of the executable
Analysis and correlation with source code!
Old idea, new implementation.

9
Cache Simulation

Hardware independence
Information on
Locality
Placement
Reference pattern and Reuse
Line usage

10
Locality

Spatial and Temporal
misses/memory reference
misses/re-use
Conflict vs. Capacity

11
Placement

Padding can be very important
Not always possible to do during static analysis
phase.
Reference pattern can affect padding.

12
Reference Pattern

Again, not always possible to do during static
analysis.
Even harder to analyze when dealing with
pseudo-optimized code.
Examples Stencils, Sparse solvers etc...

13
Reuse

Blocking is critical to applications where there
is re-use.
We need to identify re-use potential, to spot
areas where blocking and register allocation
should be focused on.

14
Source Code Mapping

Most cache tools are hard to use and relate to
the source code.
This tool simulates the cache(s) on each memory
reference and thus can easy correlate the data.
Instrumentation is at the source level, not
object code.

15
Statistics

Global, per file, per statement, per reference
References, misses, cold misses, re-used
references
Conflict/Re-use matrix
M(A,B) x means some element of A ejected some
element of B from the cache x times iff that
element of A has been in the cache before.

16
Development status

GUI for selective instrumentation
Real parsers (F90, C, C)
Better report generation

17
Implementation

Simulator written in C
Instrumentation in Perl
GUI in Java
Report generator in Perl

18
Relevance

Why shouldnt this technology be part of a
feedback loop?
Compile with instrumentation
Run
Recompile with information from the run
Watch input sensitivity issues.

19
Integration

Identifying and correcting poor cache behavior
can be made explicit and part of a compiler.
(Ideally a source-to-source transformer or
preprocessor)
Simulator can stand alone for detailed analysis
and optimization by CS folks.
Our knowledge and expertise made available
through the tools.

20
Hardware Counters