Performance%20Visualizations%20using%20XML%20Representations

About This Presentation

Title:

Performance%20Visualizations%20using%20XML%20Representations

Description:

(by analyzing the program) How to improve the performance? ... fetching data from fast CPU caches reduces execution time. 5. Overview. Background: ... – PowerPoint PPT presentation

Number of Views:14

Avg rating:3.0/5.0

Slides: 34

Provided by: yij5

Learn more at: http://www.cs.toronto.edu

Category:

more less

Transcript and Presenter's Notes

Title: Performance%20Visualizations%20using%20XML%20Representations

1
Performance Visualizations using XML
Representations

Presented by Kristof BeylsYijun YuErik H.
DHollander

2
Overview

Background program optimization research
XML representations
Visualizations
Conclusion

3
Program optimization research

What slows down a program execution?Need to
pinpoint the performance bottlenecks.(by
analyzing the program)
How to improve the performance?By program
transformations, based on pinpointed bottlenecks.
How to transform the program?
Compileradvantage automatic optimizationdisadva
ntage sometimes hard to understand what program
does
Programmeradvantage has good understanding of
program functionalitydisadvantage requires
human effort / How to
present performance bottlenecks best?
How to construct a research infrastructure that
supports all the above in a common framework? (?
XML)

4
Two main performance factors

Parallelismperforming computation in
parallelreduces execution time
Data localityfetching data from fast CPU caches
reduces execution time

5
Overview

Background program optimization research
XML representations
Visualizations
Conclusion

6
Why XML representations?
yaxx YACC extension to XML oc Omega
calculator isv iteration space visualizer cv
cache (trace) visualizer distv (cache reuse)
distance visualizer

Extensible and versatile
Standard and Interoperable
Language Independent

XMLnamespace (tool) Representing
1. ast (yaxx) abstract syntax tree
2. par (oc) identified parallel or sequential loops
3. trace (isv, cv) execution trace of memory instructions
4. hotspot(isv,cv) performance bottleneck locations
5. isdg (isv) iteration space dependence graph
6. rdv (distv) a reuse distance vector
7
1. AST (Abstract Syntax Tree) (ast)

XML is a good representation for AST by its
hierarchical nature.
ast namespace captures syntactical information of
a program
We can construct AST from source code through
YAXX and regenerate source code through XSLT.

ltastDO_Loopgt ltvar nameI/gt ltlbgtltconst value1/gtlt/lbgt ltubgtltconst value10/gtlt/ubgt ltstgtltconst value1/gtlt/stgt ltbodygtlt/bodygt lt/astDO_Loopgt DO I1,10,1 ENDDO
8
Program optimization research

What slows down a program execution?Need to
pinpoint the performance bottlenecks.(by
analyzing the program)
How to improve the performance?By program
transformations, based on pinpointed bottlenecks.
Who transforms the program?
Compileradvantage automatic optimizationdisadva
ntage sometimes hard to understand what program
does
Programmeradvantage has good understanding of
program functionalitydisadvantage requires
human effort / How to
present performance bottlenecks best?
How to construct a research infrastructure that
supports all the above in a common framework? (?
XML)

9
2. Parallel loops (par)

Identified parallel loop are annotated with a
ltpartrue/gt element in the par namespace.
ltastDO_Loopgt
ltpartrue/gt
lt/astDO_Loopgt
In this way, semantics and syntax information are
in orthogonal name spaces. Syntax-based tools
(e.g. unparser) can still ignore it, or translate
it into directive comments e.g. Fortran CDOALL.

10
XFPT an extended optimizing compiler
11
Program optimization research

What slows down a program execution?Need to
pinpoint the performance bottlenecks.(by
analyzing the program)
How to improve the performance?By program
transformations, based on pinpointed bottlenecks.
Who transforms the program?
Compileradvantage automatic optimizationdisadva
ntage sometimes hard to understand what program
does
Programmeradvantage has good understanding of
program functionalitydisadvantage requires
human effort / How to
present performance bottlenecks best?
How to construct a research infrastructure that
supports all the above in a common framework? (?
XML)

12
3. Traces (trace)

Trace records a sequence of memory address
accesses
lttraceseqgtltaccess addr0x00ffe8 bytes8
/gtltaccess addr0x00fff0 bytes16 /gt
lt/traceseqgt
Trace alone can be used to identify runtime data
dependences and identify cache misses through
cache simulator
Associate an address with the array reference
number or loop iteration index on the programs
AST, the trace can be used for advanced loop
dependence analysis and cache reuse distance
analysis.
lttraceseqgtltaccess addr0x00ffe8 bytes8
hotspotid1gt lt!- The 1st reference --gt
ltdo_loop hotspotid1 vector1 2/gt lt! The
1st DO loop(I,J)(1,2) --gt ltarray
hotspotid1 vector1/gt lt!- Reference to
array element X(1) --gtlt/accessgt
lt/traceseqgt

13
4. Hotspots (hotspot)

Hot spots are identified bottlenecks of the
program
Two types are used
Bottleneck loops tells which loop is the
performance bottlenecks
Bottleneck references tells which references are
performance bottlenecks
lthotspotlistgt
ltdo_loop id1gt
ltindex vectorI J/gt
ltstart lineno3 colno1/gt
ltend lineno7 colno12/gt
lt/do_loopgt
ltarray id2 nameXgt
ltdimgtltlbgt1lt/lbgtltubgt10lt/ubgtlt/dimgt
lt/arraygt
ltreference id1 typeRgt
ltstart lineno5 colno9/gt
ltend lineno5 colno14/gt
lt/referencegt
lt/hotspotlistgt

DIM T(3), X(10)
REAL S, X
DO I 1, 10
DO J 1, 10
S S X(I)J
ENDDO
ENDDO

14
Overview

Background program optimization research
XML representations
Visualizations
Conclusion

15
Program optimization research

What slows down a program execution?Need to
pinpoint the performance bottlenecks.(by
analyzing the program)
How to improve the performance?By program
transformations, based on pinpointed bottlenecks.
Who transforms the program?
Compileradvantage automatic optimizationdisadva
ntage sometimes hard to understand what program
does
Programmeradvantage has good understanding of
program functionalitydisadvantage requires
human effort / How to
present performance bottlenecks best?
How to construct a research infrastructure that
supports all the above in a common framework? (?
XML)

16
Performance Visualizations

XML plays an important role to glue the
visualizers with an optimizing compiler
Loop dependence visualization
Reuse distance visualization
Cache behavior visualization

17
Visualization 1ISDG iteration space dependence
graph

An iteration is an instance of the loop body
statements. An iteration space is the set of
integer vector values of the DO loop index
variables for the traversed iterations.
Loop carried dependence is a dependence caused by
two references R1 and R2 that access to the same
memory address, while
One of R1, R2 is a write
R1 belongs to loop iteration (i1, j1) and R2
belongs to loop iteration (i2, j2) ? (i1,j1)
A ISDG is a graph with nodes representing the
iteration space and edges representing loop
carried dependences.

DO i1,5 DO j1,5 A(i,j) A(i,j1)
ENDDOENDDO

i
5
1
1
5
j
18
The WTCM CFD application

WTCM has a Computational Fluid Dynamics simulator
which involves solving partial differential
equations (PDE) through a Gauss-Siedel solver

3D geometry 1D time
temperature
19
The visualized dependences
20
The loop transformation
A 3-D unimodular transformation is found after
visualizing the 4D loop nest which has 177 array
references at run-time for each iteration. Here
we use a regular shape. The transformation makes
it possible to speed-up the program around N2/6
times where N is the diameter of the geometry.
21
Visualization 2Reuse distances

Reuse distance is the amount of data accessed
before a memory address is reused.
reuse distance gt cache size ? cache miss

22
(No Transcript)
23
Execution time reduction on an Itanium processor
(Spec2000 programs).
24
Visualization 3Cache miss traces
(Tomcatv/Spec95)
White hit
Blue compulsory
Green capacity
Red conflict
56.7
25
4.2 Visualizing hotspots of conflict cache misses
X(I,J1) and X(I,J) has conflict if X has a
dimension (512,512). It is resolved by changing
thedimension to (524, 524). Also known as,
Array Padding
26
4.2 Cache misses trace after array padding, most
spatial locality is exploited, conflict misses
resolved
On Intel 550MHz Pentium III (single CPU), the
measured speedup with VTune gt50
17.2
27
Overview

Background program optimization research
XML representations
Visualizations
Conclusion

28
Conclusion

An existing optimizing compiler FPT was extended
with an extensible XML interface.
The performance factors, in particular loop
parallelism and data locality, were exported from
FPT.
These factors were visualized through
Loop dependence visualizer ISV
Execution trace visualizer CacheVis
Reuse distance visualizer ReuseVis
The programmer can use the visualized feedback to
improve the performance.

29
The End.

Any questions?

30
Program semantics (Software) vs. Architecture
capabilities (Hardware)
Research Area Program Architecture
Parallel Computing Parallelism at Task, Loop, Instruction levels through data dependence analysis Multi-processors (MIMD), pipeline (SIMD), multi-threads, network of workstations (NOW, Grid computing)
Memory-hierarchy Temporal and spatial data locality, data layout, stack reuse distances Cache at level 1, 2, 3, TLB, set associativity, data replacement policy
Visualize them!
31
2. Major Performance factors

Parallelism
Loop dependences
Loop-level parallelism
Instruction-level parallelism
Partition load balance
Data locality
Temporal locality
Spatial locality
CCC (Compulsory, Capacity, Conflict) cache misses
Reuse distances

32
3.6 Cache parameters

To tune different architectural cache
configurations, we represent the cache
parameters cache size, cache line size and set
associativity, into a configuration file in XML.
For example, a 2-level cache is specified as
follows
ltcachehierarchygt
ltparameters level1gt
ltsizegt1024lt/sizegt
ltlinegt32lt/linegt
ltassociativitygt32lt/associativitygt
lt/parametersgt
ltparameters level2gt
ltsizegt65536lt/sizegt
ltlinegt32lt/linegt
ltassociativitygt1lt/associativitygt
lt/parametersgt
lt/cachehierarchygt

33
4.2 Visualizing data locality histogram
distributed over reuse distances

Write a Comment

User Comments (0)

About PowerShow.com

Performance%20Visualizations%20using%20XML%20Representations - PowerPoint PPT Presentation

Performance%20Visualizations%20using%20XML%20Representations

(by analyzing the program) How to improve the performance? ... fetching data from fast CPU caches reduces execution time. 5. Overview. Background: ... – PowerPoint PPT presentation