Title: The VAMPIR and PARAVER performance analysis tools applied to a wet chemical etching parallel algorithm
1The VAMPIR and PARAVER performance analysis tools
applied to a wet chemical etching parallel
algorithm
- S. Boeriu1 and J.C. Bruch, Jr.2
- 1Center for Computational Science and Engineering
- 2Department of Mechanical and Environmental
Engineering - and Department of Mathematics
- University of California, Santa Barbara
- http//www.engineering.ucsb.edu/hpscicom
2Acknowledgements
This material is based upon work supported by the
National Science Foundation under Grant 0086262.
This research was conducted using the resources
of the San Diego Supercomputer Center. http//www.
npaci.edu/Horizon/guide_linked/bh_tools_txt.html
3Outline of Presentation
- Introduction (Physical problem)
- Problem formulation
- Fixed domain formulation
- Numerical algorithm
- Test case
- Performance tools and considerations
- a. VAMPIR
- b. PARAVER
- Diagnostic example
- Conclusions
4Physical problem
A gap of width 2a and length L is to be
etched in a flat plate. The remainder of
the plate is covered with a protective
(photoresist) layer. Since it is assumed that L
gtgt2a, the problem can be considered as
two-dimensional.
Figure 1. Physical problem
5Simplifying assumptions
- There is no convection in the etching medium
- The etching process is isotropic
- The thickness of the photoresist layer is
infinitely small - Only one component of the etching liquid
determines the process
6Problem formulation
Mathematical model The etching fluid W(t) is
bounded by the outer boundary G1 the
photoresist layer G2(t) and the moving boundary
S(t). D\ W(t) denotes part of the solid.
Figure 2. Side view of physical problem
showing mathematical problem setup.
7Fixed domain formulation
Figure 3. Fixed domain
mathematical formulation.
8Numerical Algorithm
The basic numerical algorithm is
with
in
and
9Numerical algorithm (cont.)
with
in
(the rectangular region of the plates cross
section)
10Test case
Maxrow1 ( of rows in the top
region) 280 Maxcol1 (
of columns in the top region) 321
Maxrow2 ( of rows in the bottom region)
80 Maxcol2 ( of columns in
the bottom region) 161 Maxtime (
of time steps) 5
Dt (size of time steps)
1 q
(successive over-relaxation factor)
1.935 B (non-dimensional number)
10.0
11Domain decomposition
Figure 5. Domain decomposition of mathematical
problem into sixteen subregions showing the flow
of computations.
12Load balancing information for the test case
Processors 2 4 8 16 32 64
Bottom Processors 1 1 1 2 4 8
Bottom Points 12888 12888 12888 6440 3220 1610
Top Processors 1 3 7 14 28 56
Top Points 89880 30174 12840 6420 3210 1605
Diff Points 77000 17294 40 20 10 5
13 Figure 4. Ideal versus
obtained speedup
14 Figure 6. Moving boundaries
at various times.
15Performance tools and considerations
- The parallel program is monitored while
- it is executed. Monitoring produces
- performance data that is interpreted in
- order to reveal areas of poor performance.
- The program is then altered and the
- process is repeated until an acceptable
- level of performance is reached.
16VAMPIR (Visualization and Analysis of MPI
Resources 2.0)
- VAMPIR 2.0 is a post-mortem trace visualization
tool from Pallas GmbH - http//www.pallas.com
-
- It uses the profile extensions to MPI
and - permits analysis of the message events where
- data is transmitted between processors during
- execution of a parallel program. It has a
- convenient user-interface and an excellent
- zooming and filtering. Global displays show
all - selected processes.
-
17- Global Timeline detailed application execution
over time axis - Activity Chart presents per-process profiling
information - Summaric Chart aggregated profiling information
- Communication Statistics message statistics for
each process pair - Global Communication Statistics collective
operations statistics - I/O Statistics MPI I/O operation statistics
- Calling Tree global dynamic calling tree
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27PARAVER(Parallel Program Visualization and
Analysis Tool)
- PARAVER is a flexible parallel program
visualization and analysis tool based on an
easy-to-use Motif GUI (graphical user interface) -
- PARAVER was developed to respond to
the - basic need to have a qualitative perception of
the - application behavior by visual inspection and
then - to be able to focus on the detailed
quantitative - analysis of the problems.
28Paraver (Parallel Program Visualization and
Analysis Tool)
- Powerful flexible parallel program visualization
tool based on an easy-to-use Motif GUI (graphical
user interface) - Developed by
- European Center for Parallelism of
Barcelona (CEPBA) - Universitat Politecnica de Catalunya
- http//www.cepba.upc.es/
-
-
29- Paraver is designed to visualize and analyze
- - Communication and load balance
- - Combining OpenMP and MPI
- - Hardware performance and counters
- Usage
- - Compile programs with special
libraries - - Run programs to produce trace files
- - View and analyze traces
- - Designed to help in program
understanding and optimization -
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Inefficient programming example
- Load imbalance (inefficient memory use)
- Cache misses and page faults
- Stride minimization (efficient memory use)
39Load imbalance - VAMPIR
Figure 7. Load imbalance in the plate region.
40Load imbalance - PARAVER
Figure 8. Load imbalance in the plate region.
41The memory hierarchy
42Array Allocation
43Example of coding
- c-
- c- calculate error in the top region and update
u1old - do 370 i iamnumrows1 1,lastrow
- do 380 j 2,maxcol1
- if (abs(u1new(i,j) -
u1old(i,j)).gt.err)
then - err abs(u1new(i,j) - u1old(i,j))
- endif
- u1old(i,j) u1new(i,j)
- 380 continue
- 370 continue
- c-
c- do 380 j 2,maxcol1 do 370 i
iamnumrows1 1,lastrow if (
abs(u1new(i,j) - u1old(i,j)) .gt. err) then
err abs(u1new(i,j) - u1old(i,j))
endif u1old(i,j)
u1new(i,j) 370 continue 380 continue c-
Figure 9. A piece of the etching code
(non-optimized on the left and optimized on the
right).
44Load balance - VAMPIR
Figure 10. Approximate load balance.
45Load balance - PARAVER
Figure 11. Approximate load balance.
46Conclusions
- A significant factor that affects the
performance of a parallel application is the
balance between communication and workload. The
challenge of the message passing model is in
reducing message traffic over the interconnection
network. To fully understand the - performance behavior of such applications,
analysis and - visualization tools are needed. Two such
tools, VAMPIR - and PARAVER, were used to analyze the
performance of - the etching application. It was seen that
optimization of - the parallel code can be carried out in an
iterative process - involving these tools to investigate
performance issues.
47Web Sites
- Project site
- http//www.engineering.ucsb.edu/hpscicom
- San Diego Supercomputer Center
- http//www.npaci.edu/Horizon/guide_linked/bh_tools
_txt.html - VAMPIR
- http//www.pallas.com
- PARAVER
- http//www.cepba.upc.es/
-
-
-