Application Performance Profiling and Prediction in Grid Environment - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Application Performance Profiling and Prediction in Grid Environment

Description:

Grid Enablement of Weather Research and Forecasting Code (WRF) ... Makeover. Amon Output Process. Aprof Input Process --- (464) --- name: wrf.exe. cpus: 4 ... – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 36

Provided by: marlon5

Category:

more less

Transcript and Presenter's Notes

Title: Application Performance Profiling and Prediction in Grid Environment

1
Application Performance Profiling and Prediction
in Grid Environment

Presented by Marlon Bright
1 August 2008
Advisor Masoud Sadjadi, Ph.D.
REU Florida International University

2
Outline

Grid Enablement of Weather Research and
Forecasting Code (WRF)
Profiling and Prediction Tools
Research Goals
Project Timeline
Project Status
Challenges Overcome
Remaining Work

3
Motivation Weather Research and Forecasting
Code (WRF)

Goal Improved Weather Prediction
Accurate and Timely Results
Precise Location Information
WRF Status
Over 160,000 lines (mostly FORTRAN and C)
Single Machine/Cluster compatible
Single Domain
Fine Resolution -gt Resource Requirements
How to Overcome this?
Through Grid Enablement
Expected Benefits to WRF
More available resources Different Domains
Faster results
Improved Accuracy

4
System Overview

Web-Based Portal
Grid Middleware (Plumbing)
Job-Flow Management
Meta-Scheduling
Performance Prediction
Profiling and Benchmarking
Development Tools and Environments
Transparent Grid Enablement (TGE)
TRAP Static and Dynamic adaptation of programs
TRAP/BPEL, TRAP/J, TRAP.NET, etc.
GRID superscalar Programming Paradigm for
parallelizing a sequential application
dynamically in a Computational Grid

5
Performance Prediction

IMPORTANT part of Meta-Scheduling
Allows for
Optimal usage of grid resources through smarter
meta-scheduling
Many users overestimate job requirements
Reduced idle time for compute resources
Could save costs and energy
Optimal resource selection for most expedient job
return time
Tools Amon /Aprof and Paraver/Dimemas

6
Research Goals

Extend Amon/Aprof research to larger number of
nodes, different archtitecture, and different
version of WRF (Version 2.2.1).
Compare/contrast Aprof predictions to Dimemas
predictions in terms of accuracy and prediction
computation time.
Analyze if/how Amon/Aprof could be used in
conjunction with Dimemas/Paraver for optimized
application performance prediction and,
ultimately, meta-scheduling

7
Timeline

End of June
Get MPItrace linking properly with WRF Version
Compiled on GCB, then Mind COMPLETE
a) Install Amon and Aprof on MareNostrum and
ensure proper functioning AMON COMPLETE APROF
FINAL STAGES
b) Run Amon benchmarks on MareNostrum COMPLETE
Early/Mid July
Use and analyze Aprof predictions within
MareNostrum (and possibly between MareNostrum,
GCB, and Mind) MN COMPLETE
Use generated MPI/ OpenMP tracefiles
(Paraver/Dimemas) to predict within (and possibly
between) Mind, GCB, and MareNostrum IN PROGRESS
Late July/Early August
Experiment with how well Amon and Aprof relate
to/could possibly be combined with Dimemas IN
PROGRESS
Compose paper presenting significant findings. IN
PROGRESS
Analyze how findings relate to bigger picture.
Make optimizations on grid-enablement of WRF.

8
The ToolsAmon / Aprof Dimemas / Paraver
9
Amon / Aprof

Amon monitoring program that runs on each
compute node recording new processes
Aprof regression analysis program running on
head node receives input from Amon to make
execution time predictions (within cluster
between clusters)

10
Amon / Aprof Monitoring and Prediction
11
Amon / Aprof Approach to Modeling Resource Usage
WRF
Network Latency
CPU Speed
Hard Disk I/O
Network Bandwidth
Number of Nodes
FSB Bandwidth
RAM Size
L2 Cache
Application Resource Usage Model
12
Previous Findings for Amon / Aprof

Experiments were performed on two clusters at
FIUMind (16 nodes) and GCB (8 nodes)
Experiments were run to predict for different
number of nodes and cpu loads (i.e. 2,3,,14,15
and 20, 30,,90, 100)
Aprof predictions were within 10 error versus
actual recorded runtimes within Mind and GCB and
between Mind and GCB
Conclusion first step assumption was valid. -gt
Move to extending research to higher number of
nodes.

13
Howd they do that?

Developed a benchmarking script that edits and
submits a job file to MareNostrum (MN) scheduler
Runs for each number of nodes (8, 16, 32, 64, 96,
128)
Runs for each cpu percentage (100, 75, 50, 25)
Records execution time, average cpu utilization,
participating nodes, etc.
Job file
Requests desired number of nodes from MN
Starts Amon on each returned node to monitor and
return processes
Starts cpulimit on each returned node limiting
the effective power given to the WRF process
Executes WRF as parallel job across the returned
nodes
Developed modification script
Combines Amon output to one file
Filters processes to solely WRF processes
Edits processes to Aprof friendly format

14
Howd they do that ? (contd)

Start Aprof loading input file as data
Executed Aprof Query Automation script
Starts telnet session querying Aprof for
benchmarked scenarios
Compares predicted values to actual values
returned in run
Outputs text file and graphing plot file of
comparison statistics

15
Experimental Process
16
Extreme? Makeover

--- (464) ---
name wrf.exe
cpus 4
cpu MHz 1/0.000 MHz
cache size 1/0 KB
elapsed time 957952 msec
utime 956370 msec 957810 msec
stime 570 msec 860 msec
intr 18783
ctxt switch 58290
fork 95
storage R 0 blocks 0 blocks
storage W 0 blocks
network Rx 19547308 bytes
network Tx 1434925 bytes

--- (464) ---
name wrf.exe
inv cpu 1/16
inv clock 1/574
cache size 1/1024 KB
elapsed time 1990992 msec
inv clockcpu 1/(36763)
Why Version of Linux on MN does not report some
characteristics (i.e. cache size). From its
initial design, Amon reports in different format
than Aprof reads.

Amon Output Process

Aprof Input Process

17
Aprof Prediction

name wrf.exe
elapsed time
5.783787e06
explanatory value parameter
std.dev
----------------- ------------- -------------
-------------
1.000000e00
5.783787e06 1.982074e05
predicted value residue rms
std.dev
----------------- ------------- -------------
-------------
elapsed time 5.783787e06 4.246451e06
1.982074e05

18
Query Automation Script Output

adj. cpu speed, processors, actual, predicted,
rms, std. dev, actual difference,
3591.363, 1, 5222, 5924.82, 1592.459, 415.3491,
13.4588280352
3591.363, 2, 2881, 3246.283, 1592.459, 181.5382,
12.6790350573
3591.363, 3, 2281, 2353.438, 1592.459, 105.334,
3.17571240684
3591.363, 4, 1860, 1907.015, 1592.459, 69.19778,
2.52768817204
3591.363, 5, 1681, 1639.161, 1592.459, 49.83672,
2.48893515764
3591.363, 6, 1440, 1460.592, 1592.459, 39.5442,
1.43
3591.363, 7, 1380, 1333.043, 1592.459, 34.76459,
3.40268115942
3591.363, 8, 1200, 1237.381, 1592.459, 33.27651,
3.11508333333
3591.363, 9, 1200, 1162.977, 1592.459, 33.56231,
3.08525
3591.363, 10, 1080, 1103.454, 1592.459, 34.68943,
2.17166666667
3591.363, 11, 1200, 1054.753, 1592.459, 36.15324,
12.1039166667
3591.363, 12, 1080, 1014.169, 1592.459, 37.70271,
6.09546296296
3591.363, 13, 1200, 979.8292, 1592.459, 39.22018,
18.3475666667
3591.363, 14, 1021, 950.3947, 1592.459, 40.65455,
6.91530852106
3591.363, 15, 1020, 924.8848, 1592.459, 41.9872,
9.32501960784

19
Paraver / Dimemas

Dimemas - simulation tool for the parametric
analysis of the behavior of message-passing
applications on a configurable parallel platform.
Paraver tool that allows for performance
visualization and analysis of trace files
generated from actual executions and by Dimemas
Tracefiles generated by MPItrace that is linked
into execution code

20
Dimemas Simulation Process Overview

Link MPItrace into application source
codedynamically generates tracefiles for each
node application running on
Identify computation iterations in Paraver
compose a smaller trace file by selecting a few
iterations, preserving communications and
eliminating initialization phases
Convert the new tracefile to Dimemas format
(.trf) using CEPBA provided prv2trf tool
Load tracefile into Dimemas simulator, configure
target machine, and with information generate
Dimemas configuration file
Call simulator with or without option of
generating a Paraver (.prv) tracefile for
viewing.

21
Paraver/Dimemas DiP Environment
22
Howd they do that?

Generated Paraver tracefiles, Dimemas tracefiles,
and simulation configuration files for each
number of nodes
Developed Dimemas simulation script
simulation_automater.sh
Selects configuration file for desired number of
nodes
Edits configuration file for desired cpu
percentage
Records execution time, average cpu utilization
Finalizing development of prediction validation
script. Will
Compare Dimemas predicted values to actual run
values
Outputs text file and graphing plot file of
comparison statistics

23
Dimemas Prediction

Execution time 36.354146
Speedup 5.34
CPU Time 194.066431
Id. Computation time Communication
1 31.224017 91.21 3.008552
2 20.089440 78.20 5.599083
3 19.305673 76.84 5.818317
4 28.672368 83.27 5.762332
5 29.058603 85.36 4.982049
6 19.488003 77.63 5.614155
7 18.727851 78.57 5.108366
8 27.500476 84.29 5.123971
Id. Mess.sent Bytes sent Immediate
recv Waiting recv Bytes re

cv Coll.op. Block time
Comm. time Wait link time Wait bus

es time I/O time
1 7.577000e03 1.583659e08
3.539000e03 4.080000e03 1.671666

e08 1.475000e03 0.247092
0.383663 0.319859 0.000000

0.000000
2 8.948000e03 2.200029e08
8.797000e03 1.440000e02 2.186629

e08 1.475000e03 3.710867
0.383663 0.098868 0.000000

0.000000
3 8.948000e03 2.176712e08
6.904000e03 2.037000e03 2.163992

e08 1.475000e03 3.453668
0.383663 0.243052 0.000000

0.000000

24
Project Status
25
Amon / Aprof

Software installed and tailored to MareNostrum
Proficient in executing software
Amon benchmarking completed
Aprof query automation complete and results
generated
Lessons learned on extending Amon/Aprof to
different architecture

26
Dimemas / Paraver

Proficient in executing software
Paraver and Dimemas tracefiles generated for each
number of nodes (8, 16, 32, 64, 96, 128)
Benchmarking script complete
Simulations generated
Comparison script being finalized

27
Quick Comparison

Amon / Aprof

Dimemas / Paraver

Pros
Simpler to deploy in comparison
Scalability of model is promising with first
results
Feasible solution for performance prediction
purposes
Cons
Requires more base executions for accurate
performance in comparison

Pros
More featurescould be more useful to experienced
user (i.e. adjustment of system characteristics)
Visualization and analysis of execution for
analysis purposes
Graphical User Interface
Cons
Requires special compilation of applications
Requires non-trivial-to-install kernel patch
Large tracefiles (sometimes gigabytes)

28
Aprof Results 100 CPU Utilization
29
Aprof Results 100 CPU Utilization
30
Significant Challenges Overcome

Amon
Adjustment of source code to proper functioning
on MareNostrum (MN)
Development of benchmarking script to conform to
system architecture of MareNostrum (i.e. going
through its scheduler one process per node
etc.)
Proper functioning of CPU limit for accurate cpu
percentage
Job termination by MN Scheduler due to execution
surpassing wall clock limit

31
Significant Challenges Overcome(contd)

Aprof
Adjustment of source code for less complex, more
consistent data input
Development of prediction and comparison scripts
for MareNostrum
Dimemas/Paraver
MPItrace properly linked in with WRF on GCB and
Mind
Generation of trace and configuration files
WRF
Version 2.2 installed and compiled on Mind

32
Challenges Remaining

Lengthy Amon benchmarking runs due to job times
spent in queue
Complexities in preparing Dimemas tracefiles for
simulation purposes
Extracting accurate predictions from Dimemas
trace files are reduced in order to speed up
prediction process therefore, predicted times
must be multiplied by a determined factor

33
Remaining Work

Next Week
Finalizing scripting of Dimemas prediction
simulations for the same scenarios of those of
Amon and Aprof
Fall 2008
Experiment with how well Amon and Aprof relate
to/could possibly be combined with Dimemas
Decide if and how to compare results from
MareNostrum, GCB, and Mind (i.e. the same
versions of WRF would have to be running in all
three locations)
Compose paper presenting significant results and
submit paper to conference.
Future Work
Work with metascheduling team on implementation
of tools.

34
References

S. Masoud Sadjadi, Liana Fong, Rosa M. Badia,
Javier Figueroa, Javier Delgado, Xabriel J.
Collazo-Mojica, Khalid Saleem, Raju Rangaswami,
Shu Shimizu, Hector A. Duran Limon, Pat Welsh,
Sandeep Pattnaik, Anthony Praino, David Villegas,
Selim Kalayci, Gargi Dasgupta, Onyeka Ezenwoye,
Juan Carlos Martinez, Ivan Rodero, Shuyi Chen,
Javier Muñoz, Diego Lopez, Julita Corbalan, Hugh
Willoughby, Michael McFail, Christine Lisetti,
and Malek Adjouadi. Transparent grid enablement
of weather research and forecasting. In
Proceedings of the Mardi Gras Conference 2008 -
Workshop on Grid-Enabling Applications, Baton
Rouge, Louisiana, USA, January 2008.
http//www.cs.fiu.edu/sadjadi/Presentations/Mardi
-Gras-GEA-2008-TGE-WRF.ppt

S. Masoud Sadjadi, Shu Shimizu, Javier Figueroa,
Raju Rangaswami, Javier Delgado, Hector Duran,
and Xabriel Collazo. A modeling approach for
estimating execution time of long-running
scientific applications. In Proceedings of the
22nd IEEE International Parallel Distributed
Processing Symposium (IPDPS-2008), the Fifth
High-Performance Grid Computing Workshop
(HPGC-2008), Miami, Florida, April 2008.
http//www.cs.fiu.edu/sadjadi/Presentations/HPGC-
2008-WRF20Modeling20Paper20Presentationl.ppt
Performance/Profiling. Presented by Javier
Figueroa in Special Topics in Grid Enablement of
Scientific Applications Class. 13 May 2008

35
Acknowledgements