Application Performance Profiling and Prediction in Grid Environment - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Application Performance Profiling and Prediction in Grid Environment

Description:

Use CEPBA tool mpi2prv' to convert .mpit files into one .prv file ... (to be used as Aprof input) to organized results directory of .../ cpu load ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 33
Provided by: marlon5
Category:

less

Transcript and Presenter's Notes

Title: Application Performance Profiling and Prediction in Grid Environment


1
Application Performance Profiling and Prediction
in Grid Environment
  • Presented by Marlon Bright
  • 14 July 2008
  • Advisor Masoud Sadjadi, Ph.D.
  • REU Florida International University

2
Outline
  • Grid Enablement of Weather Research and
    Forecasting Code (WRF)
  • Profiling and Prediction Tools
  • Research Goals
  • Project Timeline
  • Current Progress
  • Challenges
  • Remaining Work

3
Motivation Weather Research and Forecasting
Code (WRF)
  • Goal Improved Weather Prediction
  • Accurate and Timely Results
  • Precise Location Information
  • WRF Status
  • Over 160,000 lines (mostly FORTRAN and C)
  • Single Machine/Cluster compatible
  • Single Domain
  • Fine Resolution -gt Resource Requirements
  • How to Overcome this?
  • Through Grid Enablement
  • Expected Benefits to WRF
  • More available resources Different Domains
  • Faster results
  • Improved Accuracy

4
System Overview
  • Web-Based Portal
  • Grid Middleware (Plumbing)
  • Job-Flow Management
  • Meta-Scheduling
  • Performance Prediction
  • Profiling and Benchmarking
  • Development Tools and Environments
  • Transparent Grid Enablement (TGE)
  • TRAP Static and Dynamic adaptation of programs
  • TRAP/BPEL, TRAP/J, TRAP.NET, etc.
  • GRID superscalar Programming Paradigm for
    parallelizing a sequential application
    dynamically in a Computational Grid

5
Performance Prediction
  • IMPORTANT part of Meta-Scheduling
  • Allows for
  • Optimal usage of grid resources through smarter
    meta-scheduling
  • Many users overestimate job requirements
  • Reduced idle time for compute resources
  • Could save costs and energy
  • Optimal resource selection for most expedient job
    return time

6
The ToolsAmon / Aprof Dimemas / Paraver
7
Amon / Aprof
  • Amon monitoring program that runs on each
    compute node recording new processes
  • Aprof regression analysis program running on
    head node receives input from Amon to make
    execution time predictions (within cluster
    between clusters)

8
Amon / Aprof Monitoring and Prediction
9
Amon / Aprof Approach to Modeling Resource Usage
WRF
Network Latency
CPU Speed
Hard Disk I/O
Network Bandwidth
Number of Nodes
FSB Bandwidth
RAM Size
L2 Cache
Application Resource Usage Model
10
Sample Amon Output Process
  • --- (464) ---
  • name wrf.exe
  • cpus 8
  • inv clock 1/2297.700 MHz
  • inv cache size 1/1024 KB
  • elapsed time 1234232 msec
  • utime 1233890 msec 1236360
    msec
  • stime 560 msec 1420
    msec
  • intr 44959
  • ctxt switch 84394
  • fork 89
  • storage R 0 blocks 0
    blocks
  • storage W 0
    blocks
  • network Rx 4188840 bytes
  • network Tx 2106854 bytes

11
Sample Aprof Output
  • name wrf_arw_DM.exe
  • elapsed time
  • 5.783787e06

  • explanatory value parameter
    std.dev
  • ----------------- ------------- -------------
    -------------
  • 1.000000e00
    5.783787e06 1.982074e05

  • predicted value residue rms
    std.dev
  • ----------------- ------------- -------------
    -------------
  • elapsed time 5.783787e06 4.246451e06
    1.982074e05


12
Sample Query Automation Script Output
  • adj. cpu speed, processors, actual, predicted,
    rms, std. dev, actual difference,
  • 3591.363, 1, 5222, 5924.82, 1592.459, 415.3491,
    13.4588280352
  • 3591.363, 2, 2881, 3246.283, 1592.459, 181.5382,
    12.6790350573
  • 3591.363, 3, 2281, 2353.438, 1592.459, 105.334,
    3.17571240684
  • 3591.363, 4, 1860, 1907.015, 1592.459, 69.19778,
    2.52768817204
  • 3591.363, 5, 1681, 1639.161, 1592.459, 49.83672,
    2.48893515764
  • 3591.363, 6, 1440, 1460.592, 1592.459, 39.5442,
    1.43
  • 3591.363, 7, 1380, 1333.043, 1592.459, 34.76459,
    3.40268115942
  • 3591.363, 8, 1200, 1237.381, 1592.459, 33.27651,
    3.11508333333
  • 3591.363, 9, 1200, 1162.977, 1592.459, 33.56231,
    3.08525
  • 3591.363, 10, 1080, 1103.454, 1592.459, 34.68943,
    2.17166666667
  • 3591.363, 11, 1200, 1054.753, 1592.459, 36.15324,
    12.1039166667
  • 3591.363, 12, 1080, 1014.169, 1592.459, 37.70271,
    6.09546296296
  • 3591.363, 13, 1200, 979.8292, 1592.459, 39.22018,
    18.3475666667
  • 3591.363, 14, 1021, 950.3947, 1592.459, 40.65455,
    6.91530852106
  • 3591.363, 15, 1020, 924.8848, 1592.459, 41.9872,
    9.32501960784

13
Previous Findings for Amon / Aprof
  • Experiments were performed on two clusters at
    FIUMind (16 nodes) and GCB (8 nodes)
  • Experiments were run to predict for different
    number of nodes and cpu loads (i.e. 2,3,,14,15
    and 20, 30,,90, 100)
  • Aprof predictions were within 10 error versus
    actual recorded runtimes within Mind and GCB and
    between Mind and GCB
  • Conclusion first step assumption was valid. -gt
    Move to extending research to higher number of
    nodes.

14
Paraver / Dimemas
  • Dimemas - simulation tool for the parametric
    analysis of the behavior of message-passing
    applications on a configurable parallel platform.
  • Paraver tool that allows for performance
    visualization and analysis of trace files
    generated from actual executions and by Dimemas
  • Tracefiles generated by MPItrace that is linked
    into execution code

15
Dimemas Simulation Process Overview
  1. Link MPItrace into application source
    codedynamically generates tracefiles for each
    node application running on (.mpit)
  2. Use CEPBA tool mpi2prv to convert .mpit files
    into one .prv file
  3. Load file into Parver using XML filtering file
    (provided by CEPBA) to reduce tracefile
    eliminating perturbed regions (i.e. much of the
    initialization)
  4. Open tracefile in Paraver using useful_duration
    configuration file and adjust scales to fit
    events
  5. Identify computation iterations compose a smaller
    trace file by selecting a few iterations,
    preserving communications and eliminating
    initialization phases

16
Paraver tracefile with iterations selected, cut,
and ready for Dimemas conversion.
17
Simulation Process (contd)
  • Convert the new tracefile to Dimemas format
    (.trf) using CEPBA provided prv2trf tool
  • Load tracefile into Dimemas simulator, configure
    target machine, and with information generate
    Dimemas configuration file
  • Call simulator with or without option of
    generating a Paraver (.prv) tracefile for
    viewing.
  • Great News
  • You only have to go through this process once if
    done for the maximum amount of nodes you will
    simulate for! Once configuration file is
    generated, different numbers of nodes can be
    simulated for through alterations to the file.

18
Dimemas Simulator Results
19
Goals
  1. Extend Amon/Aprof research to larger number of
    nodes, different architecture, and different
    version of WRF (Version 2.2.1).
  2. Compare/contrast Aprof predictions to Dimemas
    predictions in terms of accuracy and prediction
    computation time.
  3. Analyze if/how Amon/Aprof could be used in
    conjunction with Dimemas/Paraver for optimized
    application performance prediction and,
    ultimately, meta-scheduling

20
Timeline
  • End of June
  • Get MPItrace linking properly with WRF Version
    Compiled on GCB, then Mind COMPLETE
  • a) Install Amon and Aprof on MareNostrum and
    ensure proper functioning AMON COMPLETE APROF
    FINAL STAGES
  • b) Run Amon benchmarks on MareNostrum COMPLETE
  • Early/Mid July
  • Use and analyze Aprof predictions within
    MareNostrum (and possibly between MareNostrum,
    GCB, and Mind) IN PROGRESS
  • Use generated MPI/ OpenMP tracefiles
    (Paraver/Dimemas) to predict within (and possibly
    between) Mind, GCB, and MareNostrum IN PROGRESS
  • Late July/Early August
  • Experiment with how well Amon and Aprof relate
    to/could possibly be combined with Dimemas
  • Analyze how findings relate to bigger picture.
    Make optimizations on grid-enablement of WRF.
  • Compose paper presenting significant findings.

21
Current Progress
22
General
  • Completed reading of related works papers
  • Well advanced in Linux studies
  • Established effective collaboration/working
    relationship with developers of Dimemas and
    Paraver

23
Amon
  • Installed on MareNostrum
  • Adjusted source code to properly read node
    information from MareNostrum (will document this
    on Wiki to be considered when configuring on new
    architectures)

24
Amon (contd)
  • Automated benchmarking shell script developed
  • Starts Amon on each compute node returned by
    system scheduler
  • Executes WRF with one process per node for
  • Node counts of 8, 16, 32, 64, 96, and 128
  • CPU percentage () loads of 25, 50, 75, 100
    (Done through implementation of CPULimit program)
  • Writes results (to be used as Aprof input) to
    organized results directory of /ltcpu load
    percentagegt/ltnumber of nodesgt/lttimestamp of rungt/
    ltamon output by nodegt

25
Aprof
  • Installed on MareNostrum
  • Adjusted source code to change the way Aprof
    reads in information
  • Before Input files had to specify number of
    bytes in process listing in process header (This
    was very complicated and error prone. Aprof was
    inconsistent in loading MareNostrum data).
  • Now Input files simply need to separate process
    entries with one or more blank lines.

26
Aprof (contd)
  • Script developed that combines Amon output from
    all nodes and edits it into the necessary read-in
    format for Aprof.
  • Aprof query automation script adjusted /developed
    for MareNostrum
  • Queries Aprof for prediction information for
    different cases (number of nodes cpu percentage
    loads)
  • Compares predicted values to actual values
    returned by run

27
Dimemas / Paraver
  • Paraver tracefile successfully generated and
    visualized with GUI on MareNostrum
  • Dimemas tracefile successfully generated from
    Paraver on MareNostrum
  • Configuration file for MareNostrum developed
  • Prediction simulations will begin shortly

28
Significant Challenges Overcome
  • Amon
  • Adjustment of source code to proper functioning
    on MareNostrum
  • Development of benchmarking script to conform to
    system architecture of MareNostrum (i.e. going
    through its scheduler one process per node
    etc.)
  • Aprof
  • Adjustment of source code for less complex, more
    consistent data input
  • Development of prediction and comparison scripts
    for MareNostrum

29
Significant Challenges Overcome(contd)
  • Dimemas/Paraver
  • MPItrace properly linked in with WRF on GCB and
    Mind
  • Paraver and Dimemas successfully generated and
    configuration file configured for MareNostrum.
  • WRF
  • Version 2.2 installed and compiled on Mind

30
Remaining Work
  • Scripting Dimemas prediction simulations for the
    same scenarios of those of Amon and Aprof
  • Finalizing Aprof prediction/comparison script so
    that Aprofs performance on new architecture of
    MareNostrum can be analyzed
  • Deciding if and how to compare results from
    MareNostrum, GCB, and Mind (i.e. the same
    versions of WRF would have to be running in all
    three locations)
  • Experiment with how well Amon and Aprof relate
    to/could possibly be combined with Dimemas

31
References
  • S. Masoud Sadjadi, Liana Fong, Rosa M. Badia,
    Javier Figueroa, Javier Delgado, Xabriel J.
    Collazo-Mojica, Khalid Saleem, Raju Rangaswami,
    Shu Shimizu, Hector A. Duran Limon, Pat Welsh,
    Sandeep Pattnaik, Anthony Praino, David Villegas,
    Selim Kalayci, Gargi Dasgupta, Onyeka Ezenwoye,
    Juan Carlos Martinez, Ivan Rodero, Shuyi Chen,
    Javier Muñoz, Diego Lopez, Julita Corbalan, Hugh
    Willoughby, Michael McFail, Christine Lisetti,
    and Malek Adjouadi. Transparent grid enablement
    of weather research and forecasting. In
    Proceedings of the Mardi Gras Conference 2008 -
    Workshop on Grid-Enabling Applications, Baton
    Rouge, Louisiana, USA, January 2008.
  • http//www.cs.fiu.edu/sadjadi/Presentations/Mardi
    -Gras-GEA-2008-TGE-WRF.ppt
  • S. Masoud Sadjadi, Shu Shimizu, Javier Figueroa,
    Raju Rangaswami, Javier Delgado, Hector Duran,
    and Xabriel Collazo. A modeling approach for
    estimating execution time of long-running
    scientific applications. In Proceedings of the
    22nd IEEE International Parallel Distributed
    Processing Symposium (IPDPS-2008), the Fifth
    High-Performance Grid Computing Workshop
    (HPGC-2008), Miami, Florida, April 2008.
  • http//www.cs.fiu.edu/sadjadi/Presentations/HPGC-
    2008-WRF20Modeling20Paper20Presentationl.ppt
  • Performance/Profiling. Presented by Javier
    Figueroa in Special Topics in Grid Enablement of
    Scientific Applications Class. 13 May 2008

32
Acknowledgements
  • REU
  • PIRE
  • BSC
  • Masoud Sadjadi, Ph. D. - FIU
  • Rosa Badia, Ph.D. - BSC
  • Javier Delgado FIU
  • Javier Figueroa - UM
Write a Comment
User Comments (0)
About PowerShow.com