Application Performance Profiling and Prediction in Grid Environment - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Application Performance Profiling and Prediction in Grid Environment

Description:

Grid Enablement of Weather Research and Forecasting Code (WRF) ... Makeover. Amon Output Process. Aprof Input Process --- (464) --- name: wrf.exe. cpus: 4 ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 36
Provided by: marlon5
Category:

less

Transcript and Presenter's Notes

Title: Application Performance Profiling and Prediction in Grid Environment


1
Application Performance Profiling and Prediction
in Grid Environment
  • Presented by Marlon Bright
  • 1 August 2008
  • Advisor Masoud Sadjadi, Ph.D.
  • REU Florida International University

2
Outline
  • Grid Enablement of Weather Research and
    Forecasting Code (WRF)
  • Profiling and Prediction Tools
  • Research Goals
  • Project Timeline
  • Project Status
  • Challenges Overcome
  • Remaining Work

3
Motivation Weather Research and Forecasting
Code (WRF)
  • Goal Improved Weather Prediction
  • Accurate and Timely Results
  • Precise Location Information
  • WRF Status
  • Over 160,000 lines (mostly FORTRAN and C)
  • Single Machine/Cluster compatible
  • Single Domain
  • Fine Resolution -gt Resource Requirements
  • How to Overcome this?
  • Through Grid Enablement
  • Expected Benefits to WRF
  • More available resources Different Domains
  • Faster results
  • Improved Accuracy

4
System Overview
  • Web-Based Portal
  • Grid Middleware (Plumbing)
  • Job-Flow Management
  • Meta-Scheduling
  • Performance Prediction
  • Profiling and Benchmarking
  • Development Tools and Environments
  • Transparent Grid Enablement (TGE)
  • TRAP Static and Dynamic adaptation of programs
  • TRAP/BPEL, TRAP/J, TRAP.NET, etc.
  • GRID superscalar Programming Paradigm for
    parallelizing a sequential application
    dynamically in a Computational Grid

5
Performance Prediction
  • IMPORTANT part of Meta-Scheduling
  • Allows for
  • Optimal usage of grid resources through smarter
    meta-scheduling
  • Many users overestimate job requirements
  • Reduced idle time for compute resources
  • Could save costs and energy
  • Optimal resource selection for most expedient job
    return time
  • Tools Amon /Aprof and Paraver/Dimemas

6
Research Goals
  • Extend Amon/Aprof research to larger number of
    nodes, different archtitecture, and different
    version of WRF (Version 2.2.1).
  • Compare/contrast Aprof predictions to Dimemas
    predictions in terms of accuracy and prediction
    computation time.
  • Analyze if/how Amon/Aprof could be used in
    conjunction with Dimemas/Paraver for optimized
    application performance prediction and,
    ultimately, meta-scheduling

7
Timeline
  • End of June
  • Get MPItrace linking properly with WRF Version
    Compiled on GCB, then Mind COMPLETE
  • a) Install Amon and Aprof on MareNostrum and
    ensure proper functioning AMON COMPLETE APROF
    FINAL STAGES
  • b) Run Amon benchmarks on MareNostrum COMPLETE
  • Early/Mid July
  • Use and analyze Aprof predictions within
    MareNostrum (and possibly between MareNostrum,
    GCB, and Mind) MN COMPLETE
  • Use generated MPI/ OpenMP tracefiles
    (Paraver/Dimemas) to predict within (and possibly
    between) Mind, GCB, and MareNostrum IN PROGRESS
  • Late July/Early August
  • Experiment with how well Amon and Aprof relate
    to/could possibly be combined with Dimemas IN
    PROGRESS
  • Compose paper presenting significant findings. IN
    PROGRESS
  • Analyze how findings relate to bigger picture.
    Make optimizations on grid-enablement of WRF.

8
The ToolsAmon / Aprof Dimemas / Paraver
9
Amon / Aprof
  • Amon monitoring program that runs on each
    compute node recording new processes
  • Aprof regression analysis program running on
    head node receives input from Amon to make
    execution time predictions (within cluster
    between clusters)

10
Amon / Aprof Monitoring and Prediction
11
Amon / Aprof Approach to Modeling Resource Usage
WRF
Network Latency
CPU Speed
Hard Disk I/O
Network Bandwidth
Number of Nodes
FSB Bandwidth
RAM Size
L2 Cache
Application Resource Usage Model
12
Previous Findings for Amon / Aprof
  • Experiments were performed on two clusters at
    FIUMind (16 nodes) and GCB (8 nodes)
  • Experiments were run to predict for different
    number of nodes and cpu loads (i.e. 2,3,,14,15
    and 20, 30,,90, 100)
  • Aprof predictions were within 10 error versus
    actual recorded runtimes within Mind and GCB and
    between Mind and GCB
  • Conclusion first step assumption was valid. -gt
    Move to extending research to higher number of
    nodes.

13
Howd they do that?
  • Developed a benchmarking script that edits and
    submits a job file to MareNostrum (MN) scheduler
  • Runs for each number of nodes (8, 16, 32, 64, 96,
    128)
  • Runs for each cpu percentage (100, 75, 50, 25)
  • Records execution time, average cpu utilization,
    participating nodes, etc.
  • Job file
  • Requests desired number of nodes from MN
  • Starts Amon on each returned node to monitor and
    return processes
  • Starts cpulimit on each returned node limiting
    the effective power given to the WRF process
  • Executes WRF as parallel job across the returned
    nodes
  • Developed modification script
  • Combines Amon output to one file
  • Filters processes to solely WRF processes
  • Edits processes to Aprof friendly format

14
Howd they do that ? (contd)
  • Start Aprof loading input file as data
  • Executed Aprof Query Automation script
  • Starts telnet session querying Aprof for
    benchmarked scenarios
  • Compares predicted values to actual values
    returned in run
  • Outputs text file and graphing plot file of
    comparison statistics

15
Experimental Process
16
Extreme? Makeover
  • --- (464) ---
  • name wrf.exe
  • cpus 4
  • cpu MHz 1/0.000 MHz
  • cache size 1/0 KB
  • elapsed time 957952 msec
  • utime 956370 msec 957810 msec
  • stime 570 msec 860 msec
  • intr 18783
  • ctxt switch 58290
  • fork 95
  • storage R 0 blocks 0 blocks
  • storage W 0 blocks
  • network Rx 19547308 bytes
  • network Tx 1434925 bytes
  • --- (464) ---
  • name wrf.exe
  • inv cpu 1/16
  • inv clock 1/574
  • cache size 1/1024 KB
  • elapsed time 1990992 msec
  • inv clockcpu 1/(36763)
  • Why Version of Linux on MN does not report some
    characteristics (i.e. cache size). From its
    initial design, Amon reports in different format
    than Aprof reads.
  • Amon Output Process
  • Aprof Input Process

17
Aprof Prediction
  • name wrf.exe
  • elapsed time
  • 5.783787e06

  • explanatory value parameter
    std.dev
  • ----------------- ------------- -------------
    -------------
  • 1.000000e00
    5.783787e06 1.982074e05

  • predicted value residue rms
    std.dev
  • ----------------- ------------- -------------
    -------------
  • elapsed time 5.783787e06 4.246451e06
    1.982074e05


18
Query Automation Script Output
  • adj. cpu speed, processors, actual, predicted,
    rms, std. dev, actual difference,
  • 3591.363, 1, 5222, 5924.82, 1592.459, 415.3491,
    13.4588280352
  • 3591.363, 2, 2881, 3246.283, 1592.459, 181.5382,
    12.6790350573
  • 3591.363, 3, 2281, 2353.438, 1592.459, 105.334,
    3.17571240684
  • 3591.363, 4, 1860, 1907.015, 1592.459, 69.19778,
    2.52768817204
  • 3591.363, 5, 1681, 1639.161, 1592.459, 49.83672,
    2.48893515764
  • 3591.363, 6, 1440, 1460.592, 1592.459, 39.5442,
    1.43
  • 3591.363, 7, 1380, 1333.043, 1592.459, 34.76459,
    3.40268115942
  • 3591.363, 8, 1200, 1237.381, 1592.459, 33.27651,
    3.11508333333
  • 3591.363, 9, 1200, 1162.977, 1592.459, 33.56231,
    3.08525
  • 3591.363, 10, 1080, 1103.454, 1592.459, 34.68943,
    2.17166666667
  • 3591.363, 11, 1200, 1054.753, 1592.459, 36.15324,
    12.1039166667
  • 3591.363, 12, 1080, 1014.169, 1592.459, 37.70271,
    6.09546296296
  • 3591.363, 13, 1200, 979.8292, 1592.459, 39.22018,
    18.3475666667
  • 3591.363, 14, 1021, 950.3947, 1592.459, 40.65455,
    6.91530852106
  • 3591.363, 15, 1020, 924.8848, 1592.459, 41.9872,
    9.32501960784

19
Paraver / Dimemas
  • Dimemas - simulation tool for the parametric
    analysis of the behavior of message-passing
    applications on a configurable parallel platform.
  • Paraver tool that allows for performance
    visualization and analysis of trace files
    generated from actual executions and by Dimemas
  • Tracefiles generated by MPItrace that is linked
    into execution code

20
Dimemas Simulation Process Overview
  • Link MPItrace into application source
    codedynamically generates tracefiles for each
    node application running on
  • Identify computation iterations in Paraver
    compose a smaller trace file by selecting a few
    iterations, preserving communications and
    eliminating initialization phases
  • Convert the new tracefile to Dimemas format
    (.trf) using CEPBA provided prv2trf tool
  • Load tracefile into Dimemas simulator, configure
    target machine, and with information generate
    Dimemas configuration file
  • Call simulator with or without option of
    generating a Paraver (.prv) tracefile for
    viewing.

21
Paraver/Dimemas DiP Environment
22
Howd they do that?
  • Generated Paraver tracefiles, Dimemas tracefiles,
    and simulation configuration files for each
    number of nodes
  • Developed Dimemas simulation script
    simulation_automater.sh
  • Selects configuration file for desired number of
    nodes
  • Edits configuration file for desired cpu
    percentage
  • Records execution time, average cpu utilization
  • Finalizing development of prediction validation
    script. Will
  • Compare Dimemas predicted values to actual run
    values
  • Outputs text file and graphing plot file of
    comparison statistics

23
Dimemas Prediction
  • Execution time 36.354146
  • Speedup 5.34
  • CPU Time 194.066431
  • Id. Computation time Communication
  • 1 31.224017 91.21 3.008552
  • 2 20.089440 78.20 5.599083
  • 3 19.305673 76.84 5.818317
  • 4 28.672368 83.27 5.762332
  • 5 29.058603 85.36 4.982049
  • 6 19.488003 77.63 5.614155
  • 7 18.727851 78.57 5.108366
  • 8 27.500476 84.29 5.123971
  • Id. Mess.sent Bytes sent Immediate
    recv Waiting recv Bytes re

    cv Coll.op. Block time
    Comm. time Wait link time Wait bus

    es time I/O time
  • 1 7.577000e03 1.583659e08
    3.539000e03 4.080000e03 1.671666

    e08 1.475000e03 0.247092
    0.383663 0.319859 0.000000

    0.000000
  • 2 8.948000e03 2.200029e08
    8.797000e03 1.440000e02 2.186629

    e08 1.475000e03 3.710867
    0.383663 0.098868 0.000000

    0.000000
  • 3 8.948000e03 2.176712e08
    6.904000e03 2.037000e03 2.163992

    e08 1.475000e03 3.453668
    0.383663 0.243052 0.000000

    0.000000

24
Project Status
25
Amon / Aprof
  • Software installed and tailored to MareNostrum
  • Proficient in executing software
  • Amon benchmarking completed
  • Aprof query automation complete and results
    generated
  • Lessons learned on extending Amon/Aprof to
    different architecture

26
Dimemas / Paraver
  • Proficient in executing software
  • Paraver and Dimemas tracefiles generated for each
    number of nodes (8, 16, 32, 64, 96, 128)
  • Benchmarking script complete
  • Simulations generated
  • Comparison script being finalized

27
Quick Comparison
  • Amon / Aprof
  • Dimemas / Paraver
  • Pros
  • Simpler to deploy in comparison
  • Scalability of model is promising with first
    results
  • Feasible solution for performance prediction
    purposes
  • Cons
  • Requires more base executions for accurate
    performance in comparison
  • Pros
  • More featurescould be more useful to experienced
    user (i.e. adjustment of system characteristics)
  • Visualization and analysis of execution for
    analysis purposes
  • Graphical User Interface
  • Cons
  • Requires special compilation of applications
  • Requires non-trivial-to-install kernel patch
  • Large tracefiles (sometimes gigabytes)

28
Aprof Results 100 CPU Utilization
29
Aprof Results 100 CPU Utilization
30
Significant Challenges Overcome
  • Amon
  • Adjustment of source code to proper functioning
    on MareNostrum (MN)
  • Development of benchmarking script to conform to
    system architecture of MareNostrum (i.e. going
    through its scheduler one process per node
    etc.)
  • Proper functioning of CPU limit for accurate cpu
    percentage
  • Job termination by MN Scheduler due to execution
    surpassing wall clock limit

31
Significant Challenges Overcome(contd)
  • Aprof
  • Adjustment of source code for less complex, more
    consistent data input
  • Development of prediction and comparison scripts
    for MareNostrum
  • Dimemas/Paraver
  • MPItrace properly linked in with WRF on GCB and
    Mind
  • Generation of trace and configuration files
  • WRF
  • Version 2.2 installed and compiled on Mind

32
Challenges Remaining
  • Lengthy Amon benchmarking runs due to job times
    spent in queue
  • Complexities in preparing Dimemas tracefiles for
    simulation purposes
  • Extracting accurate predictions from Dimemas
    trace files are reduced in order to speed up
    prediction process therefore, predicted times
    must be multiplied by a determined factor

33
Remaining Work
  • Next Week
  • Finalizing scripting of Dimemas prediction
    simulations for the same scenarios of those of
    Amon and Aprof
  • Fall 2008
  • Experiment with how well Amon and Aprof relate
    to/could possibly be combined with Dimemas
  • Decide if and how to compare results from
    MareNostrum, GCB, and Mind (i.e. the same
    versions of WRF would have to be running in all
    three locations)
  • Compose paper presenting significant results and
    submit paper to conference.
  • Future Work
  • Work with metascheduling team on implementation
    of tools.

34
References
  • S. Masoud Sadjadi, Liana Fong, Rosa M. Badia,
    Javier Figueroa, Javier Delgado, Xabriel J.
    Collazo-Mojica, Khalid Saleem, Raju Rangaswami,
    Shu Shimizu, Hector A. Duran Limon, Pat Welsh,
    Sandeep Pattnaik, Anthony Praino, David Villegas,
    Selim Kalayci, Gargi Dasgupta, Onyeka Ezenwoye,
    Juan Carlos Martinez, Ivan Rodero, Shuyi Chen,
    Javier Muñoz, Diego Lopez, Julita Corbalan, Hugh
    Willoughby, Michael McFail, Christine Lisetti,
    and Malek Adjouadi. Transparent grid enablement
    of weather research and forecasting. In
    Proceedings of the Mardi Gras Conference 2008 -
    Workshop on Grid-Enabling Applications, Baton
    Rouge, Louisiana, USA, January 2008.
  • http//www.cs.fiu.edu/sadjadi/Presentations/Mardi
    -Gras-GEA-2008-TGE-WRF.ppt
  • S. Masoud Sadjadi, Shu Shimizu, Javier Figueroa,
    Raju Rangaswami, Javier Delgado, Hector Duran,
    and Xabriel Collazo. A modeling approach for
    estimating execution time of long-running
    scientific applications. In Proceedings of the
    22nd IEEE International Parallel Distributed
    Processing Symposium (IPDPS-2008), the Fifth
    High-Performance Grid Computing Workshop
    (HPGC-2008), Miami, Florida, April 2008.
  • http//www.cs.fiu.edu/sadjadi/Presentations/HPGC-
    2008-WRF20Modeling20Paper20Presentationl.ppt
  • Performance/Profiling. Presented by Javier
    Figueroa in Special Topics in Grid Enablement of
    Scientific Applications Class. 13 May 2008

35
Acknowledgements
  • REU
  • Partnerships for International Research and
    Education (PIRE)
  • The Barcelona SuperComputing Center (BSC)
  • Masoud Sadjadi, Ph. D. - FIU
  • Rosa Badia, Ph.D. - BSC
  • Javier Delgado FIU
  • Javier Figueroa Univ. of Miami
Write a Comment
User Comments (0)
About PowerShow.com