IPDPS 2005, slide 1 - PowerPoint PPT Presentation

About This Presentation
Title:

IPDPS 2005, slide 1

Description:

Automatic Construction and Evaluation of 'Performance Skeletons' ... Construct Performance Skeleton ... Skeletons constructed for Class B NAS MPI benchmarks. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 24
Provided by: camp203
Learn more at: https://www2.cs.uh.edu
Category:
Tags: ipdps | construct

less

Transcript and Presenter's Notes

Title: IPDPS 2005, slide 1


1
Automatic Construction and Evaluation of
Performance Skeletons (Predicting Performance
in an Unpredictable World)
Sukhdeep Sodhi Microsoft Jaspal
Subhlok University of Houston IPDPS 2005
2
What is a Performance Skeleton anyway ?
A short running program that mimics execution
behavior of a given application GOAL execution
time of a performance skeleton is a fixed
fraction of application execution time - say
11000, then.. Sounds vaguely interesting
but Who cares ? How to do it ? Is it even
possible to build one ?
3
Who Cares ? Anyone who needs a performance
estimate when it cannot be modeled well
  • Applications Distributed on Networks Resource
    selection, Mapping, Adapting

Which nodes offer best performance
?
Application
Network
  • Performance testing of a future architecture
    under simulation Large applications cannot be
    tested as simulation is 1000X slower

4
Mapping Distributed Applications on Networks
state of the art
Mapping for Best Performance
  • Measure and model network and application
    characteristics (NWS is popular)
  • Find best match of nodes for execution
  • But the approach has significant limitations
  • Knowing network status is not the same as knowing
    how an application will perform
  • Frequent measurements are expensive, less
    frequent measurements mean stale data

5
Mapping Distributed Applications on Networks
our approach
Model
Data
Sim 2
Vis
Sim 1
Pre
Stream
Application
Predict performance and select nodes by actual
execution of performance skeletons on groups of
nodes
?
Network
6
How to Construct a Performance Skeleton ?
  • Central challenge in this research
  • Common sense dictates that an application and its
    skeleton must be similar in
  • Computation behavior
  • Communication behavior
  • Memory behavior
  • I/O Behavior
  • All execution behavior is to be captured in a
    short program

How ? How ?
skeleton
application
7
How to Construct a Performance Skeleton ?
How ?
skeleton
Run application
Construct Performance Skeleton
Record Execution Trace
Compress execution trace into Execution Signature
Execution trace is a record of all system
activity during execution such as memory
accesses, communication messages and CPU
events. Execution signature is a compressed
summarized record of execution Performance
Skeleton is a program based on execution signature
8
Likmitations of Work Presented Today
  • Only model the coarse application
    computation and communication patterns to build
    performance skeleton
  • ignore memory and I/O behavior
  • Ignore specific instructions only consider
    whether CPU is computing or communicating or idle
  • somewhat intrusive link with a profiling
    library
  • Limited to MPI programs
  • But these are not limitations of the approach.
  • Most are being addressed in the project.

9
Constructing a Performance Skeleton
How ?
skeleton
Run application
Construct Performance Skeleton program from
execution signature
Record Execution Trace
Compress execution trace into Execution Signature
10
Recording Execution Trace
  • Link MPI application with PMPI based profiling
    library
  • no source code modification / analysis required
  • Execute on a dedicated testbed
  • Records all MPI function calls
  • Call name, start time, stop time, parameters
  • Timing done to microsecond granularity
  • CPU busy time between consecutive MPI calls
  • Result is a (long) execution sequence of
    computation and communication events and their
    durations/parameters

11
Constructing a Simple Performance Skeleton
How ?
skeleton
Run application
Construct Performance Skeleton program from
execution signature
Record Execution Trace
Compress execution trace into Execution Signature
12
Compress Execution Trace? Execution Signature
  • Application execution typically follows cyclic
    patterns
  • Goal Form loop structure by identifying
    repeating execution behavior.
  • Step 1 Execution trace to symbol strings
  • Identify similar (may not be identical)
    execution events
  • Each event in such a cluster of similar events is
    replaced by a representative and assigned a
    symbol
  • Execution trace is replaced by symbol string
  • ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  • Where, say ? compute for 100ms, ? MPI
    call to send 800 bytes to a neighbor node

13
Compress Execution Trace? Execution Signature
  • Step 2 Compress string by Identifying Cycles
  • Build loop structure recursively from symbol
    strings
  • e.g. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  • is replaced by
  • ? ? ?3 ? ?2 ?2
  • Similar to longest substring matching problem
  • Typical Execution Signature is multiple orders of
    magnitude smaller than trace
  • Step 3 Adaptively increase degree of compression
    (by managing a similarity parameter) until
    signature is compact enough

14
Constructing a Simple Performance Skeleton
How ?
skeleton
Run application
Construct Performance Skeleton program from
execution signature
Record Execution Trace
Compress execution trace into Execution Signature
15
Generate Performance Skeleton Program
  • GoalExecution time of performance skeleton is
    1/K application execution time (K given by user)
  • Reduce Iterations of each loop in application
    signature by a factor K
  • Heuristically process remaining iterations and
    events outside loops
  • Replace symbols by C language statements

16
Experimental Validation
  • Skeletons constructed for Class B NAS MPI
    benchmarks. Executed on 4 cluster nodes in
    following sharing scenarios
  • Dedicated nodes (defines reference execution time
    ratio between skeleton and application)
  • Competing processes on one node/ all nodes
  • Competing traffic on one link /all links
  • Competition as above on one node and one link
  • Skeleton execution time used to predict
    application execution time in different scenarios
  • Setup Intel Xeon dual CPU 1.7 GHz nodes running
    Linux 2.4.7. Gigabit crossbar switch. Simple CPU
    intensive competing processes. iproute to
    simulate link sharing

17
Prediction Accuracy of Skeletons(average across
all sharing scenarios)
Average prediction error is 6 , max 18
--acceptable Longer skeletons better but even .5
sec. skeletons meaningful (tool issues a warning
if requested skeleton size is too small)
18
Prediction for Different Sharing Scenarios (10
second skeletons)
  • Error is higher with network contention
  • communication is harder to scale down and
    affects synchronization more directly

19
Comparison with Simple Prediction Methods
Average Prediction Average slowdown of entire
benchmark is used to predict execution time for
each program. Class S Prediction Class S
benchmark(1sec) programs used as skeletons for
Class B (30-900s)benchmarks Even the smallest
skeletons are far superior!
20
Conclusions
  • Promising approach to performance estimation for
  • Unpredictable environments (GRIDS)
  • Non existing architectures (under simulation)
  • .
  • It is work in progress a lot more remains, such
    as
  • accurately reproducing memory behavior (some
    results in LCR 2004 workshop)
  • integration of memory and communicate/compute
  • validation on larger grid environments
  • accurate reproduction of CPU behavior (such as
    instruction types etc.)
  • Skeletons that scale to different numbers of nodes

21
FOR MORE INFORMATION www.cs.uh.edu/jaspal
jaspal_at_uh.edu Thanks to NSF and DOE!
End of Talk! Or is It ? Questions ?
22
Discovered Communication Structure of NAS
Benchmarks
1
1
1
0
0
0
2
2
3
3
3
2
BT
CG
IS
1
1
1
0
0
0
2
2
2
3
3
3
LU
MG
SP
1
0
2
3
EP
23
CPU Behavior of NAS Benchmarks
Write a Comment
User Comments (0)
About PowerShow.com