Title: Performance Control in RealityGrid
1Performance Control in RealityGrid
- Ken Mayes, Mikel Luján, Graham Riley,
- Rupert Ford, Len Freeman, John Gurd
- !st RealityGrid Workshop, June 2003
2Overview
- Preamble
- The case for Performance Control
- Context
- Malleable, component-based Grid applications
- RealityGrid Application
- The LB3D Use Case
- RealityGrid Performance Control System
- Design and status report
- Generalisation . . .
3Performance Analysis and Distributed Computing
. . .
- We want specific computational performance and
resource levels - We want these in an environment that is
- Distributed (physically)
- Heterogeneous (inherently)
- Dynamic (by necessity)
- We want them with a minimum effort
4. . . Implies Performanceand Resource Control
- There is no viable alternative apart from
automating the process of achieving them - monitoring and analysis (sensing)
- change when unsatisfactory (actuation)
- This is a classical control system scenario, and
we need to use established control system
techniques to deal with it
5Control Systems Overview
- Control systems involve appropriate change of
actuated variables informed by (timely) feedback
of monitoring information
6Summary
- Traditional performance control amounts to
open loop control, mediated by human
performance engineers - Distributed environments introduce enough extra
complexity that closed loop control becomes
essential - Successful closed loop control demands accurate
and rapid feedback data, thus affecting
achievable control limits
7Context
- Grid applications will be distributed and, in
some sense, component-based. - To deliver the power grid model, adaptation is
key! - Elevate users above Grid resource and performance
details. - Our work is considering adaptation and its impact
on performance - Adaptation in component deployment,
- Initial model configuration and deployment on
resources. - Flexibility in deployment of compositions of
coupled models. - Adaptation via malleable components.
- At run time, re-allocation of resources in
response to changes.
8RealityGrid - LB3D Use Case
Display simulation time equal
Tracking a parameter search
Params 1
LB3D
Output rate 1 Resolution 1
T100
User Changes dynamically
Output rate 2 Resolution 2
T100
LB3D
Params 2
9LB3D Use Case
Tracking a parameter study
Params 1
LB3D
Output rate 1 Resolution 1
Params 2
LB3D
Output rate 2 Resolution 2
User Display rates equal
10Malleable LB3D - mechanisms
- LB3D will respond to requests to change resources
- Use more (or less) pes ( mem) on current system
- Move from one system to another
- Via machine independent (parallel)
checkpoint/restart - LB3D will output science data (for remote
visualisation) at higher or lower rates - LB3D will (one day) respond to requests to
continue running at higher (or lower) lattice
resolution - Each of the above affects performance (e.g.
timesteps per second rate) - Each has an associated cost . . .
11Use Case - continued
- User might be tracking many parameter set
developments (one per LB3D instance) - Some will be uninteresting (for a while)
- Lower output rate / resolution / terminate
- Some will become interesting
- Increase output rate / resolution
- One aim Re-distribute resources across all LB3D
instances to maintain highest possible timestep
rate
12A General Grid Application
Generate Data
Component 1
Component 2
Component 3
Computational Grid Resources
Applications and components exhibit phased
behaviour
13Life is Hierarchical
- Can we use hierarchy to divide and conquer
complex system problems? - Introduce Performance Steerers at component-
and application-levels . . .
14Performance Steerers - Design
Initial deployment Run-time adaptation
Component Framework
Computational Grid Resources
15Full System
External Resource Scheduler
Application Performance Repository
Component Performance Repository
Loader
Resource Usage Monitor
16Performance Prediction
- Role of APS
- To distribute available resources to the
components in such a way that the predicted
performance of components gives a minimum
predicted execution time. - Role of CPS
- To utilise allocated resources and component
performance effectors (actuators) so that the
predicted execution time of the steered component
is minimised.
17Life is Repetitive
- Many programs iterate more-or-less the same thing
over-and-over again - We can take advantage of this
- e.g. for load balance in weather prediction
- and, possibly, for performance control
18Application Progress Points
- Assume that the execution of the application
proceeds through phases. Phase boundaries are
marked by Progress Points. - NB. Can take decisions about performance and take
actions at the progress points - Must be safe points
19Component Progress Points
APS
Application progress points
CPS
Component progress points
Component
Time
- Information about progress points will be
contained in some repository.
20Implementation
APS as daemon, CPS as library
Comms I/f
CPS Progress
Sockets (rpc)
Comms I/f
Component Interface
Procedure calls
Component control
21Implementation
Machine 1
APS
socket
Machine 3
Machine 2
CPS
LB3D
shmem
shmem
MPIRUN
Component Loader
Component Loader
socket
DUROC RSL GLOBUS GRIDFTP
22Start-up Process
- GlobusRun, RSL script for Component loaders (one
per machine in Grid) plus APS daemon. - Loaders connect to each other.
- LB3D started by Loader (via MPIRUN), calls CPS (a
library) at start-up. - CPS connects to APS.
- LB3D calls CPS at each subsequent progress point
and CPS communicates with APS. - Continue until LB3D completed (e.g. no. of
timesteps complete).
23Status
- We have a prototype implementation (basic
mechanisms) - First Experiment
- Every N tsteps, move LB3D between machines 2 and
3 determined by APS. - At tstep mod N progress point in LB3D, APS tells
CPS (which tells the component) to checkpoint,
CPS writes certain status information to the
shmem area and then LB3D (and CPS) dies. - Loader on machine it ran on communicates to
loader on machine it is to run on. The restart
file is Gridftpd along with restart info. (e.g.
tstep to shmem area of new loader). - New LB3D is started and CPS manages the restart.
- Continue until no more tsteps.
24Status
- Preliminary performance results
- np 4, data 64x64x64
- checkpoint file (XDR) size 32.8 MB
- average resident tstep time for cronus 3.384
(s.) - average migration tstep time to cronus 43.718
(s.) - average resident tstep time for centaur3 6.675
(s.) - average migration tstep time to centaur3 55.280
(s.) - np 4, data 8x8x8
- checkpoint file (XDR) size 64 KB
- average resident tstep time for cronus 0.017
(s.) - average migration tstep time to cronus 0.504
(s.) - average resident tstep time for centaur3 0.061
(s.) - average migration tstep time to centaur3 3.038
(s.) - cronus SGI Origin 3400 and centaur3 Sun TCF
25Status
- We have a prototype implementation
- Second Experiment
- Three platforms (2 SGI, 1 Solaris Linux coming
soon) - Move LB3D at random between machines, as
determined by APS. - Has exposed Gridftp problems worked around.
- Have timings for 2 machines, but not yet 3
- Now looking at ICENI framework.
26Status
- Developing implementation of performance
repository - Berkeley Database (linked as library)
- Survey of prediction and machine learning
algorithms - runtime vs. off line analysis
- accuracy of predictions
- amount of history data required
- Learning control theory and understand how to
apply it to Performance Control.
27Generalisation
- LB3D as a component is an easy case
- how does the above scheme generalise?
- Are components necessary?
- Is hierarchy necessary?
- component lt application lt scheduler
- What different kinds of component and/or
application might there be? - redefinition of the process
- a topic for my forthcoming leave of absence
28Summary
- Aim
- Develop an architecture that enables us to
investigate different mechanisms for Performance
Control of malleable component based applications
on the Grid - Main characteristic
- dynamic adaptation
- Design/implementation tensions
- general vs. specific purpose
- APS lt-gt CPS communication
- APS lt-gt CPS ratio
- performance prediction algorithms - accuracy vs.
execution time - APS/CPS overhead vs. application execution time
- Work in development (first year in one sentence)
- A Grid Framework for Malleable Component-based
Application Migration
29Related Projects at Manchester
- Met Office FLUME - design of next generation
Unified Model software - Coupled models
- Tyndall Centre SoftIAM - Climate Impact,
Integrated assessment modelling - Coupling climate and economic models
- Computational Markets
- 1 RA and 1 PhD - positions being filled at
present - UK e-Science funded (Imperial College led)
- For more information check
- http//www.cs.man.ac.uk/cnc
- http//www.realitygrid.org