RealityGrid Performance Control System - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

RealityGrid Performance Control System

Description:

... simulation of oil, water and surfactant (detergent) in porous media. Plus reduced version, without surfactants. LAMMPS (MD) atomistic/molecular simulation ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 26
Provided by: graham104
Category:

less

Transcript and Presenter's Notes

Title: RealityGrid Performance Control System


1
RealityGrid Performance Control System
  • Ken Mayes, Graham Riley,
  • Rupert Ford, Mikel Lujan, John Gurd
  • APART meeting, SC 2002
  • November 2002

2
Overview
  • Set the context
  • RealityGrid and a Use Case
  • Design of our Performance Control System
  • Brief look at prototype implementation
  • Role of performance analysis
  • Related projects at Manchester

3
Context
  • Grid applications will be distributed and, in
    some sense, component-based.
  • To deliver the power grid model, adaption is key!
  • Elevate users above Grid resource and performance
    details.
  • Our work is considering adaption and its impact
    on performance
  • Adaption in coupled modelling deployment,
  • Flexibility in deployment of compositions of
    coupled models,
  • Adaption due to malleable components.
  • Present an example of the latter from the
    RealityGrid project.

4
RealityGrid - Aims
  • A UK e-Science testbed project.
  • Predict realistic behaviour of matter
  • Large scale simulation, computational steering
    and high performance visualisation.
  • Techniques Lattice Boltzman (LB), Molecular
    Dynamics (MD), Monte Carlo (MC).
  • Discovery of new materials through the
    integration of prediction and experiments (LUSI
    facility).

5
Application Codes
  • LB3D (LB) Lattice Boltzman simulation of oil,
    water and surfactant (detergent) in porous media.
  • Plus reduced version, without surfactants
  • LAMMPS (MD) atomistic/molecular simulation
  • Oxford MC code TINKER

6
Academic partners
Queen Mary, University of London Imperial
College University of Manchester University of
Edinburgh University of Oxford University of
Loughborough
7
Industrial Partners
  • Schlumberger
  • Edward Jenner Institute for Vaccine Research
  • Silicon Graphics Inc
  • Computation for Science Consortium
  • Advanced Visual Systems
  • Fujitsu

8
RealityGrid Use Case
Tracking a parameter search
Params 1
LB3D
Output rate 1 Resolution 1
T100
User Changes dynamically
Output rate 2 Resolution 2
T100
LB3D
Display simulation time equal
Params 2
9
RealityGrid Use Case
Tracking a parameter study
Params 1
LB3D
Output rate 1 Resolution 1
Params 2
LB3D
Output rate 2 Resolution 2
User Display rates equal
10
Malleable LB3D - mechanisms
  • Lb3d will respond to requests to change resources
  • Use more (or less) pes ( mem) on current system
  • Move from one system to another
  • Via machine independent (parallel)
    checkpoint/restart
  • Lb3d will output science data (for remote
    visualisation) at higher or lower rates
  • Lb3d will (one day) respond to requests to
    continue running at higher (or lower) lattice
    resolution
  • Each of the above affects performance (e.g.
    timesteps per second rate)
  • Each has an associated cost

11
Use Case - detail
  • User might be tracking many parameter set
    developments (one per lb3d instance)
  • Some will be uninteresting (for a while)
  • Lower output rate / resolution / terminate
  • Some will become interesting
  • Increase output rate / resolution
  • One aim Re-distribute resources amongst all lb3d
    instances to maintain highest possible timestep
    rate

12
A General Grid Application
Generate Data
Component 1
Component 2
Component 3
Computational Grid Resources
Applications and components exhibit phased
behaviour
13
Performance Steerers - Design
Initial deployment Run-time adaption
Component Framework
Computational Grid Resources
14
Full System
External Resource Scheduler
Application Performance Repository
Component Performance Repository
Loader
Resource Usage Monitor
15
Performance Prediction
  • Role of APS
  • To distribute available resources amongst
    components such that the predicted performance of
    components gives a minimum predicted execution
    time.
  • Role of CPS
  • To utilise resources and component performance
    effectors (actuators) so that the predicted
    execution time of the steered component is
    minimised.

16
Two Types of Predictor
  • History and Heuristics
  • NewSystemApplicationParameters
    PREDICTOR(HistoryPerfData)
  • Parameters plus models
  • PredictedPerfData PREDICTOR(SystemApplicationPa
    rameters)
  • Therefore need a repository of information about
    the components and the application.

17
Application Progress Points
  • Assume that the execution of the application
    proceeds through phases. Phase boundaries are
    marked by Progress Points.
  • NB. Can take decisions about performance and take
    actions at the progress points
  • Must be safe points

18
Component Progress Points
APS
Application progress points
CPS
Component progress points
Component
Time
  • Information about progress points will be
    contained in some repository.

19
Implementation
APS as daemon, CPS as library
Comms I/f
CPS Progress
Sockets (rpc)
Comms I/f
Component Interface
Procedure calls
Component control
20
Implementation
Machine 1
APS
socket
Machine 3
Machine 2
CPS
LB3D
shmem
shmem
MPIRUN
Component Loader
Component Loader
socket
DUROC RSL GLOBUS GRIDFTP
21
Start-up Process
  • GlobusRun, RSL script for Component loaders (one
    per machine in Grid) plus APS daemon.
  • Loaders connect to each other.
  • LB3D started by Loader (via MPIRUN), calls CPS (a
    library) at start-up.
  • CPS connects to APS.
  • Lb3d calls CPS at each subsequent progress point
    and CPS communicates with APS.
  • Continue until LB3D completed (e.g. no. of
    timesteps complete).

22
Example
  • Every N tsteps, move LB3D between machines 2 and
    3 determined by APS.
  • At tstep mod N progress point in LB3D, APS tells
    CPS, which tells the component, to checkpoint,
    CPS writes certain status information to the
    shmem area and then lb3d (and CPS) dies.
  • Loader on machine it ran on communicates to
    loader on machine it is to run on. The restart
    file is Gridftpd along with restart info. (e.g.
    tstep to shmem area of new loader).
  • New LB3D is started and CPS manages the restart.
  • Continue until no more tsteps.

23
Role of Performance Analysis
  • Description of phased behaviour
  • Progress points and APART regions etc.
  • At component and application level
  • Information kept in repositories
  • Capturing the performance/resource use data
  • populating the component and application
    repositories.
  • Performance prediction/derivation
  • History-based / model-based, through analysis of
    contents of repositories.
  • On-line and off-line.

24
Role of Performance Analysis
  • Note traditional APART performance analysis is
    part of the development process NOT the
    production process.
  • Early Grid systems will probably clearly
    distinguish development phases from production
    phases!
  • At least, service providers will

25
Related Projects at Manchester
  • APART 2, joint EC-NSF working group funding.
  • Met Office FLUME, design of next generation
    software
  • Coupled models
  • Tyndall Centre Climate Impact, Integrated
    assessment modelling
  • Coupling climate and economic models
  • DCD-ICE proposal (expected soon)
  • Coupled modelling in aircraft wing ice domain
  • Computational Markets
  • UK e-Science funded (Imperial College led)
Write a Comment
User Comments (0)
About PowerShow.com