Grid infrastructure analysis with a simple flow model - PowerPoint PPT Presentation

About This Presentation
Title:

Grid infrastructure analysis with a simple flow model

Description:

Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear Physics, Moscow ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 25
Provided by: nikhefNl
Category:

less

Transcript and Presenter's Notes

Title: Grid infrastructure analysis with a simple flow model


1
  • Grid infrastructure analysis with a simple flow
    model
  • Andrey Demichev, Alexander Kryukov, Lev
    Shamardin, Grigory Shpiz
  • Scobeltsyn Institute of Nuclear Physics, Moscow
    State University

2
Why a grid simulator?
  • A simulator allows easy changes to a grid
    structure and behavior.
  • The grid behavior under stress conditions
  • Site failures
  • Job execution failures
  • Unexpected raise of the job load
  • Bottleneck analysis
  • System structure optimization

3
Different approaches to job flow simulation
  • Individual jobs tracking
  • Monte-Carlo simulation of job submission. The
    model system simulates of stages of the job life
    from the submission to completion or failure.
  • Easier to implement.
  • Examples simgrid, gridsim, beosim

4
Different approaches to job flow simulation
  • Statistical models of job flows
  • Simulation of job flows (i.e. jobs/second). The
    model system consists of boxes which take a
    number of job flows as input and produces a
    number of job flows as an output.
  • The output of such model is actually exactly the
    numbers we are interested in.
  • Examples optorsim

5
Goals
  • Create a simple reallistic model of the grid
  • Model should be capable answering the questions
  • Will the grid handle the required constant
    average job load?
  • Can it be reorganized to handle the load?

6
(No Transcript)
7
Simple flow-based model
  • Simulation of an LCG-like grid
  • Four general node types
  • User Interface (UI), the source of the jobs in
    the system.
  • Resource Broker (RB), accepts the jobs, queries
    the informational system, dispatches the jobs to
    the
  • Computing Elements (CE), where the jobs are
    executed.
  • BDII nodes, which are the informational system.

8
User Interface
  • UIs may be connected to a number of RBs
  • Each UI generates a constant job requests flow in
    the direction of a connected RB.

UI
RB
UI
RB
UI
RB
9
Resource Broker
  • RBs are connected to BDIIs and CEs, and have
    connected UIs. RB is characterized by
  • maximum input job requests flow
  • number of informational system lookups per job
    and a maximum number of informational lookups
    flow
  • maximum job flow to the CEs

10
Informational System (BDII)
  • Maximum flow of requests it can handle

UI
RB
UI
RB
BDII
BDII
UI
RB
11
Computing Elements
  • The maximum flow of jobs it can process
  • All jobs are assumed to be equal
  • We are not interested in the exact location of
    the failing CE when the grid is overloaded,
    therefore we can combine all grid CEs into one
    virtual CE with the efficient capacity.
  • We could actually do the same for the UIs

12
Simple flow-based model
UI
RB
UI
RB
BDII
BDII
UI
RB
Virtual CE
13
Flows
  • Think of a pluming.
  • The UIs generate the flow of incoming jobs to the
    RBs.
  • The RB generates a flow of the requests to the
    BDII and CE
  • The flow of the requests to the CE is checked
    against the maximum

14
Overflows treatment
  • All overflows are monitored but not truncated.
  • If an overflow happened we are interested not in
    the exact values of the overflow, but in the fact
    of the overflow itself.

15
Automatic structure generation
  • Information published in the GOC database
  • No direct access to the GOCDB, so the data is
    pulled out from the SAM web-services
  • Information published in the services
    configuration files
  • No straight way to determine which BDII is used
    by a particular RB, but gsiftp access to the RB
    filesystem allows to read an parse the RB config

16
Automatic structure generation UI
  • No information about UIs is published. We have to
    guess and/or estimate.
  • Each site is assumed to be running a UI with some
    default parameters. This UI is connected to the
    site RBs, or to the country RBs, or to the region
    RBs or to the default RB.

17
Automatic structure generation RB
  • RBs parameters are based on the measurements by
    CMS collaboration (Update on gLite WMS tests by
    Andrea Sciabà).
  • All RBs are assumed to be able to submit jobs to
    all CEs.

18
Automatic structure generation RB
  • The RB is using the BDII specified in its
    configuration if this data is available
  • Site BDII is used if the information is
    unavaible.
  • One of the BDIIs in the same Country is used if
    there is no site BDII
  • One of the BDIIs in the same Region is used if
    there is no BDII in the country
  • Top-level default BDII is used if there are no
    BDIIs in the Region.

19
Automatic structure generation
  • For the BDII performance we use the results from
    the talk LCG/gLite BDII performance
    measurements.
  • The CE performance is scaled according to the
    number of the CPUs on each CE.

20
Example russian part of LCG
UI, RB, BDII
21
Conclusion
  • A simple flow-based model describing the job load
    distribution in the grid
  • The structure of the modeled grid is
    automatically updated to match the real grid
    structure
  • Parameters of nodes are based on the measured
    values

22
Conclusion
  • Any node connections or parameters may be
    overriden allowing to play with the grid
  • Numbers for the current LCG are quite optimistic
  • RBs are capable of generating the job flow to
    accomodate all available resources on CEs, but
  • Clever connection between RBs and UIs is
    required, i.e. if we want not to overflow the RB,
    the UI should become a registered service.

23
Future plans
  • Distinguish different kinds of jobs.
  • A big number of short-time jobs makes a higher
    load on the grid than the smaller number of long
    jobs.
  • Accomodate the delays in the informational system
  • The information about CE availability is delayed
    from the reality on the RB, causing job
    submission failures and resubmissions gt
    additional background load on the RB

24
Acknowledgements
  • The research was partially supported by
  • INTAS-CERN Grant 2005-7509
  • RFBR Grant 06-07-89199
Write a Comment
User Comments (0)
About PowerShow.com