DataGrid WP4: Fabric Management - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

DataGrid WP4: Fabric Management

Description:

Linux clusters are prime target, but implemented in portable way. Components: ... Documentation and meeting minutes. Members. Mailing list archives ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 14
Provided by: germancan
Category:

less

Transcript and Presenter's Notes

Title: DataGrid WP4: Fabric Management


1
DataGrid WP4Fabric Management
  • German Cancio
  • German.Cancio_at_cern.ch
  • 30/7/2001
  • CERN-Russia JWG on LHC Computing

2
Outline
  • Background information
  • Functionality and Architecture
  • M9 prototype deliverables

3
Background information
  • WP4s objective deliver the necessary tools to
    manage a computing fabric providing grid services
    on clusters scaling up to thousands of nodes.
  • Main scope
  • User job management (Grid and local)
  • Fabric (system administration) management
  • Official participants CERN (leading partner),
    INFN, NIKHEF, University of Heidelberg, ZIB
    (Berlin) and University of Edinburgh/PPARC

4
Functionality
  • Provision for running Grid jobs
  • Authorization according to local policies
  • Mapping Grid credential to local ones
  • Publication of fabric resources and job
    information
  • Provision for running local jobs
  • Sharing of resources according to local policies
  • Enterprise system administration - scalable to
    O(10K) nodes
  • Automated installation and maintenance of nodes
  • Resource management (batch, interactive)
  • Monitoring of events and performance
  • Fault tolerance recovery actions
  • Fabric Configuration Management

5
(No Transcript)
6
Grid
Fabric services
Fabric
Node Installation Management
Monitoring and Fault Tolerance
Fabric Storage Management SE- WP5
Configuration Management
Resource Management
WP4 tasks
7
Architecture (1/3)
  • Subsystems in WP4
  • 1 - (Grid) job submission, control and
    management subsystems
  • Grid interface subsystem
  • receives job submission/control requests from the
    Grid, provides mechanisms for policy based
    authentication and authorization.
  • Globus/GRAM layer.
  • Publication of static and dynamic resource
    information to the Grid.
  • Globus/GRIS layer.
  • Resource Management subsystem
  • manage execution, workload distribution and
    resource sharing of user jobs on the fabrics
    batch and interactive services.

8
Architecture (2/3)
  • 2 - Subsystems for fabric management
  • Configuration management
  • Store fabric configuration information (node
    config, farm profiles, fabric policies)
  • Node installation and management
  • Installing, configuring, updating system
    components and application on all fabric nodes
    (eg. Farm CPU nodes, file servers,)
  • Monitoring and fault tolerance
  • job, node and service based monitoring and alarm
    generation. Fault tolerance components will
    correlate monitoring with configuration
    information and initiate automated recovery
    actions if appropriate, both at individual node
    and at fabric scale.

9
Architecture (3/3)
  • All the subsystems are glued together by a common
    fabric administration scripting layer.
  • Automate complex and inter-dependent fabric wide
    management operations
  • APIs to the subsystems
  • Base libraries for common operations
  • Scripts are written by experienced fabric
    administrators
  • Scripts are executed either manually (eg. using
    web forms)
  • or automatically (periodic actions, recovery
    procedures)

10
Grid Info Services (WP3)
Grid User
Replica Mgr
Farm A (LSF)
Farm B (PBS)
Mass storage, Disk pools
11
Month 9 deliverables (1/2)
  • Main WP4 deliverables in Release 1 (end
    September)
  • 1) Interim Installation System (IIS) for the
    installation and maintenance of the M9 testbeds
  • Linux clusters are prime target, but implemented
    in portable way.
  • Components
  • Initial installation tool using system image
    cloning
  • LCFG (Edinburgh University) for software updates
    and maintenance

12
Month 9 deliverables (2/2)
  • 2) First configuration management prototype
  • Low level language and API for fabric components
    to reliably retrieve their configuration
    information.
  • Design
  • Node profiles stored in XML format
  • central HTTP(S) servers host configurations
  • Node access API for tree-like navigation
  • Used by the PM9 Interim Installation System (IIS)
    to get all configuration information.
  • Configuration information is entered per node
  • High-level definition language allowing for
    inheritance configuration trees (for whole
    services, clusters) to be added in later releases

13
Contacts and more information
  • DataGRID WP4
  • http//cern.ch/hep-proj-grid-fabric
  • Documentation and meeting minutes
  • Members
  • Mailing list archives
  • Overall project architecture group DataGRID ATF
  • http//cern.ch/grid-atf
Write a Comment
User Comments (0)
About PowerShow.com