EU DataGrid WP4 - PowerPoint PPT Presentation

About This Presentation
Title:

EU DataGrid WP4

Description:

Client machine. MLD. Translation. HLD. LLD. Cached. LLD. Manipulations (read/write) Fetching only ... MLD = Machine Level Description. May 24 2001. http://cern. ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 16
Provided by: obar
Category:
Tags: datagrid | wp4

less

Transcript and Presenter's Notes

Title: EU DataGrid WP4


1
EU DataGrid WP4
  • Large-Scale Cluster Computing Workshop
  • FNAL, May 24 2001
  • Olof Bärring, CERN

2
Outline
  • Background
  • Architecture
  • Short term prototypes (September 2001)
  • GRID issues
  • Conclusions

3
Background
  • 3 years EU funded project lead by Fabrizio
    Gagliardi, CERN
  • Started 1/1/2001
  • 6 principal contractors CERN, CNRS, ESA, INFN,
    FOM, PPARC
  • 15 assistant contractors

4
Workpackages
  • WP1 Workload Management
  • WP2 Grid Data Management
  • WP3 Grid Monitoring Services
  • WP4 Fabric management
  • WP5 Mass Storage Management
  • WP6 Integration Testbed Production quality
    International Infrastructure
  • WP7 Network Services
  • WP8 High-Energy Physics Applications
  • WP9 Earth Observation Science Applications
  • WP10 Biology Science Applications
  • WP11 Information Dissemination and Exploitation
  • WP12 Project Management

5
WP4 Fabric Management
  • To deliver a computing fabric comprised of all
    the necessary tools to manage a center providing
    grid services on clusters of thousands of nodes.

6
WP4 Fabric Management
  • 14 FTEs (6 funded by the EU) for 3 years split
    over 6 partners CERN, FOM/NIKHEF, ZIB,
    Heidelberg Univ. PPARC, INFN
  • The work divided into 6 subtasks
  • Configuration management
  • Automatic software installation maintenance
  • Monitoring
  • Fault tolerance
  • Resource management
  • Gridification

7
Dependencies
GRID
Fabric
8
Configuration management
GUI
CDB
HLD
LLD
Manipulations (read/write)
Compilation (one-way)
CLI
Fetching only
Client machine
  • HLD High Level Description
  • LLD Low Level Description
  • MLD Machine Level Description

Translation
Cached LLD
MLD
9
Installation management
Software Maintainers
Configuration Management
Resource Management
Local Node
BSS
Fault Tolerance
NMS
  • SRS Software Repository
  • NMS Node Management
  • BSS Bootstrap Service

Monitoring
10
Scheduling of Actions
  • Node autonomy approach (chaotic)
  • High level configuration change propagated to all
    affected nodes
  • Monitoring senses a change of configuration
  • Fault tolerance fires an actuator to bring the
    node to its configured state (could be
    re-install)
  • What happens to running jobs?
  • Who tells scheduler that node is in maintenance?
  • How are dependent actions handled (e.g. server
    intervention)?

11
Scheduling of Actions
  • Decompose complex actions into simple atomic
    actions that can be serialized centrally
  • Each configuration change would generate a simple
    action on the affected nodes
  • Scripts to bundle the actions together and
    executes them in a sensible order
  • Use APIs to the different sub-components

12
Change glibc on service A
  • Get list of ndoes L belonging to service A
  • For all nodes (L1Ln)
  • Disable Li in scheduler queue A
  • Wait for completion of 2
  • For all nodes (L1Ln)
  • Submit admin job to node Li
  • Wait for completion of 4
  • For all nodes (L1Ln)
  • Re-enable node Li in scheduler queue A

13
For September 2001
  • First prototype of the configuration management
    system
  • Low level (node) query interface
  • Caching
  • Interim installation system
  • LCFG for upgrades and maintenance
  • SystemImager for initial system install and VACM
    console control for system preparation

14
GRID issues
  • Gridification Protect the fabric against GRID
    jobs
  • Local farms will still be used by local users
  • Firewalls (channeling of job I/O, interactive
    jobs, MPI over WAN, )
  • Local authorization of grid users
  • Job information

15
Conclusions
  • DataGrid WP4 is not so much about the G-word. It
    is really about automating cluster management
  • In the process of defining the global
    architecture. How do we best put the bits and
    pieces together?
  • Ambitious delivery plans already for September
Write a Comment
User Comments (0)
About PowerShow.com