THE INFN GRID PROJECT - PowerPoint PPT Presentation

About This Presentation
Title:

THE INFN GRID PROJECT

Description:

Condor resource usage statistics (CPU, Network, Ckpt-server) ... Priorities, policies on resource (CPU, Data, Network) usage. bookkeeping and web' user interface ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 17
Provided by: massimosg
Category:
Tags: grid | infn | project | the

less

Transcript and Presenter's Notes

Title: THE INFN GRID PROJECT


1
THE INFN GRID PROJECT
  • Scope Study and develop a general INFN computing
    infrastructure, based on GRID technologies, to be
    validated (as first use case) implementing
    distributed Regional Center prototypes for LHC
    expts ATLAS, CMS, ALICE and, later on, also for
    other INFN expts (Virgo, Gran Sasso .)
  • Project Status
  • Outline of proposal submitted to INFN management
    13-1-2000
  • 3 Year duration
  • Next meeting with INFN management 18th of
    February
  • Feedback documents from LHC expts by end of
    February (sites, FTEs..)
  • Final proposal to INFN by end of March

2
INFN Grid Related Projects
  • Globus tests
  • Condor on WAN as general purpose computing
    resource
  • GRID working group to analyze viable and useful
    solutions (LHC computing, Virgo)
  • Global architecture that allows strategies for
    the discovery, allocation, reservation and
    management of resource collection
  • MONARC project related activities

3
Evaluation of the Globus ToolKit
  • 5 sites Testbed (Bologna, CNAF, LNL, Padova,
    Roma1)
  • Use case HTL CMS studies
  • MC Prod. ? Complete HLT chain
  • Services to test/implement
  • Resource management
  • fork() ? Interface to different local resource
    managers (Condor, LSF)
  • Resources chosen by hand ? Smart Broker to
    implement a Global resource manager
  • Data Mover (Gass, Gsiftp)
  • to stage executable and input files
  • to retrieve output files
  • Bookkeeping (Is this a worth a general tool ?)

4
Use Case CMS HLT studies
5
Status
  • Globus installed in 5 Linux PCs in 3 sites
  • Globus Security Infrastructure
  • works !!
  • MDS
  • Initial problems accessing data (long response
    time and time out)
  • GRAM, GASS, Gloperf
  • Work in progress

6
Condor on WAN Objectives
  • Large INFN project of the Computing Commission
    involving 20 sites
  • INFN collaboration with Condor Team UWISC
  • I goal Condor tuning on WAN
  • verify Condor reliability and robustness in Wide
    Area Network environment
  • Verify suitability to INFN computing needs
  • Network I/O impact and measures

7
  • II goal Network as a Condor Resource
  • Dynamic checkpointing and Checkpoint domain
    configuration
  • Pool partitioned in checkpoint domains (a
    dedicated ckpt server for each domain)
  • Definition of a checkpoint domain according
  • Presence of a sufficiently large CPU capacity
  • Presence of a set of machines with an efficient
    network connectivity
  • Sub-pools

8
Checkpointing next step
  • Distributed dynamic checkpointing
  • Pool machines select the best checkpoint server
    (from a network view)
  • Association between execution machine and
    checkpoint server dynamically decided

9
Implementation
  • Characteristics of the INFN Condor pool
  • Single pool
  • To optimize CPU usage of all INFN hosts
  • Sub-pools
  • To define policies/priorities on resource usage
  • Checkpoint domains
  • To guarantee the performance and the efficiency
    of the system
  • To reduce network traffic for checkpointing
    activity

10
USA
INFN Condor Pool on WAN checkpoint domains
EsNet
155Mbps
15
TRENTO
4
10
40
UDINE
GARR-B Topology 155 Mbps ATM based Network
access points (PoP) main transport nodes

MILANO
TORINO
PADOVA
LNL
TRIESTE
FERRARA
15
PAVIA
10
GENOVA
Central Manager
65
PARMA
CNAF
3
BOLOGNA
PISA
1
FIRENZE
S.Piero
6
PERUGIA
LNGS
10
3
ROMA
LAQUILA
5
ROMA2
LNF
SASSARI
3
NAPOLI
15
BARI
2
LECCE
SALERNO
2
T3
CAGLIARI
COSENZA
180 machines ? 500-1000 machines 6 ckpt servers
? 25 ckpt servers
5
PALERMO
CATANIA
LNS
USA
11
Management
  • Central management (condor-admin_at_infn.it)
  • Local management (condor_at_infn.it)
  • Steering committee
  • software maintenance contract with Condor_support
    team of University of Madison

12
Central management
  • The Admin Group has to provide
  • Configuration, tuning and overall maintenance of
    the INFN Condor Wan pool
  • management tools
  • activity reports
  • Condor resource usage statistics (CPU, Network,
    Ckpt-server)
  • Which Condor release has to be installed
  • Help desk for users and local administrators.
  • Interface to condor support in Madison.

13
Local management
  • Local management has to provide
  • release installation in collaboration with the
    central management
  • local condor usage policies (e.g. sub-pools)

14
Steering Committee
  • The Steering committee has to
  • consider the status of the condor system and
    suggest when upgrade the software
  • interact with the Condor Team and suggest
    possible modifications of the system
  • define the general policy of the condor pool
  • organize meeting for condor administrators and
    users

15
INFN-GRID project requirements
  • Networked Workload Management
  • Optimal co-allocation of data and CPU and network
    for a specific grid/network-aware job
  • distributed scheduling (data and/or code
    migration)
  • unscheduled/ scheduled job submission
  • Management of heterogeneous computing systems
  • Uniform interface to various local resource
    managers and schedulers
  • Priorities, policies on resource (CPU, Data,
    Network) usage
  • bookkeeping and web user interface

16
Project req. (cont.)
  • Networked Data Management
  • Universal name space transparent, location
    independent
  • Data replication and caching
  • Data mover (scheduled/interactive at OBJ/file/DB
    granularity)
  • Loose synchronization between replicas
  • Application Metadata, interfaced with DBMS, i.e.
    Objectivity,
  • Network services definition for a given
    application
  • End systems network protocol tuning

17
Project req. (cont.)
  • Application Monitoring/Management
  • Performance, instrumented systems with timing
    information and analysis tools
  • Run-time analysis of collected application events
  • Bottleneck analysis
  • Dynamic monitoring of GRID resources to optimize
    resource allocation
  • Failure management

18
Project req. (cont.)
  • Computing Fabric and general utilities for a
    global managed Grid
  • Configuration management of computing facilities
  • Automatic software installation and maintenance
  • System, service, network monitoring and global
    alarm notification, automatic recovery from
    failures
  • resource use accounting
  • Security of GRID resources and infrastructure
    usage
  • Information service

19
Grid Tools
Write a Comment
User Comments (0)
About PowerShow.com