Grid Monitoring For ZEUS - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Grid Monitoring For ZEUS

Description:

'A computational grid is a hardware and software infrastructure that provides ... ZEPHYR - reconstruction program. 7. Statistic of completed jobs ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 15
Provided by: person81
Category:
Tags: zeus | grid | monitoring | zephyr

less

Transcript and Presenter's Notes

Title: Grid Monitoring For ZEUS


1
Grid MonitoringFor ZEUS
Alexander Chernyack
DESY Hamburg 14-Sep-07 Summer students seminar
1
2
Outline
  • Introduction to Grid
  • The Enabling Grids for E-sciencE
  • ZEUS on the GRID
  • Site Availability Monitoring (SAM)
  • SAM for ZEUS
  • Summary

2
3
What is Grid?
  • A computational grid is a hardware and software
    infrastructure that provides dependable,
    consistent, pervasive, and inexpensive access to
    high-end computational capabilities.
  • Carl Kesselman and Ian Foster (1998 )
  • Grid is a system that
  • coordinates computing resources that are not
    subject to centralized control
  • uses standard, open, general-purpose protocols
    and interfaces
  • provides basic computational services to the end
    user

3
4
The Enabling Grids for E-sciencE
  • 36,000 CPU
  • 5 PB disk (5 million Gigabytes)
  • 30,000 concurrent jobs
  • About 20 projects
  • ZEUS, ALICE, ATLAS
  • GATE, GPS_at_, CDSS
  • Mammogrid, etc.

Project brings together scientists and engineers
from more than 240 institutions in 45 countries
5
ZEUS on the Grid
6
ZEUS on the Grid
  • Monte Carlo Production
  • MOZART - detector simulation
  • CZAR, ZGANA, TLT-ZGANA - trigger simulation
  • ZEPHYR - reconstruction program

7
Statistic of completed jobs
Some inefficiencies caused by problems at grid
sites
Solution
Active monitoring of grid services
8
Grid components
CE
Minimal site configuration
  • UI User Interface
  • user entry point to GRID
  • RB Resource Broker
  • CE computing element
  • batch farm server
  • WN Worker Node
  • runs the user program
  • SE Storage Element
  • RLS Replica Location Service
  • Replica Metadata Catalog, Local Replica Catalog,
    Replica Manager
  • IS information service

Core at DESY
RLS
RB
IS
N x UI
N x UI
N x UI
8
9
Service Availability Monitor
  • SAM aims to provide a site independent,
    centralized and uniform monitoring tool for all
    grid services
  • Run different sensors on sites (e.g. SE, CE,
    testjob)
  • Publish results to DB and web page

10
SAM for ZEUS
  • I have implemented for the first time SAM for the
    ZEUS experiment
  • We can now run standard SAM tests (sensors)
  • and have developed ZEUS specific tests
  • These tests have been registered in a centrally
    maintained DB (at CERN)
  • They are executed regularly on all Grid sites
    supporting ZEUS VO
  • The results can be accessed via web

11
ZEUS Tests
  • Aim test all aspects of grid services required
    by MC jobs
  • Assign error status to unsuitable sites

What we test
  • Storage Element (SE)
  • Computing Element (CE)
  • Worker Nodes (WN)
  • File Catalog
  • Access to frequently used files

12
ZEUS Tests details
  • Storage Element (SE)
  • registering new files, transfer between sites,
    replication and removal
  • Computing Element (CE)
  • job submission and scheduling
  • Worker Nodes (WN)
  • Local disk access, proper environment
  • File Catalog
  • Accessing files via logical name
  • Access to frequently used files
  • Geometry and calibration constants (GAFs)
  • Scripts, configuration files
  • executables

13
Test CE-sft-zeusmc-files
  • Check availability of ZEUS files needed for
    running MC jobs.
  • Copy these files to WN using ZEUS toolkit.

14
Summary
  • Done
  • SAM has been deployed .
  • A new sensors has been written to test ZEUS
    specific functionality needed for MC production.
  • Next step
  • Finish a simple MC test job with real physics.
  • Automatic exclusion of faulty sites

14
Write a Comment
User Comments (0)
About PowerShow.com