LHCb Distributed Computing and the Grid V. Vagnoni (INFN Bologna) PowerPoint PPT Presentation

presentation player overlay
1 / 17
About This Presentation
Transcript and Presenter's Notes

Title: LHCb Distributed Computing and the Grid V. Vagnoni (INFN Bologna)


1
LHCb Distributed Computingand the GridV.
Vagnoni (INFN Bologna)
  • D. Galli, U. Marconi, V. Vagnoni INFN Bologna
  • N. Brook Bristol
  • K. Harrison Cambridge
  • E. Van Herwijnen, J. Closier, P. Mato CERN
  • A. Khan Edinburgh
  • A. Tsaregorodtsev Marseille
  • H. Bulten, S. Klous Nikhef
  • F. Harris, I. McArthur, A. Soroko Oxford
  • G. N. Patrick, G. Kuznetsov RAL

2
Overview of presentation
  • Current organisation of LHCb distributed
    computing
  • The Bologna Beowulf cluster and its performance
    in distributed environment
  • Current use of Globus and EDG middleware
  • Planning for data challenge and the use of Grid
  • Current LHCb Grid/applications R/D
  • Conclusions

3
History of distributed MC production
  • Distributed System has been running for 3 years
    processed many millions of events for LHCb
    design.
  • Main production sites
  • CERN, Bologna, Liverpool, Lyon, NIKHEF RAL
  • Globus already used for job submission to RAL and
    Lyon
  • System interfaced to GRID and demonstrated at
    EU-DG Review and NeSC/UK Opening.
  • For 2002 Data Challenges, adding new institutes
  • Bristol, Cambridge, Oxford, ScotGrid
  • In 2003, add
  • Barcelona, Moscow, Germany, Switzerland Poland.

4
LOGICAL FLOW
Submit jobs remotely via Web
Analysis
Execute on farm
Data quality check
Update bookkeeping database
Transfer data to mass store
5
Monitoring and Control of MC jobs
  • LHCb has adopted PVSS II as prototype control and
    monitoring system for MC production.
  • PVSS is a commercial SCADA (Supervisory Control
    And Data Acquisition) product developed by ETM.
  • Adopted as Control framework for LHC Joint
    Controls Project (JCOP).
  • Available for Linux and Windows platforms.

6
(No Transcript)
7
Example of LHCb computing facilitythe Bologna
Beowulf cluster
  • Set up at INFN-CNAF
  • 100 CPUs hosted in Dual Processor machines
    (ranging from 866 MHz to 1.2 GHz PIII), 512 MB
    RAM
  • 2 Network Attached Storage systems
  • 1 TB in RAID5, with 14 IDE disks hot spare
  • 1 TB in RAID5, with 7 SCSI disks hot spare
  • Linux disk-less processing nodes with OS
    centralized on a file server (root file-system
    mounted over NFS)
  • Usage of private network IP addresses and
    Ethernet VLAN
  • High level of network isolation
  • Access to external services (afs, mccontrol,
    bookkeeping db, java servlets of various kinds,
    ) provided by means of NAT mechanism on a GW
    node

8
Farm Configuration
9
Fast ethernet switch
Rack (1U dual-processor MB)
NAS 1TB
Ethernet controlled power distributor
10
Farm performance
  • Farm capable to simulate and reconstruct about
    (700 LHCb-events/day)(100 CPUs)70000
    LHCb-events/day
  • Data transfer over the WAN to the CASTOR tape
    library at CERN realised by using bbftp
  • very good throughput (up to 70 Mbits/s over
    currently available 100 Mbits/s)

11
Current Use of Grid Middleware in development
system
  • Authentication
  • grid-proxy-init
  • Job submission to DataGrid
  • dg-job-submit
  • Monitoring and control
  • dg-job-status
  • dg-job-cancel
  • dg-job-get-output
  • Data publication and replication
  • globus-url-copy, GDMP

12
Example 1Job Submission
  • dg-job-submit /home/evh/sicb/sicb/bbincl1600061.jd
    l -o /home/evh/logsub/
  • bbincl1600061.jdl
  • Executable "script_prod"
  • Arguments "1600061,v235r4dst,v233r2"
  • StdOutput "file1600061.output"
  • StdError "file1600061.err"
  • InputSandbox "/home/evhtbed/scripts/x509up_u149
    ","/home/evhtbed/sicb/mcsend","/home/evhtbed/sicb/
    fsize","/home/evhtbed/sicb/cdispose.class","/home/
    evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb
    /bbincl1600061.sh","/home/evhtbed/script_prod","/h
    ome/evhtbed/sicb/sicb1600061.dat","/home/evhtbed/s
    icb/sicb1600062.dat","/home/evhtbed/sicb/sicb16000
    63.dat","/home/evhtbed/v233r2.tar.gz"
  • OutputSandbox "job1600061.txt","D1600063","file
    1600061.output","file1600061.err","job1600062.txt"
    ,"job1600063.txt"

13
Example 2 Data Publishing Replication
Compute Element
Storage Element
MSS
Local disk
Job
Data
globus-url-copy
Data
register-local-file
publish
CERN TESTBED
Replica Catalogue NIKHEF - Amsterdam
REST-OF-GRID
replica-get
Job
Data
Storage Element
14
LHCb Data Challenge 1 (July-September 2002)
  • Physics Data Challenge (PDC) for detector,
    physics and trigger evaluations
  • based on existing MC production system small
    amount of Grid tech to start with
  • Generate 3107 events (signal specific
    background generic b and c min bias)
  • Computing Data Challenge (CDC) for checking
    developing software
  • will make more extensive use of Grid middleware
  • Components will be incorporated into PDC once
    proven in CDC

15
GANGA Gaudi ANd Grid AllianceJoint Atlas (C.
Tull) and LHCb (P. Mato) project,formally
supported by GridPP/UK with 2 joint Atlas/LHCb
research posts at Cambridge and Oxford
  • Application facilitating end-user physicists and
    production managers the use of Grid services for
    running Gaudi/Athena jobs.
  • a GUI based application that should help for the
    complete job life-time
  • - job preparation and
  • configuration
  • - resource booking
  • - job submission
  • - job monitoring and control

GANGA
GUI
Collective Resource Grid Services
Histograms Monitoring Results
JobOptions Algorithms
GAUDI Program
16
Required functionality
  • Before Gaudi/Athena program starts
  • Security (obtaining certificates and credentials)
  • Job configuration (algorithm configuration, input
    data selection, ...)
  • Resource booking and policy checking (CPU,
    storage, network)
  • Installation of required software components
  • Job preparation and submission
  • While Gaudi/Athena program is running
  • Job monitoring (generic and specific)
  • Job control (suspend, abort, ...)
  • After program has finished
  • Data management (registration)

17
Conclusions
  • LHCb already has distributed MC production using
    GRID facilities for job submission
  • We are embarking on large scale data challenges
    commencing July 2002, and we are developing our
    analysis model
  • Grid middleware will be being progressively
    integrated into our production environment as it
    matures (starting with EDG, and looking forward
    to GLUE)
  • R/D projects are in place
  • for interfacing users (production analysis) and
    Gaudi/Athena software framework to Grid services
  • for putting production system into integrated
    Grid environment with monitoring and control
  • All work being conducted in close participation
    with EDG and LCG projects
  • Ongoing evaluations of EDG middleware with
    physics jobs
  • Participate in LCG working groups e.g. Report on
    Common use cases for a HEP Common Application
    layer http//cern.ch/fca/HEPCAL.doc
Write a Comment
User Comments (0)
About PowerShow.com