Title: LHCb Distributed Computing and the Grid V. Vagnoni (INFN Bologna)
1LHCb Distributed Computingand the GridV.
Vagnoni (INFN Bologna)
- D. Galli, U. Marconi, V. Vagnoni INFN Bologna
- N. Brook Bristol
- K. Harrison Cambridge
- E. Van Herwijnen, J. Closier, P. Mato CERN
- A. Khan Edinburgh
- A. Tsaregorodtsev Marseille
- H. Bulten, S. Klous Nikhef
- F. Harris, I. McArthur, A. Soroko Oxford
- G. N. Patrick, G. Kuznetsov RAL
2Overview of presentation
- Current organisation of LHCb distributed
computing - The Bologna Beowulf cluster and its performance
in distributed environment - Current use of Globus and EDG middleware
- Planning for data challenge and the use of Grid
- Current LHCb Grid/applications R/D
- Conclusions
3History of distributed MC production
- Distributed System has been running for 3 years
processed many millions of events for LHCb
design. - Main production sites
- CERN, Bologna, Liverpool, Lyon, NIKHEF RAL
- Globus already used for job submission to RAL and
Lyon - System interfaced to GRID and demonstrated at
EU-DG Review and NeSC/UK Opening. - For 2002 Data Challenges, adding new institutes
- Bristol, Cambridge, Oxford, ScotGrid
- In 2003, add
- Barcelona, Moscow, Germany, Switzerland Poland.
4LOGICAL FLOW
Submit jobs remotely via Web
Analysis
Execute on farm
Data quality check
Update bookkeeping database
Transfer data to mass store
5Monitoring and Control of MC jobs
- LHCb has adopted PVSS II as prototype control and
monitoring system for MC production. - PVSS is a commercial SCADA (Supervisory Control
And Data Acquisition) product developed by ETM. - Adopted as Control framework for LHC Joint
Controls Project (JCOP). - Available for Linux and Windows platforms.
6(No Transcript)
7Example of LHCb computing facilitythe Bologna
Beowulf cluster
- Set up at INFN-CNAF
- 100 CPUs hosted in Dual Processor machines
(ranging from 866 MHz to 1.2 GHz PIII), 512 MB
RAM - 2 Network Attached Storage systems
- 1 TB in RAID5, with 14 IDE disks hot spare
- 1 TB in RAID5, with 7 SCSI disks hot spare
- Linux disk-less processing nodes with OS
centralized on a file server (root file-system
mounted over NFS) - Usage of private network IP addresses and
Ethernet VLAN - High level of network isolation
- Access to external services (afs, mccontrol,
bookkeeping db, java servlets of various kinds,
) provided by means of NAT mechanism on a GW
node
8Farm Configuration
9Fast ethernet switch
Rack (1U dual-processor MB)
NAS 1TB
Ethernet controlled power distributor
10Farm performance
- Farm capable to simulate and reconstruct about
(700 LHCb-events/day)(100 CPUs)70000
LHCb-events/day - Data transfer over the WAN to the CASTOR tape
library at CERN realised by using bbftp - very good throughput (up to 70 Mbits/s over
currently available 100 Mbits/s)
11Current Use of Grid Middleware in development
system
- Authentication
- grid-proxy-init
- Job submission to DataGrid
- dg-job-submit
- Monitoring and control
- dg-job-status
- dg-job-cancel
- dg-job-get-output
- Data publication and replication
- globus-url-copy, GDMP
12Example 1Job Submission
- dg-job-submit /home/evh/sicb/sicb/bbincl1600061.jd
l -o /home/evh/logsub/ - bbincl1600061.jdl
-
- Executable "script_prod"
- Arguments "1600061,v235r4dst,v233r2"
- StdOutput "file1600061.output"
- StdError "file1600061.err"
- InputSandbox "/home/evhtbed/scripts/x509up_u149
","/home/evhtbed/sicb/mcsend","/home/evhtbed/sicb/
fsize","/home/evhtbed/sicb/cdispose.class","/home/
evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb
/bbincl1600061.sh","/home/evhtbed/script_prod","/h
ome/evhtbed/sicb/sicb1600061.dat","/home/evhtbed/s
icb/sicb1600062.dat","/home/evhtbed/sicb/sicb16000
63.dat","/home/evhtbed/v233r2.tar.gz" - OutputSandbox "job1600061.txt","D1600063","file
1600061.output","file1600061.err","job1600062.txt"
,"job1600063.txt"
13Example 2 Data Publishing Replication
Compute Element
Storage Element
MSS
Local disk
Job
Data
globus-url-copy
Data
register-local-file
publish
CERN TESTBED
Replica Catalogue NIKHEF - Amsterdam
REST-OF-GRID
replica-get
Job
Data
Storage Element
14 LHCb Data Challenge 1 (July-September 2002)
- Physics Data Challenge (PDC) for detector,
physics and trigger evaluations - based on existing MC production system small
amount of Grid tech to start with - Generate 3107 events (signal specific
background generic b and c min bias) - Computing Data Challenge (CDC) for checking
developing software - will make more extensive use of Grid middleware
- Components will be incorporated into PDC once
proven in CDC
15GANGA Gaudi ANd Grid AllianceJoint Atlas (C.
Tull) and LHCb (P. Mato) project,formally
supported by GridPP/UK with 2 joint Atlas/LHCb
research posts at Cambridge and Oxford
- Application facilitating end-user physicists and
production managers the use of Grid services for
running Gaudi/Athena jobs.
- a GUI based application that should help for the
complete job life-time - - job preparation and
- configuration
- - resource booking
- - job submission
- - job monitoring and control
GANGA
GUI
Collective Resource Grid Services
Histograms Monitoring Results
JobOptions Algorithms
GAUDI Program
16Required functionality
- Before Gaudi/Athena program starts
- Security (obtaining certificates and credentials)
- Job configuration (algorithm configuration, input
data selection, ...) - Resource booking and policy checking (CPU,
storage, network) - Installation of required software components
- Job preparation and submission
- While Gaudi/Athena program is running
- Job monitoring (generic and specific)
- Job control (suspend, abort, ...)
- After program has finished
- Data management (registration)
17Conclusions
- LHCb already has distributed MC production using
GRID facilities for job submission - We are embarking on large scale data challenges
commencing July 2002, and we are developing our
analysis model - Grid middleware will be being progressively
integrated into our production environment as it
matures (starting with EDG, and looking forward
to GLUE) - R/D projects are in place
- for interfacing users (production analysis) and
Gaudi/Athena software framework to Grid services - for putting production system into integrated
Grid environment with monitoring and control - All work being conducted in close participation
with EDG and LCG projects - Ongoing evaluations of EDG middleware with
physics jobs - Participate in LCG working groups e.g. Report on
Common use cases for a HEP Common Application
layer http//cern.ch/fca/HEPCAL.doc