Title: Enabling Grid Computer for HEP
1Enabling Grid Computer for HEP
- Babar Team at
- University of Manchester
- Resources www.hep.man.ac.uk/u/jamwer
2Human resource strategy
Jobs with 5 events instead Millions.
3Resources Strategy
4 Grid Test Bed
5(No Transcript)
6Software 850 packages. Tau Datasets range
between 60 files 1GB and 150 files 1GB Total
4,000 GB 10,000 files
7Analysis Submission to Grid
(Prototype)
- Single command ./easygrid dataset_name
- Perform Handlers management and submission
- Software based in State-machine
- Verify skimdata available
- If not available perform BbkDatasetTCL to
generate skimData. Each file will be a job. - Verify if there are handlers pending
- If not, script generation (gera.c) with
edg-job-submit and ClassAdds, and script
execution. Nest for submission policy and
optimisation. - If yes, verify job status. When the all jobs
ended, recover results in user folder.
8Generation and submission
- jamwer_at_bfb babar ./easygrid SP-1005-Tau11-R14
- Invalid configuration filename
/opt/edg/etc/vomses - Your identity /CUK/OeScience/OUManchester/LHE
P/CNjames werner - Enter GRID pass phrase for this identity
- Creating temporary proxy .........................
................................ Done - Creating proxy ...................................
................. Done - Searching pre selected skimdata.
- Searching previous handlers.
- Handlers not found. Submiting to GRID . Wait end
of process...
9Job Status
- jamwer_at_bfb babar ./easygrid SP-1005-Tau11-R14
- Invalid configuration filename
/opt/edg/etc/vomses - Your identity /CUK/OeScience/OUManchester/LHE
P/CNjames werner - Enter GRID pass phrase for this identity
- Creating temporary proxy .........................
... Done - Creating proxy ...............................
Done - Searching pre selected skimdata.
- Searching previous handlers. Checking if jobs
finished. - Handle -gt https//lcgrb01.gridpp.rl.ac.uk9000
/foRHhWyeDBnbqA9JkDADLg - Current Status Scheduled
- https//lcgrb01.gridpp.rl.ac.uk9000/foRHhWy
eDBnbqA9JkDADLg still pendent. - Handle -gt https//lxn1188.cern.ch9000/8DdK3xr
uxtevNpei3zZbaA - Current Status Scheduled
- https//lxn1188.cern.ch9000/8DdK3xruxtevNpe
i3zZbaA still pendent. - 4 jobs did not finished ! Try again later.
10Job Status and recovery
- jamwer_at_bfb babar ./easygrid SP-1005-Tau11-R14
- Invalid configuration filename
/opt/edg/etc/vomses - Your identity /CUK/OeScience/OUManchester/LHE
P/CNjames werner - Enter GRID pass phrase for this identity
- Creating temporary proxy .........................
................. Done - Creating proxy ...................................
........................ Done - Searching pre selected skimdata. Searching
previous handlers. - Checking if jobs finished.
- Handle -gt https//lcgrb01.gridpp.rl.ac.uk9000
/foRHhWyeDBnbqA9JkDADLg - Current Status Done
- Exit code 0
- Handle -gt https//lxn1188.cern.ch9000/8DdK3xr
uxtevNpei3zZbaA - Current Status Done
- Exit code 0
- 0 jobs did not finished ! Try again later.
- All jobs done. Recovering results in your folder.
Results in the following folders
/home/jamwer/grid_sub/babar/jamwer_foRHhWyeDBnbqA9
JkDADLg /home/jamwer/grid_sub/babar/jamwer_8DdK3xr
uxtevNpei3zZbaA
11Monte Carlo Submission to Grid
(Prototype)
- Single Command ./mcgrid JobName num_copies
- Perform Handlers management and submission.
- Software based in State-Machine
- Verify if there are handlers pending
- If not, script generation (geramc.c) with
edg-job-submit and ClassAdds for each copy, and
script execution. Nest for submission policy and
optimisation. - If yes, verify job status. When the all jobs
ended, recover results in user folder.
12MC Submission
- jamwer_at_bfb mcgrid1 ./mcgrid MCteste 3
- Invalid configuration filename
/opt/edg/etc/vomses - Your identity /CUK/OeScience/OUManchester/LHE
P/CNjames werner - Enter GRID pass phrase for this identity
- Creating temporary proxy .........................
........ Done - Creating proxy ...................................
.................... Done - Searching previous handlers. Handlers not found.
- Submiting to GRID . Wait end of process...
13Job Status
- jamwer_at_bfb mcgrid1 ./mcgrid MCteste 3
- Invalid configuration filename
/opt/edg/etc/vomses - Your identity /CUK/OeScience/OUManchester/LHE
P/CNjames werner - Enter GRID pass phrase for this identity
- Creating temporary proxy .........................
............... Done - Creating proxy ...................................
.... Done - Searching previous handlers. Checking if jobs
finished. - Handle -gt https//lxn1188.cern.ch9000/9WzceoI
MEQoTK24a-UvOmw - Current Status Scheduled
- https//lxn1188.cern.ch9000/9WzceoIMEQoTK24
a-UvOmw still pendent. - Handle -gt https//lcgrb01.gridpp.rl.ac.uk9000
/c4iCB8vioozaGteI9hybIg - Current Status Ready
- https//lcgrb01.gridpp.rl.ac.uk9000/c4iCB8v
ioozaGteI9hybIg still pendent. - Handle -gt https//lcgrb01.gridpp.rl.ac.uk9000
/L5BD1OE--eckTm5RXkp2nA - Current Status Ready
- https//lcgrb01.gridpp.rl.ac.uk9000/L5BD1OE
--eckTm5RXkp2nA still pendent. - 3 jobs did not finished ! Try again later.
14Job status and recovery
- jamwer_at_bfb mcgrid1 ./mcgrid MCteste 3
- Invalid configuration filename
/opt/edg/etc/vomses - Your identity /CUK/OeScience/OUManchester/LHE
P/CNjames werner - Enter GRID pass phrase for this identity
- Creating temporary proxy .........................
......................... Done - Creating proxy ...................................
................. Done - Searching previous handlers. Checking if jobs
finished. - Handle -gt https//lxn1188.cern.ch9000/9WzceoI
MEQoTK24a-UvOmw - Current Status Done
- Exit code 0
- Handle -gt https//lcgrb01.gridpp.rl.ac.uk9000
/c4iCB8vioozaGteI9hybIg - Current Status Done
- Exit code 0
- 0 jobs did not finished ! Try again later.
- All jobs done. Recovering results in your folder.
Results in the following folders
/home/jamwer/grid_sub/mcgrid1/jamwer_9WzceoIMEQoTK
24a-UvOmw /home/jamwer/grid_sub/mcgrid1/jamwer_c4i
CB8vioozaGteI9hybIg /home/jamwer/grid_sub/mcgrid1/
jamwer_L5BD1OE--eckTm5RXkp2nA
15Testing Submission Script
- Load Range Worker load x Files
- 16 x 60 files 960 jobs pendent
- 16 x 150 files 2400 jobs pendent
- Test with Submission script
sslv3 alert handshake failure Please wait
job enter the Done status. This never
happens! Resource Broker not reliable or robust.
Sometimes failure 3 days a week or takes hours to
submit/dispatch to CE (empty!).
16Pending Infrastructure gt Course of action
- Babar Software Know How is not available at
Manchester gt Web Page Network skills. - Quality Assurance gt We are OK! from benchmark (E
x P) - Real Application to perform complete cycle,
acquire know how, and grid prof-of-concept is
missing gt Partnership with physicists - CERN does NOT recognise Babar Community gt Lets
reduce their priority! - RB at Manchester gt 60MB binaries and policies
freedom. - SE/RC at Manchester gt policies and submission
jobs freedom. - Mass storage (10TB) for Babar purposes gt CAP!
- UI in the AFS gt wide access to Manchester farms.
- Apprenticeship at RAL and later at SLAC
production and experiment gt improve where others
fail - Configuration for optimal job performance/submissi
on at Tear 2 (1 Ce x 50 WN? Performance dCache
with Babar Software? Why 10TB if Liverpool bought
80TB? Electricity bill? gt analyse procedures to
improve QoS and better Site Configuration - Update (software and data) and operational
policies gt operational standards to achieve high
QoS
17Aimed Hardware Architecture
(Redundant RB with alternate access)
18Aimed Software Architecture
19Production Job Submission Package
- Operational policies/integration with RB
(application level). - Recovery of aborted status.
- Resources optimisation.
- Integration with RC (application level) for
replicas policies development. - Interactive data visualisation (Useful?)
- Integration with GridSite (Data visualisation,
analysis, performance monitor, and submission) - Professional version.
20 Integrate LCG2 and Job Submission with
Babar/CM2 at University of Manchester for Tau
Physics modelling, analysis and MC generation.
Summary
- We aim to be soon
- The largest site in UK.
- Leader in grid computing and HEP
21Conclusion
- Babar CM2 is running at Manchester!
- LCG2 Grid is running with real world experiment!
- Babar submission prototype to Grid is running !
- LCG is not LHC software only! It is Babars.
- We are doing today what will take years to you to
achieve. Lets work together!