A CMS computing project - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

A CMS computing project

Description:

A CMS computing project BOSS (Batch Object Submission System ) Zhang YongJun ... currently only MonaLisa and direct MySQL connections (to be deprecated) ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 28

Provided by: nianqi

Category:

more less

Transcript and Presenter's Notes

Title: A CMS computing project

1
A CMS computing project BOSS (Batch Object
Submission System )

Zhang YongJun
(Imperial College London)

Background GRID and LHC
CMS computing project BOSS

2
LHC (Large Hadron Collider)

LHC is a particle accelerator located at CERN,
which situated Geneva on the border between
Switzerland and France. It is scheduler to start
operation in 2007.
LHC will collide protons with colliding energy 14
TeV and will also collide heavy ions like lead
(Pb).

3
Detector and trigger

75 million electronics channels from various
subdetector
Data from detector is electrical signals.
By applying calibration, the physical quantity
(momentum, energy ) can be know from the
strength of the electrical signal
Trigger system selects interesting event
Reconstruction procedure builds physics object
with property from raw event
Data analysis apply a set of cut to select
specific set of event corresponding a specific
physical channel

4
Software
full simulation
fast simulation
physics

Simulation is essential for the detector/software
design as well as for data analysis
Fast simulation comparing to full simulation is
fast but depends on the parameters extracted from
full simulation

generator
LHC
generator
simulation
detector
digitization
trigger
Fast simulation
reconstruction
Data analysis (ROOT)
5
LHC computing model

225MB/s for CMS from online to offline. Lot of
data will come and it is out the ability for one
site to process all data. So a tier data
distribution structure is proposed. CERN is
Tier0, and every country has one Tier1 and
several Tier2.
Tier1 reconstructs event and host data. Tier 2
runs physicists analysis job.
This Tier structure is built upon Grid software.

6
Computing before Grid
CERN
Imperial College
(yjzhang)
(yzhang)
RAL
(????????)

Need an account to submit job to every site.
To submit job to a newly joined site, a new
account needs to be created.
Although these sites actually take part in the
same project like CMS, it is difficult to share
CPU and data

7
Computing on Grid
CERN
Imperial College

Instead of using account, user holds a
certificate to submit job
Those sited accept this certificate form a
Virtual Organization (VO). All those sites
joined CMS experiment can join CMS VO.
Certificate is issued by some kind of authority
by using RSA algorithm.
On VO, more services can be added to help user to
submit job, for example, scheduling and
mornitoring.

RAL
certificate
Bag Attributes friendlyName yongjun zhang's
eScience ID localKeyID 65 AB 3E 55 38 77 49
B3 3A 93 26 B5 08 68 D1 8C A9 CD 6A D8
subject/CUK/OeScience/OUImperial/LPhysics/CN
yongjun zhang issuer/CUK/OeScience/OUAuthorit
y/CNCA/emailAddressca-operator_at_grid-support.ac.u
k -----BEGIN CERTIFICATE----- MIIFbzCCBFegAwIBAgIC
FHowDQYJKoZIhvcNAQEFBQAwcDELMAkGA1UEBhMCVUsx ETAPB
gNVBAoTCGVTY2llbmNlMRIwEAYDVQQLEwlBdXRob3JpdHkxCzA
JBgNVBAMT . -----END CERTIFICATE-----
8
Work flow management on Grid
CERN (VO CMS)
Imperial College (VO CMS)
Resource Broker (CMS)
RAL (VO CMS)
certificate

To make users job submission even more easier, a
job submission service - Resource Broker (RB) can
set up upon VO. RB can delegate user to submit
job a non-busy site.
To accept jobs submitted from all over VO, a
dedicate cluster can be set up as Computing
Element (CE).
Similarly, there are many VO based services like
monitoring and logging have been developed.

9
Work flow management on Grid
CERN (VO CMS)
Imperial College (VO CMS)
Resource Broker (CMS)
RAL (VO CMS)
certificate
LFN
PFN
Catalogue DataBase

On Grid, user specify file by its Logical File
Name (LFN). Grid service looks up database to
find out all its corresponding Physical File
Names (PFN), and selects one from them to do real
work. Between LFN and PFN is UUID to link these
two.
Dedicated site can be built to be a Storage
Element (SE) to host large amount of data, for
example gfe02.hep.ph.ic.ac.uk, which uses dCache
tool.

10
BOSS - Batch Object Submission System
CRAB
BOSS
BOSS Logging

Boss is a part of CMS workload management system
Boss provides logging, bookkeeping and
monitoring.
Boss sits between user(CRAB) and scheduler/Grid.
Boss is a generic submission tool, and will
provide Python / C APIs which will be used by
CRAB, then CRABBOSS are the complete submission
tools.

monitoring
11
Sample Task specification

lt?xml version"1.0" encoding"UTF-8"
standaloneyes"?gt
lttaskgt
ltiterator nameITR start0 end100
step1gt
ltchain scheduler"glite rtupdater"mysql"
ch_tool_name"jobExecutor"gt
ltprogram exec"test.pl"
argsITR"
stderr"err_ITR
program_type"test
stdin"in
stdout"out_ITR"
infiles"Examples/test.pl,Examples/i
n
outfiles"out_ITR,err_ITR
outtopdir"" /gt
lt/chaingt
lt/iteratorgt
lt/taskgt

Example of task containing 100 chains each
consisting of 1 program.
Program specific monitoring activated - results
returned via MySQL connection.

12
BOSS components overview
user CLI
admin. CLI
Python interface
user GUI
Pro-active UI ?
BOSS Logging
BOSS kernel APIs kernel objects BossTask,
database scheduler
Grid or Local Scheduler
monitoring
BOSS on WN jobExecutor tar ball Configuration
file and executables

Boss has 2 parts, (1) BOSS on UI and (2) BOSS on
WN.
Boss on UI has two sub layer further (a) user
interface, (b) Boss kernel.
Boss kernel further include APIs
(BossUserSession, BossAdministratorSession) and
kernel objects (BossConfiguration, BossTask,
BossDataBase and BossScheduler).
Boss on WN has level structure Task, Chain,
Program, userExecutable.

13
BOSS internal data flow
administrator
user/CRAB
WN
task.xml
schema.xml
API/user
API/adm
Job.tar ( job.xml, monitoring,,ORCA,input...)
JOB_ID 1
START_TIME
STATUS running

JOB_ID TYPE INPUT
1 ORCA FILE1
2 ORCA FILE2
wrapper /Shreek
Journal file
BOSS logging
scheduler/JDL
monitoring
14
BOSS internal work flow
15
BOSS WN reorganization proposal
BOSS UI reorganization proposal
job.tar
job components
plug-in
core services
blackboard
pro-active Plug-in
JobExecuter
File of job configuration
pro-active service
JobMonitor

programChaining
monitor interface

All variable things go to configuration file so
that leave rest components simple, even no
recompilation needed when new components added
Configuration file is created during job
preparation stage, it owns all information needed
JobExecuter only has to interpret the
configuration file
Core services can talk each other, so they
dependent each other
Plug-in only talks to services so that it achieve
independency to be plug-in
Tar ball job.tar is created during job
preparation stage, synchronized with
configuration file creation. A service or plug-in
is referenced by configuration files ( logically
or even physically there are more than one
configuration files ) should be added to the tar
ball as well

16
Structure of level 2, 3 and final
level 1
level 2
level final
level 3

Chaining configuration file owns all information
to chain programs together, it leaves
programChaining program clean and stable
Chaining configuration file is created during
chain preparation stage ( a step of the job
preparation stage )
programChaining interprets the chaining
configuration file and executes its commands
Job configuration file, chaining configuration
file and program configuration file have similar
( or same ) structure and functionality. They
even can share the same physical file, but
logically they should be different to achieve
flexibility

17
BOSS Status and plans
BOSS Status and plans

New functionality has been implemented or is
being written
Tasks, job and executables.
XML task description.
C and Python APIs
Basic executable chaining - currently only
default chainer with linear chaining.
Separate logging and monitoring DBs.
Implemented DBs in either MySQL or SQLite (more
to come).
Optional RT monitoring with multiple
implementations, currently only MonaLisa and
direct MySQL connections (to be deprecated).
To be done in the near future
Allow chainer plugins.
Implement more RT monitoring solutions i.e R-GMA.
Look at writing wrapper in scripting language i.e
Perl/Python.
Optimize architecture and separate data from
functionality.

18
GRID organizations
Resource management Grid Resource Allocation
Management Protocol (GRAM) Information Services
Monitoring and Discovery Service (MDS) Security
Services Grid Security Infrastructure (GSI)
Data Movement and Management Global Access to
Secondary Storage (GASS) and GridFTP
ALICE ATLAS CMS LHCb
Projects PI - POOL/CondDB - SEAL
- ROOT - Simulation - SPI - 3D (GDA)

To build a consistent, robust and secure Grid
network that will attract additional computing
resources.
To continuously improve and maintain the
middleware in order to deliver a reliable service
to users.
To attract new users from industry as well as
science and ensure they receive the high standard
of training and support they need.

There are many national scale Grid related
collaboration, for example, GridPP, is a UK
national collaboration funded by the UK
government through PPARC as part of its e-Science
Programme. It collaborates with CERN and EGEE.
19

Backup slides
20
Boss key components
administrator
user/CRAB
WN
task.xml
schema.xml
API/user
API/adm
Bosstask
BossScheduler
BossDB
Job.tar ( job.xml, monitoring,,ORCA,input...)
JOB_ID 1
START_TIME
STATUS running

JOB_ID TYPE INPUT
1 ORCA FILE1
2 ORCA FILE2
wrapper /Shreek
Journal file
scheduler/JDL
monitoring
BOSS logging
21
Boss level structure on WN
Blackboard
Pro-active
Interface?
JobExecuter (wrapper)

pre-filter
user executable
runtime-filter
JobMonitor
programExecutor1
post-fileter
JobChaining
programExecutor2

level 0
level 1
level 2
level final
level 3

At least level 0, level 1 and level final have to
be there
level 2 and level 3 can be omitted, this can
easily achieved by rewriting configuration file
New level can be easily inserted between level 1
and level final by rewriting configuration file
Every level can has its configuration file or not
JobExecutor controls all proccess on worker node
Pro-active process not planned for first release.
JobChaining simple linear program execution in
first release allow possibility of plugins (ie
Shreek) in the future.
Simple monitoring via output stream filters
planned for first release more extensive
options available later.

22
BOSS history
W. Bacchi, G. Codispoti, C. Grandi, INFN
Bologna D. Colling, B. MacEvoy, S. Wakefield, Y.
Zhang. Imperial College London
Old BOSS
Italian group Claudio, 2001-
Imperial group Hugh,Stuart, Dave,
Barry,Yong 2003-2005
GROSS
logging bookkeeping
CMS specific functionality group of jobs
scheduler
Bologna Imperial joint meeting Stuard, Dave,
Barry,Yong, Claudio,and all Bologna
group 17/12/2004, Bologna
monitoring
New BOSS
Joint meeting Stuart,Dave,Yong,Henry, Claudio. 02-
03/02/2005, Imperial
taskjobprogram
CMS WM workshop 14-15/07/2005, Padova
adopted XML structure
defined framework and priority
BOSS Group meeting 12-14/10/2005 Bologna
23
Schema configuration file proposal
- ltTABLE NAME"TASK"gt ltELEMENT NAME"TASK_ID"
TYPE"INTEGER PRIMARY KEY" DAUGHTER"CHAIN" /gt
ltELEMENT NAME"ITERATORS" TYPE"TEXT NOT NULL
DEFAULT """ /gt ltELEMENT NAME"TASK_INFILES"
TYPE"TEXT NOT NULL DEFAULT """ /gt ltELEMENT
NAME"DECL_USER" TYPE"TEXT NOT NULL DEFAULT """
/gt ltELEMENT NAME"DECL_PATH" TYPE"TEXT NOT
NULL DEFAULT """ /gt ltELEMENT NAME"DECL_TIME"
TYPE"INTEGER NOT NULL DEFAULT 0" /gt
lt/TABLEgt - ltTABLE NAME"CHAIN"gt ltELEMENT
NAME"CHAIN_ID" TYPE"INTEGER PRIMARY KEY"
DAUGHTER"PROGRAM" MOTHER"TASK" /gt ltELEMENT
NAME"TASK_ID" TYPE"INTEGER NOT NULL DEFAULT 0"
TAG4DB"MOTHER_ID" /gt ltELEMENT
NAME"SCHEDULER" TYPE"TEXT NOT NULL DEFAULT """
/gt ltELEMENT NAME"RTUPDATER" TYPE"TEXT NOT
NULL DEFAULT """ /gt ltELEMENT NAME"SCHED_ID"
TYPE"TEXT NOT NULL DEFAULT """ /gt ltELEMENT
NAME"CHAIN_CLAD_FILE" TYPE"TEXT NOT NULL
DEFAULT """ /gt ltELEMENT NAME"LOG_FILE"
TYPE"TEXT NOT NULL DEFAULT """ /gt ltELEMENT
NAME"SUB_USER" TYPE"TEXT NOT NULL DEFAULT """
/gt ltELEMENT NAME"SUB_PATH" TYPE"TEXT NOT
NULL DEFAULT """ /gt ltELEMENT NAME"SUB_TIME"
TYPE"INTEGER NOT NULL DEFAULT 0" /gt
lt/TABLEgt - ltTABLE NAME"PROGRAM"gt ltELEMENT
NAME"PROGRAM_ID" TYPE"INTEGER PRIMARY KEY"
MOTHER"CHAIN" /gt ltELEMENT NAME"CHAIN_ID"
TYPE"INTEGER NOT NULL DEFAULT 0"
TAG4DB"MOTHER_ID" /gt ltELEMENT NAME"TYPE"
TYPE"TEXT NOT NULL DEFAULT """ /gt ltELEMENT
NAME"EXEC" TYPE"TEXT NOT NULL DEFAULT """ /gt
ltELEMENT NAME"ARGS" TYPE"TEXT NOT NULL DEFAULT
""" /gt ltELEMENT NAME"STDIN" TYPE"TEXT NOT
NULL DEFAULT """ /gt ltELEMENT NAME"STDOUT"
TYPE"TEXT NOT NULL DEFAULT """ /gt ltELEMENT
NAME"STDERR" TYPE"TEXT NOT NULL DEFAULT """ /gt
ltELEMENT NAME"PROGRAM_TIMES" TYPE"TEXT NOT
NULL DEFAULT """ /gt ltELEMENT NAME"INFILES"
TYPE"TEXT NOT NULL DEFAULT """
TAG4SCHED"IN_FILES" /gt ltELEMENT
NAME"OUTFILES" TYPE"TEXT NOT NULL DEFAULT """
TAG4SCHED"OUT_FILES" /gt ltELEMENT
NAME"OUTTOPDIR" TYPE"TEXT NOT NULL DEFAULT """
/gt lt/TABLEgt - ltTABLE NAME"PROGRAMTYPE"gt
ltELEMENT NAME"NAME" TYPE"CHAR(30) NOT NULL
PRIMARY KEY" TAG4DB"UPDATE_KEY"
TAG4SCHED"META_DATA" /gt ltELEMENT
NAME"PROGRAM_SCHEMA" TYPE"TEXT NOT NULL DEFAULT
""" TAG4DB"INSERT_FILE_CONTENT,CREATE_TABLE_CONTE
NT" TAG4SCHED"PROGRAMTYPE_CONTENT" /gt
ltELEMENT NAME"COMMENT" TYPE"VARCHAR(100) NOT
NULL DEFAULT """ TAG4SCHED"META_DATA" /gt
ltELEMENT NAME"PRE_BIN" TYPE"TEXT NOT NULL
DEFAULT """ TAG4DB"INSERT_FILE_CONTENT"
TAG4SCHED"PROGRAMTYPE_CONTENT" /gt ltELEMENT
NAME"RUN_BIN" TYPE"TEXT NOT NULL DEFAULT """
TAG4DB"INSERT_FILE_CONTENT" TAG4SCHED"PROGRAMTYP
E_CONTENT" /gt ltELEMENT NAME"POST_BIN"
TYPE"TEXT NOT NULL DEFAULT """
TAG4DB"INSERT_FILE_CONTENT" TAG4SCHED"PROGRAMTYP
E_CONTENT" /gt lt/TABLEgt
24
Dataset and PhEDEx
Boss level structure on WN
How to understand PhEDEs?
jm_Hit245_2_g133/jm03b_qcd_120_170
01D4DF4E-A4EB-4047-A94A-1A550265872F.zip
866822E9-244B-4C1D-BF1D-080E71D343F0.zip 021C736B-
A2A4-43E2-9F25-829F9E7E8F35.zip
8B3ED1AD-14AB-4696-BADD-71119EA7652A.zip ... tota
lly 135 files, 200GB

Manually dataset transfer
find out from where to copy the dataset
copy the files one by one
publish files into catalog one by one
write private scripts to do the transfer
use PhEDEx
PhEDEx has a collection of scripts or script
templates
PhEDEx provides a framework (a set of agents) to
support scripts
PhEDEx has a central Database (TMDB) to
coordinate every step in transfer process
PhEDEx has a website to monitor transfer status
and handle dataset request
...

file catalog
lt?xml version"1.0" encoding"UTF-8"
standalone"no" ?gt ltPOOLFILECATALOGgt ltFile
ID"01D4DF4E-A4EB-4047-A94A-1A550265872F"gt
ltphysicalgt ltpfn filetype""
name"dcap//gfe02.hep.ph.ic.ac.uk22128/pnfs/hep.
ph.ic.ac.uk/data/cms/phedex/jm03b_qcd_120_170/Hit/
01D4DF4E-A4EB-4047-A94A-1A550265872F.zip"/gt
lt/physicalgt ltlogicalgt ltlfn
name"ZippedEVD.121000153.121000154.jm_Hit245_2_g1
33.jm03b_qcd_120_170.zip"/gt lt/logicalgt
ltmetadata att_name"dataset" att_value"jm03b_qcd_
120_170"/gt ltmetadata att_name"jobid"
att_value"1126203628"/gt ltmetadata
att_name"owner" att_value"jm_Hit245_2_g133"/gt lt/
Filegt ltFile ID"866822E9-244B-4C1D-BF1D-080E71D3
43F0"gt lt/Filegt lt/POOLFILECATALOGgt
62796961
25
Developers point of view of PhEDEx
26
Users point of view of PhEDEx
NODE (IC)
Configuration
NODE
FileDownloadDestination
FileDownload
FileDownloadVeryfy
FilePFNExport
TMDB
WWW
FileDownloadDelete
NodeRouter
FileDownloadPublish
FileRouter
PFNLookup
NODE (RAL)
...
...
User needs to write glue scripts which are driven
by agents.
27
Event data model