Title: US-CMS Core Application Software Status Report
1US-CMS Core Application Software Status Report
- Ian Fisk
- SCOP Review
- October 26, 2001
2Outline
- Quick Introduction to the Core Application
Software Project (CAS) - Scope of the project
- Division of labor
- Status
- Plans
- Progress
- Problems
- News From Management
- Milestones and schedule
- Summary
3Introduction to CAS
- CAS is the US-CMS Core Application Software
Project. We are involved in 4 main areas - WBS 2.1 CMS Software Architecture
- Core Framework Development
- Sub-System Architecture Development
- CAFÉ (CMS Architecture Forum and Evaluation)
- WBS 2.2 IGUANA
- Graphical User Interfaces
- Visualization
- Data Browsing, plotting, fitting
- WBS 2.3 Distributed Data Management and
Processing - Evaluation,Testing, Integration of Grid Tools
- Distributed Process and Database Management
- System Services and Load Balancing
- Development of Production Tools
- Development of Distributed Production and
Analysis Prototypes - System Simulation and System Scalability
Development - WBS 2.4 Support
4Inside International CMS
- CPT is a combination of Computing, Physics,
and Trigger DAQ. Computing has been divided into
7 sub-projects. There are 5 cross project
groups to handle interactions between projects.
CCS Core Computing Software
PRS Physics Reconstruction and Selection
TriDAS Online Software
1. Computing Centres
2. General CMS Computing Services
9. Tracker / b-tau
3. Architecture, Frameworks / Toolkits
7. Online Filter Software Framework
10. E-gamma / ECAL
8. Online Farms
4. Software Users and Developers Environment
11. Jets, Etmiss/HCAL
5. Software Process and Quality
CAS Work
12. Muons
6. Production Processing Data Management
7. Grid Systems
RPROM (Reconstruction Project Management)
SPROM (Simulation Project Management)
CPROM (Calibration Project Management)to be
created
Cafe (CMS Architectural Forum and Evaluation)
GPI (Group for Process Improvement)recently
created
5Introduction to CAS
- CAS Currently Employs 8 On-Project Developers at
5 Institutions - Michael Case UC Davis 2.1 75 2.4 25
- Greg Graham Fermilab 2.3 75 2.4 25
- Iosif Legrand Caltech (CERN) 2.3 75 2.4 25
- Vladimir Litvin Caltech 2.1 50 2.3 25 2.4
25 - Ianna Osborne Northeastern (CERN) 2.2 75 2.4
25 - Natalia Ratnikova Fermilab 2.4 50
- Lassi Tuura Northeastern (CERN) 2.1 75 2.4 25
- Hans Wenzel Fermilab 2.1 25 2.4 25
- Tony Wildish Princeton (CERN) 2.3 50 2.4 50
-
- WBS 2.1 2.25
- WBS 2.2 0.75
- WBS 2.3 2.25
- WBS 2.4 2.75
- Totals 8.0
-
6Progress and Plans
- In the past we have focused a lot on technical
progress during these reviews - In the next several years CMS has several high
level milestones related to the software project - DAQ TDR 2002
- Computing TDR 2003
- Physics TDR 2004
- 20 Data Challenge in early 2004
- Try to talk today about the pieces needed to
complete the upcoming milestones and technical
progress being made toward completing those
pieces. - Break the discussion into 3 pieces
- Software for Reconstruction and Simulation
- Software for Analysis
- Software for Distributed Computing
7Simulation and Reconstruction
- CMS Software has a data store, a central
framework, a number of components, and a variety
of support packages.
Visualization Tools
8The Database
- Final choice currently scheduled for the end of
2002 - Considerable effort required to make a reasonable
choice - Long term commercial viability of Objectivity far
from assured. - ORACLE 9i being investigated by a CMS CERN
fellow. Preliminary indications are that some of
the Object handling aspects have not been
completely productized yet. - 50 areas of concern have been submit to IT to
determine if there are any show-stoppers - Root-IO being examined
- Workshop on October 10-11, with presentations by
CAS engineers Tony Wildish and Hans Wenzel - This is a key area of concern to CMS
Root IO Workshop conclusions and
presentations http//cmsdoc.cern.ch/cms/software/
presentations/rootws01/Conclusions.html
9CMS Software Framework
- This year CMS reorganized the central framework
into the COBRA Project - Includes the elements of CARF and Utilities
- More efficient to reuse code
- SCRAM managed project. Ianna Osborne responsible
for modularization - CMS software packages modified to use the new
framework ORCA, OSCAR, IGUANA - Investigation over the next half year of how to
separate the production and file handling aspects
of COBRA from the event and reconstruction
aspects, CAS Engineers Tony Wildish and Greg
Graham
10CMS Reconstruction and Simulation Packages
- The GEANT3 Based Fortran simulation program.
Workhorse since the first CMS Fortran code. - Supported by expected to be replaced by OSCAR
- Simple test application for Grid Developers
- The GEANT4 based simulation. Needed for
Physics TDR - Fully functional software expected by the end of
2001 - Physics Validation through next year
- Production Software expected by the end of 2002
- Fast Simulation Program. Needed to complete the
physics TDR and 20 Data Challenge. - Currently in Proof-of-concept phase. Lack of
people - CMS Reconstruction Code. Advanced. Needed for
DAQ TDR, Physics TDR, Computing TDR, and Data
Challenges - More information in David Sticklands Second Talk
CMSIM GEANT3 Fortran Based Simulation
OSCAR GEANT4 Based Simulation
FAMOS Fast Simulation
ORCA Reconstruction
11Technical Progress
- Last year OSCAR could not do full detector
simulation - This year several hundred events can be simulated
- Big infusion of people. New coordinator, new
developers, new librarian Hans Wenzel who has
handled release and configuration - Fast Development requires frequent release
- Need to rapidly progress to reliably being able
to simulate thousands of events for physics
validation. - ORCA continues to develop and improve
- 68 Developers supporting 190k lines of code
- Most reconstructed objects are stored
persistently with the release of ORCA 5 including
tracks with reasonable subset of functionality - CAS engineer Vladimir Litvin has been working on
the framework used in the calorimetry
reconstruction. Main developer left CMS and code
has been unsupported since. - First phase expected before the end of October
12Support Packages
- Visualization was listed as a support package, a
useful debugging tool. Part of the IGUANA
package and will be covered in analysis. - CAS Engineer Michael Case has been participating
in the development of the Detector Description
Database. A consolidated store of detector
information used by CMS software clients. Needed
for consistent geometry input to CMS Software
Packages - Lots of other supporting packages SCRAM for
configuration, DAR for distribution, OVAL for
validation
- Functional prototype Nov. 2001
- Basic Geometry and Materials
- Basic Core Functionality
- XML Schema
- Fully Functional Prototype April 2002
- All CMS Required Solids
- All CMS positioning parameters
- Numbering Scheme
- Prototype of XML DDD editor
13CMS Architecture CAFE
- Café cross-project task force set-up (end 2000)
to evaluate, document, and provide feed back for
improvements - Documentation and evaluation of the existing
architecture, design, use-cases, scenarios,
requirements, - 4 task forces
- Online helped to define simulation program
- Framework gave recommendation for organization
of sub-projects and the reuse of core software. - Analysis is working on a CMS analysis
requirements document to be fed to IGUANA.
- Distributed Computing working on CMS
requirements for the grid projects as well as
CMS Distributed Production and Distributed
Analysis Requirements
- CAS engineer is working on the top level
description document for the CMS central Software
Framework. - CAFÉ hasnt succeeded in providing the
evaluations or the documentation that was
expected. CMS is still trying to determine how
to revitalize this project.
14Analysis
- In order to complete high level milestones a lot
of analyses must be performed - 2002 DAQ TDR
- Physics analysis for high level trigger studies
- 2003 Computing TDR
- Physics analysis techniques both local and
distributed - Verification of on-line trigger routines,
reconstruction code, required networking, etc - 2004 Physics TDR
- Prototypical Analysis for everything
- 2004 20 Data Challenge
- 20 test of the entire system starting from the
raw detector readout, though triggering,
reconstruction, and analysis
15Analysis and Data Accessibility
- CMS has increased data accessibility through the
use of the database. Allows more transparent
access to many levels of the data - Loop over data summary quickly
- Access lower levels of data for more detailed
analysis - Possible to visualize even the raw data for small
set of selected events - Unfortunately very few people have been able take
advantage of the improved accessibility. - Accessing the database directly through the
framework is possible but until recently there
hasnt been a summary format which maintains the
connections ntuples, root files, etc. break the
connections. - Without a workable summary format, one must store
summary another way or loop over higher levels of
data for each analysis step - This has been painful due to deficiencies in the
staging system at CERN - Led almost all analysis groups to write
alternative summary format - Muon writes Root files
- Jet/Met writes ntuples
16Summary Format
- Current front runner for summary formats is the
use of tags - Tags can store small amounts of data like ntuples
or root files - Can be looped over quickly for analysis jobs
- Can be stored so that connections are maintained
to more detailed levels of the database - Need Physics groups to help define AOD (Analysis
Object Data). - Good example code exists for creating generic
tags - Tools used to analyze them are still under
discussion whether its tags or some other summary
format
17Analysis Tools
- No unique choice satisfying both users
developers - Users tend to use any tool with the required
functionality - Developers worry about quality, integration and
support issues - Idea is to take advantage of existing tools as
much as possible, but to combine the
functionality - Create a uniform architecture and interfaces for
multitude of tools. Interoperable
components/plug-ins - Allows custom functionality to the achieved
without the manpower required for creating
completely custom tools - Workshop at the next CPT week meeting
- Explain current ideas and plans to the physics
groups - Demonstrations of generic analysis packages
- Reactions of Physics and Developers
- Demonstrations of preliminary integrations of
analysis modules and CMS software. - Get input from the PRS groups about desired and
required functionality
18New Analysis Architecture
- New Analysis Architecture is being written by a
CAS engineer - Relying on a very small very flexible kernel
- Uniform architecture and interfaces for multitude
of tools - Interoperable components/plug-ins
- Consistent with HEP trends (e.g. HEPVis 2001)
- Consistent with Lizard (C), JAS (Java),
Hippodraw (C/Java) - Close links to CMS data without strong-coupling
of software - First implementation of new architecture will be
released October 2001
19IGUANA
- CAS Engineers Ianna Osborne and Lassi Tuura have
led the development of Interactive Graphics for
User ANAlysis - Main IGUANA focus - interactive detector and
event visualisation - High-performance 2D/3D graphics
- Graphical user interfaces
- Data browsers.
- Integration of other tools, components
- The goal is to provide common look and feel for
the CMS interactive graphical applications - Interactive analysis is not considered a primary
goal. It is assumed that this functionality will
be provided by other tools (JAS, Hippodraw,
Lizard, ROOT, or OpenScientist)
20ORCA Visualisation with IGUANA
21ORCA Visualisation
- Based on generic IGUANA toolkit
- with CMS specific extensions for Detector
Geometry - Geant3 detector geometry
- Reconstruction geometry for the Tracker
- and Event
- Muon DT, CSC, and RPC sim hits DT and CSC
track segments, CSC rec hits reconstructed and
simulated tracks - Tracker-Bt simulated and reconstructed tracks,
measurements with directions, sim hits - ECAL-Eg simulated and reconstructed hits
- HCAL-JetMEt digits, jets.
22Sim Hits and Sim Tracks
Z slice
setenv OO_FD_BOOT cmsuf01/cms/reconstruction/us
er/jet0501/jet0501.boot InputCollections
/System/jetDigis_1033_CERN/eg_ele_pt1050_1033/eg_e
le_pt1050_1033
23OSCAR (GEANT4) Visualisationusing IGUANA
IGUANA Viewer displaying OpenInventor scene
Control of arbitrary GEANT 4 tree
Correlated Picking
24OSCAR Visualisation Next Step
- Integration of the detector overlap tool (Martin
Liendl) - Extending the scope of the configuration wizard
Example extension (a trivial wizard) Queried
from plug-in database, located on request and
bound to IGUANA G4 Run Manager
25Software For Distributed Computing
CMS Distributed Computing System
- Prototypes of several of the dedicated
Facilities exist - Part Time PrototypeTier0 facility at CERN
- Full Time Prototypical Tier1 Facility at Fermilab
- Several Full Time Prototype Tier2 Facilities in
the US and Italy - More shared production facilities, many of which
will eventually be entries in the chart
26Production Software
- First production performed at CERN (by David
Stickland and Tony Wildish) - Production needed to complete TDRs and Physics
Studies rapidly overwhelms capabilities of CERN - Need to take advantage of computing resources
both dedicated and shared at remote facilities - How do you arrange for a lot of people to rapidly
become production managers? - Clone and Distribute David and Tony
- How do you maintain the consistency of jobs run
all over the world? - How do you transfer the results to central
facilities for analysis? - A few files at a few centers is easy but the
complexity grows very rapidly - Need easy to use common production tools which
can consistently specify and execute production
jobs - Flexible and site independent enough to be used
everywhere - Need applications to transfer and manage data
from remote sites.
27IMPALA Production Tools
- Intelligent Monte carlo Production and Analysis
Local Administrator - In response to CMS need for reliable production
tools IMPALA was created - Production scripts initially developed by Hans
Wenzel - Were Ported to CERN and made site independent by
Greg Graham - Now used by almost all CMS Production centers
- Has resulted in smoother and more reproducible
production - At the time of the last review only CERN and
Fermilab had successfully run all production
steps. Now several regional centers have
succeeded - IMPALA is implemented as bash scripts. It allows
for good functionality, but it is hitting the
limits of complexity. - As more functionality and site independence is
desired a more flexible implementation is
desired. - Next set of job specification tools called
MC_runjob, a joint project between D0 and CMS, is
implemented in Python.
28MC_runjob implementation
- Currently in initial release
- Allows specification and chaining of executables
- Allows production jobs to be templated and
reduces the manual configuration of production
managers - Has a GUI
- First big test is a 12 million event sample for
calibration which will be run from generation
through simulation, reconstruction and analysis
in a single job
29GDMP
- The Grid Data Mirroring Package, developed by
CMS, PPDG, and EDG, is an example of CMS
successfully interacting with the Grid Projects.
- Tools are needed to transfer and manage results
run at remote centers - Easy to handle this manually with a few centers,
impossible with lots of data at many centers - GDMP is based around Globus Middleware and a
flexible architecture - Globus Replica Catalogue Recently implemented to
handle file format independent replication - Formerly only could manage Objectivity Data Files
- Successfully used to replicate about a 1TB of CMS
data during tests - GDMP Heartbeat monitor proposed by CAS engineer
Greg Graham - Automatically verifies the client and servers are
working and notifies users when problems occur.
30Production Issues
- Almost immediately pieces that are needed begin
to appear - Information about the job parameters used to run
the jobs isnt stored in a convenient way making
it difficult for the end users to determine
exactly how jobs were produced - Job and Request tracking is done almost entirely
manually - Web pages are updated by hand, stored in several
places - It is difficult to determine if production farms
are running efficiently and to diagnose and solve
problems - Solutions Proposed
- EU DataGrid and CMS developed BOSS system helps
with the Job specification tracking with the use
of a simple database - US-CMS members are working to include the IMPALA
specified parameters into the database - CAS Engineer Iosif Legrand is working on
monitoring tools for clusters - Variety of technologies are being investigated
- In the short-tem this helps the efficiency of the
production system - In the long-term monitoring serves information to
advanced Grid-Service
31Agent-Based Distributed System
- Based on JINI
- Includes Station Servers (static) that host
Dynamic Services - Servers interconnected dynamically to form a
fabric in which mobile agents can travel with a
payload of physics analysis tasks - Prototype is highly flexible and robust against
network outages - Amenable to deployment on leading edge and future
portable devices (WAP, iAppliances, etc.) - The ultimate system for the travelling physicist!
- Design document submitted as part of review
material (I. Legrand) - Studies using the MONARCSimulator (build on SONN
Study)
Task Allocation to SitesReplica and Workflow
Mgmnt
32System Scalability
- CMS has an aggressive ramp up of computing
complexity to reach a full sized system. - Target to reach 50 of complexity by 2004
- T0/T1 with approx 600 cpu boxes
- (CPU boxes is an inadequate measure of full
complexity) - Double each year
- End of 2001 200 boxes
- End of 2002 400 boxes
- Along with the effort to make use of distributed
computing resources, there is considerable effort
needed to use large amounts of local resources
when there are central services. - CAS Engineer Tony Wildish has been instrumental
in achieving the complexity milestones.
33Production Now
- While CMS production has not gone as quickly as
we would have like the use of distributed
facilities has been successful with 6 fully
operational centers and another several expected. - Unfortunately has many manual components
- Production Tool improvements will help some of
this - Manpower intensive, requires a production manager
at all participating sites - The next step is to begin to investigate
Distributed Production Systems - Even for the 20 data challenge, tools to
automate the use of remote facilities for
production and reconstruction are needed - CMS plans for manpower at the Tier2 centers does
not allow for full time production people at
Tier2 facilities. Long-term need to reduce
manpower needed for production and eventually
reconstruction
34Distributed Production Systems
- To automate even basic predictable, schedulable
production need tools - Authentication Modules
- Distributed Schedulers
- Automated Tools for Data Replication
- Job Tracking Tools
- System Monitoring
- Production Configuration Tools
- To get even a little more advanced
- Resource Discovery Tools
- Resource Brokers
- Load Balancing
- Cleary substantial support is needed from the
Grid Projects
35Distributed Production Prototypes
- PPDG Developed MOP System
- Relies on GDMP for file replication
- Globus GRAM for authentication
- Condor-G and local queuing systems for Job
Scheduling - IMPALA for Job Specification
- Currently the system is deployed
- at FNAL, UCSD, Caltech, and
- U. Wisc
- Allows submission of cmsim jobs
- from a central location, run on
- remote locations, and return
- results
- More complicated ORCA
- production testing expected
- soon.
36Prototypes and Plans
- EU DataGrid Developed TestBed1 will run before
the end of the year - They hope to achieve a similar functionality to
MOP - Switching from predictable Distributed Production
Prototypes to choatic Distributed Analysis
Prototypes is a significant step in complexity - Addition of analysis users
- More complex authentication
- Additional security to protect against the
careless and the malicious - Resource Discovery for both data and computing
resources necessary - Load Balancing more complicated
- Time Estimation Tools required
- Good Interactions with the Grid Necessary
- CMS needs to clearly define requirements and
expectations - Process started, but clearly a lot of work is
needed. - New CCS Level 2 Task for Grid should define the
CMS requirements and evaluate the prototypes
37Schedule and Milestones
- CMS Software Schedule is tight
- CMS Architecture Development WBS 2.1
- WBS 2.1.2.1 Detector Description Database
- Has a release of a functional prototype in Nov
(WBS 2.1.2.1.6) - More critical that the Fully Functional Software
is released in April (WBS 2.1.2.1.8) for
integration with the CMS software packages that
will use it - Development can progress without it, but its
important to have a common set of information for
all software and it would be good to validate
OSCAR with something close to a final system - Important to get Users reaction to tools
(Michael Case is scheduled to work with the End
Cap muon group in California assessing techniques
and user interface)
- It has been difficult keeping the remote
engineers working as efficiently as the people - resident at CERN.
- Requires considerable effort on both sides to
make it work. - The most successful examples are at Fermilab
where there is a large team to work - with
38Schedule
- WBS 2.1.2.2 OSCAR Development is progressing much
better than last year - In order to perform physics validation of GEANT4
fully functional code must be delivered soon. - US-CMS has 0.5 FTE working in this area.
- WBS 2.1.2.3 Sub-System Architecture Development
- This was a new effort this year. First release
was expected end of September - Has slipped but next major production is not
scheduled until Jan. 2002 - Will be critical if initial modifications and not
completed in time to validate - WBS 2.1.2.4 Analysis Sub-Architecture
- On schedule for a release at the end of the month
- WBS 2.1.2.5 Production Architecture is a new task
for the coming year - Primarily a US responsibility with 1FTE of effort
identified over 2 people for development
39Schedule
- WBS 2.2 IGUANA
- Several of IGUANAs milestones slipped because
the funds for an additional developer expected at
the beginning of the year were only made
available recently. Search progressing to fill
position - Data Browsers for CMS software were pushed into
the early part of next year. - Analysis in general has not gone as smoothly as
everyone would like. - WBS 2.3 Distributed Data Management and
Processing - WBS 2.3.2 Computing Complexity Progression
- Involved a tremendous amount of effort to stay on
schedule - Does not appear to be getting easier and there is
still a long way to go - May require that CMS rethink elements of the
local computing model to make it simpler and more
scalable
40Schedule
- WBS 2.3.6 Distributed Production Tools
- Work has generally gone very well
- A few milestones were delayed, but production in
the spring should 2002 should run a lot faster
and smoother than the Fall 2000 production which
took almost a year to complete. - Reaching a level of maturity that we will be able
to use a lot of Gregs time in the next year for
other development tasks. - WBS 2.3.6.4 System Monitoring Tools
- Effort recently started this summer by Iosif
Legrand in cooperation with a CERN CCS developer
and two Pakistani students - First release may be available in time for the
Spring 2002 Production run - WBS 2.3.4, 2.3.7, and 2.3.8 Distributed Data
Management, Distributed Production Prototyping,
and Distributed Analysis Prototyping require good
interactions with the grid projects - So far there have been some interesting
prototypes - More formal interactions are needed
- Better evaluations and requirements from CMS
41Milestones 2.1 Architecture
2.1.2.4.1 Use Case Analysis For New Analysis Architecture Feb 9, 01 Feb 9, 01
2.1.2.1.2 Tools for Conversion of XML to GEANT3 March 1, 01 March 15, 01
2.1.3.1.5 First Release of CAFÉ Documentation Tools March 1, 01 March 1, 01
2.1.3.2.1.4 Top Level CARF Description Document May 1, 01 Dec 1, 01
2.1.2.1.4 Assessment of XML Technology July 1, 01 August 1, 01
2.1.2.3.1.5 Release of redesign document July 3, 01 July 7, 01
2.1.2.3.1.7 Release of code for use in production Sept 27, 01 Oct 15, 01
2.1.2.4.3 Analysis Architecture kernel defined Oct 31, 01
2.1.2.1.5 Release of DDD Prototype Nov 15, 01
2.1.2.3.1.9 Release of Calo Code Phase 1-4 Dec 19, 01
42Milestones 2.2 IGUANA
2.2.1.4.3 Review of baseline technology GUI technologies Oct 31, 01
2.2.2.4.4 Review of baseline graphics technologies Oct 31, 01
Several IGUANA milestones from Oct were delayed 6 months due to lack of manpower Several IGUANA milestones from Oct were delayed 6 months due to lack of manpower Several IGUANA milestones from Oct were delayed 6 months due to lack of manpower Several IGUANA milestones from Oct were delayed 6 months due to lack of manpower
43DDMP Milestones 2.3
2.3.4.2.4 File format Replication In GDMP Feb 12, 01 Aug 01, 01
2.3.7.2 Release of Distributed Production Prototype Design Document Feb 13, 01 May 25, 01
2.3.4.5.4 Implementation of Security Protocol in Objectivity Mar 01, 01 Apr 04, 01
2.3.6.3.6 Port of FNAL Scripts to CERN Mar 3, 01 Mar 01, 01
2.3.6.3.9 Release of Site Independent Scripts May 2, 01 May 2, 01
2.3.7.4 Test of Distributed Production System between FNAL and U. Wisc July 4, 01 Oct 2, 01
2.3.7.6 Test of Distributed Production Tier2 Aug 16, 01 Sep 28, 01
2.3.6.3.11 Tools for Job Specification Aug 6, 01 Oct 15, 01
2.3.6.3.15 Tools for Job Specification in BOSS Sep 25, 01 Nov 1, 01
2.3.7.8 Evaluation Document for MOP Oct 12, 01 Dec 1, 01
2.3.6.3.17 Compilation of User Reaction to spec Nov. 28, 01
44Conclusions
- Lot of progress in a variety of areas, good
contributions from CAS engineers - CMS Software schedule is tight with high level
milestones approaching quickly - Lots of work left to do
- Choice of viable database solution
- Validated GEANT4 simulator
- CMS Complexity Progression very aggressive
- Production Tool Improvements
- Production scheduled for Feb. 2002 and needed to
be completed by summer is as large as the sample
which recently took a year. - Analysis Tools and reaping the benefits of the
improved data accessibility.