Title: LATBauerdick, Project Manager
1Joint DOE/NSF Status Meeting of U.S. LHC
Software and ComputingNational Science
Foundation, Arlington, VA, July 8, 2003
US CMS Software and Computing Overview and
Project Status
- LATBauerdick, Project Manager
- Agenda
- Overview and Project Status -- LATBauerdick
- Preparations for CMS Milestones -- Ian Fisk
- Discussions
2Plan for this Talk
- Responses to previous recommendations
- Grids and Facilities
- Software
- Management
- Status FY03
- Issues for the next 12 months
- Production Quality Grid
- Middleware and Grid Services Architecture
- Environment for Distributed Analysis
- Grid Services Infrastructure
- Ian Fisks talk Status and preparation for the
Data Challenge 2004 and Physics TDR
3Recommendations Grids
- The committee encourages both experiments to
continue efforts to get the prototype Tier 2
centers into operation and to establish MOU's
with the iVDGL Tier 2 centers concerning various
production deliverables. - three sites are fully operational and active
participants in CMS production - Caltech, UCSD, U.Florida
- have finalized and signed the iVDGL MOU --!
- production deliverables delineated in the MOU
- However, the prototype Tier 2 centers are also
excellent sites for testing prototype software
developments and deployments. They should be
allowed to operate for part of the time in
research mode, which is consistent with their
charter. This research will facilitate
opportunistic use of non-ATLAS/CMS owned
resources in the future. - working with Tier-2 sites on many aspects of
Grids - e.g. dCache test for pile-up serving, networking
studies, Replica management etc - defining joint Grid-US LHC Grid-3 project with
all T2 sites participating - also has specifically CS-demo component related
to GriPhyN
4Recommendations Grids cntd
- There are good efforts underway to pursue grid
monitoring. The committee encourages continued
efforts to develop integrated monitoring
capabilities that provide end-to-end information
and cooperation with the LCG to develop common
monitoring capabilities - MonaLISA monitoring provides such an end-to-end
system for US CMS Grids - MonaLISA is a model for agents-based Grid
services architecture - is becoming part of the VDT and thus will be
available to any VDT-based Grid environment (LCG
and EDG are VDT-based) - Monitoring in LCG is an ongoing activity -
- definition of requirements through GOC effort
- have seen initial GOC document, provided
feedback, new doc coming out today - interest of INFN to provide tools
- new LCG GOC plan is to have a federated
apporach - with GOCs in Europe (RAL), US, Asia
- work on instrumentation of CMS application not
yet started, - but is part of the work plan
5Recommendations Grids Facilities
- US CMS should ensure that FNAL has a
networking plan that meets CMS needs as a
function of time from DC04 to turn on (in the
presence of other FNAL demands). - Offsite data transfer requirements have
consistently outpaced available bandwidth - Upgrade by ESnet to OC12 (12/02) becoming
saturated already - FNAL planning to obtain an optical network
connection to the premier optical network
switching center on the North American continent
StarLight in Chicago, which enables network
research and holds promise - for handling peak production loads for times when
production demand exceeds what ESnet can supply - for acting as a backup if the link to ESnet is
unavailable - Potential on a single fiber pair
- Wavelength Division Multiplexing (WDM) for
multiple independent data links - Capable of supporting 66 independent 40 GB/s
links if fully configured - Initial configuration is for 4 independent 1 Gbps
links across 2 wavelength - Allows to configure bandwidth to provide a mix of
immediate service upgrades as well as validation
of non-traditional network architectures - Immediate benefit to production bulk data
transfers, test bed for high performance network
investigations and scalability into the area of
LHC operations
6Current T1 Off-Site Connectivity
- All FNAL off-site traffic carried by ESnet link
- ESnet Chicago PoP has 1Gb/s Starlight link
- Peering with CERN, Surfnet, CAnet there
- Also peering with Abilene there (for now)
- ESnet peers with other networks at other places
7T1 Off-Site Connectivity w/ StarLight link
- Dark fiber as an alternate path to
StarLight-connected networks - Also an alternate path back into ESnet
8Network Integration Issues
- End-to-End Performance - Network Performance and
Prediction - US CMS actively pursues integration of Network
Stack Implementations that support ultra-scale
networking rapid data transactions,
data-intensive tasks - Maintain statistical multiplexing end-to-end
flow control - Maintain functional compatibility with Reno/TCP
implementation - FAST Project (Caltech) has shown dramatic
improvements over Reno Stack by moving from loss
based congestion to delay based control mechanism
- with standard segment size and fewer streams
Sender-side only modifications - Fermilab/US CMS is FAST partner
- as a well supported user having the FAST stack
installed on - Facility RD Data Servers (first results look
very promising) - Aiming at Installations/Evaluations for
Integration with Production Environment at
Fermilab, CERN and US Tier-2 sites - Work in Collaboration with the FAST project team
at Caltech and Fermilab Computing Division
9Real-world Networking Improvements
- Measured TCP throughput between Fermilab Tier-1
and CERN - requirements until 05 30 MBytes/sec average,
100MByte/sec peak
using the standard Stack
using the FAST Stack
TCP Window Size
622
622
100
100
Throughput Mbits/s
Throughput in Mbits/s
10
1
0.1
1
TCP Window Size
1
100
10
100
10
1
Streams
Streams
10Recommendations Grids Facilities
- US CMS should formulate a clear strategy for
dealing with homogeneity vs. interoperability
(LCG-x / VDT / DPE) and argue for it within
International CMS and the LCG process. We further
recommend a preference for interoperability to
facilitate opportunistic use of external
resources. - argued for interoperability and (some)
heterogeneity with CMS and LCG - seem to have found understanding, specifically
with the LCG/EDG delays - developed and deployed DPE as basis for
Pre-Challenge Production in the US - CMS application platform based on VDT and EDG
components - added functionality in terms of Grid services,
like VO management, storage resource management,
data movement - advertised in the GDB the idea of a federated
interoperable LHC Grid - issues like Grid Operations, different
implementations of middleware services (RLS,
MDS), organizational and political realities,
like EGEE in Europe - have seen both resistance and support for these
ideas - ongoing process - devised and agreed on the next joint US LHC/US
Grid projects step Grid3 - conceived the long-term goal of Open Science Grid
11Recommendations Grids Facilities
- US CMS should work with US ATLAS, DOE and NSF
to develop a plan for the long-term support of
grid packages - Virtual Data Toolkit (VDT) is now the agreed
standard for US LHC, LCG and EDG - so we are establishing the boundary conditions
and the players - many open issues
- but also many developments e.g. EGEE working
with VDT principals etc - what can we expect from NMI et al?
12Recommendations Grids Facilities
- We recommend that the issue of authentication
in heterogeneous environments (including Kerberos
credentials) for all of CMS International should
be further studied. - have started a project to addresses these issues
VOX project with US CMS, Fermilab, BNL, iVDGL
(Indiana, Fermilab) - US CMS registration services, and Local
authorization/authentication - address labs Kerberos-based vs Grid (PKI)
security - many fundamental issues related to security on
the Grid-- see also Fermilab Cyber Security
report, endorsement for Fermilab to tackle! - immediate issues like KCA PKI addressed through
VOX project - other issues with Grid authorization will come up
very soon - work with Global Grid Forum activities
- next issues are privilege management (attributes
like privilege, right, clearance, quota),
policy-based resource access, secure credential
repositories - need to address to even be able to formulate
Service Level Agreements! - also TRUST ITR proposal of Foster et al.
13VOX Architecture
- Many of the components exist, some are being
developed - allows T1 security requirements and Grid Security
Infrastructure to co-exist - provides registration services for Virtual
Organization
14Recommendations Software
- The committee concurs with the US CMS
assessment that further reduction of manpower
would be detrimental to the ability of the team
to complete its mission. US CMS management should
not allow the current manpower allocation to
erode further and it needs to push for the needed
ramp up in the coming years. - with the start of the NSF Research Program in
FY03 and the funding guidance we have received
for FY04 it will be feasible to stabilize and
consolidate the CAS effort - CERN is looking at the experiments software
manpower (including physics) - review in September establishing update on FTE
requirements w/ LCG-AA - also looking at physics software
- expect the previously planned (1FTE) ramp-up for
CAS in FY04 is required - other areas are starting in FY04
- Grid services support, physics support
- relies on the availability of NSF Research
Program funds!
15Recommendations Software
- US CMS should prioritize grid projects and
reassign people as needed in order prevent
schedule slip but should avoid diverting
personnel from other high priority projects to
grid projects. - all Grid-related efforts are now in the User
Facilities sub-project - some re-assignments of deliverables/scope from
CAS to UF has occurred, freeing corresponding CAS
manpower (1 FTE) - however, there is an open position at Fermilab
for CAS, and we have been as of now unsuccessful
to fill this slot from existing Computing
Division manpower - The CAS team should provide vigorous and loud
input to the prioritization of the LCG projects
in order to ensure completion of those most
critical to CMS milestones - our CAS engineers have strong institutional and
work team loyalties to CMS - are either directly involved in the LCG project
(e.g. Tuura, Zhen) - or work closely with the LCG (Tanenbaum, Wildish,
etc) - this also occurs reasonably efficiently through
the oversight committees - SC2, POB, GDB
16Recommendations Software
- CMS should promote tightest possible coupling
between key CMS software efforts with their
specific needs to off-project (quasi external)
efforts that will benefit CMS (e.g., GAE RD). - we are actively addressing this
- main off-project RD effort on distributed
analysis and grid services architecture small
ITRs in Cal, MIT, Princeton Caltech GAE project - becoming successful in convincing the LCG and CCS
to actively pursue an architectural effort for
distributed analysis - LCG GridApplicationsGroup effort for analysis use
cases, including Grid (HEPCAL II) - ARDA Architectural Roadmap towards Distributed
Analysis (RTAG 11) - very useful PPDG CS11 activity on Interactive
Analysis - June Caltech workshop discussing Grid services
architecture for distributed analysis - Clarens is being tracked in CMS, presentations
in CMS and CPT weeks - Clarens will be part of the DC04 analysis
challenge
17Recommendations Software
- While the search for NSF funds through the ITR
program appears necessary, US CMS CAS management
should be wary of the complex relationships and
expansive requirements that come into play in
such an environment. - Yes! we are w(e)ary...
- Nevertheless, we have submitted two ITR
proposals DAWN, GECSR - we also realize the great opportunities the ITR
program gives - we need to have CS directly involved in the LHC
computing efforts - we will need to implement much of what is
described in DAWN - distributed workspaces are at the core of how
distributed resources get to the users - this is new and exciting, and the next step for
the Grids - middleware OGSI -- made available to individuals
and communities through DW - through the ITR program the NSF should foster a
scientific approach to the problem!! - otherwise, we will need to implement much of that
by straight programing
18Recommendations Management
- It would be useful if the US CMS Software and
Computing Project Management Plan could be
updated to reflect its new management within the
US CMS Research Program. - there is the project management plan for SC,
which was written and approved before the start
of the Research Program - this plan is going to be adapted, and the planned
time scale is to have a update in draft form at
the next Program Management Group meeting, August
8, 2003 - We consider it important to continue to allow
the flexibility to shift funds among different
components in the coming era of funding as a
research project. - we agree, and this is being done through the
Research Program Manager, Dan Green
19Recommendations Management
- The committee recommends that US CMS closely
monitor plans for providing support of externally
provided software, and especially grid
middleware. - yes, see above!
20Project Organization
- In April, Ian Fisk joined Fermilab as Associate
Scientist - and as Level2 manager for User Facilities
- we are in the process of defining the services
that UF delivers - project deliverables are being formulated in
terms of services - that allows us to map out Computing, Data,
Support, Grid services, etc - a program of work is being performed in order to
implement and run a service - program of work related to service, with
sub-projects to implement/integrate/deploy etc,
as needed, and operations teams to run services - tracked and managed through the resource-loaded
WBS and schedule - service coordinators responsible for these main
deliverables - also defining the roles of the Tier-1 Facility
Manager, the User Services Coordinator - Robert Clare of UC Riverside is agreeing to
become the CAS L2 manager
21FY03 funding
- Very substantial cut in FY03 DOE funding guidance
w/r to previous plans - OFallon letter March 2002, subsequent DOE
guidance gt Bare Bones scope - US CMS SC was able to maintain the Bare Bones
scope of 4000k - DOE shortfalls mitigated by the start of the NSF
research program funding - Research Program Management has allocated the
funding accordingly - DOE total RP funding is 3315
- SC got allocated 3115
- NSF total RP funding is 2500k (1000k 2-years
grant 1500k new) - SC got allocated the 750k, out of the two-year
grant, - total SC funding in FY03 is 3865k (bare bones
profile 4005k) - however, due to the drastic reduction in FY03 DOE
RP guidance (March 02) - SC could not yet start all the foreseen NSF RP
activities (leadership profile) - For FY04 we need DOE to ramp to the required
level so that we can realize the potential of
the US LHC program!
22FY03 Funding Status
- Budget allocation for FY03
23FY03 Funding Status
- Effort as Reported and ACWP as Invoiced
24Plans for FY04
- NSF Leadership profile!
- Physics support and Grid services support start
- Tier-2 pilot implementation
- CAS gets an additional FTE
- Plans for US-CERN Edge Computing Systems
- Tier-1 stays at 13 FTE
- rather small T1 upgrades
Bare-Bones Scope BCWS FY02-FY05
25FY04 US LHC Edge Computing Systems
- following discussion around the C-RRB meeting
US CMS scoping out project with the LCG and CCS - look at a flexible way of moving streams of raw
data to the Tier-1 centers - with some "intelligence" in doing some selection
and thus producing pre-selected data streams, - to be able to optimize access to it later
- (and eventual enable re-processing, or even
processing of dedicated lower-priority triggers) - while at the same time ensuring a consistent and
complete 2nd set of raw data. - provide the US Edge-Computing Systems needed at
CERN for flexible streaming of data to the US - part of these facilities would be located at the
CERN Tier-0 - main function is to enable LHC computing in the
US and to facilitate streaming and distribution
of data to Tier-1 centers off the CERN site - The associated equipment would also be available
to the LCG in helping the planned tests for the
computing model - exact scope of this project being discussed in US
CMS and with the LCG facilities group - going to be looked at by CMS within the next
couple of months - will eventually increase the CMS physics reach,
- help to understand better the issues of such a
distributed data model. - inputs to the "economics model", PRS
participation in the project probably would be
appropriate. - presented to US CMS, then CMS SB for approval
expected costs in equipment about 500k
26FY03 Accomplishments
- Prototyped Tier-1 and Tier-2 centers and deployed
a Grid System - Participated in a world-wide 20TB data production
for HLT studies - US CMS delivered key components IMPALA, DAR
- Made available large data samples (Objectivity
and nTuples) to the physics community - ? successful submission of the CMS DAQ TDR
- Worked with Grid Projects and VDT to harden
middleware products - Integrated the VDT middleware in CMS production
system - Deployed Integration Grid Testbed and used for
real productions - Decoupled CMS framework from Objectivity
- allows to write data persistently as ROOT/IO
Files - Released a fully functional Detector Description
Database - Released Software Quality and Assessment Plan
- well underway getting ready for DC04 gt Ian
Fisks talk
27Shifting the Focus to Distributed Analysis
- going forward to analysis means a significant
paradigm shift - from well-defined production jobs to interactive
user analysis - from DAGs of process to Sessions and state-full
environments - from producing sets of files to accessing massive
amounts of data - from files to data sets and collection of objects
- from using essentially raw data to complex
layers of event representation - from assignments from the RefDB to Grid-wide
Queries - from user registration to enabling sharing and
building communities - are the (Grid) technologies ready for this?
- there will be a tight inter-play between
prototyping the analysis services and developing
the lower level services and interfaces - how can we approach a roadmap towards an
Architecture? - what are going to be the new paradigms that
will be exposed to the user? - user analysis session transparently extended to a
distributed system - but requires a more prescriptive and declarative
approach to analysis - set of services for collaborative work
- new paradigms beyond analysis
28(No Transcript)
29LHC Multi-Tier Structured Computing Resources
Peta Scales!!
30Building a Production-Quality Grid
- We need make Grid work, large resources become
available to experiments
31Getting CMS DC04 underway
- pre-challenge Production to provide 50 million
events for DC04 - Generation already started
- Generator level pre-selection in place to enrich
simulated background samples - Goal is 50 million useful events Simulated and
Reconstructed - To fit scale of DC04
- Also fits scale of first round of Physics TDR
work - Simulation will start in July
- Assignments going out now
- all US sites already certified
- G4 and CMSIM, some samples simulated with both
- Expect G4 version to go through a few versions
during production - Mix and choice of G4/CMSIM to be determined
- Data production rate will be about 1TB/day
- Simulated Data kept at Tier-1 centers
- Reconstructed data sent to CERN for DC04 proper
in spring 2004
32CMS Computing TDR Schedule
33Plan for CMS DC04
34Preparation of LHC Grid in US Grid3
- Prepare for providing the Grid services for DC04
and beyond the Grid3 project - contributions from US LHC and the Trillium
projects (PPDG, iVDGL, GriphyN) - integrate the existing US CMS and US Atlas
testbeds, including the existing iVDGL
infrastructure - deploy a set of emerging Grid services built upon
VDT, EDG components (provided by LCG) and some
specific U.S. services, as required - e.g. monitoring, VO management etc
- demonstrate functionalities and capabilities,
proof that the U.S. Grid infostructure is ready
for real-world LHC-scale applications - specific well-defined performance metrics robust
data movement, job execution - Not least
- demonstration Grid, showcasing NSF and DOE
infostructure achievements - provides a focal point for participation of
others - e.g. Iowa, Vanderbilt (BTeV), etc
35Grid3 is Trillium and US LHC
- Multiple Virtual Organization persistent Grid for
applicationdemonstrators that hums along (not
expected to purr yet) - Well aligned with deployment and use of LCG-1 to
provide US peerservices in fall 2003 - Well aligned with preparations for US ATLAS and
US CMS DataChallenges - Demonstrators are production, data management or
analysisapplications needed for data challenges. - Application demonstrators running in production
environment to showcapabilities of the grid and
allow testing of the envelope. - Clear performance goals geared at DC04
requirementsMetrics will be defined and
tracked. - Computer Science application demonstrators should
helpdetermine benefits from and readiness of
core technologies. - Grid3 will be LCG-1 compliant wherever possible.
- This could mean, for example, using the same VDT
release asLCG-1 but in practice will probably
mean service compatible versions
36LCG Middleware Layers
- LCG Prototype LCG-1 Architecture of Middleware
Layers
Middleware!!
37Middleware continues to be a focus
- LCG is struggling with middleware components -
commend VDT project
38Building the LHC GRID
- Working with Grid middleware providers, have
found ways to make a CMS Grid work - This way large computing resources available
- If the GRID software and GRID management can be
good enough - GRID Software still has a long way to go
- And it is only the basic layer
- Much of what we have is a good prototype
- Need to address how to approach the next level of
functionality - specifically Globus Toolkit 3 and Open Grid
Services Infrastructure (OGSI) - US and CERN/Europe will need to find a way to
address the maintenance issue - Basic Grid functionality now works
- Working with LCG to implement and/or integrate
other required features - See a path towards getting the Grid Middleware
for basic CMS production, data management in place
39HEP-specific Grid Layers, End-to-end Services
- HEP Grid Architecture (H. Newman)
- Layers Above the Collective Layer
- Physicists Application Codes
- Reconstruction, Calibration, Analysis
- Experiments Software Framework Layer
- Modular and Grid-aware Architecture able to
interact effectively with the lower layers
(above) - Grid Applications Layer (Parameters and
algorithms that govern system operations) - Policy and priority metrics
- Workflow evaluation metrics
- Task-Site Coupling proximity metrics
- Global End-to-End System Services Layer
- Workflow monitoring and evaluation mechanisms
- Error recovery and long-term redirection
mechanisms - System self-monitoring, steering, evaluation and
optimization mechanisms - Monitoring and Tracking Component performance
- Already investigate a set of prototypical
services and architectures
(I.Foster et al.)
40Grid Services Architecture
- Have seen Grid services technologies, e.g.
OGSI, how about Architectures?
41Distributed Analysis
- Unclear in the LHC community how we should
approach that new focus - Distributed Analysis effort not yet projectized
in the US CMS WBS - Need to understand what should be on CMS, in LCG
AA, in RD projects - perception of (too many) independent
(duplicating) efforts (?) - What can we test/use in DC04?
- Some prototypes can be tested soon and for DC04
- What are the assumptions they make on the
underlying GRID - On Physicists work patterns?
- How are their architectures similar/different?
- Are their similarities that can sensibly be
abstracted to common layers? - Or is it premature for that
- Diversity is probably good at this time!
- LCG RTAG on An Architectural Roadmap towards
Distributed Analysis - review existing, confront with HEPCAL use cases,
consider interfaces between Grid, LCG and
Application services, - To develop a roadmap specifying wherever possible
the architecture, the components and potential
sources of deliverables to guide the medium term
(2 year) work of the LCG and the DA planning in
the experiments.
42LHC Architecture for Distributed Analysis
43DAWN Scientists within Dynamic Workspaces!
- How will Communities of Scientists Work Locally
Using the Grid - Infrastructure for sharing, consistency of
physics and calibration data, software
Communities!!
44Dynamic Workspaces DAWN
- This is About Communities of Scientists Doing
Research in a Global Setting
45Dynamic Workspaces DAWN
- DAWN Proposal Focusses on Dynamic
Workspaceswithin the Peta-scale Grid!
46The Vision of Dynamic Workspaces
- Science has become a vastly more complex human
endeavor. Scientific collaborations are becoming
not only larger, but also more distributed and
more diverse. The scientific community has
responded to the challenge by creating global
collaborations, petascale data infrastructures
and international computing grids. This
proposal is about taking the next step, to
research, prototype and deploy the user level
tools that will enable far-flung scientific
collaborators to work together as collocated
peers. We call this new class of scientific tool
a dynamic workspace and it will fundamentally
change the way we will do science in the
future.Dynamic workspaces are environments for
scientific reasoning and discovery. These
environments are based on advanced grid
middleware but extend beyond grids. Their design
and development will require the creation of a
multidisciplinary team that combines the skills
of computer scientists, technologists and those
of domain experts. We are focusing on the needs
of the particle physics community, specifically
those groups working on the LHC. Dynamic
Workspaces are managed collections of objects and
tools hosted on a grid based distributed
computing and collaboration infrastructure.
Workspaces extend the current capabilities of the
grid by enabling distributed teams to work
together on complex problems, which require grid
resources for analysis. Dynamic workspaces
expand the capabilities of existing scientific
collaborations by creating the ability to
construct and share the scientific context for
discovery.
47CS Research and Work Areas
- Workspaces are about building the capability to
involve a community in the process of doing
science. To develop a community oriented approach
to science requires progress in three key areas
of Computer Science research - Knowledge Management
- the techniques and tools for collecting and
managing the context in which the mechanical
aspects of the work are done. The resulting
methods and systems will enable the workspace to
not only hold the scientific results, but also to
be able to explain and archive the reasons for
progress. - Resource Management
- workspaces will sit at the top of resource
pyramids. Each workspace will have access to
many types of resources from access to data to
access to supercomputers. A coherent set of
policies and mechanisms will be developed to
enable the most effective use of the variety of
resources available. - Interaction Management
- The many objects, people and resources in a
workspace need to be managed in a way that key
capabilities are available when and where users
need them. The task of coupling the objects in a
workspace and facilitating their use by people is
the goal of interaction management.
48DAWN Model for CS Applications collaboration
49DAWN Project Structure ITR US LHC
- CS Applications Areas LHC Systems Integration
50Grid Services Infrastructure
- Grid Layer Abstraction of Facilities Rich
with Services!
Services!!
51Steps towards Grid Service Infrastructure
- Initial Testbeds in US Atlas and US CMS,
consolidation of middleware to VDT - VDT agreed as basis of emerging LCG service,
basis of the EDG 2.0 distribution - Build a functional Grid between Atlas and CMS in
the US Grid3 - based on VDT, with a set of common services VO
management, information services, monitoring,
operations, etc - demonstrate this infrastructure using
well-defined metrics for LHC applications - November CMS demonstration of reliant massive
production (job throughput), robust data
movements (TB/day), consistent data management
(files, sites) - to scale of the 5 data challenge DC04, planned
for Feb. 2003 - Get LHC Grid stake holders together in the US and
form the Open Science Consortium - LHC labs, Grid PIs, Tera Grid, Networking
- develop plan for implementing and deploying Open
Science Grid peering with the EGEE in Europe,
Asia to provide LHC infrastructure
52Proposed OSG Goals and Scope
- We have started to develop a plan
- and a proposal to the DOE and NSF, over the next
few months - and to forge an organization and collaboration
- building upon the previously proposed Open
Science Consortium - to build an Open Science Grid in the US on a
Peta-Scale - for the LHC and other science communities
- Goals and Scope
- Develop and deploy services and capabilities for
a Grid infrastructure that would make LHC
computing resources, and possibly other computing
resources for HEP and and other sciences (Run 2
etc) available to the LHC Science community, - as a functional, managed, supported and
persistent US national resource. - Provide a persistent 24x7 Grid that peers and
interoperates, interfaces to and integrates with,
other national and international Grid
infrastructures - in particular the EGEE in Europe (which will
provide much of the LHC Grid resources in Europe
to the LCG) - This would change how we do business in US LHS
and maybe in Fermilab
53A Project to Build the Open Science Grid
- Scope out services and interface layers between
Applications and Facilities - LHC already has identified funding for the fabric
and its operation - Work packages to acquire and/or develop enabling
technologies as needed - goal to enable "persistent organizations" like
the national labs to provide those
infrastructures to the application communities
(CMS, Atlas, etc) - develop the "enabling technologies" and systems
concepts that allow the fabric providers to
function in a Grid environment, and the
applications and users to seamlessly use it for
their science - develop well defined interfaces and a services
architecture - issues like distributed databases, object
collections, global queries - work on the technologies enabling end-to-end
managed resilient and fault tolerant systems
networks, site facilities, cost-estimates - devise strategies for resource use, and
dependable "service contracts" - Put up the initial operation infrastructure
54Initial Roadmap to Open Science Grid
- PMG members and ASCB members have seen a first
draft of the Open Science Grid document - Briefing of Computing Division, strong
endorsement from CD head - Discussion with Atlas and general agreement
- Discussions with Grid PIs on OSG and Grid3 (iVDGL
and PPDG steering) - started formulation of the Grid3 plan, and task
force to define Grid3 workplan - initial discussions with DOE/NSF
- starting to develop a technical document
- Workshop at Caltech June 2003 and start of ARDA
- starting the Grid services architecture for Grid
Enabled Analysis - starting to define examples for service
architectures and interfaces - Roadmap towards Architecture for all four LHC
experiments in October - Planning for initial Open Science Consortium
meeting in July
55Summary US CMS Grid Activity Areas
- Peta-Scales Building Production Quality Grids
- US CMS pre-challenge production, LCG-1, Grid3
- Middleware Drafting the Grid Services
Architecture - VDT and EGEE, DPE and LCG-1, ARDA and GGF
- Communities Dynamic Workspaces And
Collaboratories - DAWN and GECSR NSF ITR
- Services Building the Grid Services
Infrastructure for providing the persistent
services and the framework for running the
infrastructure - Open Science Grid and Open Science Consortium,
labs and DOE - Adapting the US project to provide the Grid
Services Infrastructure
56Conclusions on US CMS SC
- US CMS SC Project is delivering a working Grid
environment, with a strong participation of
Fermilab and U.S. Universities - There is still a lot of RD and most of the
engineering to do - With a strong operations component to support
physics users - US CMS has deployed initial Grid system that is
delivering to CMS physicistsand shows that the
US Tier-1/Tier-2 User Facility system can indeed
work to deliver effort and resources to US CMS! - With the funding advised by the funding agencies
and project oversightwe will have the manpower
and equipment at the lab and universities to
participate in strongly in the CMS data
challenges, - bringing the opportunity for U.S. leadership into
the emerging LHC physics program - Next Steps are crucial for achieving an LHC
computing environment that is truly reaching out
into the US - a global production of 50M events in preparation
of the next Data Challenge, run in Grid-mode in
the US - Grid3 and the integration with the European Grid
efforts and the LCG - performing the DC, streaming data at 5 level,
throughout the integrated LHC Grid
57CMS Timelines
CMS/CCS
CMS/PRS
LCG
POOL
2003
New Persistency Validated
General Release
Physics Model Drafted
OSCAR Validated
LCG-1 Middleware And Centers Ramping up
DC04 Test LCG1 Start PTDR
Computing Model Drafted
2004
LCG-3 Final Prototype
Physics TDR Work
DC05 Test LCG3
Computing TDR
2005
Complete
LCG TDR
Computing MOUs
Physics TDR
2006
PURCHASING
DC06 Readiness Check
58CCS Level 2 milestones
DC04
LCG-1
59CMS Milestones v33 (June 2002)