Title: USCMS
1 US-CMS User Facilities Vivian ODell FNAL
Software and Computing PMG May 11, 2001
2Production Status in US
- US-CMS is responsible for reconstruction of 2M
JPG PRS group events. - Weekly coordination meetings between T1 (FNAL)
and T2 (CalTech/UCSD). - Many problems
- At FNAL we have had many hardware problems
- T2 have personnel issues
- At FNAL we have daily production meetings
- Keeps team focused
- Good collaboration between CMS/ISD/OSS/ODS
3Tier 1 Production Status
4CMS Cluster
WORKERS
SERVERS
GALLO,WONDER,VELVEETA, CMSUN1
Popcrn01 - popcrn40
5Production Status in US
- Fermilab is hosting the JPG User Federation
- Being used by people at FNAL/Wisconsin for HLT
studies - Keep documentation up to date with federation
contents - Required AMS-Enstore interface for automatic
migration of files - See
- http//computing.fnal.gov/cms/Monitor/cms_producti
on.html - Production scripts written by FNAL being used by
all of CMS (including CERN!) - These scripts will allow us to make a distributed
system.
6FNAL/UF in International CMS
- It is essential we have good integration in
international CMS - Natalia Ratnikova (UF Code librarian) is also
spending part of her time as CAS engineer
working on SCRAM the CMS code distribution tool - Greg Graham/Hans Wenzel spend part of their time
on UF issues - Pal Hidas (PPD-Guest Scientist working on HLT
studies for JPG) is sitting with us. - Mixing roles according to expertise really helps
people stay integrated with international CMS
7Hardware Spending Proposal
- We have put together a hardware spending proposal
for T1 consistent with user and RD needs - This was presented to the ASCB in April and
general principles agreed to - Some concern was raised that FNAL buys expensive
stuff - We have a meeting of CMS computing people this
afternoon to hash out the details and start
writing requisitions
8Summary of Current UF tasks
- Digitization of JPG fall production with Tier 2
sites - New MC (spring) production with Tier 2 sites
- Hosting JPG User Federation at FNAL
- For fall production, this implies 4 TB storage
- Implies 1TB on disk, 3 TB in tape storage
- Hosting MPG User Federation at FNAL?
- For fall production, this implies 4 TB storage
- Implies 1TB on disk, 3 TB in tape storage
- Also hosting User Federation from spring
production, AOD or even NTUPLE for users - Objectivity testing/RD in data hosting
9Hardware Equipment Needs
- Efficient use of CPU at Tier 2 sites so we
dont need additional CPU for production - Fast, efficient, transparent storage for hosting
user federation - Mixture of disk/tape
- RD for RAID/disk/OBJY efficient matching
- This will also serve as input to RC simulation
- Build RD systems for analysis clusters
10Funding Request for 2001-2003
(This is the total with FNAL added overhead)
11Funding Request for 2001
12Funding Reality for 2001
13Proposed Spending FY01
- 1.6.1 Test Systems 48,500
- Sun Netra-T1 3,500.00 (?)
- 1 TB RAID (ZYZZX) 45,000.00
14Proposed Spending FY01
- 1.6.2.2.1 Prototype User Analysis System (RD)
53,000 - Dell 6400 Server 16,000.00
- Powervault Storage Array
7,000.00 - Eight worker nodes 30,000.00
-
15Proposed Spending FY01
- 1.7.2 Disk support for User Federation 45,000
- 1 TB ZYZZX 45,000.00
- 1.6.2.2.3 Storage RD Systems 188,000
- 2 Dell 6400 Server 32,000.00
- 1 TB RAIDZONE 25,000.00
- 1 TB 3WARE 13,000.00
- 2 TB WINCHESTER 65,000.00
-
- Farm Test System 53,000.00
- Dell 6400 Server 16,000.00
- Powervault Storage Array 7,000.00
- Eight worker nodes 30,000.00
16Proposed Spending FY01
- Networking 52,000
- WS-C6509-2500AC Catalyst 6509 Chassis w/2500W AC
Power 9,300.00 - WS-CAC-2500W/2 Catalyst 6000 Second 2500W AC
PS 2,600.00 - WS-X6K-S2-PFC2 Catalyst 6500 Supervisor
- Engine-2,2GE,plus PFC-2 16,000.00
- WS-X6516-GBIC Catalyst 6500 16-port GigE
ModFabric - Enabled 16,500.00
- WS-X6548-RJ-21 Catalyst 6500 48-Port 10/100
RJ21Fabric - Enabled 13,300.00
- --------------------------------------------------
--------------------------------------------------
---------- - TOTAL will be 52,000.00
- (if we trade in a couple our old switches)
17Funding Proposal for 2001
Some costs may be overestimated (but) also
we may need to augment our farm CPU
18Selection of 2nd T2 prototype
- Method
- Level 1 SC PM sends a letter to the US-CMS
collaboration requesting institutes interested in
becoming a Tier 2 prototype site to send a letter
of interest to the Level 2 US-CMS User Facility
Manager. - Lothar did this March 28, 2001
- (Matthias did this May 25, 2000)
- The letter of interest should address the
selection criteria (see below) - The User Facility Manager will review all of the
proposals, request more information if necessary
and write a report detailing the results and
recommending a selection. - I did this April 22nd
- This report then goes to the ASCB and the L1
project manager for US-CMS Software and
Computing. - Sent draft to Lothar Irwin on April 23
- The L1 Project Manager in consideration of the UF
report and with input from the ASCB then makes
the final decision on the prototype center
location.
19Tier 2 Selection
- Selection Criteria (as detailed in letter from
Lothar to US-CMS) - Activity in GRID research
- Amount of personnel and degree of involvement in
RD projects (GriPhyN, PPDG, Globus, testbeds) - Existing resources that can be leveraged, i.e.
local and wide area network connectivitiy,
availability of support staff, and existing
hardware such as mass storage systems,
processors, disk storage -
20Tier 2 Selection
- Results and Responses
- 6 universities responded in May, 2000
- 2 of the original 6 responded in April, 2001
- I considered all 6 and used updates where I had
them. I also requested information from a 7th
(MIT) and clarifications from most of the
original 6 - I then rated the responses according to the
selection criteria - University of Florida
- Iowa State/University of Iowa
- University of Maryland
- Massachusettes Institute of Technology
- University of Minnesota
- Northeastern University
- University of Wisconsin
- Recommended University of Florida for next pT2
center
21Tier 2 Hardware Status
From CalTech 20 dual 800MHz PIIIs, 0.5 GB
RAM Dual 1GHz CPU DS, 2GB RAM 2 X 0.5 TB fast
(Winchester) RAID 70 MB/sec sequential disk
access Total cost 140k at each site. Installed
CMS software, in process of commissioning. Tested
ooHits. UCSD has similar system in similar
status Plans to buy another 20 duals this year
at each site, 65k each(?) This makes a total
spending of 410k out of projected costs of 430k
22Florida Cluster
- A CMS Computing Cluster is being assembled at the
University of Florida by Paul Averys Group. - 72 computational nodes
- Dual 1GHz PIII
- Based on Intel STL2 Motherboards
- Built in Ethernet, SCSI, and VGA
- Serverworks ServerSet III LE Chipset
- 512MB PC133 MHz SDRAM
- 76GB IBM IDE Disks
- Purchased from Ace Computers of Chicago
- (http//www.acecomputers.com)
- Per system cost of 1950
- Sun Dual Fiber Channel RAID Array 660GB (raw)
- Connected to Sun Data Server
- Not yet delivered. Performance numbers to follow.
23CMS Milestones (as pertinent to UF)
- CMS is starting from a working OO system and
slowly building in complexity. Our goals for the
next few years are to fully OO the software chain
and implement grid tools. - 6/01/2000 Roughly 1 M events produced for Higher
Level Trigger Studies (0.1 MDC, though not
full complexity). Full OO reconstruction
pileup. - 12/01/2000 5M events produced for HLT Studies,
representing 0.5 MDC. CERN, INFN, FNAL,
Caltech and Moscow are participating in
deploying grid tools to share simulated data.
- 12/01/2000 Trigger TDR.
- 01/30/2001 create AOD at FNAL for JPG group. For
this we have to run over 2M final jets
events. Support re- reconstruction. Needs
5-6k Si95s for this.
24CMS Milestones (as pertinent to UF)
- 06/01/2001 1 data challenge. This means 10M
events, or gt2M generated in US. 2M at FNAL, 2M
at T2? Continue with grid implementation.
Non-networked based data movement? - 12/01/2001 1-2 data challenge. Fully support
prototpye Tier 2s. Implement any
reconfiguration or design from previous data
challenge. - 11/01/2002 DAQ TDR
- 12/01/2003 Software and Computing TDR
- 03/01/2003 Tier 1 Regional Center 5 prototype.
- 06/01/2003 5 Data Challenge. Goals not yet
defined.
25CMS Milestones (as pertinent to UF)
- Longer Term Milestones
- 10/01/2003 Begin Tier 1 implementation phase
(FY2004) - 12/01/2004 Physics TDR
- 12/01/2004 20 Data Challenge
- 04/01/2006 Pilot LHC run (2-4 weeks at 1032)
- 08/08/2006 LHC running (1033)
- -02/27/07
26Near Term Plans
- Define hardware strategy for FY2001. DONE
- We have the project plan as the basis.
- Refining using input from
- US-CMS physics coordinator and groups
- US-CMS users
- ASCB
- CMS SCB
- Production Organization DONE
- Scripts developed at Fermilab have been adopted
as the official CMS distributed production
scripts - Working with T2 on production and prototype grid
systems - Strategy for dealing with strong authentication
- In progress
- Organize another CMS software tutorial early
summer, - coinciding with kerberizing CMS machines
27Summary
- Lots of progress in UF
- US-CMS computing beginning to coalesce
- Good interaction with international CMS
- Production status
- US CMS taking a lead role here.
- Would like stronger connections with physics
groups - By next PMG
- Hardware reqs out and equipment coming in
- RD projects underway
- Continue documentation and collaborative efforts
- Production at 1st T2 site ramped up
- Florida prototype ready to begin production
- Critical items
- Fix milestones to new schedule and update WBS
- Signed SoWs for T2 sites