Title: Run IIb DAQ / Online status
1Run IIb DAQ / Online status
2Introduction
- In order to meet the DAQ and Online computing
requirements for Run IIb we plan - Level 3 farm node increase
- Brown, Univ. of Washington, Fermilab
- Host system replacements / upgrades
- Hardware Fermilab, Softwarevarious
- Control system node upgrade
- Fermilab
- The requirements, plans, status, and future
activities will be discussed
3Level 31.3.1
4Level 3 farm nodes
- Need greater L3 processing capabilities for
higher luminosities
Dual Nodes GHz plan 48 1.0
to be removed 34 1.6 existing 32
2.0 existing 96 2.2 to be
added 332 GHz equiv CPUs now 659 GHz equiv CPUs
for start of RunIIb
For example 1 kHz _at_ 500 ms-GHz requires 500 GHz
of CPUs
5Level 3 farm nodes, contd.
- Plan Single purchase, summer 2005, of 210K of
nodes - 3 racks of 32 96 nodes plus infrastructure
- Strategy
- This is an off the shelf purchase, but a major
one - Similar to CompDiv farms purchases
- Used a Run IIa purchase to refine the procedure
- 32 node addition
- History
- Req preparation begun 1/04/04
- Req submitted 1/29/04
- PO created 3/23/04 (51.5K)
- Prototype system delivery 4/21/04
- Full order delivery 6/21/04
- Operational in Level 3 on 6/23/04
5 month process !
Thanks to Computing Division for help!
Unburdened FY02
6Level 3 farm nodes, contd.
- Other preparations
- Will replace 3 racks / 48 nodes of older
processors with3 racks / 96 nodes - Existing electrical circuits and cooling
sufficient for new racks - Will need additional 48 network ports on Level 3
and Online switches - Impact
- Installation somewhat disruptive, as will remove
3 racks (48) of older nodes to make room for
these - Remaining 66 nodes operational during
installation - Schedule
- Plan for arrival of nodes at start of 2005
shutdown - Start purchase process 3/05
- Continued replacement with upgraded nodes will be
necessary over the duration of Run IIb (Operating
funds)
7Host systems1.3.2
8Host systems
- Need
- Replace 3-node Alpha cluster, which has the
functions - Event data logger, buffer disk, transfer to FCC
- Oracle database
- NFS file server
- User database
- Plan
- Replace with Linux servers
- Install a number (4) of clusters which supply
"services - Shared Fibre Channel (FC) storage and failover
software to provide flexibility and high
availability - 247K for processor and storage upgrades
Unburdened FY02
9DØ Online Linux clusters
Clients
Network Switch
SAN
Fibre Channel Switch
Fibre Channel Switch
Legacy RAID Array
Legacy JBOD Array
RAID Array
JBOD Array
10Cluster Configuration
Cluster
- Service
- Name
- Domain
- Check interval
- Script
- Device
- Device special file
- Mount point
- File system
- Mount options
Power Controller
ip Address
- NFS Export
- Export directory
- NFS Clients
- Client names / addresses
- Export options
11Cluster Services
Details of configuration of cluster services
Using experience of Run IIa in how things
actually work!
12Host systems, contd.
- System tests
- Performed tests of Fibre Channel, network,
storage rates - Network capable of wire rate (1 Gb/sec)
- Storage
Write (MB/s) Read (MB/s)
Local disk
JBOD 18 53
SW RAID 0 88 70
FC disk
JBOD 52 41
HW RAID 0 94 30
HW RAID 1
HW RAID 5 31 34
SW RAID 0 75 83
Target is 25 MB/s for event path
13Host systems, contd.
- System tests, contd.
- Checked relative performance of dual vs quad
processor systems - Conclude dual processor nodes, at 20 the cost,
are sufficient for all but possibly the highest
I/O DAQ data logging nodes - Potential issues/concerns
- Linux 2.4 kernel has problems with multiple
high-rate buffered I/O streams much better in
2.6 kernel alleviated somewhat with use of
direct I/O - Expect to see 2.6 next Spring/Summer in Fermi
Linux - The design avoids this situation
- Fibre Channel redundant paths somewhat
complicated - Expect to use a manual solution, but is
solvable () with commercial Secure Path software
14Host systems, contd.
- Cluster implementation
- Red Hat Cluster Suite
- Available open source, distributed in Fermi Linux
- But also a supported () Red Hat Application
Suite product - No kernel modifications required
- Can use non-homogeneous distributions
- Can be made to work with non-homogeneous hardware
- Use LVM as virtual storage layer
- Cluster tests
- Storage device access
- NFS failover
- File reads/writes transparently complete when
active node turned off and service transitioned
to backup node
15Host systems, contd.
- Status
- A 2-node cluster has been created
- Single-path FC SAN
- Service failover demonstrated
- 6 new servers delivered 6/21/04
- Will construct 4 clusters during summer/fall 2004
shutdown - Schedule
- Fall 04 attempt to move everything!
- DAQ, Oracle, NFS, etc
- Need involvement of software system experts
- Dual-path SAN still a challenge
- DAB2 rack space juggling a challenge
- Disruptive (possibly a day or two)! Essential
functions will have to be relocated and debugged - Summer 05 to enhance with best processors
16Control system1.3.3
17Control System
- Need
- The current control system processors (100 of
them) - are becoming obsolete and not maintainable
- Lost 2 nodes, repaired 5 during Run IIa
- are limiting functionality in some areas
- Tracker readout crates are CPU limited
- Plan
- Upgrade 1/3 of the control system processors
- either with latest generation of processors
(PowerPC) which run current software (VxWorks),
or transition to different architecture (eg
Intel) with new OS (eg Linux) - Inclination is to just purchase appropriate
number of the current processor family and
minimize software changes - Strategy
- 140K to upgrade processors
- Scheme for replacement on next slide
Unburdened FY02
18Control System Processors
Detector subsystem of Processors Processor types Replacement plan
Control and Monitoring 18 (11) 16MB PowerPC (6) 64MB PowerPC (1) 128MB PowerPC Replace, use old processors for HV or spares need 12 additional for CAL
High Voltage 30 (30) 16MB PowerPC Retain, with new and spare needs met from other replacements
Muon 40 (23) 4MB 68K (16) 128MB PowerPC OK as is, 16 in readout crates are recent replacements
Tracker readout 26 (10) 32MB Power PC (11) 64MB PowerPC (5) 128MB PowerPC Replace, use old processors for HV or spares
Test stands 13 mixed low end Use available
19Control System, contd.
- Impact
- Potential short disruptions in control system
functions as processors are replaced - Schedule
- Recently purchased latest PowerPC processor for
testing - Testing EPICS and D0 controls software
- Follow evolutionary developments of OS (VxWorks)
and Control System Framework (EPICS) - Purchases in advance of Summer 05, then
incremental installation of nodes
20Conclusion
- Three activities
- Level 3
- Host systems
- Control system
- Level 3 is an addition of nodes
- Host system changes are most revolutionary
- Attempting to perform upgrade this summer/fall
- Improvements in functionality
- Control system is a replacement of nodes
- With evolutionary progress of VxWorks, EPICS
software - Expect nearly seamless transition, ready for IIb