Run IIb DAQ / Online status - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Run IIb DAQ / Online status

Description:

either with latest generation of processors (PowerPC) which run current software (VxWorks), or transition to different architecture (eg Intel) with new OS (eg Linux) ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 21
Provided by: Jonathan5
Category:
Tags: daq | iib | intel | online | processors | run | status

less

Transcript and Presenter's Notes

Title: Run IIb DAQ / Online status


1
Run IIb DAQ / Online status
  • Stu Fuess
  • Fermilab

2
Introduction
  • In order to meet the DAQ and Online computing
    requirements for Run IIb we plan
  • Level 3 farm node increase
  • Brown, Univ. of Washington, Fermilab
  • Host system replacements / upgrades
  • Hardware Fermilab, Softwarevarious
  • Control system node upgrade
  • Fermilab
  • The requirements, plans, status, and future
    activities will be discussed

3
Level 31.3.1
4
Level 3 farm nodes
  • Need greater L3 processing capabilities for
    higher luminosities

Dual Nodes GHz plan 48 1.0
to be removed 34 1.6 existing 32
2.0 existing 96 2.2 to be
added 332 GHz equiv CPUs now 659 GHz equiv CPUs
for start of RunIIb
For example 1 kHz _at_ 500 ms-GHz requires 500 GHz
of CPUs
5
Level 3 farm nodes, contd.
  • Plan Single purchase, summer 2005, of 210K of
    nodes
  • 3 racks of 32 96 nodes plus infrastructure
  • Strategy
  • This is an off the shelf purchase, but a major
    one
  • Similar to CompDiv farms purchases
  • Used a Run IIa purchase to refine the procedure
  • 32 node addition
  • History
  • Req preparation begun 1/04/04
  • Req submitted 1/29/04
  • PO created 3/23/04 (51.5K)
  • Prototype system delivery 4/21/04
  • Full order delivery 6/21/04
  • Operational in Level 3 on 6/23/04

5 month process !
Thanks to Computing Division for help!
Unburdened FY02
6
Level 3 farm nodes, contd.
  • Other preparations
  • Will replace 3 racks / 48 nodes of older
    processors with3 racks / 96 nodes
  • Existing electrical circuits and cooling
    sufficient for new racks
  • Will need additional 48 network ports on Level 3
    and Online switches
  • Impact
  • Installation somewhat disruptive, as will remove
    3 racks (48) of older nodes to make room for
    these
  • Remaining 66 nodes operational during
    installation
  • Schedule
  • Plan for arrival of nodes at start of 2005
    shutdown
  • Start purchase process 3/05
  • Continued replacement with upgraded nodes will be
    necessary over the duration of Run IIb (Operating
    funds)

7
Host systems1.3.2
8
Host systems
  • Need
  • Replace 3-node Alpha cluster, which has the
    functions
  • Event data logger, buffer disk, transfer to FCC
  • Oracle database
  • NFS file server
  • User database
  • Plan
  • Replace with Linux servers
  • Install a number (4) of clusters which supply
    "services
  • Shared Fibre Channel (FC) storage and failover
    software to provide flexibility and high
    availability
  • 247K for processor and storage upgrades

Unburdened FY02
9
DØ Online Linux clusters
Clients
Network Switch
SAN
Fibre Channel Switch
Fibre Channel Switch
Legacy RAID Array
Legacy JBOD Array
RAID Array
JBOD Array
10
Cluster Configuration
Cluster
  • Service
  • Name
  • Domain
  • Check interval
  • Script
  • Cluster
  • Member
  • Name
  • Device
  • Device special file
  • Mount point
  • File system
  • Mount options

Power Controller
ip Address
  • NFS Export
  • Export directory
  • NFS Clients
  • Client names / addresses
  • Export options

11
Cluster Services
Details of configuration of cluster services
Using experience of Run IIa in how things
actually work!
12
Host systems, contd.
  • System tests
  • Performed tests of Fibre Channel, network,
    storage rates
  • Network capable of wire rate (1 Gb/sec)
  • Storage

Write (MB/s) Read (MB/s)
Local disk
JBOD 18 53
SW RAID 0 88 70
FC disk
JBOD 52 41
HW RAID 0 94 30
HW RAID 1
HW RAID 5 31 34
SW RAID 0 75 83
Target is 25 MB/s for event path
13
Host systems, contd.
  • System tests, contd.
  • Checked relative performance of dual vs quad
    processor systems
  • Conclude dual processor nodes, at 20 the cost,
    are sufficient for all but possibly the highest
    I/O DAQ data logging nodes
  • Potential issues/concerns
  • Linux 2.4 kernel has problems with multiple
    high-rate buffered I/O streams much better in
    2.6 kernel alleviated somewhat with use of
    direct I/O
  • Expect to see 2.6 next Spring/Summer in Fermi
    Linux
  • The design avoids this situation
  • Fibre Channel redundant paths somewhat
    complicated
  • Expect to use a manual solution, but is
    solvable () with commercial Secure Path software

14
Host systems, contd.
  • Cluster implementation
  • Red Hat Cluster Suite
  • Available open source, distributed in Fermi Linux
  • But also a supported () Red Hat Application
    Suite product
  • No kernel modifications required
  • Can use non-homogeneous distributions
  • Can be made to work with non-homogeneous hardware
  • Use LVM as virtual storage layer
  • Cluster tests
  • Storage device access
  • NFS failover
  • File reads/writes transparently complete when
    active node turned off and service transitioned
    to backup node

15
Host systems, contd.
  • Status
  • A 2-node cluster has been created
  • Single-path FC SAN
  • Service failover demonstrated
  • 6 new servers delivered 6/21/04
  • Will construct 4 clusters during summer/fall 2004
    shutdown
  • Schedule
  • Fall 04 attempt to move everything!
  • DAQ, Oracle, NFS, etc
  • Need involvement of software system experts
  • Dual-path SAN still a challenge
  • DAB2 rack space juggling a challenge
  • Disruptive (possibly a day or two)! Essential
    functions will have to be relocated and debugged
  • Summer 05 to enhance with best processors

16
Control system1.3.3
17
Control System
  • Need
  • The current control system processors (100 of
    them)
  • are becoming obsolete and not maintainable
  • Lost 2 nodes, repaired 5 during Run IIa
  • are limiting functionality in some areas
  • Tracker readout crates are CPU limited
  • Plan
  • Upgrade 1/3 of the control system processors
  • either with latest generation of processors
    (PowerPC) which run current software (VxWorks),
    or transition to different architecture (eg
    Intel) with new OS (eg Linux)
  • Inclination is to just purchase appropriate
    number of the current processor family and
    minimize software changes
  • Strategy
  • 140K to upgrade processors
  • Scheme for replacement on next slide

Unburdened FY02
18
Control System Processors
Detector subsystem of Processors Processor types Replacement plan
Control and Monitoring 18 (11) 16MB PowerPC (6) 64MB PowerPC (1) 128MB PowerPC Replace, use old processors for HV or spares need 12 additional for CAL
High Voltage 30 (30) 16MB PowerPC Retain, with new and spare needs met from other replacements
Muon 40 (23) 4MB 68K (16) 128MB PowerPC OK as is, 16 in readout crates are recent replacements
Tracker readout 26 (10) 32MB Power PC (11) 64MB PowerPC (5) 128MB PowerPC Replace, use old processors for HV or spares
Test stands 13 mixed low end Use available
19
Control System, contd.
  • Impact
  • Potential short disruptions in control system
    functions as processors are replaced
  • Schedule
  • Recently purchased latest PowerPC processor for
    testing
  • Testing EPICS and D0 controls software
  • Follow evolutionary developments of OS (VxWorks)
    and Control System Framework (EPICS)
  • Purchases in advance of Summer 05, then
    incremental installation of nodes

20
Conclusion
  • Three activities
  • Level 3
  • Host systems
  • Control system
  • Level 3 is an addition of nodes
  • Host system changes are most revolutionary
  • Attempting to perform upgrade this summer/fall
  • Improvements in functionality
  • Control system is a replacement of nodes
  • With evolutionary progress of VxWorks, EPICS
    software
  • Expect nearly seamless transition, ready for IIb
Write a Comment
User Comments (0)
About PowerShow.com