Title: ALICE DAQ Comprehensive Review 5
1ALICE DAQComprehensive Review 5
- P. VANDE VYVRE CERN/PH for
- the ALICE DAQ project
- Birmingham, Budapest, CERN, Istanbul, Split,
Zagreb - CERN - 7/8 March 2005
- Istanbul University ALICE membership being
discussed now with Funding Agency
2Acronyms (1)
- AliROOT ALICE sw framework based on ROOT
- AFFAIR A Fine Fabric and Applications Information
Recorder Performance monitoring sw - ADC ALICE Data Challenge ALICE
DAQ/HLT/MSS/Offline integrated test - BW Bandwidth
- CASTOR CERN Advanced STORage Manager CERN
developed MSS - CTP Central Trigger Processor System managing
TRG L0, L1, L2 - DAQ Data Acquisition System
- DAS Direct Attached Storage Storage
accessible from one computer - DATE Data Acquisition and Test Environment ALICE
DAQ sw framework - DDL Detector Data Link ALICE optical link
- DDL DIU DDL Destination Interface Unit Optical
Link receiving side (DAQ side) - DDL SIU DDL Source Interface Unit Optical Link
sender side (detector side) - EBDS Event Building and Distribution System Event
building load balancing system - EDM Event Destination Manager Sw allocating the
GDC for event-building - EOR End Of Run Phase of the DAQ control system
- GDC Global Data Collector CPU performing
event-building - HLT High Level Trigger ALICE Software Trigger
Level 3 - HW Hardware
3Acronyms (2)
- I/O bus Input/Output bus Computer bus used for
input/output - L0, L1, L2 Trigger levels 0,1,2 Fast TRG based
on partial data (hw) - LDC Local Data Concentrator CPU performing DDL
readout sub-event building - LTC Local Trigger Crate Local Trigger System
interface to central TRG and TTC,
stand-alone TRG system - LTU Local Trigger Unit Board interfacing the
central TRG to the LTC - MSS Mass Storage System Data management
software - NAS Network Attached Storage Storage
accessible from a network through a
server - NIC Network Interface Card Computer interface
to the network - NTW Network
- OO Object-Oriented Software paradigm (C,
Java) - PCI Open standard of PC I/O bus
- ROOT OO software framework for I/O
visualization - RORC Read-Out Received Card Mother-board of the
DDL SIU - SAN Storage Area Network Network dedicated to
serverless storage - SMI State Manager Interface Run control based
on distributed state machines - SOR Start Of Run Phase of the DAQ control
system - SW Software
- TRG Trigger
- TTC Trigger, Timing and Control Optical
broadcast system used by the TRG
4ALICE DAQ
- Data transfer DDL and D-RORC
- DATE V5 and DAQ fabric
- Integration with detectors
- Data Challenges
- Installation
- Commissioning
5DAQ architecture
Rare/All
CTP
L0, L1a, L2
BUSY
BUSY
LTU
LTU
DDL H-RORC
L0, L1a, L2
HLT Farm
TTC
TTC
FEP
FEP
FERO
FERO
FERO
FERO
Event Fragment Sub-event Event File
10 DDLs 10 D-RORC 10 HLT LDC
262 DDLs
123 DDLs
329 D-RORC 175 Detector LDC
LDC
LDC
LDC
LDC
LDC
Load Bal.
Event Building Network
EDM
50 GDC 25 TDS
GDC
GDC
GDC
DSS
DSS
GDC
5 DSS
Storage Network
6DDL Radiation Tolerance Test
- The SIU card works in radiation environment
- Total ionizing dose 16 Gy/10 years
- Neutron fluence 3.9 x 1011 n/cm2 /10 years
- Charged hadron fluence 8.9 x 1011 n/cm2 /10
years - Irradiation
- Cyclotron of TSI, Uppsala, Sweden protons, 50,
150, 180 MeV - Cyclotron of ATOMKI, Debrecen, Hungary neutrons,
1 to 15 MeV - All components are radiation tolerant except the
FPGAFocus on FPGA configuration loss - FPGAs under test (standalone and complete DDL
board) - ALTERA APEX-E (EP20KE...) SRAM
- XILINX Virtex II SRAM
- Actel ProASIC FLASH
- Tests register and RAM tests
- MEMORY test read and compare
- REGISTER test long chain of shift registers
7Test Setup - 1
Conf.PROM
ALTERAFPGA
optical cable
PC
S/P
OT
DDL card
RORC
MEMORY TEST REGISTER TEST Transfer over DDL
Test Setup - 2
adapter card
PC
FPGAtestboard
RS232
FPGA
Parallel port
REGISTER TEST
- XILINX or ACTEL
- Development Board
8Test Firmware and Software
- Tests
- MEMORY test FPGA internal memory cells filled
with bit pattern (2048 x 16 bit). - REGISTER test long chain of shift registers
(128 x 16 bit, 128 x 8 bit)
SW read and compare
Expected bit pattern
Read-out bit pattern
Difference
Memory cell error
Logic cell errorConfiguration Loss
9RadTol Project Results
10Rad-Tol DDL SIU design
- Self-healing card
- FPGA (ALTERA or XILINX) can suffer configuration
loss - Automated configuration error detection and
recovery - Both ALTERA and XILINX FPGAs support this with
special functions - On-board rad. tol. (e.g. Flash based) supervisory
circuit controls the mechanism - Radiation tolerant card
- All components are radiation tolerant including
FPGA - ACTEL ProASIC adopted as baseline
11DDL SIU Design (ACTEL)
Xtal
PLL
PLL
Data path(2x16 bits) control
Data path (Serial)
TXCLK
RXCLK
RXCLK/2
TXCLK/2
TXCLK
RXCLK
ACTEL ProASIC
SERDES
OpticalTransceiver
Power
JTAG
- Hardware
- Schematic ready for future ProASIC3
- 2 prototype boards no design error so far
- Firmware
- All modules ported from present firmware
- Timing critical modules reengineered
- Complete firmware simulated
- Simple DDL transactions already tested
12DDL Software
- All functions accessible as interactive commands
or API - Script-based interpreter for sequence of
operations - Sending command to the FEE
- Reading FEE status
- printing the status
- comparing the status
- polling the status
- Downloading data into the FEE from a file
- Reading data from the FEE
- writing data into a file
- comparing data with data in a file
- Part of start-of-run sequence
- TPC configuration lt 3.0 sAll pedestals of all
Altro 2MB/RCUSeveral thousands configuration
files
FERO
DDL
Control Configuration
Data
D-RORC
LDC
define pedestal_addr 0x1FFF define enable_
pedestal 0x2C reset SIU write_command enable_p
edestal write_block pedestal_addr
pedestal.hex x read_and_check_block pedestal_add
r pedestal.hex x
13DATE V5
- Run Control Compatible with latest version of the
Experiment Control System (See ECS talk) - Use of database (MySQL) for configuration
- HLT decisions distribution
- New multi-streams data recorder
- Use of database (MySQL) for Info-logging system
- DATE V5 ready. Test during Data Challenge (Mar
05) - Event building tested successfully with
Infiniband - Code management system CVS
- Release packaging and distribution Red Hat RPM
14Use of DBMS for DATE Configuration
- Database content
- DATE RolesActors of DATE system LDCs, GDCs
- TriggerTrigger masks
- DetectorsFront-end equipment of LDCs
- Event building controlEvent building rules
- BanksMemory banks to operate DATE
- Database implementation
15Distribution of HLT decisions inside DAQ
CTP
L2 trigger pattern
L2 trigger pattern
LTU
LTU
HLT Farm
TTC
TTC
Original LDC pattern
FEP
FEP
FERO
FERO
FERO
FERO
L2 trigger pattern
Refined LDC pattern HLT output pattern
Original LDC pattern
LDC
LDC
LDC
LDC
LDC
Refined LDC pattern HLT output pattern
Event Building Network
EDM
GDC
GDC
DSS
DSS
GDC
Refined LDC pattern HLT output pattern
Storage Network
16HLT decision handling in detector LDC
Detector
Eventsfragments
HLT
Det. LDC
DDL DIU
DDL SIU
D-RORC
readout
DATE banks
HLT decisions
Raw data
decision agent
recorder
NIC
Selected Sub-events
Event Building Network
17Event building and data recording in GDCs
Event Building Network
Sub-events (raw data, HLT payload, HLT
decisions) HLT decisions
- Event builderIn subeventsOut I/O vectorSet
of pointer/size pairs - DATE recorderRFIO format doneROOT data format
in progress Parallel streamsCASTOR file
systemInterfaced to the GRID
GDC
NIC
DATE data banks
event builder
ROOT recorder
Complete accepted events
GRID Catalog
Storage Network
18Event building network
- ALICE baseline for event-building Gigabit
Ethernet TCP/IP protocol - Hw independent de-facto network standard IP
- Tests done at HP HPC NT (High Performance
Computing New Technologies) - Test of another event building network
- InfiniBand (4x)
- IPoIB (IP over IB) stack of Voltaire Inc.
- System running smoothly
- Not a single line of codemodified
- Changing only /etc/hosts(alternate routing of
packets) - Performance with14 LDC 13 GDC 2 GB/s
- ALICE DAQ ready for future networks
TCP
iSCSI
iSER
TOE
TCP
EthernetNIC
RNIC
InfiniBandHBA
19Infologger using DBMS
runControl EDM AFFAIR
LDC
DDL DIU
DDL SIU
rcServer
Performance12000 - 20000 msg/s
D-RORC
readout
DATE banks
decision agent
Raw data
Message Database
recorder
NIC
Event Building Network
GDC
NIC
rcServer
DATE banks
event builder
ROOT recorder
20Data quality monitoring MOOD
- MOOD Monitoring Of Online Data
- DATE ROOT environments
- MOOD framework
- Interfaces to detector code
- Applications
- Raw data integrity
- Detector performance
21Monitoring software AFFAIR
- Monitoring of system parameters CPU usage,
memory usage etc - Monitoring of DATE system individual bandwidth
In/Out, event number etc
22AFFAIR V2
- New tool to install and configure AFFAIR
- Used daily in the DAQ reference system
- All performance plots of this talk are produced
with AFFAIR
23Transient Data Storage Storage Arrays
- Fast evolution since 2002-2005
- Prices dropped dramatically by using COTS disks
- Hard Disk reliability not yet adequate
- dotHILL SANnet II 200 FC
- 12 fiber channel disk slots
- 1 GB cache
- 1 x 2Gbit fiber host channel
- Infortrend IFT-6330
- 12 IDE drive slots
- 128 MB cache
- 2 x 2Gbit fiber host channels
- Infortrend EonStor A16F-G1A2
- 16 SATA drive slots
- 1GB cache
- 2 x 2Gbit fiber host channel
24 Storage Arrays Performance
- Aggregate throughput measured for sets of 5
disks configured as RAID 5
2 GB, write
dothill 1 dothill 2
IFT 1 IFT 2
2 GB, write
25DAQ Reference System sw data generator
DDG
DDG sw
DDG
DDG sw
HLT
Rare/All
LDC
LDC
LDC
LDC
LDC
Load Bal.
Event Building Network
EDM
GDC
GDC
DSS
Storage
Storage Network
TDS
26DAQ Reference System hw data generator
L0, L1a, L2
BUSY
BUSY
LTU
LTU
L0, L1a, L2
TTC
TTC
DDG
DDG sw
DDG
DDG sw
HLT
DDG
DDG
DDG
DDG
Rare/All
LDC
LDC
LDC
LDC
LDC
Load Bal.
Event Building Network
EDM
GDC
GDC
DSS
Storage
Storage Network
TDS
27DAQ/HLT setup for TPC test beam
Detector LDC
Si TelescopeTOF
VME processor CAEN VME boards
Fast Ethernet
10 MB/s
GDC
CASTOR 1.5 TB
3x 250 GB disk
28Combined ITS Test Beam DAQ setup
Integration with TRG, ECS !
Trigger Logic
LTU
LTU
LTU
NIM-based logic LTU decision Master-Slave
logic TTC-based distribution Event ID from
TTC ECS DATE V5 Event-building event ID
TTC vi
TTC vi
TTC vi
TTC ex
TTC ex
TTC ex
DetectorReadout
DetectorReadout
DetectorReadout
DDL SIU
DDL SIU
DDL SIU
DDL
DDL
DDL
LDC (PC/Linux)
DDL DIU
LDC (PC/Linux)
DDL DIU
LDC (PC/Linux)
DDL DIU
RORC
RORC
RORC
Event Building Network
Mass Storage System Computing Center
GDC
29DAQ/Detector integration (Feb 04)
30DAQ/Detector integration (Mar 05)
31Detector readout time
- Muon TRK (F. Lefevre - S. Rousseau) 400 µs
(measurement of test beam Aug. 04 scaled to 5
occupancy in tracking chambers) - SPD (A. Kluge) 260 µs (between L2A and end of
transfer to DAQ) - SDD (D. Nouais) (ITS combined test beam Oct. 04)
- Single-Buffer Multi-buffer
- Dead-time 2 0.5-1.6 ms
- Rate 312 520 events/s
- TPC (L. Musa)
- 925 Hz for 16 kBLimited by ALTRO to RCUWill
improve with sparse readout - 320 Hz for 300 kB (central event)Limited by DDL
bw
32Data Challenge VI
- System performances
- Event building bandwidth 1.5 GB/s with ALICE
traffic - Storage bandwidth (50 increase compared to last
ADC) - Tape 450 MB/s sustained over a week
- Disk 700 MB/s peak needed
- System setup
- New 10 Gb Eth router unstable. Firmware upgrade
by company failed. - System reduced to a a switch of the previous
generation.Limited number of ports, limited
bandwidth - Limited number of machines (56 nodes, 15 LDC x 41
GDC)Only flat traffic test so far
33Data Challenge Event building bandwidth
MBytes/s.
- Discrepancy ALICE traffic vs equal traffic
solved - But too low due to lack of CPU
34Event building performance
35Data Challenge Mass Storage bandwidth
MBytes/s.
Delayed to 2005CASTOR not ready
Slightly less than the goal
3616-17 Feb 2005 very promising start
Bandwidth to disk Before start of migration
3715-22 Feb 2005
Global bandwidth Migration active
Goal
Large fluctuations between GDCs
3801-08 Mar 2005 RFIO production period
Goal
39Current status of Data Challenge
- Hardware setup
- New 10 Gigabit Ethernet router not usable
- Stable network with 1 single N7 box butReduced
performance, simplified architecture (No 10 Gbit
Eth router) - CASTOR
- 9 months delay ADC VI restarted Jan 05 with the
new version - Lots of problems have been identified and fixed
in the new CASTOR version - Major involvement of the CASTOR team. But online
debugging during close-to-production period. Is
the process under control ? - Future schedule only 18 months to get CASTOR
right ! - 2005 450 MB/s 2006 750 MB/s
- From Sep 2006 onwards whole ALICE DAQ team busy
with detector integration - ADC PoW
- DAQ only - Event building test OK
- DAQRFIOCASTOR test OK
- Scalability of DATE V5.5 to be done(HLT
decisions handling, DB-based applications
configuration, infologger) - DAQROOTCASTOR to be done.
- We need
- Stable reliable hw setup dedicated 100 to ADC
- Adequate storage resources to reach milestones
450 MB/s with ROOT
40DAQ Installation
All equipment placed in racks Cables and racks in
DCDB Installation planning estalished
41DAQ Services Installation
- Revised DAQ planning
- DAQ at Point 2 not used before end of the year ?
complete installation of all services before
start of DAQ hw installation - Jan-May 05 all services
- May-Jun DAQ installation
- Jul-Aug DAQ commissioning
- Sep DAQ ready for TPC test in SXL
42DAQ Commissioning
- Commissioning of hardware and software in DAQ lab
- Reference system with a rack from the
experimental area - Combined tests TRG/DAQ/HLT/DCS/ECS before
installation - Detector integration in the institutes and test
beams - Hardware DDL and RORC
- Software DATE, MOOD
- DAQ for Detector Test and Commissioning in Nov05
for TPC