Title: E.Fullana, J. Torres, J. Castelo
1ROD General RequirementsandPresent Hardware
Solution for the ATLAS Tile Calorimeter
- E.Fullana, J. Torres, J. Castelo
- Speaker Jose Castelo
- IFIC - Universidad de Valencia
- Tile Calorimeter ROD
- Colmar, 11 Sept. 2002
2Introduction Basic functionalities
- DATA PROCESSING Raw Data gathering from FE at
the L1A event rate 100Khz to ROBS with
intermediate processing and formatting. - TRIGGER TTC signals will be present (latency
2ms after L1A) at each module providing ROD
L1ID, ROD BCID and Ttype (trigger type). - ERROR DETECTION Synchronism Trigger Tasks. The
ROD must check that the BCID and L1ID numbers
received from CTP match with the ones received
from the FEB. If a mismatch is detected an error
flag must be reported. - DATA LINKS Input data must be received with
optical input RX. Event data processed must be
sent to ROB through the standard ATLAS readout
links and standard DAQ-1 data format at the L1A
event rate (100khz). - BUSY GENERATION Provide a ROD busy signal in
order to stop L1A generation. This signal will be
an OR function ROD crate modules and managed by
the ROD TTC crate as an interface with CTP. - LOCAL MONITORING VME access of the data during
a run without introducing dead-time or additional
latency in the main DAQ data. Each ROD
motherboard are VME slaves commanded by the ROD
Controller (VME SBC).
3ATLAS TDAQ Read-Out System
- LVL1 decision taken with calorimeter data (coarse
granularity) and muon trigger chambers data.
Buffering is done on Detector Electronics. - LVL2 is using Region of Interest data (up to 4
of whole event) with full granularity and
combines information from all detectors.
Buffering is implemented in ROBs. - EF refines the selection, can perform event
reconstruction at full granularity using latest
alignment and calibration data. Buffering in EB
EF
RODs
4TileCal ROD Dataflow and TTC partitions
- 9856 calorimeter channels. (Two fibers/drawer.
19712 ch with redundant info)
5ROD Crate
- Crate Hardware
- ROD Controller
- TTC interface (TBM) Trigger and Busy Module
- FE, Calibration Cards for data injection,
calibration, etc - M RODs per N Crates Motherboards with Processing
Units Transition modules for Output optical
links. Nowadays design M8 and N4. Total 32 ROD
modules for 10.000 calorimeter cells acquisition. - Interface with Environment
- ATLAS DAQ and TileCal Run Control Online
Software - CTP Reception of TTC information and management
of a per crate ROD BUSY. - DATAFLOW Process FEB events, detects
synchronization errors, and send data to ROBs for
ROIs lvl2 decision.
Crate OR BUSY to CTP
TTC signals from CTP
Online Software
Detector Data
Event Data
6TCC scheme
- Number of TTC partitions 4
- Organized around f 0, 2p EB(hlt0), CB(hlt0),
CB(hgt0), EB(hgt0) - This distribution will allow us to work with only
central barrels if there are not enough RODs in
the early runs. - The scheme shown is for 4 partitions in two
crates with 2 TBM (actually is only possible with
4 crates because TBM architecture design)
7Tiles ROD Racks and interconnections between
crates
- USA15 room (radiation free)
- 2 Racks (52U)
- Standard 9U Atlas crates. 9U(slot) with
transition module (probably 160mm). External size
for CERN crates 10U - Cooling Air/water (not decided, it depends on
the amount of power dissipated by G-LINK
receivers) - Readout fibers
- Input fibers (front in MB)
- Output fibers (rear TM)
8Using new LArg ROD Motherboard for Tiles
9Using LArg new motherboard. Hardware performance
- LArg needs more processing power per link 128
channels/link (LArg), 45 for CB and 32 for EB
ch/link (tilecal) so only 2 Processing Units,
and 2 Output Controllers plus SDRAM data storage
are enough for Tiles dataflow needs. Some
estimations - Input bandwidth. The maximum input BW of each
link for a tilecal physic event is 467,2
Mbit/sec, so 4 links (4 drawers) is 1,825
Gbits/sec. Input bandwidth of the Processing Unit
is 2,5Gbits/sec (64bits_at_40Mhz) gt One PU has
enough input BW for 4 links. - The processing unit. We need to process 154
channels (four drawers) in two TMS320C6414_at_600MHz
DSPs (4800 MIPS each). This DSP has the same core
with some improvements in number of registers and
an enhanced DMA unit over the actual DSP we have
tested is the TMS320C6202_at_250MHz (2000 MIPS). Our
actual lab routines could process 45 channels in
around 5,5ms (assembler) or 15,5ms (C code).
Potentially, we could process 154 channels with
the new PU TMS320C6414_at_600MHz with 9600MIPs (two
DSPs) in 3,92 ms (assembler) or around 11ms (C
language). Because our limit is 10ms at LVL1
100KHz rate, thus if we believe in improvements
in the C compiler from Texas Instruments,
probably we could program the final system in a
better maintainable C code and only with 2
Processing Unit mezzanines installed in the
motherboard. - Output Bandwidth. The typical BW for 154 channels
(four drawers) is 656 Mbits/sec. Then, an Output
Controller FPGA of 1,28 Gbits/sec (32_at_40MHz) has
enough BW for the output of each Processing Unit
(154 channels each). - Transition Module 2 mezzanine links are enough
for this configuration.
10Compatibility issues with LArg motherboard
- From Tiles FEB electronics we need to adjust a
simple VHDL change in FEB FPGAs (Glink HDMP1032
pin ESMPXENB0) - From LArg new motherboard design we need
- To connect the CAV line of the HDMP1024 to the
Staging FPGA in order to receive control words
(not only data words). In the Figure this line is
highlighted in RED color. - To connect the FLAGSEL line of HDMP1024 to
Staging FPGA. LArg used the FLAG bit to mark the
even and odd 16 bit fragments, and Tilecal use
CAV (control bit) to mark the start of
transmission and count for the even and odd 16
bit fragment. Tiles use the FLAG bit to mark the
global CRC word so Tiles need FLAGSEL set high,
and LArg set low, so is needed to connect this
pin to Staging FPGA for maintaining
compatibility. In Figure this line is highlighted
in RED color. - The connection of FDIS, ACTIVE, LOOPEN and STAT1,
seems to be correct since they are standard. The
configuration is Simplex method III. - To replace the LArg 80MHz clock with Tiles 40MHz
clock, (pinout compatible oscillators needed) for
getting the right reference clock for the G-LINK
receivers (Tiles uses HDMP1032_at_40MHz in the
interface links).
11ROD SOFTWARE OPTIMAL FILTERING ALGORITHM (I)
- A multisampled method firstly developed for
liquid ionization calorimeters. (See W.E. Cleland
and E.G. Stern, Nucl. Instr. And Meth. A 338
(1994) 467) - Allows the reconstruction of Energy and time
information. - Additionally minimizes the noise coming from
thermal sources (electronics) and also from
minimum bias events. (see plots)
Noise reduction of optimal filtering algorithm
(OF) versus others algorithms (FF). Data
testbeam 2001
12ROD SOFTWARE OPTIMAL FILTERING ALGORITHM (II)
Samples coming from the front end electronics
First Set of weights a1, a2, a3 an,
S1 S2 S3 . Sn
Energy information E
Time information
- Weights are obtained from noise atocorrelation
matrix and ideal PMT shape waveform g(t). - The mathematical procedure uses Lagrange
Multipliers in order to reduce the electronic
noise factor (see E.Fullanas talk
_at_feb_tilecal_analysis_meeting)
Second Set of weights b1, b2, b3 bn,
13DSP Software Implementation DSP Core Architecture
- Based on Texas DSP C6202
- Harvard Architecture Program and data memory
could be accessed at the same time. - FCLK 250Mhz . Cycle time 4ns. 2000 MIPs
- Data/Program Memory 1Mbit (128kbyte)/ 2Mbit (64k
32bits) - DMA channels 4
- EMIF HPI 32bits
- McBSP 3
- Timers 2 (32 bit)
- VCORE 1.8v / VI/O 3.3v
- 3 phase PIPELINE
- 8 independent ALUS. Load-Store Architecture With
32 32-Bit General-Purpose Registers (two banks of
16). All instruction conditional
14DSP Software Implementation OF Algorithm
- Using Optimal filtering for obtain Energy, Tau,
and c2. - The implementation is considering 7 samples of 10
bits. - Actual studies demonstrate that the resolution
will not be improved with different weights for
each cell gt Use of the same calibration
constants table for all channels (this could be
changed). - The calculations are 32bit integer except for
Multiplication (16bits) due to DSP architecture.
Always trying to get the maximum resolution of
the integer ALU operations. - C and Assembler code were developed compare two
input languages for coding the algorithm
15C vs. Assembler
Energy/Tau/c2 algorithm for 45 channels and 7
samples of 10bit. Compilation characteristics
comparative for the best speed performance
profiling option
16Actual developments Staging FPGA (I)
- Online dataflow
- Routes the incoming data from the different FEB
inputs and send it to the connector of the PU
concerned, depending if it is staging or not.
This feature provides the possibility to use only
two processing units instead of four, routing the
data to the desired PU (the staging is configured
through VMEbus). - Glink chips configuration that will be performed
by the staging chips. - It gets the temperature of the Glink and
transmits it to the VME chip (generates an IRQ).
Because these chip usually has high power
consumption. - It transmits the Glink errors (parity and ready)
to the PUs and to the VME. - During the tests it will
- Read the Glink data and transfer it to the VME.
- Transfer data from the VME to the PU. Similar
function as 'data distributor block' in the
demonstrator board. - Transfer at high rate some pattern data to the
PU. - The code must be written in VHDL because of
maintainability reasons.
17Actual developments Staging FPGA (II)
18Summary
- Work with ROD demonstrator board were very useful
and instructive - Waiting for the first prototypes of new
motherboard and studying LArg and Tilecal
compatibilities for a common ROD board for ATLAS
calorimeters - Programming a versatile program of Staging FPGA
compliant with tilecal and LArg - Optimal Filtering studies and analysis were
successful for 1998 testbeam data. Actually this
work is focused over 2001 and 2002 tilecal
testbeam data. Better results are expected with
final FEB electronics mounted on these test beams
(real electronic noise). - DSP implementation of Optimal Filtering under
10ms (100KHz atlas lvl1 rate) in ASSEMBLER, but
not in C using 250MHz C6202 DSP. We expect to
be below this threshold with new generation PU
with 600MHz C6414 - DAQ (online software) integration of demonstrator
board.