Title: Use of Programmable Logic in a Pipelined Trigger for ATLAS
1Use of Programmable Logic in aPipelined Trigger
for ATLAS
Samuel Silverstein, Stockholm University For the
Atlas Level-1 Calorimeter Trigger Collaboration
- Programmable Logic
- The Atlas Jet/Energy Sum Processor
- Architecture and functionality
- Design Features
- Demonstrator Programs
- Outlook
2Programmable Logic
- Advantages of programmable logic
- Fast prototyping cycle (hours vs. weeks)
- Safety Factor
- Errors discovered later can be easily fixed
- Functionality can be altered if needs change
- 100 device yield (devices factory tested)
- Design Multiplexing
- Same hardware can perform different functions
- e.g. seperate configurations for data taking,
diagnostics, etc.
3Programmable Logic
- ASIC or FPGA?
- ASICs cost less per unit
- but NREs make FPGAs attractive for small
quantities - ASICs have higher performance
- but newer FPGAs are capable of 40, 80, or 160 MHz
system speeds, which are good enough for LHC
apps. - Using programmable logic for LHC projects
- ASICs can sometimes be replaced by FPGAs, or
- Systems can be designed for COTS components...
4Organization of ATLAS Level-1 Trigger
Calorimeters
Muon Detectors
Analog tower sums
Cal.Preprocessor
0.1 x 0.1 (8b)
0.2 x 0.2 (9b)
m
e, t
Jet, SET
Trigger Processors
Feature Positions
Trigger Data
Central Trigger Processor
Region of Interest Builder (Lv11/LVL2)
Level-1 Accept
Trigger, Timing Control
RoI Data
Level-2 Trigger
Front End Electronics
5The Jet/Energy Sum Processor
- Functions
- Calculate total and missing ET sums
- Identify Jet clusters
- Local 0.4 x 0.4 maximum (sliding by 0.2 x 0.2)
- 8 energy thresholds with selectable window sizes
around local maximum - 0.4 x 0.4, 0.6 x 0.6, or 0.8 x 0.8
- Report Jet cluster positions to Level 2 (as ROIs)
- Read out time slice data to DAQ system
6The Jet/Energy Sum Processor
- Developed jointly by two institutions
- University of Mainz
- K. Jakobs, G. Quast, U. Schäfer, J. Thomas
- Primary responsibilities Serial links, Energy
summation - Stockholm University
- C. Bohm, M. Engström, S. Hellman, S. Silverstein
- Primary responsibilities Jet algorithm, Readout
7The Jet/Energy Processor
0.2 x 0.2 Jet elements -3.2 lt h lt 3.2
FCAL Separate EM and Hadronic sums f-quad
architecture
f
Jet/Energy Crate, 1 of 4(2?)
h
Key
2x8 core region processed by JEM (Jet/Energy
Module)
Environmental data from neighboring quadrant,
via duplicated cable links
Environmental data from neighboring JEMs,
via backplane links
8The Jet/Energy Module (JEM)
(conceptual drawing)
LVDS Receivers
400 MB/s LVDS Input Data
80 MHz data sharing on point-point backplane
Jet Cluster Multiplicities to Jet Merger Module
SET, SEX, and SEY to Sum Merger Module
ROI data to Level-2
Slice data to DAQ
9The Energy Summation Algorithm
(One Jet element)
5
To Jet Algorithm
MUX 80 MHz
F/O
EEM
9
Thresh
10
LUTs
S
9
EHad
ET
Thresh
To Summation Logic
Ex
Baseline Device Xilinx Virtex
Ey
10The Energy Summation Algorithm
- Implemented on Xilinx Virtex (baseline)
- Large amounts of logic and I/O resources
- Both distributed and block memory available
- DLLs for internal and external clock
synchronization - Supports high system speeds (gtgt80 MHz)
- Relatively inexpensive
11The Jet Algorithm
- Decluster/ROI Identification
- Identify 0.4 x 0.4 cluster (step size 0.2) that
is a local maximum - Jet Identification
- Apply thresholds to a 0.4, 0.6, or 0.8 region
around that local max. - 8 Jet thresholds available, each with independent
choice of energy and cluster size
0.4 x 0.4
0.6 x 0.6
0.8 x 0.8
12The Jet Algorithm (one JEM)
Select local maxima
Compare neighbors
Input 55 Jet elements (10 bits)
40 x 0.4
16 x 0.4
Apply thresh.
16 x 0.8
Apply thresh.
27 x 0.6
Baseline Device Altera 10K250
Apply thresh.
13The Jet Algorithm Implementation
- Implementation features
- Data received and processed at 80 MHz
- 550 bits received on 275 I/O pins (of 470)
- Single FPGA handles entire JEM (64 devices in
system) - Decreased latency
- Addition steps take 12.5 ns, not 25
- No demultiplexing step necessary
- 5-bit serial calculations use less routing and
logic - Duplicate and unnecessary calculations eliminated
145-bit Serial Algorithms
Example 10 bit adder, implemented with 5-bit
logic
OF
OF
MUX
DFF
A
5
S
S
5
5
B
5
Carry in
80 MHz clock
40 MHz phase select
Latency is 12.5 ns, Routing and logic resources
nearly halved
15The Jet Algorithm Implementation
- Implemented on Altera FLEX 10K250 (baseline)
- Large amounts of logic and I/O resources
- Supports high system speeds (gt100 MHz)
- Routing architecture well suited for large design
with high fanout and clock rate - Plans to evaluate other devices
- Xilinx Virtex
- Altera Apex
16The Jet/Energy Processor
- Backplane
- High density backplane based on COTS parts
- 52 row 2mm HM connectors used in CompactPCI
- IEEE 1101.10 crate and front panel hardware
- All high speed connections are point-to-point
- Nearest-neighbor sharing of data between JEMs
- Jet and ET results to merger modules
- LV-CMOS signal levels used
- Possible to eliminate many driver/receiver chips
in design
17The Jet/Energy Processor
- Taking advantage of FPGA reprogrammability
- Different data taking configurations
- High/low luminosity
- High/low background noise
- etc
- Special diagnostic configurations
- Fast, complete testing of all data paths
- Can be loaded and run during beam fills, etc...
18Demonstrator Programs
- JEM Module -2 Technology Demonstrator
- 80 MHz Data Transmission
- Between FPGAs on the same PCB
- Across point-to-point backplane links
- With and without intermediate clocked buffers
- High-speed serial link tests
- Mainz Link Test Board
- 2 Virtex FPGAs and 8 LVDS receivers
- TP cable and differential PCB tracks
19Module -2 Technology Demonstrator
VME Interface
Glink Receiver
Xilinx 4013XL (ET)
ALVC buffer
Glink Transmitter
2mm HM Connector
Altera 10K50V (Jet)
20Module -2 Technology Demonstrator
- Results
- Error-free 80 MHz CMOS data transmission With and
without intermediate clocked ALVC drivers - Upper limits on bit error rates 10-14
- Timing Margins for 80 MHz data (12.5 ns clock)
- With intermediate ALVC drivers 9.6 ns
- Direct transmission between FPGAs 7.3 ns
- 5-bit Serial algorithms implemented and tested
- Early experience with Glink and LVDS links
- Superseded by Mainz Link Test Board results
21Mainz Link Test Board
22Mainz Link Test Board
- Results
- LVDS transmission problems identified and solved
- Active pre-compensation circuit added to LVDS
data source to extend cable range - Stable transmission of 4 channels over 15m of
category 5 cable, plus 55 cm of differential
tracks on PCBs - Investigating COTS compensation solutions for
final system
23Outlook
- Current demonstrator program nearly finished
- Few more tests of Mainz Link Test Board
- Finish writing up results of both programs
- Plans underway for full prototype systems
- System -1 to be built by Summer 2000
- System 0 by end of 2001
- Possible additional technology demonstrators
- e.g. Altera Apex, etc...
24Conclusions
- A high-performance pipelined calorimeter trigger
processor has been designed, based on
programmable logic and COTS components. - Initial tests indicate that design principles are
sound, additional demonstrators and prototypes
planned. - Still possible to take advantage of future
advances in FPGA technology for final system.