Title: RealTime Embedded System Support for the BTeV Level 1 Muon Trigger
1Real-Time Embedded System Support for the BTeV
Level 1 Muon Trigger
- Michael J. Haney m-haney_at_uiuc.edu
- RTES Collaboration
- Nuclear Science SymposiumPortland, OROctober
19-25, 2003 - Poster N36-65
2RealTimeEmbeddedSystems Collaboration
- Funded by the National Science Foundation
- Information Technology Research Program
- Consisting of
- University of Illinois
- Center for Reliable and High-Performance
Computing (CRHC) of the Coordinated Science
Laboratory (CSL) - Design and Validation of Reliable Networked
Systems Research Group - D. Beauregard, R. Iyer, Z. Kalbarczyk, Q. Liu, L.
Wang - High Energy Physics Group of the Department of
Physics - M. Haney, M. Selen
- University of Pittsburgh
- Fault Tolerant Real-Time Systems (FORTS) Group in
the Department of Computer Science - D. Mosse, O. Shigiltchoff
- Link-to-Learn educational program
- College in High School educational program
3RTES Collaboration (2)
- Syracuse University
- Department of Electrical Engineering and Computer
Science - R. Chopade, L. Hovey, M. Jung, D. Messie, J. Oh
- High Energy Physics Group of the Department of
Physics - S. Stone
- Vanderbilt University
- ISIS (Institute for Software Integrated Systems)
- T. Bapty, S. Neema, S. Nordstrom, S. Shetty, S.
Vashishtha, D. Yao - BTeV Group, part of the High Energy Physics Group
in the Department of Physics and Astronomy - P. Sheldon, E. Vaandering
- Fermi National Accelerator Laboratory
- BTeV Collaboration of the Particle Physics
Division - J. Butler, E. Gottschalk, J. Kowalkowski
- Computing Division
- J. Kowalkowski, M. Votava
- Fermilab Education Office
- J. Appel
4RTES Mission
- (from the NSF proposal)
- to develop methodologies and tools for designing
and implementing very large-scale real-time
embedded computer systems that - achieve ultra high computational performance
through use of parallel hardware architectures - achieve and maintain functional integrity via
distributed, hierarchical monitoring and control - are required to be highly available
- are dynamically reconfigurable, maintainable,
and evolvable
5BTeV L1 Muon Trigger Architecture (1 highway)
Muon Front End
Muon Trigger
L1 Trigger Switch
DSP
DSP Farm
DSP
DSP Farm
1 highway, 1Gb/s
DSP
DSP Farm
Muon Detector
7 other highways
Raw Muon Data to DAQfrom each Muon PreProcessor
processed data and results to DAQ
Global Level 1
6BTeV L1 Muon Trigger Architecture
MU O N D E T E C T O R
PreProc
SWITCH
FARM
T R I G G E R L E V E L 2
G L O B A L L E V E L 1
PreProc
SWITCH
FARM
. . .
. . .
N Highways
FARM
PreProc
SWITCH
250 nodes
48 nodes(total)
PIXEL TRIGGER
FRONT END SIGNALS
7BTeV L1 Pixel Trigger
- It is no coincidencethat the L1 Muonand Pixel
Triggersare similar(see also N29-7, N36-61)
Shared architecture design reuse
substantial cost saving
8The Problem
- Large system
- 100s (1000s) of FPGAs in the muon (pixel)
trigger - 250 (2500) DSPs
- 2500 processor Linux farm
- countless fiber cables, Cat-5 cables, backplanes
- Something will fail !
- How do we get the best/most physics?
- Detection, mitigation, graceful degradation
- RTES !
9RTES view of the BTeV Trigger
Slow/RunControl
L2/3
L2/3
L2/3
GlobalMTSM
GlobalL2/3SM
Global GL1SM
GlobalPTSM
DAQ Switch
ITCH
GL1SM Global L1 Trigger Supervisor/Monitor MTS
M Muon Trigger Supervisor/Monitor PTSM
Pixel Trigger Supervisor/Monitor L2/3SM
Level2/Level3 Trigger Supervisor/Monitor
GL1 Manager
GL1
Concentrator(801)
Concentrator (41)
FPGA
FPGA
FPGA
FPGA
Farmlet Manager
RegionalMTSM
DSP
DSP
DSP
DSP
L1 Buf
Buffer Manager
60 (or 600) farmlets
FIFO
Level 1 Switch
FPGA Manager
RegionalMTSM
L1 Buf
L1 Buf
FPGA
FPGA
FPGA
Muon Detector
10RTES Solution
- Model Integrated Computing
- Graphical representation of complex system,with
modeling (simulation) resources - ARMORs
- To protect Linux processes
- And subordinate DSP processes
- VLAs
- To monitor/mitigate at every level
- DSP, Supervisory Linux,Linux trigger farm, etc.
11Adaptive Reconfigurable and Mobile Objects for
Reliability
- On Linux/Windows machines
- Especially the xxSM supervisorsand the L2/L3 farm
ExecARMOR
AppProcess
Execution ARMOR Oversees application
process(e.g. the various Trigger
Supervisor/Monitors)
Daemons Detect ARMOR crash and hang failures
Daemon
network
Heartbeat ARMOR Detects and recovers FTM failures
Fault Tolerant Manager Highest ranking manager
in the system
ARMOR processes Provide a hierarchy of error
detection and recovery.ARMORS are protected
through checkpointingand internal self-checking.
12Very Lightweight Agents
- Minimal footprint
- Platform independence
- Employed everywhere in the system!
- Monitoring hardware and software
- Handles communications control withhigher
level entities
Level 1 Farm Nodes(DSPs)
Level 2/3 Farm Nodes(Linux)
Hardware
OS Kernel (Linux)
Hardware
OS Kernel (DSP BIOS)
Physics Application
Physics Application
VLA
VLA
Network API
Network API
L2/L3 Manager Nodes(Linux)
L1 Manager Nodes(Linux)