A High-End Reconfigurable Computation Platform for Particle Physics Experiments PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: A High-End Reconfigurable Computation Platform for Particle Physics Experiments


1
A High-End Reconfigurable Computation Platform
for Particle Physics Experiments
  • Lic. Thesis Presentation in
  • ICT/ECS KTH
  • Under the collaboration between KTH JLU
  • by
  • Ming Liu

Supervisors Prof. Axel Jantsch (KTH) Dr.
Zhonghai Lu (KTH) Prof.
Wolfgang Kuehn (JLU, Germany)
2
Contributions
  • The thesis is mainly based on the following
    contributions
  • Ming Liu, Johannes Lang, Shuo Yang, Tiago Perez,
    Wolfgang Kuehn, Hao Xu, Dapeng Jin, Qiang Wang,
    Lu Li, Zhenan Liu, Zhonghai Lu, and Axel Jantsch,
    ATCA-based Computation Platform for Data
    Acquisition and Triggering in Particle Physics
    Experiments, In Proc. of the International
    Conference on Field Programmable Logic and
    Applications 2008 (FPL08), Sep. 2008. (System
    architecture)
  • Ming Liu, Wolfgang Kuehn, Zhonghai Lu and Axel
    Jantsch, System-on-an-FPGA Design for Real-time
    Particle Track Recognition and Reconstruction in
    Physics Experiments, In Proc. of the 11th
    EUROMICRO Conference on Digital System Design
    (DSD08), Sep. 2008. (Algorithm implementation
    and evaluation)
  • Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel
    Jantsch, Shuo Yang, Tiago Perez and Zhenan Liu,
    Hardware/Software Co-design of a General-Purpose
    Computation Platform in Particle Physics, In
    Proc. of the 2007 IEEE International Conference
    on Field Programmable Technology (ICFPT07), Dec.
    2007. (HW/SW co-design)

3
Overview
  • Background in Physics Experiments
  • Computation Platform for DAQ and Triggering
  • Network architecture
  • Compute Node (CN) architecture
  • HW/SW Co-design of the System-on-an-FPGA
  • Partitioning strategy
  • HW design
  • SW design
  • Algorithm Implementation and Evaluation
  • Conclusion and Future Work

4
Background
  • Nuclear Particle Physics a branch of physics
    that studies the constituents and interactions of
    atomic nuclei and particles.
  • Some elementary particles do not occur under
    normal circumstances in nature.
  • Many can be created and detected during energetic
    collisions of others.
  • Beam ? Target, or Beam ? Beam. Produced particles
    are studied with huge/complex detector systems.
  • Examples
  • HADES PANDA _at_ GSI, Germany
  • ATLAS, CMS, LHCb, ALICE at the LHC _at_ CERN,
    Switzerland France
  • BES III _at_ IHEP, China
  • WASA _at_ FZ-Juelich, Germany

5
Detector Systems
  • HADES
  • RICH (Ring Image CHerenkov)
  • MDC (Mini Drift Chamber)
  • TOF (Time-Of-Flight)
  • TOFino (small TOF)
  • Shower (Electromagnetic Shower)
  • RPC (Resistive Plate Chamber, will be added to
    substitute TOFino)

HADES
BES III
WASA
PANDA
6
HADES Detector System
7
Challenge Motivation
  • Challenge high reaction rate and high data rate
    (PANDA, 10-20 MHz, data rate up to 200 GB/s!!!)
  • Not possible to entirely store all the data, due
    to the storage capacity limitation.
  • Only a rare fraction (e.g. 1/106) is of interest
    for extensive offline analysis. The background
    can be discarded on the fly.
  • Pattern recognition algorithms used to identify
    interesting events.
  • Motivation a reconfigurable and scalable
    computation platform for high data rate
    processing.

8
Data Flow
  • Pattern recognition algorithms
  • Data correlation
  • Largely reduced data rate for storage

9
Related Work
  • Previously commercial bus systems, such as
    VMEbus, FASTbus, CAMAC, etc., were used for DAQ
    and triggering.
  • Time-multiplexing of the system bus exacerbates
    the data exchange efficiency and cannot meet
    high-performance requirements.
  • The solution of existing reconfigurable computers
    sounds good, but not suitable for physics
    experiment applications
  • Some are augmented computer clusters with FPGAs
    attached to the system bus as accelerators.
    (Bandwidth bottleneck between the microprocessor
    and the accelerator)
  • Some are standalone boards. (Not straightforward
    to scale the system to a large size, due to the
    lack of efficient inter-board connectivity)
  • Flexible and massive communication channels are
    required to interface with detectors and the PC
    farm.
  • All-board-switched or tree-like topology may
    result in communication penalty between algorithm
    steps. (P2P direct links are preferred.)

10
Overview
  • Background in Physics Experiments
  • Computation Platform for DAQ and Triggering
  • Network architecture
  • Compute Node (CN) architecture
  • HW/SW Co-design of the System-on-an-FPGA
  • Partitioning strategy
  • HW design
  • SW design
  • Algorithm Implementation and Evaluation
  • Conclusion and Future Work

11
DAQ and Trigger Systems
  • Detectors detect particles and generate signals
  • Signals digitized by ADCs
  • Data buffered in concentrators/buffers
  • Pattern recognition algorithms extract features
    from events.
  • Interesting events stored in the mass storage.
    Background discarded on the fly.

12
Network Topology
  • Compute Nodes (CN) interconnected for
    parallel/pipelined processing
  • Hierarchical network topology
  • External channels
  • Optical links
  • Gigabit Ethernet
  • Internal interconnections
  • On-board IO connections
  • Inter-board backplane
  • Inter-chassis switching

13
ATCA Backplane
  • Advanced Telecommunications Computing
    Architecture (ATCA)
  • Full-mesh direct Point-to-Point (P2P) backplane
  • High flexibility to correlate results from
    different algorithms
  • High performance compared to shared buses

14
Compute Node
  • Prototype board with 5 Xilinx Virtex-4 FX60 FPGAs
  • 4 FPGAs as algo. processors
  • 1 FPGA as a switch
  • 2GB DDR2 per FPGA, IPMC, Flash, CPLD...
  • Full-mesh on-board communications of GPIOs
    RocketIOs
  • RocketIO-based backplane channels
  • External channels of optical links Gigabit
    Ethernet

15
Compute Node PCB
  • 14-layer PCB design
  • Standard 12U size of 280 x 322 mm

16
Performance Summary
  • 1 ATCA chassis 14 CNs
  • 1890 Gbps on-board connections
  • 1456 Gbps inter-board backplane connections
  • 728 Gbps full-duplex optical bandwidth
  • 70 Gbps Ethernet
  • 140 GB DDR2 SDRAM
  • All computing resources of 70 Virtex-4 FX60 FPGAs
    (140 PowerPC 405 microprocessors programmable
    resources)
  • Power consumption evaluation Max. 170 W/CN (Each
    ATCA slot 200 W)

17
Overview
  • Background in Physics Experiments
  • Computation Platform for DAQ and Triggering
  • Network architecture
  • Compute Node (CN) architecture
  • HW/SW Co-design of the System-on-an-FPGA
  • Partitioning strategy
  • HW design
  • SW design
  • Algorithm Implementation and Evaluation
  • Conclusion and Future Work

18
Partitioning Strategy
  • Multiple tasks during experiment operations (data
    processing, control tasks, ...)
  • Partitioned between FPGA HW fabric embedded
    PowerPC CPUs

19
Partitioning Strategy
  • Concrete strategy
  • All pattern recognition algorithms customized in
    the FPGA fabric , as HW parallel/pipelined
    processing modules
  • Slow control tasks, (e.g. Monitoring the system
    status, modifying experimental parameters, ...),
    implemented in SW (applications OS)
  • Soft TCP/IP stack in Linux OS

20
HW Design
  • Old bus-based arch. (PLB OPB)
  • CPU fast peripherals on PLB
  • Slow peripherals on OPB
  • Customized processing modules (e.g. TPU) on PLB
  • Improved MPMC LocalLink-based arch.
  • Multi-Port Memory Controller (8 ports)
  • Direct access to the memory from the device
  • Customized processing unit interfaced to MPMC
    directly

21
SW Design
  • Open-source embedded Linux on the embedded
    PowerPC CPUs
  • Device drivers
  • Standard devices (Ethernet, RS232, Flash memory,
    etc.)
  • Customized modules
  • Applications for slow controls
  • High level scripts
  • C/C programs
  • Apache webserver
  • Java programs on the VM
  • Software cost low budget!!!

22
Remote Reconfigurability
  • Remote reconfigurability (HW SW) is desired due
    to the spatial constraint in experiments.
  • Both OS and FPGA bitstream are stored in the NOR
    flash memories.
  • With the support of the MTD driver, the bitstream
    and the OS kernel can be overwritten/upgraded in
    Linux.
  • Roboot the system and the updated system will
    function.
  • Backup mechanism to guarantee the system alive.

23
Overview
  • Background in Physics Experiments
  • Computation Platform for DAQ and Triggering
  • Network architecture
  • Compute Node (CN) architecture
  • HW/SW Co-design of the System-on-an-FPGA
  • Partitioning strategy
  • HW design
  • SW design
  • Algorithm Implementation and Evaluation
  • Conclusion and Future Work

24
Pattern Recognition in HADES
  • New DAQ Trigger system for HADES upgrade (10
    GB/s)
  • Pattern recognition algorithms
  • Cherenkov ring recognition (RICH)
  • MDC particle track reconstruction (MDCs)
  • Time-Of-Flight processing (TOF RPC)
  • Electromagnetic shower recognition (Shower)
  • Partitioned and distributed on FPGA nodes
  • Algorithms correlated by hierarchical connections

25
Pattern Recognition in HADES
  • Pattern recognition
  • Correlation
  • Event building storage

26
Particle Track Reconstruction
  • Particle tracks bent in the magnetic field
    between the coils
  • Straight lines before after the coil
    approximately
  • Inner and outer tracks pointing to RICH and TOF
    detector respectively and helping them to find
    patterns (correlation)
  • Similar principle for inner and outer segments.
    Only inner part discussed
  • The particle track reconstruction algorithm for
    HADES was previously implemented in SW, due to
    the complexity.
  • Now implemented and investigated as a case study
    in HW

27
Basic Principle
  • Wires fired by flying particles
  • Project fired wires to a plane
  • Recognize the overlap area and reconstruct tracks
    from the target
  • 6 sectors
  • 2110 wires per sector (inner)
  • 6 orientations

28
Basic Principle
29
Hardware Implementation
  • PLB interface (Slave)
  • MPMC interface (Master)
  • Algorithm processor Tracking Processing Unit
    (TPU)

30
Modular Design
  • TPU for track reconstruction computation
  • Input fired wire Nos.
  • Output position of track candidates on the proj.
    plane
  • Sub-modules
  • Wire No. Wr. FIFO
  • Proj. LUT Addr. LUT
  • Bus master
  • Accumulate unit
  • Peak finder

31
Implementation Results
  • Resource utilization of Virtex-4 FX60
    acceptable!
  • Timing limitation 125 MHz without optimization
    effort
  • Clock frequency fixed at 100 MHz, to match the
    PLB speed

32
Performance Evaluation
  • Experimental setup
  • MPMC-based structure used for measurements
  • Different measurement points on different wire
    multiplicities (10, 30, 50, 200, 400 fired wires
    out of 2110)
  • Projection LUT 5.7 Kbits per wire in average
    (1.5 MB/2110 wires)
  • A C program running on the Xeon 2.4 GHz computer
    as the reference
  • Results
  • Speedup of 10.8 24.3 times per module have been
    seen compared to the software solution.
  • Considering the FPGA resource consumption,
    multiple TPU modules may be integrated in the
    system for parallel processing and high
    performance.

33
Performance Analysis
  • Non-TPU factors introducing overhead and
    restricting the performance
  • Complex DDR2 address mechanism (large latency)
  • Data transfer burst mode of only 8 beats (clock
    cycles wasted)
  • MPMC arbitrating the memory access among multiple
    ports (clock cycles wasted)
  • TPU module is powerful, but memory accessories
    (LUTs) are slow.
  • Solution SRAM memory added to enhance the memory
    bandwidth and reduce the access latency
  • Speed up of from 20 to 50 per module compared to
    software expected

34
Overview
  • Background in Physics Experiments
  • Computation Platform for DAQ and Triggering
  • Network architecture
  • Compute Node (CN) architecture
  • HW/SW Co-design of the System-on-an-FPGA
  • Partitioning strategy
  • HW design
  • SW design
  • Algorithm Implementation and Evaluation
  • Conclusion and Future Work

35
Conclusion
  • An FPGA- and ATCA-based computation platform is
    being constructed for the DAQ and trigger system
    in modern nuclear and particle physics
    experiments.
  • The platform features high-performance,
    scalability, reconfigurability, and the potential
    to be used for different application projects
    (physics non-physics).
  • A co-design approach is proposed to develop
    applications on the platform.
  • HW system design customized processing modules
  • SW Linux OS device drivers application
    programs
  • A case study, the particle track reconstruction
    algorithm, has been implemented and evaluated on
    the system. Speedup of one order of magnitude per
    module has been observed when compared to the
    software solution. (multiple modules integrated
    for parallel processing)

36
Future Work
  • The network communication will be investigated
    with multiple CN PCBs.
  • All pattern recognition algorithms are to be
    implemented and parallelized.
  • Study of more efficient memory addressing
    mechanism for multiple modules
  • More advanced features, e.g. dynamic partial
    reconfiguration for adaptive computing

37
Thank you!
38
Measurements
  • Computing throughput study of the event selector
    core
  • Study of PLB data transfer performance
  • DMA transfers
  • 148.1 MB/s for 25, 97.3 MB/s for 100
    interesting event rate

39
Measurements
  • Optical link communication test with the
    front-end Trigger and Readout Board (TRBv2)
  • 2 Gbps with 8B/10B encoding
  • 150 hour test with no bit error

40
Measurements
  • P2P Ethernet performance measurements
  • Benchmark Netperf
  • Features enable to improve performance S/G DMA,
    checksum offloading, jumbo frame of 8982,
    interrupt coalescing,
  • Bottleneck PowerPC 300 MHz CPU for stack
    processing
Write a Comment
User Comments (0)
About PowerShow.com