Electromagnetic FDTD simulation on FPGAs - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Electromagnetic FDTD simulation on FPGAs

Description:

FP cores from eda.org. Area: 6756 Luts 8 Block Mults. Freq: 9.3 Mhz ... FP units from eda.org are difficult to manually pipeline ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 15
Provided by: csBer
Category:

less

Transcript and Presenter's Notes

Title: Electromagnetic FDTD simulation on FPGAs


1
Electromagnetic FDTD simulation on FPGAs
  • Yury Markovskiy
  • Semester Project Report (cs294-3)

2
Review
  • EM simulations using FDTD
  • Simple algorithm requiring large computing power
  • Operation
  • Update fields in order
  • E-field updates
  • H-field updates
  • Notice similarities in structure

3
Single Datapath Unit
  • Using single precision (32bit) FPs
  • FP cores from eda.org
  • Area 6756 Luts 8 Block Mults
  • Freq 9.3 Mhz
  • Pipeline/retime (12-cycles) 70Mhz
  • Synplicity

4
Complete Datapath
  • Approximate Area 21000 LUTs
  • Required Memory Bandwidth
  • -- 10.25 word input 3 word output per cycle
  • -- Assuming operation _at_ 100Mhz
  • 13.25 4 bytes/word 100Mhz
  • 4.9 GB/s

5
Resource Allocation
FPGA (XC2VP100)
100K LUTs
21K LUTs 4.9 GB/s _at_ 100Mhz
SDRAM Bank 256 bits/(100Mhz cycle) 32
bytes 100Mhz gt 3.2GB/s Total off-chip memory
bandwidth 12.8GB/s
6
Resource Allocation
  • Implementation Choices
  • Core Replication
  • Up to 4 Datapaths ? 20 GB/s
  • Overclocking
  • FP Units on FPGA operate at gt 150 MHz FPGA04
    Keith Underwood
  • Each Computational pipeline can handle gt 7.5 GB/s
  • Our Solution
  • 2 overclocked datapaths
  • Required freq for the total 12.8GB/s 120MHz

7
Architecture (single datapath)
8
Architecture (single datapath)
  • Feed-forward computational path
  • Amenable to deep pipelining
  • ADDRESS GENERATOR
  • Acts as a controller
  • Complicated due to
  • simulated space boundary grid cells
  • Will be further complicated when PML is added

9
Architecture (single datapath)
DATAPATH 26K LUTs 1K Regs FPADDs 3 FPMULTs
6 FPSUBs 9 FP ops 18
CONTROL 8K LUTs 7K Registers
10
Performance Summary
  • Results for Virtex 1000 (bg560-5)
  • Unoptimized design
  • FP units from eda.org are difficult to manually
    pipeline
  • Behavioral (not structural) description

11
Status and Future Work
  • Design except for write-back logic
  • Completely tested and verified
  • Control and the core run on the same clock
  • Future
  • Full regression test against C simulation
  • Optimize control to run at 150 MHz
  • Manually pipeline the core to run at max possible
    freq
  • Need gt 120MHz to saturate memory bandwidth
  • Multiple FPGA implementation with data-exchange.

12
Whats possible?
  • On a single FPGA, with the available memory b/w
  • 0.96 timesteps/second on 5003 grid
  • 10,000 step simulation ? 2.8 hours
  • 20 hours on a workstation
  • Realistically 3 data-paths can fit on-chip
  • At 120MHz ? 6.4GB/s 3 19.2 GB/s
  • Reducing the simulation time to 1.9 hours
  • Compress memory off-chip traffic

13
Compression
  • By design for stability
  • Shortest simulated wavelength ? gt 10 ?x
  • To compress,
  • Throw away high frequency information
  • e.g. LPF ? Sub-sample
  • Investigated this approach
  • Special handling for boundary cases

Space (z-dim)
Amplitude
Spatial Frequency
14
Conclusion
  • Evaluated the first FDTD implementation
  • No PML (perfectly matched layer)
  • Programming issues have been ignored
  • Results are promising
  • A modern FPGA has enough compute resources to
    saturate 4 DDR2 SDRAM banks
  • Compression will allow to push the performance
    limits
Write a Comment
User Comments (0)
About PowerShow.com