Title: NASA
1Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
NASA2005 Military and Aerospace Programmable
Logic Devices (MAPLD) International Conference
John PorcelloL-3 Communications,
Inc.Cleared by DOD/OFOISR for Public
Release under 05-S-2094 on 24 August 2005
2Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Outline
- Background
- Automation Techniques
- DSP Algorithm Design
- HDL Coding and Synthesis
- Timing Placement
- Hardware-In-The-Loop (HITL) Test and Verification
- Case Study Direct Digital Synthesizer (DDS)
using Xilinx Virtex-4 XtremeDSP - Summary
3Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Background
- Field Programmable Gate Arrays (FPGAs) are
the leading implementation path for
Reprogrammable, High Performance Digital Signal
Processing (DSP) Applications. The performance
advantage of FPGAs over Programmable DSPs is a
driving factor for implementing DSP designs in an
FPGA. - Using VHDL and Verilog Hardware Description
Languages (HDL) is often a lengthy development
path to implement a DSP design into an FPGA. - FPGA development tools are using HDL and non-HDL
DSP Intellectual Property (IP) to reduce the
design and implementation time. This concept and
approach is successful at reducing the design and
implementation cycle and increasing productivity
in many applications. - However, High Performance DSP implementations
using dedicated HDL still provide the greatest
flexibility for implementing High Performance DSP
Algorithms WHY?
4Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Three (3) Reasons to use a dedicated HDL
Implementation Path for a High Performance DSP
Application - 1) Control Available IP cant achieve required
performance and functionality. - 2) Complexity Increasing DSP Algorithm
Complexity requires unique tailoring for the
application. - 3) Components FPGA architectures are increasing
the number of dedicated components other than
FPGA fabric (embedded multipliers, hard
microprocessors, dedicated transceivers,
application specific devices, etc). Low level
control is required to maximize these components
into a high performance design.
5Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Major Advantages and Disadvantages using the HDL
Implementation Path for High Performance DSP
Applications - Low Level Control and flexibility to achieve
required or specific performance () - Design, development and integration of various IP
cores () - Source level control of DSP design ()
- Considerable design and implementation path
relative to non-HDL implementation path (-) - Extensive Debug, Test and Verification Path (-)
- Can we reduce or eliminate any of these
disadvantages to improve productivity?
6Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- YES
- The Objectives of Automation Techniques -
Identify and apply methods useful for faster
implementation of High Performance DSP Designs. - Reduce Design and Implementation Time
- Perform Error Checking
- Develop greater insight into successful high
performance DSP Implementations by automating
techniques - Specific focus areas to achieve objectives
- DSP Algorithm Design
- HDL Coding and Synthesis
- Timing Placement
- Hardware-In-The-Loop (HITL) Test and Verification
-
- If one of these processes cannot meet required
performance, it is often necessary to back up and
apply techniques to collect data to study the
problem.
7Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Automation Techniques - Not a new concept. No
single direct formula for applying them.
Automation Techniques are a function of DSP
design and FPGA implementation processes.
Automation Techniques are a means to improve and
refine these processes. A look at the overall
design through to implementation is required.
Automation Techniques are then developed to
improve processes. Consider the following
processes and goals - Process Goal
- DSP Algorithm Design Produce a DSP Algorithm
structured for an FPGA (function). - HDL Coding and Synthesis Synthesizable DSP
functions and - performance (implementation).
- Timing Placement DSP timing and
interface performance (speed). - H/W-In-The-Loop (HITL) DSP numerical and
interface performance - Test and Verification (accuracy, speed).
- Automation Techniques can be applied to improve
these processes.
8Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Considerations for developing Automation
Techniques - 1) Technical Automation Technique(s) are often
required to go beyond the basics, and increase
technical capabilities - A substantial amount of data will be generated,
tested or analyzed to quantify performance. This
includes the DSP design (truth vectors) and FPGA
testing (DUT). - Develop greater insight into DSP Design and FPGA
Implementation. - Solve a specific problem. Current processes not
effective. - Improve DSP Design and FPGA Implementation
processes in terms - of efficiency and productivity.
- 2) Cost Development of Automation Techniques
easily provide a cost benefit for processing
large amounts of data. Other techniques may
require substantial Non-Recurring Engineering
(NRE) to design, develop and implement. In these
cases, Automation Techniques must provide
substantial benefit to justify the NRE.
Substantial effort to develop Automation
Techniques for High Performance DSP Algorithms
can often be applied when there is significant
near-term benefit (current project) or long-term
benefit (marketing new DSP algorithms with
increased functionality and/or improved
performance).
9Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- DSP Algorithm Design - The DSP Algorithm has the
greatest impact on the implementation and
performance. - Best practice matches the DSP Algorithm to the
FPGA Architecture. Knowledge of target hardware
architecture is important to reduce a DSP
Algorithm to equivalent high performance
functions within an FPGA. - The class of DSP Algorithm is significant (wide
variation) - Filter, FFT, Multiply and Accumulate (MAC),
Up/Down Converters - Carrier Recovery, Timing and Synchronization
- Direct Digital Synthesizers (DDS), Waveform
Generators - Systolic Arrays, Matrix Methods, Statistical DSP
- Beam Forming, Image Processing
- Wideband, High Speed Spectral Processing
- Full parallel (unrolled, unfolded)
implementations of iterative DSP Algorithms yield
significant increase in performance at the
expense of FPGA resources.
10Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- DSP Algorithm Design - Systolic Array Design
using the Xilinx Virtex-4 XtremeDSP Tile - Systolic Arrays are small, interconnected arrays
of DSP Processing Elements (PEs). Very useful for
many high performance DSP applications such as
Digital Filters and Matrix Processing. Systolic
arrays are typically full parallel structures
processing one data sample per clock. Used in
many VLSI designs, they can be 1-Dimensional or
Multidimensional. - Systolic array can be mapped from DSP equations
consisting of iterative algorithms that can be
unrolled (Filters, FFTs, etc.) . Latency is
higher since data flow is through each element.
However, structures of this type may be
implemented using FPGA fabric and/or dedicated
FPGA components over high speed interconnects.
1D Systolic Array
Input
Output
Processing Element (PE)
11Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- DSP Algorithm Design - Systolic Array Design
using the Xilinx Virtex-4 XtremeDSP Tile (cont.) -
- FPGA Embedded Component
- Xilinx Virtex-4 XtremeDSP Tile
- consists of two (2) DSP48 slices
- Dedicated, pipelined MULT,
- Add/Subtract, ACC, MACC,
- Shift, Divide, Square Root, etc.
- High speed, dedicated interconnects
- between DSP48 slices and to other
- XtremeDSP tiles
- Dynamically configurable functions
- (via OPMODE)
- Highest performance achieved
- w/out FPGA fabric
Processing Element (PE)
1D Systolic Array
Ref. Xilinx XtremeDSP Design Considerations User
Guide, Courtesy of Xilinx, Inc.
Input
Output
Processing Element (PE)
12Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- DSP Algorithm Design - Systolic Array Design
using the Xilinx Virtex-4 XtremeDSP Tile (cont.) -
- 1 Dimensional Systolic Array
- FIR filter with constant coefficients,
relatively easy - to manage design and implementation.
1D Systolic Array FIR Filter
with
1D Systolic Array FIR Filter
Input
Output
Processing Element (PE)
Routing over dedicated, high speed interconnect
13Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- DSP Algorithm Design - Systolic Array Design
using the Xilinx Virtex-4 XtremeDSP Tile (cont.) -
- 2 Dimensional Systolic Array Increasing
capabilities in DSP applications at the expense
of increasing algorithm complexity.
2D Systolic Array N-Point FFT
2D Systolic Array FFT
Routing over FPGA fabric
Input
with
Reduce to Even and Odd PEs
Apply DSP Algorithm Automation Techniques to
manage complex DSP design, debugging, test and
validation.
Output
14Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- DSP Algorithm Design Automation Techniques
- DSP Design Validation, Quantifying Required
Algorithm Performance and Limitations Automating
tools and simulations to perform extensive
end-to-end test, data reduction and analysis, and
algorithm validation. Automated techniques are
useful in DSP designs where algorithm confidence
level over a broad performance range requires
substantial baseline of test data. Techniques may
utilize scripts or custom programs (MATLAB,
C/C, etc.) to verify algorithm numerical
accuracy or maximum error, using simulated or
actual test data. Methods used to validate a DSP
algorithm are very important. - Testing and Debugging DSP Modular Functions
Automating generation of truth data or vectors
for test and analysis of synthesizable DSP
functional building blocks. - Algorithm Strength Reduction Testing and
evaluating alternate, equivalent DSP Algorithms
and mathematically equivalent functions
(symmetry, periodicity, transform reduction,
etc.). Functions that will have a higher
performance and/or consume fewer FPGA resources.
15Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- HDL Coding and Synthesis
- HDL Coding style directly impacts FPGA
Implementation. - Good Coding techniques use HDL Coding Styles that
support Scalable and Modular DSP designs (use of
generics, VHDL generate, etc.). Important to
tailor HDL coding to maximize Synthesis Tool. - Full Parallel implementations often require
dividing up the DSP processing into small
operations that can be performed during very
short clock periods. This amounts to isolating
functions or breaking up processing over several
clock cycles at increased latency (and additional
FPGA resources) to maintain throughput. - Maximize DSP processing onto high-speed
interconnects for dedicated DSP components, such
as the XtremeDSP tile, whenever possible.
16Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- HDL Coding and Synthesis Automation Techniques
- Autocoding Functions Autocoding routines can be
used to automatically implement (or change) HDL
code - Custom DSP Functions that must be divided up
across several clock cycles to operate at maximum
speed - Clocking Techniques, Positive and Negative Edge
HDL implementations - Built-In-Test (BIT) Vector Generators / Vector
Receivers support debug, test and verification
up to the system level. Place multiple BIT blocks
at full throughput. Useful for debugging,
analysis and insight into successful High
Performance DSP Designs. Can be combined with
HITL testing for performance verification. - HDL Converters convert code (interpret code)
from another language to Synthesizable HDL.
Effective converter tools may be implemented for
porting algorithms to FPGA platforms.
17Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- HDL Coding and Synthesis Automation Techniques
- Synthesis Profiling Batch processing multiple
Synthesis runs to obtain insight into the
synthesis of a design - Establish desired variations in an HDL design for
analysis. Generate multiple versions or
incrementally modify HDL parameters in the design
via C/C, script or equivalent code. - Batch process Synthesis Tool with synthesis
constraints and obtain synthesis report. Batch
processing via script or command line, refer to
synthesis tool manual, such as the Xilinx
Synthesis Technology (XST) User Guide for an XST
design flow. - Extract desired performance parameters from the
Synthesis Report via C/C, script or equivalent
code. - (continued next slide)
18Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- HDL Coding and Synthesis Automation Techniques
- Synthesis Profiling (continued)
- Repeat process until sufficient information from
multiple synthesis runs are collected. - Analyze the results of the multiple Synthesis
runs. Profile the performance impact of
parameters on the synthesis of the design. - Useful for profiling effect of DSP Design and HDL
coding parameters on Synthesis, performing design
tradeoffs, best-match analysis between DSP design
and FPGA Implementation, and obtaining insight
into successful High Performance DSP Designs. - Combine with Timing and Placement Profiling for
analyzing the entire FPGA implementation flow.
FPGA Implementation Tools are usually well suited
for command line processing of the entire
implementation flow (example Xilinx XFLOW).
19Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Timing and Placement
- Timing and Placement constraints direct the FPGA
implementation tools and control the maximum
speed and placement of the design. These
constraints will directly impact many important
performance criteria such as design margin, DSP
throughput, pin placement, and data I/O. - Effective methods exist such as the use of
Relationally Placed Macros (RPMs) to create
instances of specific DSP functions and direct
their placement within the FPGA. - Timing Analysis reveals details of the speed of a
given implementation and design margin against
performance requirements. The Timing Analysis
must be carefully interpreted to draw conclusions
and identify where recoding and/or change to
synthesis, timing and placement constraints is
necessary. - Timing Analysis also reveals which functions
within the DSP algorithm are the issue and may
not be achievable given fixed resources (FPGA
type) and performance requirements. This
indicates that a fundamental change in the DSP
function or HDL coding is required.
20Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Timing and Placement (cont.)
- High Performance DSP designs often require
considerable attention to the data I/O for signal
processing, in addition to the internal
functionality of the algorithm. - Successful High Performance DSP designs carefully
match DSP functionality to high speed I/O lines.
Interfacing the FPGA to other high performance
components has to remain a consideration through
design and implementation. - Timing and Placement will take a substantial
amount of time for large DSP implementations.
Most tools are capable of running at the command
line, which supports batch processing. - Many timing and placement constraints are
available for FPGA implementation. Careful
interpretation and selection of timing and
placement constraints is required.
21Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Timing and Placement Automation Techniques
- Timing and Placement Profiling Same as Synthesis
Profiling with a few additional notes - Establish desired variations to constraints
and/or pin placement of the design. Profile
timing and placement constraints against a single
synthesized design. Profiling a single set of
constraints against multiple designs amounts to
processing entire flow for different designs. - Batch process Translation, Mapping and Place
Route Tools with timing and placement constraints
and obtain performance parameters. Use C/C,
script or equivalent code used to extract desired
performance parameters from these reports. - Repeat process until sufficient information from
multiple runs are collected.
22Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Timing and Placement Automation Techniques
- Timing and Placement Profiling (continued)
- Analyze and Interpret the results of multiple
runs. Timing reports are available after the MAP
and PAR processes. - Profile timing and placement constraints only
when multiple runs will provide insight into
performance. Such as being combined with
synthesis profiling over the entire
implementation flow. - Using timing analysis tools is a better approach
than timing and placement profiling for debugging
a single implementation that does not meet timing.
23Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Hardware-In-The-Loop (HITL) Test and Verification
- HITL Direct Input and/or Output through one or
more interfaces with the FPGA - Analog-To-Digital Converter (ADC)
- Digital-To-Analog Converter (DAC)
- On Chip Debugger (Dedicated IP cores for data
capture and transfer via JTAG, local bus, I/O
pins) - Logic Analyzer interface to pins
- HITL is a real-time test configuration. HITL
provides a significant advantage in terms of
incremental design, test and verification - Real-Time Divide-and-Conquer Debugging and Test
of modules and subsystems - Inject and/or transmit real-time signals
(interface testing) - Event and anomaly capture
- Practical Performance Benchmarking, HITL used as
a - True Measure-Of-Performance
24Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Hardware-In-The-Loop (HITL) Test and Verification
Automation Techniques - HITL Automation Techniques are used to automate
generating, collecting and processing large
amounts of test data. This supports design
validation, on-chip debugging, test and
verification - Test Equipment Utilize COTS and/or custom
automation software to control test instruments
and inject input or store/analyze output.
Supports interface and end-to-end testing - HITL Data Reduction and Analysis Collection and
batch processing of large amounts of HITL data - HITL Generated Performance Curves Useful for
quantifying actual performance data (Threshold
Sensitivity, Frequency Stability, Error, etc.),
compare to theoretical for design insight
25Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Case Study - Direct Digital Synthesizer (DDS)
using Xilinx Virtex-4 XtremeDSP - Objective High Speed, High Resolution Multimode
DDS for - Communications, Radar, Navigation, Tracking
- SIGINT, ELINT
- High Speed Spectral Processing
- Software Defined Radio (SDR)
- EW, ECM, Self-Protection Jamming
- Performance (Algorithm FPGA Only Pre DAC)
- Frequency Resolution lt 1 Hz
- Frequency Tuning Speed lt 1 uSec
- Spurious lt -100 dBc
- Harmonics lt -100 dBc
- Maximum Clock Speed gt 200 MHz
26Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Case Study - DDS using Xilinx Virtex-4 XtremeDSP
-
DDS Block Diagram
(I) Inphase
DDS Transform
Phase Accumulate
Phase Per CLK
(Q) Quadrature
AM Mod
PM Mod
FM Mod
27Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Case Study - DDS using Xilinx Virtex-4 XtremeDSP
-
Automation Technique DSP Algorithm Verification
and Analysis Tools
DDS Block Diagram
(I) Inphase
DDS Transform
Phase Accumulate
Phase Per CLK
(Q) Quadrature
AM Mod
PM Mod
FM Mod
Automation Technique HDL Coding and Synthesis
One time handcrafting required to meet
performance. Now that a solution is verified, a
scalable Autocoding function will be developed to
implement this solution into the next High
Performance DSP design
Note Although Timing Placement was important
and required adjustment, no Automation Techniques
were necessary to meet Performance Requirements
28Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Case Study - DDS using Xilinx Virtex-4 XtremeDSP
-
DDS Output Power Spectrum
Automation Technique HITL Debugging, Testing,
Performance Analysis and Verification
29Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Case Study - DDS using Xilinx Virtex-4 XtremeDSP
-
Automation Technique DSP Analysis and HITL
Performance Analysis provides insight into this
design. This class of DDS capable of faster
frequency tuning speed, higher frequency
resolution, and clock speed greater than 300 MHz
using Register Balancing and Double Data Rate
(DDR) techniques.
DDS Spectrogram
30Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Summary
- Automation Techniques can be applied to improve
DSP design and FPGA implementation processes.
Automation Techniques are a means to shorten
development time, improve efficiency and manage
substantial design, debugging, test and
verification efforts. There is no direct formula
for applying them. Examine the DSP design
techniques, FPGA implementation flow and tools
used for a project. Do not blindly apply
automation techniques. Look for processes where a
benefit can be realized by applying Automation
Techniques. Refer to the Summary of Automation
Techniques matrix, or create new techniques to
meet requirements. - Objectives of Automation Techniques
- Reduce Design and Implementation Time
- Perform Error Checking
- Develop greater insight into successful high
performance DSP Implementations by automating
techniques
31Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Summary (cont.)
- Specific focus areas to achieve objectives
- DSP Algorithm Design
- HDL Coding and Synthesis
- Timing Placement
- Hardware-In-The-Loop (HITL) Test and Verification
- Automation Techniques may be required to go
beyond basic DSP design and FPGA implementation.
Development of Automation Techniques easily
provide a cost benefit for processing large
amounts of data. Other Automation Techniques may
require substantial NRE. For these cases,
techniques must provide substantial benefit to
the design and implementation process.
32Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Summary (cont.)
- Designing effective Automation Techniques for
High Performance DSP Implementations requires
understanding of DSP Design and FPGA
Implementation Tools. - Automation Techniques can be used to profile
Synthesis, Timing and Placement of FPGA
Implementations. Careful interpretation of this
data is required. - Automation Techniques can be used for High
Performance DSP Designs that require substantial
amounts of data, test, analysis and verification.
33Automation Techniques for Fast Implementation of
High Performance DSP Algorithms in FPGAs
- Summary (cont.)
- Summary of Automation Techniques