Title: DSP for FPGA
1DSP for FPGA
- SYSC5603 (ELG6163) Digital Signal Processing
Microprocessors, Software and Applications - Miodrag Bolic
2Objectives
- Comparison between PDSP and FPGA
- Virtex II Pro
- Altera Stratix FPGA
- Stratix DSP Block and its configuration
- Altera design flow
3What Is an FPGA?
- Field Programmable Gate Array
- Device that Has a Regular Architecture (Set of
Blocks) that Can Be Programmed for Various
Functions - Glue Logic
- Customizable Hardware Solution
- Configurable Processors
4Why Use FPGAs in DSP Applications?
- 10x More DSP Throughput Than DSP Processors
- Parallel vs. Serial Architecture
- Cost-Effective for Multi-Channel Applications
- Flexible Hardware Implementation
- Single-Chip Solution
- System (Hardware/Software) Integration Benefits
FPGA
SoftwareEmbeddedProcessor
5DSP Processors vs. FPGAs
High Speed DSP Processor
High Level of Parallel Processing in FPGA
- Can implement hundreds of MAC functions in an
FPGA - Parallel implementation allows for faster
throughput - 200 Tap FIR Filter would need 1 clock cycle per
sample
- 1-8 Multipliers
- Needs looping for more than 8 multiplications
- Needs multiple clock cycles because of serial
computation - 200 Tap FIR Filter would need 25 clock cycles
per sample with an 8 MAC unit processor
6Extending Range of Altera Reconfigurable DSP
Solutions
New!
600 -
Performance (MMACs/sec)
100 -
Complete Hardware Implementation
Embedded Processors
Embedded Processors Hardware Acceleration
7Comparison of DSP Devices
Data Programmable DSP Processors Reconfigurable DSP
Benefits Easy to Use Programmed Via C-Code or Assembly Fast Development Time Easy to Use Programmed via C-Code, Assembly, or HDL Efficient for Recursive Algorithms Using DSP IP Cores Higher Levels of Integration
Weaknesses Fixed Architecture Inefficient for Highly Recursive Algorithms Unless Hardware Accelerated Potential Bus Bottlenecks Other Devices (FPGAs) Often Used on Board for Other Functions Longer Development Time (But Getting Shorter!)
8Objectives
- Comparison between PDSP and FPGA
- Virtex II Pro
- Altera Stratix FPGA
- Stratix DSP Block and its configuration
- Altera design flow
9Stratix EP1S10 2
10(No Transcript)
11(No Transcript)
12TriMatrix Memory 1
Dedicated External Memory Interface
M512 Blocks
M4K Blocks
M-RAM
- Small FIFOs
- Shift Register
- Rake Receiver Correlator
- FIR Filter Delay Line
- Packet / Data Storage
- Nios Program Memory
- System Cache
- Video Frame Buffers
- Echo Canceller Data Storage
- Header / Cell Storage
- Channelized Functions
- ATM cellpacket processing
- Nios Program Memory
- Look-Up Schemes
- Packet Cell Buffering
- Cache
More Bits For Larger Memory Buffering
512 Kbits per block parity
4 Kbits per block parity
512 bits per block parity
More Data Ports for Greater Memory Bandwidth
13Memory Bandwidth SummaryStratix Device Family 1
Device Total RAM Bits M-RAM Blocks M4K Blocks M512 Blocks MaximumBandwidth (Mbps)
EP1S10 920,448 1 60 94 1,245,024
EP1S20 1,669,248 2 82 194 2,096,928
EP1S25 1,944,576 2 138 224 2,894,400
EP1S30 3,317,184 4 171 295 3,750,192
EP1S40 3,423,744 4 183 384 4,384,800
EP1S60 5,215,104 6 292 574 6,762,528
EP1S80 7,427,520 9 364 767 8,784,720
14Logic Element (LE) 2
LUT Chain Input
Register Chain Input
Register Control Signals
addnsub
cin
(2)
data1
4-Input LUT
Sync Load Clear Logic
data2
Row, Column DirectLink Routing
data3
data4
Local Routing
Register Feedback
Register Chain Output
LUT Chain Output
- Note
- Functional Diagram Only. Please See Datasheet
for more Details. - Addnsum data1 connected via XOR logic
15Dynamic Arithmetic Mode
Register Chain Input
Register Control Signals
LAB Carry-In
Carry-In Logic
Carry-In0
Carry-In1
addnsub
data1
Sum Calculator
Sync Load Clear Logic
data2
Row, Column DirectLink Routing
data3
Carry Calculator
Local Routing
Carry-Out Logic
Carry-In0
Carry-In1
Register Chain Output
Carry-Out1
Carry-Out0
Note Functional Diagram Only. Please See
Datasheet for more Details.
16Logic Array Blocks (LAB) 2
Control Signals
- 10 LEs
- Local Interconnect
- LAB-Wide Control Signals
4
4
4
4
30 LAB Input Lines 10 LE Feedback Lines
4
Local Interconnect
4
4
4
4
4
17Avalon Switch Fabric Contents
- Avalon Switch Fabric provides the following to
peripherals it connects - Data-Path Multiplexing
- Address Decoding
- Wait-State Generation
- Dynamic Bus Sizing
- Interrupt-Priority Assignment
- Latent Transfer Capabilities
- Streaming Read and Write Capabilities
- Avalon Switch Fabric tailors transactions to the
characteristic of peripherals that are attached
18SOPC Design Example
CPU 32 Bit
Inst Master
Data Master
Avalon Switch Fabric
Allows for Masters and Slaves to communicate
without knowledge of each others interface
details
UART
Instruction Memory 32-bit Data path
Avalon Tri-State Bridge
VGA Controller
Data Memory 32-bit Data path
External FLASH 1 MB 16-bit Datapath
External SRAM 256 KB 32-bit Datapath
19Data Path Multiplexing Slave Arbitration
- Data-Path Multiplexing
Avalon Switch Fabric
MUX
2- Slave Arbitration
Arbiter
UART
Instruction Memory 32-bit Data path
Avalon Tri-State Bridge
VGA Controller
Data Memory 32-bit Data path
External FLASH 1 MB 16-bit Datapath
External SRAM 256 KB 32-bit Datapath
3- Address Decoding
20Objectives
- Comparison between PDSP and FPGA
- Virtex II Pro
- Altera Stratix FPGA
- Stratix DSP Block and its configuration
- Altera design flow
21DSP Blocks
- Eight 9 9 bit multipliers
- Four 18 18 bit multipliers
- One 36 36 bit multiplier
22DSP Blocks (cont.)
- The DSP block consists of
- A multiplier block
- An adder/subtractor/accumulator block
- A summation block
- An output interface
- Output registers
- Routing and control signals
23Stratix DSP Blocks
- High Performance Dedicated Multiplier Circuitry
- 18x18 Functions at 280 MHz
- Variable Operand Widths with Full Precision
Outputs - 9x9 (8 Max.)
- 18x18 (4 Max.)
- 36x36 (1 Max.)
- Add, Accumulate orSubtract
- Signed UnsignedOperations
- Dynamically Changebetween Add Subtract
- Supports DSP RequirementsIncluding Complex
Numbers
24DSP Block for 18 x 18-bit Mode
25Shift Register Chain
26Adder/Output Block
27Time-Domain Multiplexed FIR Filters
28Operation of TDM Filter
29(No Transcript)
30Resource Savings with DSP Blocks
- DSP Block
- Reduces LE Usage
- Reduces Routing Congestion
- Reduces Power
- Maintains Performance
90 of your problems are hidden under the surface!
18
18
18
18
SAVES 652 ROUTING NETS!
X
X
36
36
36
36
38
31Design Flow
32 Design Flow Overview
- Create Design in Simulink Using Altera Libraries
- Simulate in Simulink
- Add SignalCompiler to Model
- Create HDL Code Generate Testbench
- Perform RTL Simulation
- Synthesize HDL Code Place Route
- Program Device
- Signal Tap II Logic Analyzer
33Step 1- Create Design in Simulink Using Altera
Libraries
- Drag Drop Library Blocks into Simulink Design
Parameterize Each Block
34Parameterization of IP Megacores
35Step 2 - Simulate in Simulink
36Step 3 - Add Signal Compiler to Model to
Generate HDL code
- APEX20K/E/C
- APEX II
- Stratix Stratix GX
- Cyclone ACEX 1K
- Mercury
- FLEX10K FLEX 6000
- DSP Boards
- Leonardo Spectrum
- Synplify
- Quartus II
Speed vs. Area
Testbench Generation
Message Window
37Step 4 - Create HDL Code Generate Testbench
AltrFir32.mdl
Enable "Generate Stimuli for VHDL Testbench"
Button
AltrFir32.vhd
38HDL Code Generation
39DSP Builder Report File
- Lists All Converted Blocks
- Port Widths
- Sampling Frequencies
- Warnings Messages
40Step 5 Perform RTL Simulation ( ModelSim )
- Set working directory (File gt Change Directory)
- Run TCL file (Tools gt Execute Macro)
41 Perform Verification
ModelSim vs Simulink
42Step 6 - Synthesize HDL Place Route
- Leonardo Spectrum
- Synplify
- Quartus II
Synthesis
Quartus II Fitter
43Step 7 Program Device
Download Design to DSP Development Kits
44Stratix DSP Development Board
Nios Expansion Prototype Connector
MAX 7000 Device
Prototyping Area
D/A Converters
Mictor-Type Connectors for HP Logic Analyzers
A/D Converters
Analog SMA Connectors
40-Pin Connectors for Analog Devices
Texas Instruments Connectors on Underside of Board
45Stratix DSP Board Key Features
- Stratix EP1S25F780C5 Device (Starter Version)
- Stratix EP1S80B956C7 Device (Professional
Version) - Analog I/O
- Two 12-bit, 125 MHz A/D Converters
- Two 14-bit, 165 MHz D/A Converters
- Digital I/O
- Two 40-pin Connectors for Analog Devices A/D
Converter Evaluation Boards - Connector for TI TMS320 Cross-Platform Daughter
Card - 3.3V Expansion/Prototype Headers
- RS-232 Serial Port
- Memory
- 2 Mbytes of 7.5-ns Synchronous SRAM
- 32 Mbytes of FLASH
46Step 8 - SignalTap II Logic Analyzer
- Embedded Logic Analyzer
- Downloads into Device with Design
- Captures State of Internal Nodes
- Uses JTAG for Communication
47SignalTap II Logic Analyzer
Analysis of Imported Data
Imported Data
Imported Plot