CprE / ComS 583 Reconfigurable Computing - PowerPoint PPT Presentation

About This Presentation
Title:

CprE / ComS 583 Reconfigurable Computing

Description:

CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #15 Midterm Review – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 36
Provided by: ias104
Category:

less

Transcript and Presenter's Notes

Title: CprE / ComS 583 Reconfigurable Computing


1
CprE / ComS 583Reconfigurable Computing
Prof. Joseph Zambreno Department of Electrical
and Computer Engineering Iowa State
University Lecture 15 Midterm Review
2
Project Proposals
  • Group 1 FPGA Implementation of Frequency-Domain
    Audio Effects Processor
  • Five-band equalizer
  • Frequency shifter

3
Project Proposals (cont.)
  • Group 2 Transparent FPGA-Based Network Analyzer
  • Layer I pass-through
  • Layer II passive analyzer

4
Project Proposals (cont.)
  • Group 3 FPGA-Based Library Design for Linear
    Algebra Applications
  • Floating-point sparse matrix-vector
    multiplication
  • Floating-point banded matrix-vector
    multiplication
  • Floating-point lower-upper matrix decomposition

5
Project Proposals (cont.)
  • Group 4 An Improved Approach of Configuration
    Compression for FPGA-Based Embedded Systems
  • Improved compression algorithms
  • LUT-reordering techniques

6
Project Proposals (cont.)
  • Others Projects
  • Group 5 FPGA Ternary Data Conversion
  • Group 6 Analysis of Sobel Edge Detection
    Implementations
  • Group 7 Design and Analysis of Artificial
    Neural Networks on FPGAs
  • Reminders
  • 11/16 Project Updates (10 minutes)
  • 12/5-12/7 Final Presentations (25 minutes)
  • 12/15 Final Reports

7
Midterm Review
  • Using the Silicon


PE
PE
PE
PE
PE
MMX
PE
PE
PE
SSE
PE
PE
PE
FFT
AES


MPP
More Cache
CISC
PE
Reconfigurable Fabric
PE
PE


Superscalar
Vector
Reconfigurable Processor
8
Computational Density (Qualitative)
Actel ProASIC
Intel Pentium 4
  • FPGAs can complete more work per unit time than a
    processor or DSP
  • Less instruction overhead
  • More active computation onto the same silicon
    area (allows for more parallelism)
  • Can control operations at the bit level (as
    opposed to word level)

9
Coupling in a Reconfigurable System
  • Many places to put reconfigurable computing
    components
  • Most implementations involve multiple discrete
    devices
  • How should these devices be connected together?

10
Generic FPGA Architecture
  • FPGA Field-Programmable Gate Array
  • Input/Output Buffers (IOBs)
  • Configurable Logic Blocks (CLBs)
  • Programmable interconnect mesh

Island-style FPGA architecture
11
FPGA Technology
  • Various FPGA programming technologies (Anti-fuse,
    (E)EPROM, Flash, SRAM)
  • SRAM most popular

12
LUTs and Digital Logic
  • k inputs ? 2k possible input values
  • k-LUT corresponds to 2k x 1 bit memory
  • Truth table is stored
  • 22k possible functions O(22k / k!) unique

F A0A1A2 A0A1A2 A0 A1 A2
13
Architectural Issues AhmRos04A
  • What values of N, I, and K minimize the following
    parameters?
  • Area
  • Delay
  • Area-delay product
  • Assumptions
  • All routing wires length 4
  • Fully populated IMUX
  • Wiring is half pass transistor, half tri-state

14
FPGA Arithmetic
  • Traditional microprocessors, DSPs, etc. dont use
    LUTs
  • Instead use a w-bit Arithmetic and Logic Unit
    (ALU)
  • Carry connections are hard-wired
  • No switches, no stubs, short wires

(2)
(1) AND2 OR2 XOR2
A
B
Cin
3-LUT
3-LUT
Sum
Cout / Cin
A
B
(2) ADD SUB CMP
3-LUT
3-LUT
Sum
Cout
15
FPGA Arithmetic (cont.)
  • Hard-wired carry logic support

Altera FLEX 8000
Xilinx XCV4000
16
Arithmetic (cont.)
X0
X1
X2
X3
Y0
  • Carry save multiplication

Y1
X1
X2
X0
X3




Y2




Y3




Z0
Z1
Z2
17
LUT-Based Constant Multipliers
10101011 x NNNNNNNN
AAAAAAAAAAAA (N 1011 (LSN)) BBBBBBBBBBBB
(N 1010 (MSN)) SSSSSSSSSSSSSSSS Product
N0N7
A0A11
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT

N0N7
S0S15
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
4-LUT
B4B15
  • Constants can be changed in the LUTs to program
    new multipliers

18
Capacity Trends
Virtex-5 550 MHz 24M gates
Virtex-II Pro 450 MHz 8M gates
Virtex-4 500 MHz 16M gates
Virtex-II 450 MHz 8M gates
Spartan-3 326 MHz 5M gates
Virtex-E 240 MHz 4M gates
Xilinx Device Complexity
Virtex 200 MHz 1M gates
XC4000 100 MHz 250K gates
Spartan-II 200 MHz 200K gates
Spartan 80 MHz 40K gates
XC3000 85 MHz 7.5K gates
XC5200 50 MHz 23K gates
XC2000 50 MHz 1K gates
1985
1991
1987
1995
1998
1999
2000
2002
2003
2004
2006
Year
19
Splash 1 Architecture
VME Bus
VSB Bus
Interface
Interface
FIFO IN
Control
FIFO OUT
F3
F0
F1
F2
F31
F28
F29
F30
M3
M0
M1
M2
M31
M28
M29
M30
M4
M7
M6
M5
M24
M27
M26
M25
F4
F7
F6
F5
F24
F27
F26
F25
F11
F8
F9
F10
F23
F20
F21
F22
M11
M8
M9
M10
M23
M20
M21
M22
M12
M15
M14
M13
M16
M19
M18
M17
F12
F15
F14
F13
F16
F19
F18
F17
20
FPGA-based Router
  • FPX module contains two FPGAs
  • NID network interface device
  • Performs data queuing
  • RAD reprogrammable application device
  • Specialized control sequences

21
Mesh Topology
  • Chips are connected in a nearest-neighbor pattern
  • Simplicity is key
  • Linear array is essentially a 1-dimensional mesh

22
Other Topologies
  • Crossbar topology
  • Devices A-D are routing only
  • Gives predictable performance
  • Potential waste of resources for near-neighbor
    connections

23
Logic Emulation
  • Emulation takes a sizable amount of resources
  • Compilation time can be large due to FPGA compiles

24
Systolic Architectures
  • Goal general methodology for mapping
    computations into hardware (spatial computing)
    structures
  • Composition
  • Simple compute cells (e.g. add, sub, max, min)
  • Regular interconnect pattern
  • Pipelined communication between cells
  • I/O at boundaries

x

x
min
x
x
c
25
Finite Impulse Response
  • Sequential
  • Memory bandwidth per output 2k1
  • O(k) cycles per output
  • O(1) hardware
  • Systolic
  • Memory bandwidth per output 2
  • O(1) cycles per output
  • O(k) hardware

xi
x
x
x
x
w1
w2
w3
w4




yi
26
Matrix-Vector Product
t 4
a41
a23
a23
a14

t 3
a31
a22
a13


t 2
a21
a12



t 1
a11




x1
x2
x3
x4
xn
y1
t n

y2
t n1
y3
t n2
y4
t n3
27
Circuit Netlist and Mapping
28
Placing and Routing
FPGA
Programmable Connections
29
Next Steps
LIBRARY ieee USE ieee.std_logic_1164.all
ENTITY implied IS PORT ( A, B IN
STD_LOGIC AeqB OUT STD_LOGIC ) END
implied ARCHITECTURE Behavior OF implied
IS BEGIN PROCESS ( A, B ) BEGIN IF A B
THEN AeqB lt '1' END IF END PROCESS
END Behavior
  • VHDL / VHDL for Synthesis

30
HW/SW Co-Design
ARMulator
Modelsim
ARM core simulator
ARMulator API
Modelsim FLI
HDL simulator
AHB Slave I/F
ARM Core
Comm. Buffer Socket Handler
Comm. Buffer
AHB Slave I/F
SOCKET 1
AMBA
AHB Master I/F
Cache
ASIC / FPGA
Mem. Access Socket Handler
AHB Master I/F
SOCKET 2
ARM Local Memory
Shared Memory
AHB Slave I/F
31
Multi-Context FPGAs
32
Function Unit Architectures
  • RaPiD Reconfigurable Pipelined Datapath
  • Linear array of function units
  • Function type determined by application
  • Function units are connected together as needed
    using segmented buses
  • Data enters the pipeline via input streams and
    exits via output streams

33
High-Level Compilation
C Program
C Libraries on various Targets
SUIF frontend
Directives and Automation
HW / SW Partitioner
C to RTL VHDL/Verilog
C to RTL VHDL/Verilog
SUIF to GCC
VHDL to FPGA Synthesis
VHDL to ASIC Synthesis
GCC compiler for embedded
Object code for Embedded (SA)
Binaries for FPGAs (Xilinx)
Chip layouts (0.18u TSMC)
34
Other Topics?
  • Second course survey next week
  • Provide general feedback, suggest additional
    topics

35
Midterm Exam
  • Three questions
  • Review
  • Analysis
  • Extension
  • Any paper mentioned in class is fair game
  • Due in 48 hours (10/12 200pm)
  • No class on Thursday!
  • Some restrictions
  • Work alone
  • Can ask if something is unclear (what does this
    mean? questions, not how do I do this?
    questions)
  • No late submissions strict WebCT deadline
Write a Comment
User Comments (0)
About PowerShow.com