Title: Reconfigurable Computing
1Reconfigurable Computing
- Dominique LAVENIER
- IRISA / CNRS
- Rennes
- lavenier_at_irisa.fr
2Reconfigurable Computing Idea (1)
micro processor
ASIC
FPGA
programmable slow
not programmable fast
program
architecture
3Reconfigurable Computing Idea(2)
Y(i) X(i-k) W(k)
Sequence of pre-defined instructions
Assembly of boolean functions
memory
memory
Von Neumann model
4Talk overview
- FPGA Technology
- Reconfigurable Architectures
- Reconfigurable Processor Arrays
- Perspectives
5FPGA in short
- FPGA Field Programmable Gate Array
- Introduced by Xilinx in 1985
- Implement a few millions of logic gates
- Market
2500 - 2000 - 1500 - 1000 - 500
dollars in million
6FPGA Market Share - Q1 1997
7FPGA Structure
I/O
Logic block
Switching box
Routing network
8CLB(configurable logic block)
REG
RAM
Look-up table
9(No Transcript)
10Conventional FPGA Tile
11XC4K Interconnect Details
12 Traditional Design Flow
VHDL EDIF
RTL
a few minutes to a few hours
LUT Mapping
Placement
Routing
Tech. Indep. Optimization
Bitstream Generation
Config. Data
13FPGA Component Use
- FPGA components are used for
- ASIC substitution
- Rapid prototyping
- VHDL simulation
- Reconfigurable Computing
- . . .
14Reconfigurable Architectures
- Functional Unit
- Co-processor
- Accelerator
- System
15Reconfigurable Functional Unit
- FPGA integrated into the datapath
- Idea
- tailored the operations/instructions
- to the application
Level of Reconfigurability Instructions
16Spyder Project
- C. Iseli (Swiss Federal Institute of Technology,
Lausanne)
RFU1
RFU2
registers
registers
RFU3
17Why it does not work ?
- RFUs are slow
- between 5 to 10 times slower than standard
functional units - No programming tools
- the synthesis of specific operators must be
automatic
18Reconfigurable Co-Processor
- Close connection to the CPU
- Integrated on the same die
- Not (yet?) available
Level of Reconfigurability Functions
19ArMen
P
M
P
M
P
M
P
M
20Reconfigurable Accelerator
UAL
- Communicate through I/O bus
- External board
- Matrix of FPGA components
- with external RAM
- Commercial boards available
MEM
21PAM boards
- PAM Programmable Active Memory)
- J. Vuillemin, P. Bertin, D. Roncin (DEC PRL)
- Perle-0 (87), Perle-1 (91), Pamette (95),
Host computer
FPGA
memory
22(No Transcript)
23Reconfigurable System
System on Chip - 1 reconfigurable zone connected
to several components - available soon Virtex
PowerPC (Xilinx/IBM)
24Architectures - Applications
- Functional Unit
- Co-processor
- Accelerator
- System
- ? ? ?
- Intensive computation
- cryptography, image processing,
- DNA sequencing,
- Embedded systems
- mobils of 3rd generation, ...
25Reconfigurable Processor Arrays
- Principle
- parallelize intensive computation on an array of
hundred (thousand) of tailored processors - Performance come from
- the parallelization
- the customization
26Parallelization
initial code
... ... ... for ( ) for ( ) for (
)
27Customization
- data-path width
- dedicated operator
- parallelism
A
C
B
D
28Design of Reconfigurable Processor Arrays
- fast design time thanks to
- regular structure
- specify one processor, then replicate
- local interconnection
- optimize place-and-route step
29Reconfigurable Processor ArraysApplications
- Image processing
- Signal processing
- Bio-computing
- Crypyography
- Text processing
- ...
Today mostly integer applications
30Performance examples
- DNA search
- PeRLe-1 board (16 Xilinx 3090 - 1991)
- speed-up 50
- K-means clustering
- Wildforce board (4 Xilinx 4036 - 1997)
- speed-up 100
- PPI algorithm
- Spyder board (1 Xilinx V800 - 2000)
- speed-up 200
host
same technology
31Limitations
host
- host-board data bandwidth
- bottleneck
- programming tools
- automatic parallelization
- partitioning
- hardware generation
- portability !
32Perspectives
- Technology
- Applications
- Architecture
33Exponential Growth in Density
LUT
logic cells
logic gates
1 000 000 100 000 10 000 1000
12 M 1.2 M 120 K 12 K
1994 1996 1998 2000 2002 2004
2006
34Technology
2000
2002
1998
2005
30-50M gates
Xilinx Virtex XCV300 (0.3M gates)
Xilinx Virtex II (10M gates)
400 Nios
Xilinx Virtex XCV3200 (2M gates)
- Altera APEX20K1500 (2.4 M gates)
- 30 x 32-bit Nios processor (80K gates)
35Applications
- until now
- performance have been demonstrated on integer
applications with a high degree of parallelism - from now
- it becomes  reasonable to investigate the
implementation of floating point applications
36Floating-point operators
- Estimation based on current research at IRISA
- Component Xilinx XCV1000 (1 Mgates)
- Pipelined operators
Simple precision
Double precision
adder area 3
5 multiplier area 5
20 frequency 50Mhz
100Mhz
37Floating point performance
2000
2002
1998
2005
5 FPA 25 MHz
0.1 1 10 100
25 FPA 50 MHz
125 FPA 100 MHz
500 FPA 200 MHz
Giga Flops
FPA double precision floating-point adder
38Architecture
- Today accelerator board
- restricted bandwidth
- parallelism on 1D array
39Architecture
Fast dual-port memory
40Architecture
An alternative way of using the one
billion-transistor processors of the next decade
41Conclusion
- The technology is available for reconfigurable
computing - 30-50 M gates in 2005
- Application domains are increasing
- floating point
- No programming tools
- model ?
- portability ?