Title: A SelfReconfigurable Gate Array Architecture
1A Self-Reconfigurable Gate Array Architecture
Dept. of EE-Systems University of Southern
California
2Outline
- Introduction to Self-Reconfiguration
- SRGA Architecture
- Overview
- Context switch
- Memory Access
- Routing using Self-Reconfiguration
- Implementation Results
- Conclusion
3Outline
- Introduction to Self-Reconfiguration
- SRGA Architecture
- Overview
- Context switch
- Memory Access
- Routing using Self-Reconfiguration
- Implementation Results
- Conclusion
4Motivation
- Use of reconfigurable devices (mainly FPGAs) for
general purpose computation - Embedded applications are a major area
- Key advantage reconfigurability
- Adapt configured logic to suit application
requirements - Reconfigurable devices are not reconfigurable
enough - Reconfiguration time is high
- Mapping time is high
- Thus runtime reconfiguration takes a long time
- Self-Reconfiguration can solve above problems
- What is Self-Reconfiguration?
5Configuration Context
- Configuration context A set of configuration
bits that completely configures a device
6Conventional FPGA
- Configuration context A set of configuration
bits that completely configures a device
7Self-Reconfigurable DeviceBasic Features
- Configuration memory to store two or more
contexts - Configurable logic should be able to
- switch contexts,
- access configuration memory
Context 1
Context 2
Configurable logic
8Self-Reconfiguration Example
- KMP pattern matching algorithm
- Construct FSM (Finite State Machine) based on
pattern - Use FSM to efficiently search text
0100 0011 1010
- Configuration bits are generated at runtime
9Advantages of Self-Reconfiguration
- Reconfiguration time is reduced
- Mapping time is reduced
- Thus runtime reconfiguration is fast
- ns-us range
- Self-reconfiguration used to perform only
specific, relatively simple mapping - Distinction between mapping logic and mapped
logic - Unlike self-modifying code
- Precise description of self-reconfiguration
presented
Efficient Metacomputation using
Self-Reconfiguration, FPGA 99
10Uses of Self-Reconfiguration
- Speedup of 4-17 over software for pattern
matching - Speedup of more than 20 over software for Genetic
Programming - Significant area and time improvements for
regular expression matching - Examples constant coefficient multiplier,
pattern matching, graph algorithms
String Matching on Multicontext FPGAs using
Self-Reconfiguration, FPGA 99
Genetic Programming using Self-Reconfigurable
FPGAs, FPL 99
Fast Regular Expression Matching using FPGAs,
FCCM 01
11Constant Coeffcient Multiplier
- One of the multiplier operands is constant
(changes infrequently) - Constant operand can be embedded in multiplier
logic - Reduces multiplier area and latency
- Additional latency to reconfigure multiplier when
constant operand changes - Time to reconfigure multiplier on existing FPGAs
is in the ms-s range - Time to reconfigure multiplier on a
self-reconfigurable device will be in the ns-us
range - Eg Twiddle factors in FFT could be constant
operands - Other constant operand arithmetic operators
- Eg CORDIC, addition
12Self-Reconfigurable DeviceRequired Features
- Configured logic should be able to perform
- Fast, random access of configuration memory
- Fast context switch
0100 0011 1010
13Self-Reconfigurable DeviceRequired Features
Unsuitable architecture
14Existing Reconfigurable Devices
- Commercial FPGAs
- Configuration memory organized as serial shift
registers - Single cycle context switch possible
- Hundreds of cycles for memory access
- Berkeley HSRA
- Few large random access memory blocks
- Single cycle memory access
- Hundreds of cycles for context switch
- No existing device offers single cycle context
switching and memory access
15Outline
- Introduction to Self-Reconfiguration
- SRGA Architecture
- Overview
- Context switch
- Memory Access
- Routing using Self-Reconfiguration
- Implementation Results
- Conclusion
16Self-Reconfigurable Gate Array (SRGA)Architecture
Overview
- Rectangular array of logic blocks
- Local nearest neighbor connections
- Interconnect switches arranged as row and column
trees
- Mesh of trees network
- logic blocks and memory blocks at leaves
- switches at other nodes
17Self-Reconfigurable Gate Array (SRGA)Architecture
Overview
- Key feature Mesh of trees interconnect with
logic blocks and memory blocks at leaves and
identical switches at other nodes
18SRGA ArchitectureMemory Blocks
- SRGA composed of identical logic blocks, memory
blocks and interconnect switches - Memory blocks store configuration bits for the
logic blocks and switches - Memory block operations
- Read out all bits of a configuration context in a
single clock cycle - Read a bit from given address in the first half
of a clock cycle - Write a bit to any address in the second half of
a clock cycle
19Outline
- Introduction to Self-Reconfiguration
- SRGA Architecture
- Overview
- Context switch
- Memory Access
- Routing using Self-Reconfiguration
- Implementation Results
- Conclusion
20Context Switch
- For efficient context switching, the memory block
must be close to the logic block and switches for
which it stores configuration bits - High speed
- Low area
N-1 switches
- Memory block close to logic block
- Ownership relation
- PE owns switch preceding it in in-order tree
traversal - PE owns two switches, in general
N PEs
21Context Switch
- Memory block close to switches owned by PE
- Thus context switch possible in one clock cycle
PE (logic block, memory block)
PE and owned switches
N-1 switches
N PEs
22Context Switch
- First half of the clock cycle
- Configuration bits are read out and applied to
logic blocks and switches
- Second half of clock cycle
- Flip-flop content for old context is saved
(requires memory write) - Flip-flop content for new context is restored
- Address of new context is stored
23Context Switch
- All of above is done in less than 10 ns
- Efficient context switching due to mesh of trees
interconnect with logic blocks and memory blocks
at leaves and switches at the other nodes
24Context SwitchAdditional Features
- Each PE can be separately enabled or disabled
from switching context - Done by setting or resetting the CSMR (Context
Switch Mask Register) bit for the PE
- No other multicontext device seems to have this
single clock cycle partial context switch
ability
25Outline
- Introduction to Self-Reconfiguration
- SRGA Architecture
- Overview
- Context switch
- Memory Access
- Routing using Self-Reconfiguration
- Implementation Results
- Conclusion
26Random Memory Access
- A memory access operation transfers data between
rows of PEs - Upto N bits (N PEs per row) can be transferred in
a single operation, each PE contributing 1 bit - Data source
- Row of logic blocks
- Row of memory blocks
- Data destination
- Row of logic blocks
- Row of memory blocks
- Memory accesses can also be performed along
columns - All memory access operations complete in one
clock cycle
27Data Addressing
- Address register selects one bit in each memory
block - Source register and destination register select
source row and destination row respectively - Source row
- Logic blocks SR
- Memory blocks SR and AR
- Destination row
- Logic blocks DR
- Memory blocks DR and AR
- Above registers can be accessed by configured
logic
28Data Transfer
- First half of clock cycle
- Data read from logic blocks or memory blocks
- Second half of clock cycle
- Data transferred over column trees
- Data written to logic blocks or memory blocks
- Data can be written to multiple rows
29Random Memory Access
- Efficient memory access due to
- Mesh arrangement of memory blocks
- Use of column (row) trees for data transfer
- Mesh of trees interconnect with logic blocks and
memory blocks at leaves and identical switches at
other nodes
30Random Memory AccessAdditional Features
SRR
DRR
- Mask register
- Any subset of N bits can be selected for transfer
1
0
1
0
0
0
0
0
RMR
0
0
0
1
1
0
0
1
31No Chip Damage
Switch Implementation
Switch connections
- Unidirectional wires
- Connections made through multiplexers
- Each wire always driven by a single output
- Bidirectional wires not directly controlled by
configured logic - So chip wont burn no matter how configured
32Outline
- Introduction to Self-Reconfiguration
- SRGA Architecture
- Overview
- Context switch
- Memory Access
- Routing using Self-Reconfiguration
- Implementation Results
- Conclusion
33Self-Reconfiguration Primitives
- Self-Reconfiguration Primitive A logic module
that uses self-reconfiguration to perform a basic
operation - Examples constant coefficient multiplier, FSM
construction, connecting logic blocks - Use of a self-reconfiguration primitive
- User logic writes parameters to specific location
- It switches context to system context
- Self-Reconfiguration primitive reads parameters
- It generates required configuration bits
- It writes bits to appropriate memory locations
- It restores previous context
- User logic modified as required
- Library of such Self-Reconfigurable primitives
would ease user logic design
34Routing using Self-Reconfiguration
- Problem Connect output of a logic block to input
of another logic block in the same row, using
only row tree wires - Example
N-1 switches
N logic blocks
Source
Destination
- Generate bits to configure switches
- Write bits to memory blocks
35Row Routing Algorithm
- Tree to be configured
- Algorithm
- Traverse up from source creating up connections
- Traverse up from destination creating down
connections - Stop when paths meet
3
N-1 switches
1
5
0
2
4
6
Source
Destination
36Row Routing Algorithm
- Configuration bits computed in one clock cycle
- Clock period increases logarithmically with row
size - Simple routing computation using configurable
logic even more efficient than routing on a
microprocessor
37Row Routing Algorithm
- Each logic module is mapped to 4 logic blocks
- The whole tree is mapped to (N-1)x4 logic blocks
- Each node is in the same column that stores
configuration bits of the switch it represents - The computed bits can be written to memory in 3
clock cycles - Conservatively assuming a 100ns clock cycle, row
routing can be done in 400 ns - Orders of magnitude faster than commercial FPGAs
38Extensions to Row Routing
- Column writing can be performed in a similar
manner - Routing between any two logic blocks can be
performed - Column route from (xs, ys) to (xs, yd)
- Row route from (xs, yd) to (xd, yd)
- Smarter routing
- Read configuration memory to check for existing
connections - Invoke multiple row/column routing operations to
route around existing connections
39Routing using Self-ReconfigurationSummary
- Simple routing operations can be performed in a
few clock cycles - Other approaches would require 100-1000 clock
cycles - No data structures required
- Required logic occupies little area
- Less than 4 rows of logic blocks
- More powerful reconfiguration primitives can be
constructed using simpler ones
40Outline
- Introduction to Self-Reconfiguration
- SRGA Architecture
- Overview
- Context switch
- Memory Access
- Routing using Self-Reconfiguration
- Implementation Results
- Conclusion
41Implementation
- Designed the SRGA architecture
- 8 contexts, 80 bits per context, 16-bit LUTs,
32x32 array - Described design at the gate level
- About 50,000 lines of Verilog code
- Used a 0.18um 6 layer process
- Standard cell library for logic block and switch
- Synthesized using the Synopsys Design Compiler
- Full custom design for the memory block
- Layout using Virtuoso and physical verification
using DRACULA
42Timing Estimates
Total time (ns)
Operation performed
Time required (ns)
- Timing estimates 0.25 um process, fanout
considered, wire delay not considered,
inefficient memory block design - Context switch 9.02 ns
- Memory access 8.92 ns
- Min. clock period 10.04 ns
- The implementation validates that context switch
and memory access can be performed in a single
clock cycle
(first half)
(second half)
4.76 5.09 5.78 5.09
4.26 3.83 3.15 3.15
9.02 8.92 8.93 8.24
Context switch Memory read Memory write Memory to
memory
5.78
4.26
10.04
Min. clock cycle
43Area Estimates
- Memory blocks occupy 90 of area
- Reason Non-standard memory block, constructed
using - Latch with tristate output
- Two input AND gate
- With custom design, 20 transistors can be reduced
to 5-7 - Memory size can thus be significantly reduced
Component
Area (um )
2
311 7741 81797 90881 363018 1480095 5925859
Switch Logic block Memory block PE 2x2 array 4x4
array 8x8 array
Component
Transistors
Latch Gate
20 4
Memory cell
24
44Memory Block Problems
- Memory block operations
- read bit
- write bit
- read context (80 bits)
- Non-standard memory cell, constructed using
- Latch with tristate output
- Two input AND gate
- Area per bit 8179 l
- Memory blocks occupy 90 of area
2
45Memory Block Improvements
- Full custom Design
- Optimized memory cell using TSMC 6T SRAM cell
(481 l ) - Optimized decoder area using
- predecoding
- dynamic logic for the column decoders
- Optimized output logic using switch based design
to eliminate output multiplexer
2
46Memory Block Improvements
- Area of improved memory block is 6341
- Area per bit is 1223 (down from 8179 )
- Memory blocks (8 contexts) occupy 50 of total
area - Design flow
- Layout using Virtuoso
- LVS verification with verilog description using
DRACULA - Simulation using Hspice and Spectre
- Memory block latency 1.4 ns
- Demonstrated feasibility of SRGA architecture
47Memory Block Layout
48Outline
- Introduction to Self-Reconfiguration
- SRGA Architecture
- Overview
- Context switch
- Memory Access
- Routing using Self-Reconfiguration
- Implementation Results
- Conclusion
49General Advantages of the SRGA
- SRGA has important advantages over other devices
even when not used for self-reconfiguration - Unified data/configuration memory
- Single cycle random memory access
- Thus no need for separate data memory
- Amount of data/configuration can be changed at
runtime and is not fixed at fabrication time - Mesh of trees architecture
- Efficient layout by cleanly using higher metal
layers - Very important as interconnect occupies a large
part of FPGA area - Sound theoretical basis
50Conclusion
- Self-Reconfiguration is an important and useful
concept - Speeds up important, real world applications
- No existing device suitable for
self-reconfiguration - No existing device can context switch and memory
access in a single clock cycle - SRGA performs Self-Reconfiguration efficiently
- Single cycle context switch and memory access
- Logic blocks and memory blocks at leaves and
switches at other nodes
51Conclusion
- Simple routing can be performed sing
Self-Reconfiguration in a few clock cycles - Demonstrates simplicity and efficiency of
Self-Reconfiguration primitives - Implementation validates architecture claims
- Context switch and memory access in about 10 ns
- Memory area is a problem, but can be handled
using custom design of memory cell - SRGA provides important benefits which are useful
in general as well