A SelfReconfigurable Gate Array Architecture - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

A SelfReconfigurable Gate Array Architecture

Description:

Additional latency to reconfigure multiplier when 'constant' operand changes. Time to reconfigure multiplier on existing FPGAs is in the ms-s range ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 52
Provided by: sc7756
Category:

less

Transcript and Presenter's Notes

Title: A SelfReconfigurable Gate Array Architecture


1
A Self-Reconfigurable Gate Array Architecture
  • Reetinder Sidhu

Dept. of EE-Systems University of Southern
California
2
Outline
  • Introduction to Self-Reconfiguration
  • SRGA Architecture
  • Overview
  • Context switch
  • Memory Access
  • Routing using Self-Reconfiguration
  • Implementation Results
  • Conclusion

3
Outline
  • Introduction to Self-Reconfiguration
  • SRGA Architecture
  • Overview
  • Context switch
  • Memory Access
  • Routing using Self-Reconfiguration
  • Implementation Results
  • Conclusion

4
Motivation
  • Use of reconfigurable devices (mainly FPGAs) for
    general purpose computation
  • Embedded applications are a major area
  • Key advantage reconfigurability
  • Adapt configured logic to suit application
    requirements
  • Reconfigurable devices are not reconfigurable
    enough
  • Reconfiguration time is high
  • Mapping time is high
  • Thus runtime reconfiguration takes a long time
  • Self-Reconfiguration can solve above problems
  • What is Self-Reconfiguration?

5
Configuration Context
  • Configuration context A set of configuration
    bits that completely configures a device

6
Conventional FPGA
  • Configuration context A set of configuration
    bits that completely configures a device

7
Self-Reconfigurable DeviceBasic Features
  • Configuration memory to store two or more
    contexts
  • Configurable logic should be able to
  • switch contexts,
  • access configuration memory

Context 1
Context 2
Configurable logic
8
Self-Reconfiguration Example
  • KMP pattern matching algorithm
  • Construct FSM (Finite State Machine) based on
    pattern
  • Use FSM to efficiently search text

0100 0011 1010
  • Configuration bits are generated at runtime

9
Advantages of Self-Reconfiguration
  • Reconfiguration time is reduced
  • Mapping time is reduced
  • Thus runtime reconfiguration is fast
  • ns-us range
  • Self-reconfiguration used to perform only
    specific, relatively simple mapping
  • Distinction between mapping logic and mapped
    logic
  • Unlike self-modifying code
  • Precise description of self-reconfiguration
    presented

Efficient Metacomputation using
Self-Reconfiguration, FPGA 99
10
Uses of Self-Reconfiguration
  • Speedup of 4-17 over software for pattern
    matching
  • Speedup of more than 20 over software for Genetic
    Programming
  • Significant area and time improvements for
    regular expression matching
  • Examples constant coefficient multiplier,
    pattern matching, graph algorithms

String Matching on Multicontext FPGAs using
Self-Reconfiguration, FPGA 99
Genetic Programming using Self-Reconfigurable
FPGAs, FPL 99
Fast Regular Expression Matching using FPGAs,
FCCM 01
11
Constant Coeffcient Multiplier
  • One of the multiplier operands is constant
    (changes infrequently)
  • Constant operand can be embedded in multiplier
    logic
  • Reduces multiplier area and latency
  • Additional latency to reconfigure multiplier when
    constant operand changes
  • Time to reconfigure multiplier on existing FPGAs
    is in the ms-s range
  • Time to reconfigure multiplier on a
    self-reconfigurable device will be in the ns-us
    range
  • Eg Twiddle factors in FFT could be constant
    operands
  • Other constant operand arithmetic operators
  • Eg CORDIC, addition

12
Self-Reconfigurable DeviceRequired Features
  • Configured logic should be able to perform
  • Fast, random access of configuration memory
  • Fast context switch

0100 0011 1010
13
Self-Reconfigurable DeviceRequired Features
Unsuitable architecture
14
Existing Reconfigurable Devices
  • Commercial FPGAs
  • Configuration memory organized as serial shift
    registers
  • Single cycle context switch possible
  • Hundreds of cycles for memory access
  • Berkeley HSRA
  • Few large random access memory blocks
  • Single cycle memory access
  • Hundreds of cycles for context switch
  • No existing device offers single cycle context
    switching and memory access

15
Outline
  • Introduction to Self-Reconfiguration
  • SRGA Architecture
  • Overview
  • Context switch
  • Memory Access
  • Routing using Self-Reconfiguration
  • Implementation Results
  • Conclusion

16
Self-Reconfigurable Gate Array (SRGA)Architecture
Overview
  • Rectangular array of logic blocks
  • Memory Blocks
  • Local nearest neighbor connections
  • Interconnect switches arranged as row and column
    trees
  • Mesh of trees network
  • logic blocks and memory blocks at leaves
  • switches at other nodes

17
Self-Reconfigurable Gate Array (SRGA)Architecture
Overview
  • Key feature Mesh of trees interconnect with
    logic blocks and memory blocks at leaves and
    identical switches at other nodes

18
SRGA ArchitectureMemory Blocks
  • SRGA composed of identical logic blocks, memory
    blocks and interconnect switches
  • Memory blocks store configuration bits for the
    logic blocks and switches
  • Memory block operations
  • Read out all bits of a configuration context in a
    single clock cycle
  • Read a bit from given address in the first half
    of a clock cycle
  • Write a bit to any address in the second half of
    a clock cycle

19
Outline
  • Introduction to Self-Reconfiguration
  • SRGA Architecture
  • Overview
  • Context switch
  • Memory Access
  • Routing using Self-Reconfiguration
  • Implementation Results
  • Conclusion

20
Context Switch
  • For efficient context switching, the memory block
    must be close to the logic block and switches for
    which it stores configuration bits
  • High speed
  • Low area

N-1 switches
  • Memory block close to logic block
  • Ownership relation
  • PE owns switch preceding it in in-order tree
    traversal
  • PE owns two switches, in general

N PEs
21
Context Switch
  • Memory block close to switches owned by PE
  • Thus context switch possible in one clock cycle

PE (logic block, memory block)
PE and owned switches
N-1 switches
N PEs
22
Context Switch
  • First half of the clock cycle
  • Configuration bits are read out and applied to
    logic blocks and switches
  • Second half of clock cycle
  • Flip-flop content for old context is saved
    (requires memory write)
  • Flip-flop content for new context is restored
  • Address of new context is stored

23
Context Switch
  • All of above is done in less than 10 ns
  • Efficient context switching due to mesh of trees
    interconnect with logic blocks and memory blocks
    at leaves and switches at the other nodes

24
Context SwitchAdditional Features
  • Each PE can be separately enabled or disabled
    from switching context
  • Done by setting or resetting the CSMR (Context
    Switch Mask Register) bit for the PE
  • No other multicontext device seems to have this
    single clock cycle partial context switch
    ability

25
Outline
  • Introduction to Self-Reconfiguration
  • SRGA Architecture
  • Overview
  • Context switch
  • Memory Access
  • Routing using Self-Reconfiguration
  • Implementation Results
  • Conclusion

26
Random Memory Access
  • A memory access operation transfers data between
    rows of PEs
  • Upto N bits (N PEs per row) can be transferred in
    a single operation, each PE contributing 1 bit
  • Data source
  • Row of logic blocks
  • Row of memory blocks
  • Data destination
  • Row of logic blocks
  • Row of memory blocks
  • Memory accesses can also be performed along
    columns
  • All memory access operations complete in one
    clock cycle

27
Data Addressing
  • Address register selects one bit in each memory
    block
  • Source register and destination register select
    source row and destination row respectively
  • Source row
  • Logic blocks SR
  • Memory blocks SR and AR
  • Destination row
  • Logic blocks DR
  • Memory blocks DR and AR
  • Above registers can be accessed by configured
    logic

28
Data Transfer
  • First half of clock cycle
  • Data read from logic blocks or memory blocks
  • Second half of clock cycle
  • Data transferred over column trees
  • Data written to logic blocks or memory blocks
  • Data can be written to multiple rows

29
Random Memory Access
  • Efficient memory access due to
  • Mesh arrangement of memory blocks
  • Use of column (row) trees for data transfer
  • Mesh of trees interconnect with logic blocks and
    memory blocks at leaves and identical switches at
    other nodes

30
Random Memory AccessAdditional Features
SRR
DRR
  • Mask register
  • Any subset of N bits can be selected for transfer

1
0
1
0
0
0
0
0
RMR
0
0
0
1
1
0
0
1
31
No Chip Damage
Switch Implementation
Switch connections
  • Unidirectional wires
  • Connections made through multiplexers
  • Each wire always driven by a single output
  • Bidirectional wires not directly controlled by
    configured logic
  • So chip wont burn no matter how configured

32
Outline
  • Introduction to Self-Reconfiguration
  • SRGA Architecture
  • Overview
  • Context switch
  • Memory Access
  • Routing using Self-Reconfiguration
  • Implementation Results
  • Conclusion

33
Self-Reconfiguration Primitives
  • Self-Reconfiguration Primitive A logic module
    that uses self-reconfiguration to perform a basic
    operation
  • Examples constant coefficient multiplier, FSM
    construction, connecting logic blocks
  • Use of a self-reconfiguration primitive
  • User logic writes parameters to specific location
  • It switches context to system context
  • Self-Reconfiguration primitive reads parameters
  • It generates required configuration bits
  • It writes bits to appropriate memory locations
  • It restores previous context
  • User logic modified as required
  • Library of such Self-Reconfigurable primitives
    would ease user logic design

34
Routing using Self-Reconfiguration
  • Problem Connect output of a logic block to input
    of another logic block in the same row, using
    only row tree wires
  • Example

N-1 switches
N logic blocks
Source
Destination
  • Generate bits to configure switches
  • Write bits to memory blocks

35
Row Routing Algorithm
  • Tree to be configured
  • Algorithm
  • Traverse up from source creating up connections
  • Traverse up from destination creating down
    connections
  • Stop when paths meet

3
N-1 switches
1
5
0
2
4
6
Source
Destination
36
Row Routing Algorithm
  • Configuration bits computed in one clock cycle
  • Clock period increases logarithmically with row
    size
  • Simple routing computation using configurable
    logic even more efficient than routing on a
    microprocessor

37
Row Routing Algorithm
  • Each logic module is mapped to 4 logic blocks
  • The whole tree is mapped to (N-1)x4 logic blocks
  • Each node is in the same column that stores
    configuration bits of the switch it represents
  • The computed bits can be written to memory in 3
    clock cycles
  • Conservatively assuming a 100ns clock cycle, row
    routing can be done in 400 ns
  • Orders of magnitude faster than commercial FPGAs

38
Extensions to Row Routing
  • Column writing can be performed in a similar
    manner
  • Routing between any two logic blocks can be
    performed
  • Column route from (xs, ys) to (xs, yd)
  • Row route from (xs, yd) to (xd, yd)
  • Smarter routing
  • Read configuration memory to check for existing
    connections
  • Invoke multiple row/column routing operations to
    route around existing connections

39
Routing using Self-ReconfigurationSummary
  • Simple routing operations can be performed in a
    few clock cycles
  • Other approaches would require 100-1000 clock
    cycles
  • No data structures required
  • Required logic occupies little area
  • Less than 4 rows of logic blocks
  • More powerful reconfiguration primitives can be
    constructed using simpler ones

40
Outline
  • Introduction to Self-Reconfiguration
  • SRGA Architecture
  • Overview
  • Context switch
  • Memory Access
  • Routing using Self-Reconfiguration
  • Implementation Results
  • Conclusion

41
Implementation
  • Designed the SRGA architecture
  • 8 contexts, 80 bits per context, 16-bit LUTs,
    32x32 array
  • Described design at the gate level
  • About 50,000 lines of Verilog code
  • Used a 0.18um 6 layer process
  • Standard cell library for logic block and switch
  • Synthesized using the Synopsys Design Compiler
  • Full custom design for the memory block
  • Layout using Virtuoso and physical verification
    using DRACULA

42
Timing Estimates
Total time (ns)
Operation performed
Time required (ns)
  • Timing estimates 0.25 um process, fanout
    considered, wire delay not considered,
    inefficient memory block design
  • Context switch 9.02 ns
  • Memory access 8.92 ns
  • Min. clock period 10.04 ns
  • The implementation validates that context switch
    and memory access can be performed in a single
    clock cycle

(first half)
(second half)
4.76 5.09 5.78 5.09
4.26 3.83 3.15 3.15
9.02 8.92 8.93 8.24
Context switch Memory read Memory write Memory to
memory
5.78
4.26
10.04
Min. clock cycle
43
Area Estimates
  • Memory blocks occupy 90 of area
  • Reason Non-standard memory block, constructed
    using
  • Latch with tristate output
  • Two input AND gate
  • With custom design, 20 transistors can be reduced
    to 5-7
  • Memory size can thus be significantly reduced

Component
Area (um )
2
311 7741 81797 90881 363018 1480095 5925859
Switch Logic block Memory block PE 2x2 array 4x4
array 8x8 array
Component
Transistors
Latch Gate
20 4
Memory cell
24
44
Memory Block Problems
  • Memory block operations
  • read bit
  • write bit
  • read context (80 bits)
  • Non-standard memory cell, constructed using
  • Latch with tristate output
  • Two input AND gate
  • Area per bit 8179 l
  • Memory blocks occupy 90 of area

2
45
Memory Block Improvements
  • Full custom Design
  • Optimized memory cell using TSMC 6T SRAM cell
    (481 l )
  • Optimized decoder area using
  • predecoding
  • dynamic logic for the column decoders
  • Optimized output logic using switch based design
    to eliminate output multiplexer

2
46
Memory Block Improvements
  • Area of improved memory block is 6341
  • Area per bit is 1223 (down from 8179 )
  • Memory blocks (8 contexts) occupy 50 of total
    area
  • Design flow
  • Layout using Virtuoso
  • LVS verification with verilog description using
    DRACULA
  • Simulation using Hspice and Spectre
  • Memory block latency 1.4 ns
  • Demonstrated feasibility of SRGA architecture

47
Memory Block Layout
48
Outline
  • Introduction to Self-Reconfiguration
  • SRGA Architecture
  • Overview
  • Context switch
  • Memory Access
  • Routing using Self-Reconfiguration
  • Implementation Results
  • Conclusion

49
General Advantages of the SRGA
  • SRGA has important advantages over other devices
    even when not used for self-reconfiguration
  • Unified data/configuration memory
  • Single cycle random memory access
  • Thus no need for separate data memory
  • Amount of data/configuration can be changed at
    runtime and is not fixed at fabrication time
  • Mesh of trees architecture
  • Efficient layout by cleanly using higher metal
    layers
  • Very important as interconnect occupies a large
    part of FPGA area
  • Sound theoretical basis

50
Conclusion
  • Self-Reconfiguration is an important and useful
    concept
  • Speeds up important, real world applications
  • No existing device suitable for
    self-reconfiguration
  • No existing device can context switch and memory
    access in a single clock cycle
  • SRGA performs Self-Reconfiguration efficiently
  • Single cycle context switch and memory access
  • Logic blocks and memory blocks at leaves and
    switches at other nodes

51
Conclusion
  • Simple routing can be performed sing
    Self-Reconfiguration in a few clock cycles
  • Demonstrates simplicity and efficiency of
    Self-Reconfiguration primitives
  • Implementation validates architecture claims
  • Context switch and memory access in about 10 ns
  • Memory area is a problem, but can be handled
    using custom design of memory cell
  • SRGA provides important benefits which are useful
    in general as well
Write a Comment
User Comments (0)
About PowerShow.com