Title: Reconfigurable Computing - FPGA structures
1Reconfigurable Computing -FPGA structures
- John Morris
- Chung-Ang University
- The University of Auckland
Iolanthe at 13 knots on Cockburn Sound, Western
Australia
2FPGA Architectures
- Programmable logic takes many forms
- Originally devices contained 10s of gates and
flip-flops - These early devices were generally called PALs
(Programmable Array Logic) - A typical structure was
- With 10-20 inputs and outputs and 20
flip-flops,they could - implement small state machines and
- replace large amounts of discrete glue logic
FF
ProgrammableAnd-Or array
FF
FF
Inputs (?20)
Outputs (?20)
FF
3Programmable Logic
- Memory should also be included in the class of
programmable logic! - It finds application in LUTs, state machines, ...
- From early UV EPROMs with kbytes,we now have
many styles of memory which retains values when
power is removedand capacities in Mbytes
Memory is an important consideration when
designing reconfigurable systems. FPGA technology
does not provide large amounts of memory and this
can be a constraint - especially if you are
trying to produce a compact,single chip solution
to your problem!
4Modern Programmable Logic
- As technology has evolved, so have programmable
devices - Todays FPGAs contain
- Millions of gates
- Memory
- Support for several I/O protocols - TTL, LVDS,
GTL, - Arithmetic units - adders, multipliers
- Processor cores
5FPGA Architecture
- The core architecture of most modern FPGAs
consists of - Logic blocks
- Interconnection resources
- I/O blocks
6Typical FPGA Architecture
- Logic blocksembedded in asea of
connectionresources - CLB logic blockIOB I/O bufferPSM
programmable switch matrix
This particular arrangement is similar to that in
Xilinx 4000 (and onwards) chips - devices from
other manufacturersare similar in
overall structure
7Logic Blocks
- Combination of
- And-or arrayorLook-Up-Table (LUT)
- Flip-flops
- Multiplexors
- General aim
- Arbitrary boolean function of several variables
- Storage
- Designers try to estimatewhat combination of
resourceswill produce the most
efficientapplication ? circuit mappings
- Xilinx 4000 (and on) CLB
- 3 LUT blocks
- 2 Flip-Flops (Asynch Reset)
- Multiplexors
- Clock / Reset Lines
8Adders
- Adders appear in most designs
- Arithmetic Adders (including subtracters)
- Other arithmetic operators
- eg multipliers, dividers
- Counters (including program counters in
processors) - Incrementors, decrementors, etc
- They also often appear on the critical path
- Adder performance can be crucial for system
performance - Because of their importance, researchers are
still searching for better ways to add! - Adder structures proposed already
- Ripple carry
- Carry select
- Carry skip
- Carry look-ahead
- Manchester
- and several dozen more variants
9Ripple Carry Adder
- The simplest and most well known adder
- How long does it take an n-bit adder to produce a
result? - n x propagation delay( FA (a or b) ? carry )
- We can do better than this - using one of many
known better structures - but
- What are the advantages of a ripple carry adder?
- Small
- Regular
- Fits easily into a 2-D layout!
Very important in packing circuitry into fixed
2-D layout of an FPGA!
10Ripple Carry Adders
- Ripple carry adder performance is limited by
propagation of carries
Connections within a logic block are
fast! Connections between logic blocks are slower
11Fast Carry Logic
- Critical delay
- Transmission of carry out from one logic block to
the next - Solution (most modern FPGAs)
- Fast carry logic
- Special paths between logic blocks used
specifically for carry out - Very fast ripple carry adders!
- More sophisticated adders?
- Carry select
- Uses ripple carry blocks - so can use fast carry
logic - Should be faster for wide datapaths?
- Carry lookahead
- Uses large amounts of logic and multiple logic
blocks - Hard to make it faster for small adders!
12Carry Select Adder
a4-7
b4-7
0
a0-3
cin
cout7
b0-3
n-bit Ripple Carry Adder
sum04-7
cout3
n-bit Ripple Carry Adder
1
b4-7
cout7
n-bit Ripple Carry Adder
sum0-3
sum14-7
Standard n-bit ripple carry adders n any
suitable value
0
1
0
1
Here we build an 8-bit adder from 4-bit blocks
carry
sum4-7
13Carry Select Adder
a4-7
b4-7
0
a0-3
cin
cout7
b0-3
n-bit Ripple Carry Adder
sum04-7
cout3
n-bit Ripple Carry Adder
1
b4-7
cout7
n-bit Ripple Carry Adder
sum0-3
sum14-7
One assumes it will be 0 the other assumes 1
0
1
0
1
carry
sum4-7
14Carry Select Adder
- After 4tpd we will have
- sum0-3 (final sum bits)
- cout3 (from low order block)
- sum04-7
- cout07 (from block assuming 0 cin)
- sum14-7
- cout17 (from block assuming 1 cin)
a4-7
b4-7
0
a0-3
cin
cout7
b0-3
n-bit Ripple Carry Adder
sum04-7
cout3
n-bit Ripple Carry Adder
1
b4-7
cout7
n-bit Ripple Carry Adder
sum0-3
sum14-7
0
1
0
1
carry
sum4-7
15Carry Select Adder
a4-7
b4-7
0
a0-3
cin
cout7
b0-3
n-bit Ripple Carry Adder
Cout3 selects correct sum4-7 and carry out
sum04-7
cout3
n-bit Ripple Carry Adder
1
b4-7
cout7
n-bit Ripple Carry Adder
sum0-3
sum14-7
0
1
0
1
All 8 bits carry are available after 4tpd(FA)
tpd(multiplexor)
carry
sum4-7
16Carry Select Adder
- This scheme can be generalized to any number of
bits - Select a suitable block size (eg 4, 8)
- Replicate all blocks except the first
- One with cin 0
- One with cin 1
- Use final cout from preceding block to select
correct set of outputs for current block
17Fast Adders
- Many other fast adder schemes have been
proposedeg - Carry-skip
- Manchester
- Carry-save
- Carry Look Ahead
- If implementing an adder
- (eg in programmable logic)
- do a little research first!
18Fast Adders
- Challenge What style of adder is fastest / most
compact for any FPGA technology? - Answer is not simple
- For small adders (n lt ?), fast carry logic will
certainly make a simple ripple carry adder
fastest - It will also use the minimum resources - but will
need to be laid out as a column or row - For larger adders ( ? lt n lt ? ), carry select
styles are likely to be best - - They use ripple carry blocks efficiently
- For very large adders ( n gt ? ), a carry look
ahead adder may be faster? - But it will use considerably more resources!
19Exploiting a manufacturers fast carry logic
- To use the Altera fast carry logic, write your
adder like this
LIBRARY ieee USE ieee.std_logic_1164.all LIBRARY
lpm USE lpm.lpm_components.all ENTITY adder
IS PORT ( c_in IN STD_LOGIC a, b
IN STD_LOGIC_VECTOR(15 DOWNTO 0) sum
OUT STD_LOGIC_VECTOR(15 DOWNTO 0) c_out
OUT STD_LOGIC ) END adderlpm ARCHITECTURE
lpm_structure OF adder IS BEGIN instance
lpm_add_sub GENERIC MAP (LPM_WIDTH gt
16) PORT MAP ( cin gt Cin, dataa gt a, datab gt
b, result gt sum, cout gt c_out ) END
lpm_structure
20What about that carry in?
- In an ALU, we usually need to do more than just
add! - Subtractions are common also
- Observe
- c a - b
- is equivalent to
- c a (-b)
- So we can use an adder for subtractions if we can
negate the 2nd operand - Negation in 2s complement arithmetic?
21Adder / Subtractor
- Negation in 2s complement arithmetic?
- Rule
- Complement each bit
- Add 1
- eg
Binary Decimal
0001 1 Complement
1110 Add 1 1111 -1
0110 6 Complement 1001 Add
1 1010 -6
22Adder / Subtractor
- Using an adder
- Complement each bit using an inverter
- Use the carry in to add 1!
a
b
0
1
add/ subtract
cin
FA
carry
c
23Example - Generate
ENTITY adder IS GENERIC ( n INTEGER 16 )
PORT ( c_in IN std_ulogic a, b IN
std_ulogic_vector(n-1 DOWNTO 0) sum OUT
std_ulogic_vector(n-1 DOWNTO 0) c_out
OUT std_ulogic ) END adder ARCHITECTURE
rc_structure OF adder IS SIGNAL c
STD_LOGIC_VECTOR(1 TO n-1) COMPONENT
fulladd PORT ( c_in, x, y IN std_ulogic
s, c_out OUT std_ulogic ) END
COMPONENT BEGIN FA_0 fulladd PORT MAP (
c_ingtc_in, xgta(0), ygtb(0),
sgtsum(0), c_outgtc(1) ) G_1 FOR i
IN 1 TO n-2 GENERATE FA_i fulladd PORT MAP (
c(i), a(i), b(i), sum(i), c(i1) ) END
GENERATE FA_n fulladd PORT MAP
(C(n-1),A(n-1),B(n-1),Sum(n-1),Cout) END
rc_structure
24IEEE 1164 standard logic package
- Bus pull-up and pull-down resistors can be
inserted - Initialise a bus signal to H or L
- 0 or 1 from any driver will override the weak
H or L
SIGNAL not_ready std_logic H
VDD
10k
/ready
IF seek_finished 1 THEN not_ready lt
0 END IF