Title: Lesson 1 (Part 2)
1Lesson 1 (Part 2)
2Two competing implementation approaches
FPGA Field Programmable Gate Array
ASIC Application Specific Integrated Circuit
- bought off the shelf
- and reconfigured by
- designers themselves
- designs must be sent
- for expensive and time
- consuming fabrication
- in semiconductor foundry
- no physical layout design
- design ends with
- a bitstream used
- to configure a device
- designed all the way
- from behavioral description
- to physical layout
3Which Way to Go?
ASICs
FPGAs
Off-the-shelf
High performance
Low development cost
Low power
Short time to market
Low cost in high volumes
Reconfigurability
4Major FPGA Vendors
- SRAM-based FPGAs
- Xilinx, Inc.
- Altera Corp.
- Atmel
- Lattice Semiconductor
- Flash antifuse FPGAs
- Actel Corp.
- Quick Logic Corp.
5Other FPGA Advantages
- Manufacturing cycle for ASIC is very costly,
lengthy and engages lots of manpower - Mistakes not detected at design time have large
impact on development time and cost - FPGAs are perfect for rapid prototyping of
digital circuits - Easy upgrades like in case of software
- Unique applications
- reconfigurable computing
6- We analyze here the basic structures of FPGAs,
known as fabrics. - There are different ways to build an FPGA.
- The two major styles of FPGAs are SRAM-based and
antifuse-based FPGAs. - The features of I/O pins are fairly similar among
these two types of FPGAs.
7Characteristics of FPGA programming Technologies
81-bit Static RAM
9Elements of an FPGA fabric
- Logic Element (LE) or CLB
- Interconnect.
- I/O pins.
IOB
IOB
IOB
LE
LE
LE
interconnect
LE
LE
LE
LE
LE
LE
10Terminology
- Configuration bits that determine logic function
interconnect. - CLB combinational logic block logic element
(LE) Configurable Logic Block (correct meaning) - LUT Lookup table SRAM used for truth table.
- I/O block (IOB) I/O pin associated logic and
electronics.
11Fine-,Medium-,and Coarse-grained Architectures
- Its common to categorize FPGA by analyzing the
size and complexity of its internal logic
elements. - In a fine-grained architecture, each logic block
can be used to implement only a very simple
function. For example,it might be possible to
configure the block to act as any 3-input
function,such as - a primitive logic gate (AND,OR,NAND,etc),
- a storage element(DFF,D-Latch,etc)
- Today fine-grained architectures are being
replace by medium- and coarse-grained, where each
logic block contains a relative large amount of
logic.
12- As the granularity of the blocks increases to
medium-grained and higher, the amount of
connections into the blocks decreases compared to
the amount of functionality they can support. - Todays FPGAs are devices that have
- Embedded RAMs
- Embedded multipliers, adders, and MACs (multiply
accumulators) - Embedded Hard Processor Cores
- Embedded Clock trees and clock managers
13MUX- versus LUT-based Logic Blocks
- There are two fundamental incarnation of the
programmable logic blocks used to form the
medium-grained architectures referenced as MUX
based and LUT based.
14MUX-based
MUX-based architectures have an advantage when it
comes to implementing control logic along the
lines of if ..else
Quicklogic supports MUX-based architectures
(www.quicklogic.com)
15LUT-based
LUT-architectures are the leaders in anything to
do with arithmetic processing.
16LUT versus distributed RAM versus SR
- The fact that the core of a LUT in a SRAM-based
device comprises a number of RAM cells offers a
number of interesting possibilities - Configuration as lookup table
- Configuration as small RAM block
- This is referred to as distributed RAM because
(a) the LUTs are strewn (distributed) across the
surface of the chip, and (b) this differentiates
it from larger chunks of block RAM. - Each LUT may be considered to be multifaceted.
17A multifacetedLUT.
18CLBs (Xilinx) and LABs (Altera)
- The core building block in a modern FPGA from
Xilinx is called a logic cell (LC). - An LC comprises
- a 4-inputLUTwhich can also acts as a 16 x 1 RAM
or a 16-bit shift register, - a multiplexer,
- and a register.
- The equivalent core building block in an FPGA
from Altera is called a logic element (LE).
19A simplified view of a Xilinx LC
20Alteras Logic Element
Each Logic Element (LE) contains the following
A 16-bit SRAM lookup table (LUT) this can
implement an arbitrary 4- input logic function
(as truth table). Circuitry that form fast
carry chain and fast cascade chain (see later).
A D-register that can be by-passed. Various
preset/reset logic for the register.
21The next step up the hierarchy is what Xilinx
calls a slice
22Moving one more level up the hierarchy, we come
to what Xilinx calls a configurable logic block
(CLB) and what Altera refers to as a logic array
block (LAB).
A CLB Containing four slices (the number of
slices depends on the FPGA family).
23Programmable wiring
- Organized into channels.
- Many wires per channel.
- Connections between wires made at programmable
interconnection points. - Must choose
- Channels from source to destination.
- Wires within the channels.
24Programmable interconnection point
Logic elements must be interconnected to
implement complex machines. An SRAM-based FPGA
uses SRAAM to hold the information used to
program the interconnect.
When the transistors gate is high, the
transistor conducts and connects the two wires.
D
Q
MOS Transistor
An interconnection point controlled by an SRAM
cell
25Programmable wiring paths
26Choosing a path
LE
LE
27Routing problems
- Global routing
- Which combination of channels?
- Local routing
- Which wire in each channel?
- Routing metrics
- Net length.
- Delay.
28I/O
- Fundamental selection input, output,
three-state? - Additional features
- Register.
- Voltage levels.
- Slew rate.
29Configuration
- Must set control bits for
- LE.
- Interconnect.
- I/O blocks.
- Usually configured off-line.
- Separate burn-in step (antifuse).
- At power-up (SRAM).
30Configuration vs. programming
- FPGA configuration
- Bits stay at the device they program.
- A configuration bit controls a switch or a logic
bit.
- CPU programming
- Instructions are fetched from a memory.
- Instructions select complex operations.
CPU
memory
add r1, r2
IR
add r1, r2
31Xilinx
- Primary products FPGAs and the associated CAD
software - Main headquarters in San Jose, CA
- Fabless Semiconductor and Software Company
- UMC (Taiwan) Xilinx acquired an equity stake in
UMC in 1996 - Seiko Epson (Japan)
- TSMC (Taiwan)
ISE Alliance and Foundation Series Design
Software
32Xilinx FPGA Families
- Old families
- XC3000, XC4000, XC5200
- Old 0.5µm, 0.35µm and 0.25µm technology. Not
recommended for modern designs. - High-performance families
- Virtex (0.22µm)
- Virtex-E, Virtex-EM (0.18µm)
- Virtex-II, Virtex-II PRO (0.13µm)
- Low Cost Family
- Spartan/XL derived from XC4000
- Spartan-II derived from Virtex
- Spartan-IIE derived from Virtex-E
- Spartan-3
33(Adapted from EE449,George Mason University)
34Basic Spartan-II FPGA Block Diagram
35CLB Structure
- Each slice has 2 LUT-FF pairs with associated
carry logic - Two 3-state buffers (BUFT) associated with each
CLB, accessible by all CLB outputs
36CLB Slice Structure
- Each slice contains two sets of the following
- Four-input LUT
- Any 4-input logic function,
- or 16-bit x 1 sync RAM
- or 16-bit shift register
- Carry Control
- Fast arithmetic logic
- Multiplier logic
- Multiplexer logic
- Storage element
- Latch or flip-flop
- Set and reset
- True or inverted inputs
- Sync. or async. control
37Distributed RAM
- CLB LUT configurable as Distributed RAM
- A LUT equals 16x1 RAM
- Implements Single and Dual-Ports
- Cascade LUTs to increase RAM size
- Synchronous write
- Synchronous/Asynchronous read
- Accompanying flip-flops used for synchronous read
38Shift Register
- Each LUT can be configured as shift register
- Serial in, serial out
- Dynamically addressable delay up to 16 cycles
- For programmable pipeline
- Cascade for greater cycle delays
- Use CLB flip-flops to add depth
39Shift Register
- Register-rich FPGA
- Allows for addition of pipeline stages to
increase throughput - Data paths must be balanced to keep desired
functionality
40Carry Control Logic
COUT
YB
Look-Up Table
Carry Control Logic
Y
G4 G3 G2 G1
S
D
Q
O
CK
EC
R
F5IN
BY SR
XB
Look-Up Table
Carry Control Logic
X
S
F4 F3 F2 F1
D
Q
O
CK
EC
R
CIN CLK CE
SLICE
41Fast Carry Logic
- Each CLB contains separate logic and routing for
the fast generation of sum carry signals - Increases efficiency and performance of adders,
subtractors, accumulators, comparators, and
counters - Carry logic is independent of normal logic and
routing resources
MSB
Carry Logic Routing
LSB
42Accessing Carry Logic
- All major synthesis tools can infer carry logic
for arithmetic functions - Addition (SUM lt A B)
- Subtraction (DIFF lt A - B)
- Comparators (if A lt B then)
- Counters (count lt count 1)
43Block RAM
- Most efficient memory implementation
- Dedicated blocks of memory
- Ideal for most memory requirements
- 4 to 14 memory blocks
- 4096 bits per blocks
- Use multiple blocks for larger memories
- Builds both single and true dual-port RAMs
44Spartan-II Block RAM Amounts
45Block RAM Port Aspect Ratios
1k x 4
2k x 2
4k x 1
512 x 8
256 x 16
46Basic I/O Block Structure
D
Q
Three-State
EC
FF Enable
Three-StateControl
Clock
SR
Set/Reset
D
Q
Output
EC
FF Enable
Output Path
SR
Direct Input
FF Enable
Input Path
D
Q
Registered Input
EC
SR
47IOB Functionality
- IOB provides interface between the package pins
and CLBs - Each IOB can work as uni- or bi-directional I/O
- Outputs can be forced into High Impedance
- Inputs and outputs can be registered
- advised for high-performance I/O
- Inputs can be delayed
48Routing Resources
49Spartan-II FPGA Family Members
50(No Transcript)
51Virtex-II 1.5V Architecture
Block RAMs
Block RAMs
Block RAMs
Block RAMs
I/O Block
Configurable Logic Block
Multipliers 18 x 18
Multipliers 18 x 18
Multipliers 18 x 18
Multipliers 18 x 18
52Virtex-II 1.5V
Device CLB Array Slices Maximum I/O BlockRAM (18kb) Multiplier Blocks Distributed RAM bits
XC2V40 8x8 256 88 4 4 8,192
XC2V80 16x8 512 120 8 8 16,384
XC2V250 24x16 1,536 200 24 24 49,152
XC2V500 32x24 3,072 264 32 32 98,304
XC2V1000 40x32 5,120 432 40 40 163,840
XC2V1500 48x40 7,680 528 48 48 245,760
XC2V2000 56x48 10,752 624 56 56 344,064
XC2V3000 64x56 14,336 720 96 96 458,752
XC2V4000 80x72 23,040 912 120 120 737,280
XC2V6000 96x88 33,792 1,104 144 144 1,081,344
XC2V8000 112x104 46,592 1,108 168 168 1,490,944
53Virtex-II Block SelectRAM
- Virtex-II BRAM is 18 kbits
- Additional parity bits available in selected
configurations
Width Depth Address Data Parity
1 16,386 130 0 N/A
2 8,192 120 10 N/A
4 4,096 110 30 N/A
9 2,048 100 70 0
18 1,024 90 150 10
36 512 80 310 30
54FPGA Nomenclature
55Design Methods and Tools
56Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be
able to perform an encryption algorithm by
itself, executing 32 rounds..
Specification (Lab Experiments)
VHDL description (Your Source Files)
Library IEEE use ieee.std_logic_1164.all use
ieee.std_logic_unsigned.all entity RC5_core is
port( clock, reset,
encr_decr in std_logic
data_input in std_logic_vector(31 downto 0)
data_output out std_logic_vector(31
downto 0) out_full in
std_logic key_input in
std_logic_vector(31 downto 0)
key_read out std_logic ) end
AES_core
Functional simulation
Post-synthesis simulation
Synthesis
57Design process (2)
Implementation
Timing simulation
Configuration
On chip testing
58Design Process control from Active-HDL
59Simulation Tools
60(No Transcript)
61(No Transcript)
62Synthesis Tools
63Logic Synthesis
VHDL description
Circuit netlist
architecture MLU_DATAFLOW of MLU is signal
A1STD_LOGIC signal B1STD_LOGIC signal
Y1STD_LOGIC signal MUX_0, MUX_1, MUX_2, MUX_3
STD_LOGIC begin A1ltA when (NEG_A'0')
else not A B1ltB when (NEG_B'0') else not
B YltY1 when (NEG_Y'0') else not
Y1 MUX_0ltA1 and B1 MUX_1ltA1 or
B1 MUX_2ltA1 xor B1 MUX_3ltA1 xnor
B1 with (L1 L0) select Y1ltMUX_0 when
"00", MUX_1 when "01", MUX_2 when
"10", MUX_3 when others end MLU_DATAFLOW
64Features of synthesis tools
- Interpret RTL code
- Produce synthesized circuit netlist in a standard
EDIF format - Give preliminary performance estimates
- Some can display circuit schematics corresponding
to EDIF netlist
65Implementation
- After synthesis the entire implementation process
is performed by FPGA vendor tools
66(No Transcript)
67Translation
Synthesis
Circuit netlist
Timing Constraints
Constraint Editor
Native Constraint File
Electronic Design Interchange Format
EDIF
UCF
NCF
User Constraint File
Translation
Native Generic Database file
NGD
68Sample UCF File
-
- Constraints generated by Synplify Pro 7.3.3,
Build 039R -
- Period Constraints
- Begin clock constraints
- End clock constraints
- Output Constraints
- Input Constraints
- Location Constraints
- End of generated constraints
- NET "clock" LOC "P88"
- NET "control(0)" LOC "P50"
- NET "control(1)" LOC "P48"
- NET "control(2)" LOC "P42"
- NET "reset" LOC "P93"
- NET "segments(0)" LOC "P67"
- NET "segments(1)" LOC "P39"
- NET "segments(2)" LOC "P62"
- NET "segments(3)" LOC "P60"
69Pin Assignment
FPGA
70Parallel Port Interface
71Constraints Editor
72Circuit netlist
73Mapping
LUT4
LUT1
FF1
LUT5
LUT2
FF2
LUT3
74Placing
FPGA
CLB SLICES
75Routing
FPGA
Programmable Connections
76Static Timing Analyzer
- Performs static analysis of the circuit
performance - Reports critical paths with all sources of delays
- Determines maximum clock frequency
77Static Timing Analysis
- Critical Path The Longest Path From Outputs of
Registers to Inputs of Registers
78Static Timing Analysis
- Min. Clock Period Length of The Critical Path
- Max. Clock Frequency 1 / Min. Clock Period
79Configuration
- Once a design is implemented, you must create a
file that the FPGA can understand - This file is called a bit stream a BIT file
(.bit extension) - The BIT file can be downloaded directly to the
FPGA, or can be converted into a PROM file which
stores the programming information
80Resources Required Reading
Spartan FPGA devices
- Xilinx Spartan-II 2.5V FPGA Family
- Complete Data Sheet
- Module 1 Introduction Ordering Information
- Module 2 Functional Description
- http//direct.xilinx.com/bvdocs/publications/ds001
.pdf
81Resources Required Reading
FPGA Tools
Integrated Interfaces Active-HDL with
Synplify http//www.aldec.com/Previews/active_sy
nplify.htm Integrated Synthesis and
Implementation http//www.aldec.com/Previews/synth
esis_implementation.htm