Lesson 1 (Part 2) - PowerPoint PPT Presentation

1 / 81
About This Presentation
Title:

Lesson 1 (Part 2)

Description:

and reconfigured by. designers themselves. Two competing implementation approaches. ASIC ... Manufacturing cycle for ASIC is very costly, lengthy and engages ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 82
Provided by: luc107
Category:

less

Transcript and Presenter's Notes

Title: Lesson 1 (Part 2)


1
Lesson 1 (Part 2)
  • FPGA Architectures

2
Two competing implementation approaches
FPGA Field Programmable Gate Array
ASIC Application Specific Integrated Circuit
  • bought off the shelf
  • and reconfigured by
  • designers themselves
  • designs must be sent
  • for expensive and time
  • consuming fabrication
  • in semiconductor foundry
  • no physical layout design
  • design ends with
  • a bitstream used
  • to configure a device
  • designed all the way
  • from behavioral description
  • to physical layout

3
Which Way to Go?
ASICs
FPGAs
Off-the-shelf
High performance
Low development cost
Low power
Short time to market
Low cost in high volumes
Reconfigurability
4
Major FPGA Vendors
  • SRAM-based FPGAs
  • Xilinx, Inc.
  • Altera Corp.
  • Atmel
  • Lattice Semiconductor
  • Flash antifuse FPGAs
  • Actel Corp.
  • Quick Logic Corp.

5
Other FPGA Advantages
  • Manufacturing cycle for ASIC is very costly,
    lengthy and engages lots of manpower
  • Mistakes not detected at design time have large
    impact on development time and cost
  • FPGAs are perfect for rapid prototyping of
    digital circuits
  • Easy upgrades like in case of software
  • Unique applications
  • reconfigurable computing

6
  • We analyze here the basic structures of FPGAs,
    known as fabrics.
  • There are different ways to build an FPGA.
  • The two major styles of FPGAs are SRAM-based and
    antifuse-based FPGAs.
  • The features of I/O pins are fairly similar among
    these two types of FPGAs.

7
Characteristics of FPGA programming Technologies
8
1-bit Static RAM
9
Elements of an FPGA fabric
  • Logic Element (LE) or CLB
  • Interconnect.
  • I/O pins.



IOB
IOB
IOB
LE
LE
LE
interconnect
LE
LE
LE
LE
LE
LE
10
Terminology
  • Configuration bits that determine logic function
    interconnect.
  • CLB combinational logic block logic element
    (LE) Configurable Logic Block (correct meaning)
  • LUT Lookup table SRAM used for truth table.
  • I/O block (IOB) I/O pin associated logic and
    electronics.

11
Fine-,Medium-,and Coarse-grained Architectures
  • Its common to categorize FPGA by analyzing the
    size and complexity of its internal logic
    elements.
  • In a fine-grained architecture, each logic block
    can be used to implement only a very simple
    function. For example,it might be possible to
    configure the block to act as any 3-input
    function,such as
  • a primitive logic gate (AND,OR,NAND,etc),
  • a storage element(DFF,D-Latch,etc)
  • Today fine-grained architectures are being
    replace by medium- and coarse-grained, where each
    logic block contains a relative large amount of
    logic.

12
  • As the granularity of the blocks increases to
    medium-grained and higher, the amount of
    connections into the blocks decreases compared to
    the amount of functionality they can support.
  • Todays FPGAs are devices that have
  • Embedded RAMs
  • Embedded multipliers, adders, and MACs (multiply
    accumulators)
  • Embedded Hard Processor Cores
  • Embedded Clock trees and clock managers

13
MUX- versus LUT-based Logic Blocks
  • There are two fundamental incarnation of the
    programmable logic blocks used to form the
    medium-grained architectures referenced as MUX
    based and LUT based.

14
MUX-based
MUX-based architectures have an advantage when it
comes to implementing control logic along the
lines of if ..else
Quicklogic supports MUX-based architectures
(www.quicklogic.com)
15
LUT-based
LUT-architectures are the leaders in anything to
do with arithmetic processing.
16
LUT versus distributed RAM versus SR
  • The fact that the core of a LUT in a SRAM-based
    device comprises a number of RAM cells offers a
    number of interesting possibilities
  • Configuration as lookup table
  • Configuration as small RAM block
  • This is referred to as distributed RAM because
    (a) the LUTs are strewn (distributed) across the
    surface of the chip, and (b) this differentiates
    it from larger chunks of block RAM.
  • Each LUT may be considered to be multifaceted.

17
A multifacetedLUT.
18
CLBs (Xilinx) and LABs (Altera)
  • The core building block in a modern FPGA from
    Xilinx is called a logic cell (LC).
  • An LC comprises
  • a 4-inputLUTwhich can also acts as a 16 x 1 RAM
    or a 16-bit shift register,
  • a multiplexer,
  • and a register.
  • The equivalent core building block in an FPGA
    from Altera is called a logic element (LE).

19
A simplified view of a Xilinx LC
20
Alteras Logic Element
Each Logic Element (LE) contains the following
A 16-bit SRAM lookup table (LUT) this can
implement an arbitrary 4- input logic function
(as truth table). Circuitry that form fast
carry chain and fast cascade chain (see later).
A D-register that can be by-passed. Various
preset/reset logic for the register.
21
The next step up the hierarchy is what Xilinx
calls a slice
22
Moving one more level up the hierarchy, we come
to what Xilinx calls a configurable logic block
(CLB) and what Altera refers to as a logic array
block (LAB).
A CLB Containing four slices (the number of
slices depends on the FPGA family).
23
Programmable wiring
  • Organized into channels.
  • Many wires per channel.
  • Connections between wires made at programmable
    interconnection points.
  • Must choose
  • Channels from source to destination.
  • Wires within the channels.

24
Programmable interconnection point
Logic elements must be interconnected to
implement complex machines. An SRAM-based FPGA
uses SRAAM to hold the information used to
program the interconnect.
When the transistors gate is high, the
transistor conducts and connects the two wires.
D
Q
MOS Transistor
An interconnection point controlled by an SRAM
cell
25
Programmable wiring paths
26
Choosing a path
LE
LE
27
Routing problems
  • Global routing
  • Which combination of channels?
  • Local routing
  • Which wire in each channel?
  • Routing metrics
  • Net length.
  • Delay.

28
I/O
  • Fundamental selection input, output,
    three-state?
  • Additional features
  • Register.
  • Voltage levels.
  • Slew rate.

29
Configuration
  • Must set control bits for
  • LE.
  • Interconnect.
  • I/O blocks.
  • Usually configured off-line.
  • Separate burn-in step (antifuse).
  • At power-up (SRAM).

30
Configuration vs. programming
  • FPGA configuration
  • Bits stay at the device they program.
  • A configuration bit controls a switch or a logic
    bit.
  • CPU programming
  • Instructions are fetched from a memory.
  • Instructions select complex operations.

CPU
memory
add r1, r2
IR
add r1, r2
31
Xilinx
  • Primary products FPGAs and the associated CAD
    software
  • Main headquarters in San Jose, CA
  • Fabless Semiconductor and Software Company
  • UMC (Taiwan) Xilinx acquired an equity stake in
    UMC in 1996
  • Seiko Epson (Japan)
  • TSMC (Taiwan)

ISE Alliance and Foundation Series Design
Software
32
Xilinx FPGA Families
  • Old families
  • XC3000, XC4000, XC5200
  • Old 0.5µm, 0.35µm and 0.25µm technology. Not
    recommended for modern designs.
  • High-performance families
  • Virtex (0.22µm)
  • Virtex-E, Virtex-EM (0.18µm)
  • Virtex-II, Virtex-II PRO (0.13µm)
  • Low Cost Family
  • Spartan/XL derived from XC4000
  • Spartan-II derived from Virtex
  • Spartan-IIE derived from Virtex-E
  • Spartan-3

33
(Adapted from EE449,George Mason University)
34
Basic Spartan-II FPGA Block Diagram
35
CLB Structure
  • Each slice has 2 LUT-FF pairs with associated
    carry logic
  • Two 3-state buffers (BUFT) associated with each
    CLB, accessible by all CLB outputs

36
CLB Slice Structure
  • Each slice contains two sets of the following
  • Four-input LUT
  • Any 4-input logic function,
  • or 16-bit x 1 sync RAM
  • or 16-bit shift register
  • Carry Control
  • Fast arithmetic logic
  • Multiplier logic
  • Multiplexer logic
  • Storage element
  • Latch or flip-flop
  • Set and reset
  • True or inverted inputs
  • Sync. or async. control

37
Distributed RAM
  • CLB LUT configurable as Distributed RAM
  • A LUT equals 16x1 RAM
  • Implements Single and Dual-Ports
  • Cascade LUTs to increase RAM size
  • Synchronous write
  • Synchronous/Asynchronous read
  • Accompanying flip-flops used for synchronous read

38
Shift Register
  • Each LUT can be configured as shift register
  • Serial in, serial out
  • Dynamically addressable delay up to 16 cycles
  • For programmable pipeline
  • Cascade for greater cycle delays
  • Use CLB flip-flops to add depth

39
Shift Register
  • Register-rich FPGA
  • Allows for addition of pipeline stages to
    increase throughput
  • Data paths must be balanced to keep desired
    functionality

40
Carry Control Logic
COUT
YB
Look-Up Table
Carry Control Logic
Y
G4 G3 G2 G1
S
D
Q
O
CK
EC
R
F5IN
BY SR
XB
Look-Up Table
Carry Control Logic
X
S
F4 F3 F2 F1
D
Q
O
CK
EC
R
CIN CLK CE
SLICE
41
Fast Carry Logic
  • Each CLB contains separate logic and routing for
    the fast generation of sum carry signals
  • Increases efficiency and performance of adders,
    subtractors, accumulators, comparators, and
    counters
  • Carry logic is independent of normal logic and
    routing resources

MSB
Carry Logic Routing
LSB
42
Accessing Carry Logic
  • All major synthesis tools can infer carry logic
    for arithmetic functions
  • Addition (SUM lt A B)
  • Subtraction (DIFF lt A - B)
  • Comparators (if A lt B then)
  • Counters (count lt count 1)

43
Block RAM
  • Most efficient memory implementation
  • Dedicated blocks of memory
  • Ideal for most memory requirements
  • 4 to 14 memory blocks
  • 4096 bits per blocks
  • Use multiple blocks for larger memories
  • Builds both single and true dual-port RAMs

44
Spartan-II Block RAM Amounts
45
Block RAM Port Aspect Ratios
1k x 4
2k x 2
4k x 1
512 x 8
256 x 16
46
Basic I/O Block Structure
D
Q
Three-State
EC
FF Enable
Three-StateControl
Clock
SR
Set/Reset
D
Q
Output
EC
FF Enable
Output Path
SR
Direct Input
FF Enable
Input Path
D
Q
Registered Input
EC
SR
47
IOB Functionality
  • IOB provides interface between the package pins
    and CLBs
  • Each IOB can work as uni- or bi-directional I/O
  • Outputs can be forced into High Impedance
  • Inputs and outputs can be registered
  • advised for high-performance I/O
  • Inputs can be delayed

48
Routing Resources
49
Spartan-II FPGA Family Members
50
(No Transcript)
51
Virtex-II 1.5V Architecture
Block RAMs
Block RAMs
Block RAMs
Block RAMs
I/O Block
Configurable Logic Block
Multipliers 18 x 18
Multipliers 18 x 18
Multipliers 18 x 18
Multipliers 18 x 18
52
Virtex-II 1.5V
Device CLB Array Slices Maximum I/O BlockRAM (18kb) Multiplier Blocks Distributed RAM bits
XC2V40 8x8 256 88 4 4 8,192
XC2V80 16x8 512 120 8 8 16,384
XC2V250 24x16 1,536 200 24 24 49,152
XC2V500 32x24 3,072 264 32 32 98,304
XC2V1000 40x32 5,120 432 40 40 163,840
XC2V1500 48x40 7,680 528 48 48 245,760
XC2V2000 56x48 10,752 624 56 56 344,064
XC2V3000 64x56 14,336 720 96 96 458,752
XC2V4000 80x72 23,040 912 120 120 737,280
XC2V6000 96x88 33,792 1,104 144 144 1,081,344
XC2V8000 112x104 46,592 1,108 168 168 1,490,944
53
Virtex-II Block SelectRAM
  • Virtex-II BRAM is 18 kbits
  • Additional parity bits available in selected
    configurations

Width Depth Address Data Parity
1 16,386 130 0 N/A
2 8,192 120 10 N/A
4 4,096 110 30 N/A
9 2,048 100 70 0
18 1,024 90 150 10
36 512 80 310 30
54
FPGA Nomenclature
55
Design Methods and Tools
56
Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be
able to perform an encryption algorithm by
itself, executing 32 rounds..
Specification (Lab Experiments)
VHDL description (Your Source Files)
Library IEEE use ieee.std_logic_1164.all use
ieee.std_logic_unsigned.all entity RC5_core is
port( clock, reset,
encr_decr in std_logic
data_input in std_logic_vector(31 downto 0)
data_output out std_logic_vector(31
downto 0) out_full in
std_logic key_input in
std_logic_vector(31 downto 0)
key_read out std_logic ) end
AES_core
Functional simulation
Post-synthesis simulation
Synthesis
57
Design process (2)
Implementation
Timing simulation
Configuration
On chip testing
58
Design Process control from Active-HDL
59
Simulation Tools
  • Many others

60
(No Transcript)
61
(No Transcript)
62
Synthesis Tools
  • and others

63
Logic Synthesis
VHDL description
Circuit netlist
architecture MLU_DATAFLOW of MLU is signal
A1STD_LOGIC signal B1STD_LOGIC signal
Y1STD_LOGIC signal MUX_0, MUX_1, MUX_2, MUX_3
STD_LOGIC begin A1ltA when (NEG_A'0')
else not A B1ltB when (NEG_B'0') else not
B YltY1 when (NEG_Y'0') else not
Y1 MUX_0ltA1 and B1 MUX_1ltA1 or
B1 MUX_2ltA1 xor B1 MUX_3ltA1 xnor
B1 with (L1 L0) select Y1ltMUX_0 when
"00", MUX_1 when "01", MUX_2 when
"10", MUX_3 when others end MLU_DATAFLOW
64
Features of synthesis tools
  • Interpret RTL code
  • Produce synthesized circuit netlist in a standard
    EDIF format
  • Give preliminary performance estimates
  • Some can display circuit schematics corresponding
    to EDIF netlist

65
Implementation
  • After synthesis the entire implementation process
    is performed by FPGA vendor tools

66
(No Transcript)
67
Translation
Synthesis
Circuit netlist
Timing Constraints
Constraint Editor
Native Constraint File
Electronic Design Interchange Format
EDIF
UCF
NCF
User Constraint File
Translation
Native Generic Database file
NGD
68
Sample UCF File
  • Constraints generated by Synplify Pro 7.3.3,
    Build 039R
  • Period Constraints
  • Begin clock constraints
  • End clock constraints
  • Output Constraints
  • Input Constraints
  • Location Constraints
  • End of generated constraints
  • NET "clock" LOC "P88"
  • NET "control(0)" LOC "P50"
  • NET "control(1)" LOC "P48"
  • NET "control(2)" LOC "P42"
  • NET "reset" LOC "P93"
  • NET "segments(0)" LOC "P67"
  • NET "segments(1)" LOC "P39"
  • NET "segments(2)" LOC "P62"
  • NET "segments(3)" LOC "P60"

69
Pin Assignment
FPGA
70
Parallel Port Interface
71
Constraints Editor
72
Circuit netlist
73
Mapping
LUT4
LUT1
FF1
LUT5
LUT2
FF2
LUT3
74
Placing
FPGA
CLB SLICES
75
Routing
FPGA
Programmable Connections
76
Static Timing Analyzer
  • Performs static analysis of the circuit
    performance
  • Reports critical paths with all sources of delays
  • Determines maximum clock frequency

77
Static Timing Analysis
  • Critical Path The Longest Path From Outputs of
    Registers to Inputs of Registers

78
Static Timing Analysis
  • Min. Clock Period Length of The Critical Path
  • Max. Clock Frequency 1 / Min. Clock Period

79
Configuration
  • Once a design is implemented, you must create a
    file that the FPGA can understand
  • This file is called a bit stream a BIT file
    (.bit extension)
  • The BIT file can be downloaded directly to the
    FPGA, or can be converted into a PROM file which
    stores the programming information

80
Resources Required Reading
Spartan FPGA devices
  • Xilinx Spartan-II 2.5V FPGA Family
  • Complete Data Sheet
  • Module 1 Introduction Ordering Information
  • Module 2 Functional Description
  • http//direct.xilinx.com/bvdocs/publications/ds001
    .pdf

81
Resources Required Reading
FPGA Tools
Integrated Interfaces Active-HDL with
Synplify http//www.aldec.com/Previews/active_sy
nplify.htm Integrated Synthesis and
Implementation http//www.aldec.com/Previews/synth
esis_implementation.htm
Write a Comment
User Comments (0)
About PowerShow.com