Introduction to FPGA Technology, Devices and Tools - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Introduction to FPGA Technology, Devices and Tools

Description:

ASICs. Semi-Custom. ASICs. User. Programmable. PLD. FPGA. designs must be sent ... ASICs. FPGAs. Low power. Low cost in. high volumes. Other FPGA Advantages ... – PowerPoint PPT presentation

Number of Views:562
Avg rating:3.0/5.0
Slides: 63
Provided by: pcho3
Category:

less

Transcript and Presenter's Notes

Title: Introduction to FPGA Technology, Devices and Tools


1
Introduction to FPGATechnology, Devices and Tools
2
FPGA Devices Technology
3
World of Integrated Circuits
Full-Custom ASICs
Semi-Custom ASICs
User Programmable
PLD
FPGA
4
FPGA Field Programmable Gate Array
ASIC Application Specific Integrated Circuit
  • designs must be sent
  • for expensive and time
  • consuming fabrication
  • in semiconductor foundry
  • Small development
  • overhead
  • No NRE (non-recurring
  • engineering) costs
  • Quick time to market
  • No minimum quantity
  • order
  • Reprogrammable
  • designed all the way
  • from behavioral description
  • to physical layout

5
How can we make a programmable logic?
  • One time programmable
  • Fuses (destroy internal links with current)
  • Anti-fuses (grow internal links)
  • PROM
  • Reprogrammable
  • EPROM
  • EEPROM
  • Flash
  • SRAM - volatile

6
What is an FPGA?
Configurable Logic Blocks
I/O Blocks
Block RAMs
7
Which Way to Go?
ASICs
FPGAs
Off-the-shelf
High performance
Low development cost
Low power
Short time to market
Low cost in high volumes
Reconfigurability
8
Other FPGA Advantages
  • Manufacturing cycle for ASIC is very costly,
    lengthy and engages lots of manpower
  • Mistakes not detected at design time have large
    impact on development time and cost
  • FPGAs are perfect for rapid prototyping of
    digital circuits
  • Easy upgrades like in case of software
  • Unique applications
  • reconfigurable computing

9
Major FPGA Vendors
  • SRAM-based FPGAs
  • Xilinx, Inc.
  • Altera Corp.
  • Atmel
  • Lattice Semiconductor
  • Flash antifuse FPGAs
  • Actel Corp.
  • Quick Logic Corp.

Share over 60 of the market
10
  • XILINX

11
Xilinx
  • Primary products FPGAs and the associated CAD
    software
  • Main headquarters in San Jose, CA
  • Fabless Semiconductor and Software Company
  • UMC (Taiwan) Xilinx acquired an equity stake in
    UMC in 1996
  • Seiko Epson (Japan)
  • TSMC (Taiwan)

ISE Alliance and Foundation Series Design
Software
12
Xilinx FPGA Families
  • Old families
  • XC3000, XC4000, XC5200
  • Old 0.5µm, 0.35µm and 0.25µm technology. Not
    recommended for modern designs.
  • High-performance families
  • Virtex (0.22µm)
  • Virtex-E, Virtex-EM (0.18µm)
  • Virtex-II, Virtex-II PRO (0.13µm)
  • Low Cost Family
  • Spartan/XL derived from XC4000
  • Spartan-II derived from Virtex
  • Spartan-IIE derived from Virtex-E
  • Spartan-3

13
Basic Spartan-II FPGA Block Diagram
14
CLB Structure
  • Each slice has 2 LUT-FF pairs with associated
    carry logic
  • Two 3-state buffers (BUFT) associated with each
    CLB, accessible by all CLB outputs

15
CLB Slice Structure
  • Each slice contains two sets of the following
  • Four-input LUT
  • Any 4-input logic function,
  • or 16-bit x 1 sync RAM
  • or 16-bit shift register
  • Carry Control
  • Fast arithmetic logic
  • Multiplier logic
  • Multiplexer logic
  • Storage element
  • Latch or flip-flop
  • Set and reset
  • True or inverted inputs
  • Sync. or async. control

16
LUT (Look-Up Table) Functionality
  • Look-Up tables are primary elements for logic
    implementation
  • Each LUT can implement any function of 4 inputs

17
5-Input Functions implemented using two LUTs
  • One CLB Slice can implement any function of 5
    inputs
  • Logic function is partitioned between two LUTs
  • F5 multiplexer selects LUT

18
5-Input Functions implemented using two LUTs
OUT
19
Dedicated Expansion Multiplexers
  • MUXF5 combines 2 LUTs to create
  • Any 5-input function (LUT5)
  • Or selected functions up to 9 inputs
  • Or 4x1 multiplexer
  • MUXF6 combines 2 slices to form
  • Any 6-input function (LUT6)
  • Or selected functions up to 19 inputs
  • 8x1 multiplexer
  • Dedicated muxes are faster and more space
    efficient

20
Distributed RAM
  • CLB LUT configurable as Distributed RAM
  • A LUT equals 16x1 RAM
  • Implements Single and Dual-Ports
  • Cascade LUTs to increase RAM size
  • Synchronous write
  • Synchronous/Asynchronous read
  • Accompanying flip-flops used for synchronous read

21
Shift Register
  • Each LUT can be configured as shift register
  • Serial in, serial out
  • Dynamically addressable delay up to 16 cycles
  • For programmable pipeline
  • Cascade for greater cycle delays
  • Use CLB flip-flops to add depth

22
Shift Register
  • Register-rich FPGA
  • Allows for addition of pipeline stages to
    increase throughput
  • Data paths must be balanced to keep desired
    functionality

23
Carry Control Logic
COUT
YB
Look-Up Table
Carry Control Logic
Y
G4 G3 G2 G1
S
D
Q
O
CK
EC
R
F5IN
BY SR
XB
Look-Up Table
Carry Control Logic
X
S
F4 F3 F2 F1
D
Q
O
CK
EC
R
CIN CLK CE
SLICE
24
Fast Carry Logic
  • Each CLB contains separate logic and routing for
    the fast generation of sum carry signals
  • Increases efficiency and performance of adders,
    subtractors, accumulators, comparators, and
    counters
  • Carry logic is independent of normal logic and
    routing resources

MSB
Carry Logic Routing
LSB
25
Accessing Carry Logic
  • All major synthesis tools can infer carry logic
    for arithmetic functions
  • Addition (SUM lt A B)
  • Subtraction (DIFF lt A - B)
  • Comparators (if A lt B then)
  • Counters (count lt count 1)

26
Block RAM
  • Most efficient memory implementation
  • Dedicated blocks of memory
  • Ideal for most memory requirements
  • 4 to 14 memory blocks
  • 4096 bits per blocks
  • Use multiple blocks for larger memories
  • Builds both single and true dual-port RAMs

27
Dual Port Block RAM
28
Dual-Port Bus Flexibility
RAMB4_S4_S16
WEA
Port A Out 4-Bit Width
Port A In 1K-Bit Depth
ENA
RSTA
DOA30
CLKA
ADDRA90
DIA30
WEB
Port B Out 16-Bit Width
Port B In 256-Bit Depth
ENB
RSTB
DOB150
CLKB
ADDRB70
DIB150
  • Each port can be configured with a different data
    bus width
  • Provides easy data width conversion without any
    additional logic

29
Two Independent Single-Port RAMs
RAMB4_S1_S1
Port A In 2K-Bit Depth
Port A Out 1-Bit Width
VCC, ADDR100
Port B In 2K-Bit Depth
Port B Out 1-Bit Width
GND, ADDR100
  • To access the lower RAM
  • Tie the MSB address bit to Logic Low
  • To access the upper RAM
  • Tie the MSB address bit to Logic High
  • Added advantage of True Dual-Port
  • No wasted RAM Bits
  • Can split a Dual-Port 4K RAM into two Single-Port
    2K RAM
  • Simultaneous independent access to each RAM

30
I/O Banking
31
Basic I/O Block Structure
Q
D
Three-State
EC
FF Enable
Three-StateControl
Clock
SR
Set/Reset
Q
D
Output
EC
FF Enable
Output Path
SR
Direct Input
FF Enable
Input Path
Q
D
Registered Input
EC
SR
32
IOB Functionality
  • IOB provides interface between the package pins
    and CLBs
  • Each IOB can work as uni- or bi-directional I/O
  • Outputs can be forced into High Impedance
  • Inputs and outputs can be registered
  • advised for high-performance I/O
  • Inputs can be delayed

33
Routing Resources
34
Clock Distribution
35
FPGA Nomenclature
36
  • ALTERA

37
Device Families Tools
38
Logic Element FLEX10K
39
Logic Array Block FLEX10K
40
FLEX10K Architecture
41
Stratix Architecture
42
Stratix Device Family

Feature EP1S10 EP1S20 EP1S25 EP1S30 EP1S40 EP1S60 EP1S80 EP1S120
Logic Elements (LEs) 10,570 18,460 25,660 32,470 41,250 57,120 79,040 114,140
M512 RAM Blocks( 512 Bits Parity) 94 194 224 295 384 574 767 1,118
M4K RAM Blocks(4 Kbits Parity) 60 82 138 171 183 292 364 520
M512 RAM Blocks(512 Kbits Parity) 1 2 2 4 4 6 9 12
Total RAM bits 920,448 1,669,248 1,944,576 3,317,184 3,423,744 5,215,104 7,427,520 10,118,016
DSP Blocks 6 10 10 12 14 18 22 28
Embedded Multipliers 48 80 80 96 112 144 176 224
PLLS 6 6 6 10 12 12 12 12
Maximum User I/O Pins 426 586 706 726 822 1,022 1,238 1,314
Engineering Sample Availability Now Use Production Use Production N/A Now N/A Now 2003
Production Device Availability March 2003 Now Now Now March 2003 April 2003 January 2003 2003
43
FPGA Technology Roadmap
year 1995 1996 1997 2000 2003 2004 ?
Technology 0.6µ 0.35 µ 0.25 µ 0.18 µ 0.13 µ 0.07µ
Gate count 25K 100K 250K 1 M 100K LC 8Mb RAM 400 18X18 multipliers
Transistor count 3.5M 12M 23M 75M 430M 1B
note Xilinx Virtex-II Pro XC2VP100 (9/16/2003)
44
  • Advance architecture on
  • modern FPGAs

45
More guts
  • Additional components
  • RAM blocks
  • Dedicated multipliers
  • Tri-state buffers
  • Transceivers
  • Processor cores
  • DSP blocks

46
Dedicate Arithmetic Blocks
QuickLogic
Altera
Xilinx
47
Processor Cores
48
PowerPC on Vertex II Pro
  • Embedded 300 MHz Harvard Architecture Core
  • Low Power Consumption 0.9 mW/MHz
  • Five-Stage Data Path Pipeline
  • Hardware Multiply/Divide Unit
  • Thirty-Two 32-bit General Purpose Registers
  • 16 KB Two-Way Set-Associative Instruction Cache
  • 16 KB Two-Way Set-Associative Data Cache
  • Memory Management Unit (MMU)
  • - 64-entry unified Translation Look-aside Buffers
    (TLB)
  • - Variable page sizes (1 KB to 16 MB)
  • Dedicated On-Chip Memory (OCM) Interface
  • Supports IBM CoreConnect Bus Architecture
  • Debug and Trace Support
  • Timer Facilities

49
ARM in Excalibur
  • Industry-standard ARM922T 32-bit RISC processor
    core operating up to 200MHz
  • ARMv4T instruction set with Thumb extensions
  • Memory management unit (MMU) included for
    real-time operating systems (RTOS) support
  • Harvard cache architecture with 64-way set
    associative separate 8-Kbyte instruction and
    8-Kbyte data caches
  • Embedded programmable on-chip peripherals
  • ETM9 embedded trace module to assistant software
    debugging
  • Flexible interrupt controller
  • Universal asynchronous receiver/transmitter
    (UART)
  • General-purpose timer
  • Watchdog timer

50
FPGA Tools
51
Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be
able to perform an encryption algorithm by
itself, executing 32 rounds..
Specification (Lab Experiments)
VHDL description (Your Source Files)
Library IEEE use ieee.std_logic_1164.all use
ieee.std_logic_unsigned.all entity RC5_core is
port( clock, reset,
encr_decr in std_logic
data_input in std_logic_vector(31 downto 0)
data_output out std_logic_vector(31
downto 0) out_full in
std_logic key_input in
std_logic_vector(31 downto 0)
key_read out std_logic ) end
AES_core
Functional simulation
Synthesis
Post-synthesis simulation
52
Design process (2)
Implementation
Timing simulation
Configuration
On chip testing
53
Active-HDL
54
Simulation Tools
Synthesis Tools
55
Logic Synthesis
VHDL description
Circuit netlist
architecture MLU_DATAFLOW of MLU is signal
A1STD_LOGIC signal B1STD_LOGIC signal
Y1STD_LOGIC signal MUX_0, MUX_1, MUX_2, MUX_3
STD_LOGIC begin A1ltA when (NEG_A'0')
else not A B1ltB when (NEG_B'0') else not
B YltY1 when (NEG_Y'0') else not
Y1 MUX_0ltA1 and B1 MUX_1ltA1 or
B1 MUX_2ltA1 xor B1 MUX_3ltA1 xnor
B1 with (L1 L0) select Y1ltMUX_0 when
"00", MUX_1 when "01", MUX_2 when
"10", MUX_3 when others end MLU_DATAFLOW
56
Features of synthesis tools
  • Interpret RTL code
  • Produce synthesized circuit netlist in a standard
    EDIF format
  • Give preliminary performance estimates
  • Some can display circuit schematics corresponding
    to EDIF netlist

57
Implementation
  • After synthesis the entire implementation process
    is performed by FPGA vendor tools
  • Xilinx ISE foundation 6.2i
  • Altera Quartus II 4.0
  • 3rd party tools for alliance version

58
Circuit Compilation
1. Technology Mapping
2. Placement
Assign a logical LUT to a physical location.
3. Routing
Select wire segments And switches
for Interconnection.
59
Routing Example
FPGA
Programmable Connections
60
Static Timing Analyzer
  • Performs static analysis of the circuit
    performance
  • Reports critical paths with all sources of delays
  • Determines maximum clock frequency

61
Static Timing Analysis
  • Critical Path The Longest Path From Outputs of
    Registers to Inputs of Registers
  • Min. Clock Period Length of The Critical Path
  • Max. Clock Frequency 1 / Min. Clock Period

62
Configuration
  • Once a design is implemented, you must create a
    file that the FPGA can understand
  • This file is called a bit stream a BIT file
    (.bit extension)
  • The BIT file can be downloaded directly to the
    FPGA, or can be converted into a PROM file which
    stores the programming information
Write a Comment
User Comments (0)
About PowerShow.com