Field Programmable Gate Arrays - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Field Programmable Gate Arrays

Description:

'Field' as in field operations -- programmable in the field, as opposed ... oops region. cost. Raw Speed and Interrupt Latency. cost. complexity. cost / volume ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 47

Provided by: Andrew745

Category:

more less

Transcript and Presenter's Notes

Title: Field Programmable Gate Arrays

1
Field Programmable Gate Arrays

MAS863
How To Make (almost) Anything
Andrew bunnie HuangE. Rehmi Post

2
Agenda

Lecture
Motivation and Application
Theory and Architecture
System Integration
Design Demo
How to use the tools and features
In-class Project
VGA display of moving ball

3
Introduction

Field Programmable Gate Array
Field as in field operations -- programmable in
the field, as opposed to in the factory
Gate array
array of logic gates and storage elements
When and why would you use such a device?

4
Motivation

Computational Scenarios

response time (latency)
PIC
PCs, Workstations
cost
embedded processors
Raw Speed and Interrupt Latency
FPGAs
simple gates
cost / volume
oops region
complexity
cost
ASIC (full-custom IC)
complexity
5
Motivation

FPGAs span the middle ground
Fast design cycles
IP cores
reconfigurability
late binding decisions--hardware is no longer
cast in concrete
High Performance
excellent in latency limited situations (network
routers, real-time systems, timing generators),
i.e. situations where lots of time resolution is
required with a good degree of complexity
Can be cost effective
Especially in low-volume scenarios vs. ASIC

6
Applications

Fast-turn, low volume ASIC (uninteresting)
Reconfigurable Hardware Processors
One-connector I/O solutions
Rapid prototyping

7
Applications RHP

Direct implementation of algorithms in hardware
circumvents instruction fetch, decode, issue
overhead
unrestricted parallelism
disadvantage little hardware abstraction,
difficult to use
RISC framework with reconfigurable instruction
set
user-defined instructions depending on process
context
prevents the MMX disease
easier to use, more hardware abstraction, but
lower performance

8
Applications RHP

Optimal ISA
compiler analyzes code and chooses an ISA optimal
for the problem, and bundles the hardware
description for the ISA with the code object
Configurable memory management and caching
useful for implementing special OS features
VM paging schemes directly in hardware
Ultimate RHP-one processor, any ISA
In the future - possibly adaptive processors
which automatically optimize their architecture
per application

9
Applications Direct Hardware

Ideal for implementing simple, repetitive
operations (overhead operations)
time synchronization on Novell networks
CAM lookup tables for IP routing and neural nets
encryption/decryption
FEA (finite element analysis)
Relaxation networks
database searching
higher peformance with special architectures
(embedded RAM)

10
Applications I/O solutions

One-connector I/O solutions
use a single connector with any protocol desired
ex a DB-25 which can do SCSI, IEEE1284 parallel,
serial
ideal for space-limited applications
Object oriented hardware
devise a system such that a device plugged into
the I/O port uploads the hardware configuration
necessary to implement the communications
protocol
protocol upgrades are a cinch
limited by electrical signalling compatibility
issues
drawback - can be confusing to users, potentially
damaging to hardware

11
Applications EA

Evolutionary algorithms
some research done on FPGAs already
tone recognition application
possibly requires intimate knowledge of FPGA
hardware
vendor licencsing issues
EA apps do not map well into current FPGA
architectures
however, with the right FPGA EA could yield very
interesting results

12
Applications Rapid Prototyping

FPGAs are a handy thing to have on the lab bench
simple digital circuits no longer require wiring
or parts ordering
modification and duplication of existing designs
is relatively straightforward
with the right design tools, hardware design
re-use is an additional benefit

13
General Architecture
remember
compute
compute
connect
CONFIGURE
connect
connect
connect
compute
remember
remember
connect
Terminology Granularity, Configuration, and
Routing
14
Architecture Varieties

Primary classifications for FPGAs
configuration method
granularity
routing architecture
Other practical considerations
density
speed
cost
design tools
vertical migration

15
Architecture Varieties

EPAC
Electrically Programmable Analog Circuit
Contains programmable gain amplifiers,
comparators, multiplexers, DACs, track-and-hold,
filtering components
Made by iMP

16
Architecture Configuration

Configuration method
In-circuit programmable methods
SRAM based (Xilinx 2K/3K/4K, Altera 8K/10K,
Lucent Orca)
volatile, but fast configuration times
must reprogram on every power-up
some architectures offer partial reconfiguration
(Atmel)
most expensive in terms of area and timing costs
standard CMOS process
EEPROM based (Altera 7K/9K)
nonvolatile, slow config sometimes requires
extra voltages for programming and erasing
special silicon processing required

17
Architecture Configuration

Configuration method (contd)
Pre-assembly programmable methods
Antifuse based (Actel, Quicklink FPGAs)
nonvolatile, very fast links
permanent configuration (OTP)
smallest link size (lower cost)
special silicon processing technology required
(E)EPROM based (Altera 5K, 7K, Xilinx 7200, 7300)
nonvolatile, moderate performance
reprogrammable after special erase cycle
medium-sized link
special silicon processing technology required

18
Architecture Granularity

Granularity
Defined as ratio of logic per cell versus routing
Very fine-grained architectures
Partial set of n-input boolean functions per cell
Roughly 6-1 ratio of logic inputs to registers
per cell
Atmel, Actel
Fine-grained architectures
Full set of n-input boolean functions per cell
Sometimes multiple n-input boolean functions per
cell
Roughly 8-1 ratio of logic inputs to registers
per cell
Well-suited for state machines, simple
arithmetic, pipelined applications
Xilinx 3K/4K, Altera 8K/10K

19
Architecture Granularity

Granularity (contd)
Coarse-grained architectures
PLD-style product term arrays
Roughly 32-1 ratio of logic inputs to registers
per cell
Well-suited for address decoding, complicated
arithmetic operations, datapath operators,
complex state machines
Poorly suited for pipelined applications and
simple operations
Altera 5K/7K, Xilinx 7K
Dual-grained architectures or heirarchical
architectures
Combines coarse and fine-grained features
Often exhibit separate local and global routing
resources
Lucent Orca, Altera 9K

20
Architecture Routing

Routing method
Fine-grained
Short hops (1 to 8 logic cells spanned per track)
Path-dependent timing
Exhibits high density
Flexible switch matrices
Less logic placement constraints
Coarse-grained
Tracks span entire chip
Fixed timing regardless of logic placement
Lower density
Logic placement constrained by routing
availability

21
Architecture Routing

Routing method (contd)
Heirarchical routing
Local, fine-grained routing between cells
Global, coarse-grained routing between groups of
cells
Usually path-dependant timing
Best of both worlds, but can be difficult to
utilize efficiently

22
Architecture Other Practical

Density, speed and vertical migration
Altera FLEX 8K is targetted at density-driven
apps
Altera MAX 7K is targetted at performance-driven
apps
Xilinx 4K series targets both speed and
performance, with good vertical migration from 3K
gates to 250K gates (Altera 10K is Xilinx 4K
competitor)
Xilinx 6200 series targets reconfigurable
hardware applications

23
Architecture Design Tools

Design tools - the other half of the equation
FPGA is useless without good design tools
Design tools slowly progressing to acceptable
levels
Entry methods include HDL, schematic
Compilers are improving! Xilinxs most recent
compiler can place and route reasonably tough
designs in about fifteen minutes very tough
designs will finish in a half hour or not at all.
Xilinx Foundation Series / M1 technology
Altera MAXPLUS

24
Architecture Cost

Cost formulas for FPGAs are complex
OTP FPGAs tend to be cheaper
Established lines are cheaper than new lines
Cost increases exponentially with performance and
density
Some lines are targetted at cost-sensitive
applications (Altera 7K)
Not all speed grade-density combos available from
manufacturers

25
Detailed Architecture Xilinx 4000E

Fine-grained logic, SRAM based, with fine-grained
routing
Array of CLBs embedded in single length / double
length / quad length / longline routing resources
PSM
CLB Configurable Logic Block
Two 4-input LUTs (LookUp Tables) and one 3-input
LUT
Two SR D-type flip flops
Bypass paths and carry/cascade logic
PSM Programmable Switch Matrix
10 interconnect points per matrix
Each interconnect contains six pass transistors
for full connectivity between four directions
Located at intersections of single and double
length lines

26
Detailed Architecture Xilinx 4000E
27
Detailed Architecture Xilinx 4000E
28
Detailed Architecture Xilinx 4000E
29
Detailed Architecture Xilinx 4000E
30
Detailed Architecture Xilinx 4000E
31
Detailed Architecture Xilinx 4000E

Configuration
total (device) reconfiguration (no partial
reconfig)
several configuration modes available
parallel and serial modes
master and slave modes
daisy chain ability
device bitstreams between 50Kbits and 400Kbits
config rate around 10 Mbit/sec
max reconfig rate in a few tens of milliseconds
typical reconfig in a couple of seconds

32
Detailed Architecture Xilinx 4000E

Other features
distributed RAM
CLB LUTs can function as a 32x1, 16x1, or 16x2
RAM
synchronous RAM options available
internal tri-state buffers
global routing resources
JTAG boundary scan
configuration readback
programmable slew rate and logic levels in IOBs
common per-package pinout for all devices
allows for easy vertical migration

33
System Integration

FPGAs offer flexible I/O solutions
laying out a board around an FPGA is very nice
newest FPGAs, sp. Virtex, has multi-standard I/O
support
Requires a source of configuration data
Host computer, parallel or serial
Serial ROM
fewest wires--CCLK,DIN,INIT,PROG, sometimes DOUT
FLASH ROM controlled by dedicated config
circuitry
Combination of both

34
Serial Programming
Slave and Master modes
35
Programming From a ROM
36
Thats Nice. How Do I Use It?!

Present basic design flow
Work through a demo implementation

37
Design Tools Process
Libraries
IP Cores
Design description (HDL, schematic)
Technology mapping
Place
Route
Errors
Timing Analysis
Bitstream
FPGA
38
Design Tools Design Entry

HDL
Verilog, VHDL or proprietary language (AHDL,
etc.)
verilog is like C with multithreading and strict
typing
VHDL stands for VHSIC HDL intended for detailed
simulations commissioned by the military very
complex
Ideal for large designs because of well-defined
scoping and instantiation rules top down design
Also ideal for state machines, decoders/encoders,
and odd or awkward busses
Hardware mapping is difficult
very easy to make inefficient designs subtle
semantics choices can lead to drastic perfomance
variations
hard to specify hardware-specific features such
as carry chains
hard to specify placement and routing info

39
Design Tools Design Entry

Schematic entry
more intuitive, easier to observe design flow
helpful when trying to optimize designs for speed
or area
difficult when implementing large amounts of
miscellaneous logic (state machines)
heirarchical schematic tools help make large
designs more manageable
global changes difficult (hard to change global
mistakes)
hardware mapping is much easier
schematic primitives for special hardware
features
schematic attributes for routing info
WYSIWYG design entry

40
Design Tools Hardware Mapping

Many options for HDL to hardware mapping
vendor-specific options
third party tools
EDIF is the most common intermediate language
When using HDLs, good hardware mapping tools are
critical for perfomance and device utilization
Deep understanding of HDL is also useful
Schematic hardware mapping is much easier - very
close to WYSIWYG editing
Hardware mappers often perform aggressive logic
optimizations - watch your assumptions! (hazards)

41
Design Tools Place and Route

Place and route tools are always vendor-specific
Much progress remains in place and route tools
typical PR times for a reasonably complex design
is around 30 minutes to an hour
device utilization and performance still well
below that of hand-placed and routed designs
Many vendors offer hand-placement or tweaking
tools for speed and area critical applications
Partial compilation of macros in the works

42
Design Tools Timing Analysis

Especially important for path-dependant delay
devices
Designs often iterate at this point - critical
path is extracted and optimized
Timing analysis tools also have a ways to go
difficult to analyze designs with multiple clock
domains
impossile to analyze designs with combinational
loops

43
Design Tools Bitstream management

Bitstreams can be merged
daisy-chained devices

44
Configuration

Applies to ISP devices only
Many options
serial ROM
master mode with standard ROM
slave of intelligent host or another FPGA in a
daisy chain
serial ROM
very popular in ASIC-style applications
low pin and parts count, but sometimes slower

45
Configuration

Master mode with parallel ROM device
FPGA drives a ROMs address bits and reads data
from ROM
expensive in terms of pins, but pins can be
reused in some designs
sometimes faster than serial methods
Slave modes
intelligent host configures FPGA
host can be a PC or another FPGA in master mode
most flexible method
many FPGA architectures allow daisy chaining

46
Other Considerations

Design for the future
vertical migration
largest FPGA for your budget
Be wary of logic interface levels and new low
voltage devices
Pin-locking
some FPGA architectures perform very poorly under
pin-locking (Altera 8K in particular)
all architectures experience some performance
loss under pin-locking
I/O count
many designs are I/O limited, not logic limited
System performance, not just logic performance
includes I/O times and routing times, clock skew
compare to FF toggle rates often quoted by vendors