Title: Programmable Logic Technologies
1ECE U322Digital Logic Design
Dec. 5, 2005
- Lecture 34
- Mapping Designs onto FPGAs
- FPGA Paradigms
- Homework 8 Due, Practice Final exam
2Xilinx Architecture Summary
- CLBs sprinkled across chip in a regular array
- CLBs can implement any function of 4 input
variables - IOBs for interfacing to outside world
- Lots of different kinds of interconnect for
efficient design
3Xilinx Design Process
- Step1 Design
- Two design entry methods HDL(Verilog or VHDL) or
schematic drawings - Step 2 Synthesize to create Netlist
- Translates V, VHD, SCH files into an industry
standard format EDIF file - Step 3 Implement design (netlist)
- Translate, Map, Place Route
- Step 4 Configure FPGA
- Download BIT file into FPGA
HDL code
Schematic
Synthesize
Netlist
Implement
BIT File
4Design a 101 string recognizer
5Implement on an FPGA
6Special Features
- Clock management
- PLL,DLL
- Eliminate clock skew between external clock input
and on-chip clock - Low-skew global clock distribution network
- Support for various interface standards
- High-speed serial I/Os
- Embedded processor cores
- DSP blocks
7Configuration Storage Elements
- Static Random Access Memory (SRAM)
- each switch is a pass transistor controlled by
the state of an SRAM bit - FPGA needs to be configured at power-on
- Flash Erasable Programmable ROM (Flash)
- each switch is a floating-gate transistor that
can be turned off by injecting charge onto its
gate. FPGA itself holds the program - reprogrammable, even in-circuit
- Fusible Links (Antifuse)
- Forms a forms a low resistance path when
electrically programmed - one-time programmable in special programming
machine - radiation tolerant
8Configurable Logic Block (CLB)
- CLB regroups 4 logic slices
- Fast connection to neighbors
- Connections for carry logic and shift register
mode
9Example Xilinx Virtex-II Pro
10Virtex II Pro Floorplan
- 1 to 4 PowerPCs
- 4 to 16 multi-gigabit transceivers
- 12 to 216 multipliers
- 3,000 to 50,000 logic cells
- 200k to 4M bits RAM
- 204 to 852 I/Os
- Up to 16 serial transceivers
- 622 Mbps to 3.125 Gbps
PowerPCs
Logic cells
11Logic Slice Architecture
- Two 4-input LUT, can also be used as
- 16-bit synchronous RAM
- 16-bit shift register
- Two flip-flops/latches
- Carry logic for arithmetic circuits (e.g. adder)
- Fast width expansion logic
- Implement logic functions with more than 4 inputs
12Configurable Logic Block (CLB)
- CLB regroups 4 logic slices
- Fast connection to neighbors
- Connections for carry logic and shift register
mode
13Xilinx Embedded Multipliers
14Altera Embedded DSP Blocks
- Two DSP Block columns per device
- Number varies by height of column
- Can implement
- Eight 9x9 multipliers
- Four 18x18 multipliers
- One 36x36 multiplier
- Contains adder/subtractor/accumulator
- Registered inputs can become shift register
15Altera Embedded DSP Block
16Xilinx Rocket I/O
3.125 Gb/s per pair
32b _at_ 78 MHz
32b _at_ 78 MHz
Virtex-II Pro
Virtex-II Pro
Virtex 4 11.1 Gbps !!!
17FPGA Vendors Device Families
- Xilinx
- Virtex-II/Virtex-4 Feature-packed
high-performance SRAM-based FPGA - Spartan 3 low-cost feature reduced version
- CoolRunner CPLDs
- Altera
- Stratix/Stratix-II
- High-performance SRAM-based FPGAs
- Cyclone/Cyclone-II
- Low-cost feature reduced version for
cost-critical applications - MAX3000/7000 CPLDs
- MAX-II Flash-based FPGA
- Actel
- Anti-fuse based FPGAs
- Radiation tolerant
- Flash-based FPGAs
- Lattice
- Flash-based FPGAs
- CPLDs (EEPROM)
- QuickLogic
- ViaLink-based FPGAs
18State of the Art in FPGAs
- 90 nm process on 300 mm wafers
- Lower cost per function (LUT register)
- Smaller and faster transistors Higher speed
- System speed up to 500 MHz
- Mainly through smart interconnects, clock
management, dedicated circuits, flexible I/O. - Integrated transceivers running at 10
Gigabits/sec - More Logic and Better Features
- gt100,000 LUTs flip-flops
- gt200 embedded RAMs, and same number 18 x 18
multipliers - 1156 pins (balls) with gt800 GP I/O
- 50 I/O standards, incl. LVDS with internal
termination - 16 low-skew global clock lines
- Multiple clock management circuits
- On-chip microprocessor(s) and multi-Gbps
transceivers
19Latest Devices Capacity Features
- Xilinx Virtex-4
- 90nm process
- Up to 960 I/Os
- gt200000 logic cells
- Up to 552 18kb block RAMs (10Mb RAM)
- 192 DSP slices (18x18 multiplier-accumulator)
- 20 digital clock managers (DCM)
- 24 high-speed serial transceivers (622Mb/s to
11.1Gb/s) - Up to four PowerPC 405 cores
- Altera Stratix-II
- 90nm process
- Up to 1170 I/Os
- 179000 logic elements
- 9.6Mb embedded RAM
- 96 DSP blocks 380 18x18 multipliers
- 12 PLLs
- Serial I/O up to 1Gb/s
- No hard processor cores
20Recap Course Overview
21Design of Integrated Systems
Design
Verification
22Design Automation
- Design Automation is one of the most advanced
areas in practical EE CS - many problems require sophisticated mathematical
modeling - many algorithms are computationally hard and
require advanced and fine-tuned heuristics to
work on realistic problem sizes - boundary conditions need to be well declared and
synchronized between different tools (patchwork
to cover all wholes)
23Design Challenges
- Systems are becoming huge, design schedules are
getting tighter - gt million to billion gates becoming common for
ASICs - gt 0.4 Mio lines of C-code to describe system
behavior - gt 5 Mio lines of RLT code
- Design teams are getting very large for big
projects - several hundred people
- differences in skills
- concurrent work on multiple levels
- management of design complexity and communication
very difficult - Design tools are becoming more complex but still
inadequate - typical designer has to run 50 tools on each
component - tools have lots of bugs, interfaces do not line
up etc.