ECE 636 Reconfigurable Computing Lecture 25 Course Wrapup - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

ECE 636 Reconfigurable Computing Lecture 25 Course Wrapup

Description:

I/O Buffers, Programming and Test Logic. Actel Programmable Gate Arrays ... Effectively reconfigure hardware (FPGA) to allocate buffer space as needed ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 40
Provided by: RussTe7
Category:

less

Transcript and Presenter's Notes

Title: ECE 636 Reconfigurable Computing Lecture 25 Course Wrapup


1
ECE 636Reconfigurable ComputingLecture
25Course Wrap-up
2
What is Reconfigurable Computing?
  • Computation using hardware that can adapt at the
    logic level to solve specific problems
  • Why is this interesting?
  • Some applications are poorly suited to
    microprocessor.
  • VLSI explosion provides increasing resources.
  • Hardware/Software
  • Relatively new research area.

3
Design abstractions
4
Processor FPGA
Three possibilities
daughtercard
Proc
FPGA
chip
Backplane bus (e.g. PCI)
1. FPGA serves as coprocessor for data
intensive applications possible project.
FPGA
chip
Proc
2. FPGA serves as embedded computer for low
latency transfer.
Reconfigurable Functional Unit
5
Xilinx XC4000 Cell
  • 2 4-input look-up tables
  • 1 3-input look-up table
  • 2 D flip flops

6
Xilinx XC4000 Routing
25
7
Actel Programmable Gate Arrays
I/O Buffers, Programming and Test Logic
Rows of programmable logic building
blocks rows of interconnect
I/O Buffers, Programming and Test Logic
Anti-fuse Technology Program Once
I/O Buffers, Programming and Test Logic
Use Anti-fuses to build up long wiring runs
from short segments
I/O Buffers, Programming and Test Logic
Logic Module
Wiring Tracks
8 input, single output combinational logic
blocks FFs constructed from discrete cross
coupled gates
8
Altera Max 7000 Macrocell
9
Example DPGA Prototype
10
FPGA vs. DPGA Compare
11
Min-cut bisecting partitioning
B
A
C
D
partition 1
partition 2
12
Hill Climbing Algorithms
  • To avoid getting trapped in local minima,
    consider hill-climbing approach
  • Need to accept worse solutions or make bad
    moves to get global minima.
  • Acceptance is probabalistic. Only accept
    cost-increasing moves some of the time.

Cost
Solution space
13
Routing Tradeoffs
  • Bias router to find first, best route.
  • Vary number of node expansions using
  • pcosti (1 a) x pcosti-1 ncosti a x disti

14
Architectural Limitation
  • Routing architecture necessitates domain
    selection.
  • Bigger effect for multi-fanout nets

15
Two-dimensional Layout
  • Control network supports distributed signals.
  • Data routed as four-bit values.

16
Rapid Datapath
  • Segmented linear architecture
  • All RAMs and ALUs are pipelined
  • Bus connectors also contain registers

17
Basic Functional Unit
  • Two inputs from adjacent blocks.
  • Local memory for instructions, data.

18
Chess Interconnect
  • More like an FPGA
  • Takes advantage of near-neighbor connectivity

19
FPICs
  • High internal connectivity
  • Not always cost effective

20
Hierarchical Crossbar
  • Full connectivity occurs at top level
  • Routing between FPGAs requires determining level
    at which source and destination share an
    ancestor.
  • Simplifies routing

21
Linear Array
  • Current hardware
  • Programs implemented as systolic array
  • Input key
  • Search each RAM bank for sequence

22
Emulation Software Steps
Netlist Translation
Technology Mapping
Many of these are dependent on device
interconnect topology
Divide netlist into fixed-sized chunks
Partitioner
Global Placer
Locate an FPGA for a chunk
Global Router
Make connections between devices
FPGA-specific PR
Xilinx PR
FPGA bitstreams
23
Simulation Acceleration
  • FPGA system takes the place of one portion of
    simulated design
  • Inputs transported to FPGA system.
  • Outputs returned from FPGA system.

24
Network Routing
  • FPGAs popular in network hardware
  • New protocols implemented directly in silicon
  • Easy to upgrade in the field
  • Washington University Gigabit Switch (WUGS)
  • Switch provides up to 160 Gbps of bandwidth.

25
Pyramid Operations
  • Gaussian Pyramid
  • Down sample image to compress image size for
    communication.
  • Average over a set of points to create new point
  • Laplacian Pyramid
  • Determine error found from Gaussian Pyramid
  • Expand contracted picture and compare with
    original

26
Gaussian Pyramid Implementation
  • Systolic array in which each device performs a
    separate function.
  • Limited by clock rate of slowest device.

27
Proposed Data Acquisition System
Gigabit Ethernet Interface
64K X 16 DUAL PORT RAM
GIGABIT ETHERNET PHY
RJ45
Radar Control Interface
36
36
Hard Disk Interface
32
32
FPGA2 Stratix EP1S40 (Storage Control)
FPGA1 Stratix EP1S40 (Data Processing)
3.3 V BUFFER
Gigabit Ethernet core
30
30
ATA66 IDE Channel 0
3.3 to 5 V BUFFER
14
AD6645 (105 MSPS)
H Channel
Analog
64
30
30
AD6645 (105 MSPS)
14
3.3 to 5 V BUFFER
ATA66 IDE Channel 1
V - Channel
Radar Unit
Analog
16
AD974 (200 KSPS)
Radar Positioner Data channel
SRAM 1 x 512K X36 DATA PROCESSING MEMORY
SRAM 3 x 512K X36 DATA PROCESSING MEMORY
16
10/100 Mbps Ethernet Interface
16
62
ETHERNET PHY
RJ45
MAX 7000A PLD
ATMEL AT91RM9200 MICROCONTROLLER ARM - RISC
CORE (209 MHz 32 BIT)
ETHERNET CONTROLLER
USB INTERFACE
SOFTWARE FLASH 1 X 4M X 16 CONFIGURATION MEMORY
BOOT FLASH 2 X 1M X 16 PROGRAM MEMORY
SDRAM 2 X 8M X 16 DATA MEMORY
USB BLOCK
JTAG PORT
RS232 DRIVER
SERIAL PORT
28
Detailed View of Dharma
29
Chimaera Architecture
  • Live copy of register file values feed into array
  • Each row of array may compute from register of
    intermediates
  • Tag on array to indicate RFUOP

30
Chimaera Architecture
  • Array can operate on values as soon as placed in
    register file.
  • Logic is combinational
  • When RFUOP matches
  • Stall until result ready
  • Drive result from matching row

31
Chimaera Results
  • Three Spec92 benchmarks
  • Compress 1.11 speedup
  • Eqntott 1.8
  • Life 2.06
  • Small arrays with limited state
  • Small speedup
  • Perhaps focus on global router rather than local
    optimization.

32
Garp
  • Integrate as coprocessor
  • Similar bandwidth to processor as functional unit
  • Own access to memory
  • Support multi-cycle operation
  • Allow state
  • Cycle counter to track operation
  • Configuration cache, path to memory

33
Garp Array
  • Row-oriented logic
  • Dedicated path for processor/memory
  • Processor does not have to be involved in
    array-memory path

34
System Model Adaptive Viterbi Decoder
35
Compression Techniques
  • Effectively we can consider an FPGA device as a
    collection of cells, each with (x, y) location.
  • Instead of using a serial bit stream, could
    consider loading data cell-by-cell like a
    standard memory.
  • Specify location of cell through use of two
    registers.

Row
36
Hardware Support for Runlength
  • Initially latch in base
  • Down counter indicates number of strides to take.
  • Offset used to augment initial base
  • Fairly simple to implement.

37
Determining Communication Level
Send, Receive, Wait
Application hardware (custom)
Register reads/writes
I/O driver
Interrupt service
Bus transactions
I/O bus
Interrupts
  • Easier to program at application level
  • (send, receive, wait) but difficult to predict
  • More difficult to specify at low level
  • Difficult to extract from program but timing and
    resources easier to predict

38
Interface Models
  • Synchronization through a FIFO
  • FIFO can be implemented either in hardware or in
    software
  • Effectively reconfigure hardware (FPGA) to
    allocate buffer space as needed
  • Interrupts used for software version of FIFO

r3
p1
p2
p3
r2
d1
FPGA
Control/Data FIFO
d3
d2
39
Summary
  • Reconfigurable computing relies heavily on new
    VLSI technology
  • Device architectures maturing
  • Application development progressing at rapid pace
  • Integration of hardware and software a difficult
    challenge
  • Active area of research at UMass.
Write a Comment
User Comments (0)
About PowerShow.com