ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II

Description:

Need to compress bit streams to allow for rapid reconfiguration ... Compress information at the source (e.g. local workstation), decompress using ... – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 29

Provided by: RussTe7

Category:

more less

Transcript and Presenter's Notes

Title: ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II

1
ECE 697FReconfigurable ComputingLecture
18Dynamic Reconfiguration II
2
Overview

How an FPGA is configured is an important issue.
Many techniques /possibilities.
Existing approaches have serious limitations.
Need to compress bit streams to allow for rapid
reconfiguration
First approach wild card -gt write multiple
configurable cells at the same time.
Second approach run length encoding -gt attempt
to take advantage of regularity in configuration
stream.

3
Compression Techniques

Effectively we can consider an FPGA device as a
collection of cells, each with (x, y) location.
Instead of using a serial bit stream, could
consider loading data cell-by-cell like a
standard memory.
Specify location of cell through use of two
registers.

Row
4
Wild Card Registers

In many FPGA devices created recently it is
possible to write multiple cells at the same
time.
X in wild card position indicates multiple
writes to a row/col
Key is to minimize total writes to allow for data
compression.
What does this look like.

5
Dealing with Dont Touches
All eight can be covered with just one write

Some locations shouldnt be overwritten at all
due to previous constraints.
Others can be written with one value and then
overwritten with another

6
Parallels to Logic Minimization

Zeros used to represent Dont Touch cases.
Every time I solve for a specific configuration
more Dont Touches will be created.
Ordering of writes will have an impact on the
number of Dont Cares under consideration.
Maximizing Dont Cares a positive step.

7
Modeling the Problem as Logic Optimization

Order configuration groups by frequency of
occurrence.
Rather than truth table, represent grid locations
as cubes both for on-set and dont care set.
Logic optimization packages like Espresso
effective in minimizing cubes
Note cubes that cover most locations

1 00-- 1
0001 1 ----? -000 1
0010 1
0011 1
0000 -

8
Wild Card Minimization Sequence

Read in file and group all addresses with same
value.
Sort groups in decreasing order of number of
addresses to be written
Pick first group and label addresses as on-set
including unoccupied locations
Run logic minimization (Espresso) on group.
Pick Espresso cube that covers most unoccupied
addresses
Reinsert other addresses back in queue
Iterate

9
Minimization Ordering

Clearly selecting configuration 6 first is
advantageous
3 writes needed to cover this configuration.
Some values never benefit from Dont Cares due to
presence of Dont Touches
3 cycles needed for each write. Might be possible
to show additional optimization?

10
Wild Card Minimization Results

Sizable reduction in overall writes needed only
17 of original needed
Implications for power?
If only a portion of the design changed not clear
of benefit.

11
Alternate Approach

Rather than using wild card register, instead
consider encoding information.
Compress information at the source (e.g. local
workstation), decompress using hardware embedded
in the device.
Takes advantage of sequences of information that
are regular.
Independent of decode hardware inside the FPGA
(e.g. row, column, wild card registers).

12
Run-length Compression

Send a sequence of values as a collection
of three pieces of data
100, 103, 106, 109, 112
Base 100 offset 3 length 4
Primarily useful for addresses but may be
applicable to data as well.
Constrained by data sizes for the three values.

13
Lempel-Ziv Compression

Run-length encoding deals with repetition of a
single value
Lempel-Ziv deals with repetition of a number of
values up to some window size.
example CBADAFAL
if next three pieces are A, B, M
We can use a codeword pointer 3
length 2
last
symbol M
Also CBADAFAL next 13 pieces of data
BCBCBCBCBCBCD
pointer2 length12

14
Ordering of Values

Clearly some sequences are better aligned than
others
ABCDABCBAC
This reordering can be made adaptive based on
tuning parameters derived from a number of
similar files
e.g. Many repeating sequences of a certain
type should be examined abcabc
May vary from file to file

15
Hardware Support for Runlength

Initially latch in base
Down counter indicates number of strides to take.
Offset used to augment initial base
Fairly simple to implement.

16
Hardware for Lempel-Ziv

Down counter stores stride length
Register window holds repeated data
Pointer extracts appropriate data value.
Last symbol included when count is 0

17
Results

Compression ratio measures amount of compression
from original data stream
Adaptive reorder is most effective
Last column is wild card approach, compares
favorably with other approaches

18
The Transmogrifier-3 (2000-2003)

Four Xilinx Virtex 2000Es
38K LUTs each, total of 150K LEs
8 Mbytes RAM (2M per chip)
Video in video out interface on board
FPGAs interconnected in fixed wiring pattern

19
The Transmogrifier-3
20
Ray Tracing System Diagram

FPGA 0 handles ray-object intersections
FPGA 1 provides control logic for hierarchy
traversal

21
Ray-Triangle Intersection Unit

Input
List of triangles
Set of rays

Output
Nearest intersected triangle for each ray

22
Intersection Datapath

Input
A single triangle definition
A single ray definition
Output
Flag indicating intersection
Distance to intersection
Location of intersection within plane of triangle

23
Intersection Datapath

Implements a barycentric1 ray-triangle
intersection algorithm
Fully exploits all algorithmic parallelism
Pipeline latencies
7 cycles to determine if an intersection occurs
Additional 31 cycles to determine intersection
point
Hardware 28000 LUTs, 11000 flip flops _at_ 50Mhz

1 T. Möller, and B. Trumbore, Fast, Minimum
Storage Ray-Triangle Intersection, The Journal
of Graphics Tools, A. K. Peters, 1997, pp 21-28.
24
Hierarchy Tree Traversal

Traversal Algorithm
Traverse root node to find potentially visible
children
Add to a depth sorted list
Traverse breadth first
Add potentially visible children to list
When a leaf node is found test all objects within
If intersection found algorithm is complete
Otherwise continue traversal of potentially
visible list

25
Hierarchy Tree Traversal

Three different units
Bounding node sorter
Depth sorts intersected child nodes
List handler
Stores the potentially visible list
Controller state machine
Interfaces between user, the ray triangle
intersection unit, the bounding node sort, and
the list handler

26
Benchmark Scenes
Textured Sphere Low polygon count 2048 Good
locality

Landscape A
High polygon count 51200
Good locality

27
Benchmark Results

Performs well with high poly count scenes
Low poly count scenes show fixed overhead of
traversing the hierarchy yield lower returns

28
Summary

A number of configuration modes exist for
transferring data from source to FPGA
Compression scheme like wild carding evaluation
-gt take advantage of 2D regularity to reduce data
transfer times.
Data compression in the form of data word
encoding also effective.
Configuration cloning avoids off-chip I/O by
replicating configuration information already on
device.
Applications such as image processing benefit
from dynamic reconfiguration