Title: ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II
1ECE 697FReconfigurable ComputingLecture
18Dynamic Reconfiguration II
2Overview
- How an FPGA is configured is an important issue.
Many techniques /possibilities. - Existing approaches have serious limitations.
Need to compress bit streams to allow for rapid
reconfiguration - First approach wild card -gt write multiple
configurable cells at the same time. - Second approach run length encoding -gt attempt
to take advantage of regularity in configuration
stream.
3Compression Techniques
- Effectively we can consider an FPGA device as a
collection of cells, each with (x, y) location. - Instead of using a serial bit stream, could
consider loading data cell-by-cell like a
standard memory. - Specify location of cell through use of two
registers.
Row
4Wild Card Registers
- In many FPGA devices created recently it is
possible to write multiple cells at the same
time. - X in wild card position indicates multiple
writes to a row/col - Key is to minimize total writes to allow for data
compression. - What does this look like.
5Dealing with Dont Touches
All eight can be covered with just one write
- Some locations shouldnt be overwritten at all
due to previous constraints. - Others can be written with one value and then
overwritten with another
6Parallels to Logic Minimization
- Zeros used to represent Dont Touch cases.
- Every time I solve for a specific configuration
more Dont Touches will be created. - Ordering of writes will have an impact on the
number of Dont Cares under consideration. - Maximizing Dont Cares a positive step.
7Modeling the Problem as Logic Optimization
- Order configuration groups by frequency of
occurrence. - Rather than truth table, represent grid locations
as cubes both for on-set and dont care set. - Logic optimization packages like Espresso
effective in minimizing cubes - Note cubes that cover most locations
-
- 1 00-- 1
- 0001 1 ----? -000 1
- 0010 1
- 0011 1
- 0000 -
8Wild Card Minimization Sequence
- Read in file and group all addresses with same
value. - Sort groups in decreasing order of number of
addresses to be written - Pick first group and label addresses as on-set
including unoccupied locations - Run logic minimization (Espresso) on group.
- Pick Espresso cube that covers most unoccupied
addresses - Reinsert other addresses back in queue
- Iterate
9Minimization Ordering
- Clearly selecting configuration 6 first is
advantageous - 3 writes needed to cover this configuration.
- Some values never benefit from Dont Cares due to
presence of Dont Touches - 3 cycles needed for each write. Might be possible
to show additional optimization?
10Wild Card Minimization Results
- Sizable reduction in overall writes needed only
17 of original needed - Implications for power?
- If only a portion of the design changed not clear
of benefit.
11Alternate Approach
- Rather than using wild card register, instead
consider encoding information. - Compress information at the source (e.g. local
workstation), decompress using hardware embedded
in the device. - Takes advantage of sequences of information that
are regular. - Independent of decode hardware inside the FPGA
(e.g. row, column, wild card registers).
12Run-length Compression
- Send a sequence of values as a collection
of three pieces of data - 100, 103, 106, 109, 112
- Base 100 offset 3 length 4
- Primarily useful for addresses but may be
applicable to data as well. - Constrained by data sizes for the three values.
13Lempel-Ziv Compression
- Run-length encoding deals with repetition of a
single value - Lempel-Ziv deals with repetition of a number of
values up to some window size. - example CBADAFAL
- if next three pieces are A, B, M
- We can use a codeword pointer 3
-
length 2 - last
symbol M - Also CBADAFAL next 13 pieces of data
- BCBCBCBCBCBCD
- pointer2 length12
14Ordering of Values
- Clearly some sequences are better aligned than
others - ABCDABCBAC
- This reordering can be made adaptive based on
tuning parameters derived from a number of
similar files - e.g. Many repeating sequences of a certain
- type should be examined abcabc
- May vary from file to file
15Hardware Support for Runlength
- Initially latch in base
- Down counter indicates number of strides to take.
- Offset used to augment initial base
- Fairly simple to implement.
16Hardware for Lempel-Ziv
- Down counter stores stride length
- Register window holds repeated data
- Pointer extracts appropriate data value.
- Last symbol included when count is 0
17Results
- Compression ratio measures amount of compression
from original data stream - Adaptive reorder is most effective
- Last column is wild card approach, compares
favorably with other approaches
18The Transmogrifier-3 (2000-2003)
- Four Xilinx Virtex 2000Es
- 38K LUTs each, total of 150K LEs
- 8 Mbytes RAM (2M per chip)
- Video in video out interface on board
- FPGAs interconnected in fixed wiring pattern
19The Transmogrifier-3
20Ray Tracing System Diagram
- FPGA 0 handles ray-object intersections
- FPGA 1 provides control logic for hierarchy
traversal
21Ray-Triangle Intersection Unit
- Input
- List of triangles
- Set of rays
- Output
- Nearest intersected triangle for each ray
22Intersection Datapath
- Input
- A single triangle definition
- A single ray definition
- Output
- Flag indicating intersection
- Distance to intersection
- Location of intersection within plane of triangle
23Intersection Datapath
- Implements a barycentric1 ray-triangle
intersection algorithm - Fully exploits all algorithmic parallelism
- Pipeline latencies
- 7 cycles to determine if an intersection occurs
- Additional 31 cycles to determine intersection
point - Hardware 28000 LUTs, 11000 flip flops _at_ 50Mhz
1 T. Möller, and B. Trumbore, Fast, Minimum
Storage Ray-Triangle Intersection, The Journal
of Graphics Tools, A. K. Peters, 1997, pp 21-28.
24Hierarchy Tree Traversal
- Traversal Algorithm
- Traverse root node to find potentially visible
children - Add to a depth sorted list
- Traverse breadth first
- Add potentially visible children to list
- When a leaf node is found test all objects within
- If intersection found algorithm is complete
- Otherwise continue traversal of potentially
visible list
25Hierarchy Tree Traversal
- Three different units
- Bounding node sorter
- Depth sorts intersected child nodes
- List handler
- Stores the potentially visible list
- Controller state machine
- Interfaces between user, the ray triangle
intersection unit, the bounding node sort, and
the list handler
26Benchmark Scenes
Textured Sphere Low polygon count 2048 Good
locality
- Landscape A
- High polygon count 51200
- Good locality
27Benchmark Results
- Performs well with high poly count scenes
- Low poly count scenes show fixed overhead of
traversing the hierarchy yield lower returns
28Summary
- A number of configuration modes exist for
transferring data from source to FPGA - Compression scheme like wild carding evaluation
-gt take advantage of 2D regularity to reduce data
transfer times. - Data compression in the form of data word
encoding also effective. - Configuration cloning avoids off-chip I/O by
replicating configuration information already on
device. - Applications such as image processing benefit
from dynamic reconfiguration