ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II

Description:

Need to compress bit streams to allow for rapid reconfiguration ... Compress information at the source (e.g. local workstation), decompress using ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 29
Provided by: RussTe7
Category:

less

Transcript and Presenter's Notes

Title: ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II


1
ECE 697FReconfigurable ComputingLecture
18Dynamic Reconfiguration II
2
Overview
  • How an FPGA is configured is an important issue.
    Many techniques /possibilities.
  • Existing approaches have serious limitations.
    Need to compress bit streams to allow for rapid
    reconfiguration
  • First approach wild card -gt write multiple
    configurable cells at the same time.
  • Second approach run length encoding -gt attempt
    to take advantage of regularity in configuration
    stream.

3
Compression Techniques
  • Effectively we can consider an FPGA device as a
    collection of cells, each with (x, y) location.
  • Instead of using a serial bit stream, could
    consider loading data cell-by-cell like a
    standard memory.
  • Specify location of cell through use of two
    registers.

Row
4
Wild Card Registers
  • In many FPGA devices created recently it is
    possible to write multiple cells at the same
    time.
  • X in wild card position indicates multiple
    writes to a row/col
  • Key is to minimize total writes to allow for data
    compression.
  • What does this look like.

5
Dealing with Dont Touches
All eight can be covered with just one write
  • Some locations shouldnt be overwritten at all
    due to previous constraints.
  • Others can be written with one value and then
    overwritten with another

6
Parallels to Logic Minimization
  • Zeros used to represent Dont Touch cases.
  • Every time I solve for a specific configuration
    more Dont Touches will be created.
  • Ordering of writes will have an impact on the
    number of Dont Cares under consideration.
  • Maximizing Dont Cares a positive step.

7
Modeling the Problem as Logic Optimization
  • Order configuration groups by frequency of
    occurrence.
  • Rather than truth table, represent grid locations
    as cubes both for on-set and dont care set.
  • Logic optimization packages like Espresso
    effective in minimizing cubes
  • Note cubes that cover most locations
  • 1 00-- 1
  • 0001 1 ----? -000 1
  • 0010 1
  • 0011 1
  • 0000 -

8
Wild Card Minimization Sequence
  • Read in file and group all addresses with same
    value.
  • Sort groups in decreasing order of number of
    addresses to be written
  • Pick first group and label addresses as on-set
    including unoccupied locations
  • Run logic minimization (Espresso) on group.
  • Pick Espresso cube that covers most unoccupied
    addresses
  • Reinsert other addresses back in queue
  • Iterate

9
Minimization Ordering
  • Clearly selecting configuration 6 first is
    advantageous
  • 3 writes needed to cover this configuration.
  • Some values never benefit from Dont Cares due to
    presence of Dont Touches
  • 3 cycles needed for each write. Might be possible
    to show additional optimization?

10
Wild Card Minimization Results
  • Sizable reduction in overall writes needed only
    17 of original needed
  • Implications for power?
  • If only a portion of the design changed not clear
    of benefit.

11
Alternate Approach
  • Rather than using wild card register, instead
    consider encoding information.
  • Compress information at the source (e.g. local
    workstation), decompress using hardware embedded
    in the device.
  • Takes advantage of sequences of information that
    are regular.
  • Independent of decode hardware inside the FPGA
    (e.g. row, column, wild card registers).

12
Run-length Compression
  • Send a sequence of values as a collection
    of three pieces of data
  • 100, 103, 106, 109, 112
  • Base 100 offset 3 length 4
  • Primarily useful for addresses but may be
    applicable to data as well.
  • Constrained by data sizes for the three values.

13
Lempel-Ziv Compression
  • Run-length encoding deals with repetition of a
    single value
  • Lempel-Ziv deals with repetition of a number of
    values up to some window size.
  • example CBADAFAL
  • if next three pieces are A, B, M
  • We can use a codeword pointer 3

  • length 2
  • last
    symbol M
  • Also CBADAFAL next 13 pieces of data
  • BCBCBCBCBCBCD
  • pointer2 length12

14
Ordering of Values
  • Clearly some sequences are better aligned than
    others
  • ABCDABCBAC
  • This reordering can be made adaptive based on
    tuning parameters derived from a number of
    similar files
  • e.g. Many repeating sequences of a certain
  • type should be examined abcabc
  • May vary from file to file

15
Hardware Support for Runlength
  • Initially latch in base
  • Down counter indicates number of strides to take.
  • Offset used to augment initial base
  • Fairly simple to implement.

16
Hardware for Lempel-Ziv
  • Down counter stores stride length
  • Register window holds repeated data
  • Pointer extracts appropriate data value.
  • Last symbol included when count is 0

17
Results
  • Compression ratio measures amount of compression
    from original data stream
  • Adaptive reorder is most effective
  • Last column is wild card approach, compares
    favorably with other approaches

18
The Transmogrifier-3 (2000-2003)
  • Four Xilinx Virtex 2000Es
  • 38K LUTs each, total of 150K LEs
  • 8 Mbytes RAM (2M per chip)
  • Video in video out interface on board
  • FPGAs interconnected in fixed wiring pattern

19
The Transmogrifier-3
20
Ray Tracing System Diagram
  • FPGA 0 handles ray-object intersections
  • FPGA 1 provides control logic for hierarchy
    traversal

21
Ray-Triangle Intersection Unit
  • Input
  • List of triangles
  • Set of rays
  • Output
  • Nearest intersected triangle for each ray

22
Intersection Datapath
  • Input
  • A single triangle definition
  • A single ray definition
  • Output
  • Flag indicating intersection
  • Distance to intersection
  • Location of intersection within plane of triangle

23
Intersection Datapath
  • Implements a barycentric1 ray-triangle
    intersection algorithm
  • Fully exploits all algorithmic parallelism
  • Pipeline latencies
  • 7 cycles to determine if an intersection occurs
  • Additional 31 cycles to determine intersection
    point
  • Hardware 28000 LUTs, 11000 flip flops _at_ 50Mhz

1 T. Möller, and B. Trumbore, Fast, Minimum
Storage Ray-Triangle Intersection, The Journal
of Graphics Tools, A. K. Peters, 1997, pp 21-28.
24
Hierarchy Tree Traversal
  • Traversal Algorithm
  • Traverse root node to find potentially visible
    children
  • Add to a depth sorted list
  • Traverse breadth first
  • Add potentially visible children to list
  • When a leaf node is found test all objects within
  • If intersection found algorithm is complete
  • Otherwise continue traversal of potentially
    visible list

25
Hierarchy Tree Traversal
  • Three different units
  • Bounding node sorter
  • Depth sorts intersected child nodes
  • List handler
  • Stores the potentially visible list
  • Controller state machine
  • Interfaces between user, the ray triangle
    intersection unit, the bounding node sort, and
    the list handler

26
Benchmark Scenes
Textured Sphere Low polygon count 2048 Good
locality
  • Landscape A
  • High polygon count 51200
  • Good locality

27
Benchmark Results
  • Performs well with high poly count scenes
  • Low poly count scenes show fixed overhead of
    traversing the hierarchy yield lower returns

28
Summary
  • A number of configuration modes exist for
    transferring data from source to FPGA
  • Compression scheme like wild carding evaluation
    -gt take advantage of 2D regularity to reduce data
    transfer times.
  • Data compression in the form of data word
    encoding also effective.
  • Configuration cloning avoids off-chip I/O by
    replicating configuration information already on
    device.
  • Applications such as image processing benefit
    from dynamic reconfiguration
Write a Comment
User Comments (0)
About PowerShow.com