VIPERS II: A Soft-core Vector Processor with Single-copy Data Scratchpad Memory - PowerPoint PPT Presentation

About This Presentation
Title:

VIPERS II: A Soft-core Vector Processor with Single-copy Data Scratchpad Memory

Description:

VIPERS II: A Soft-core Vector Processor with Single-copy Data Scratchpad Memory Christopher Han-Yu Chou Supervisor: Dr. Guy Lemieux – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 35
Provided by: chris1225
Category:

less

Transcript and Presenter's Notes

Title: VIPERS II: A Soft-core Vector Processor with Single-copy Data Scratchpad Memory


1
VIPERS II A Soft-core Vector Processor with
Single-copy Data Scratchpad Memory
  • Christopher Han-Yu Chou
  • Supervisor Dr. Guy Lemieux

2
Outline
  • Motivation
  • New Pipeline Structure
  • VIPERS II Architecture
  • Results
  • Conclusion

3
Motivation
  • VIPERS soft vector processor provides scalable
    performance for data-parallel applications on
    FPGAs
  • Original VIPERS has a few shortcomings
  • High latency for copying data from memory to
    register file
  • Duplicate copies of data in precious on-chip
    memory
  • Scalar core not pipelined, and has no debug-core

4
Duplicate Copies of Data
  • VIPERS uses dual read-port vector register file
  • 2 identical copies of the register file
  • Plus an original copy of data in on-chip memory
  • These data duplicates are wasteful
  • Limited on-chip memory capacity
  • Todays FPGA offers fast on-chip memories. Why
    not access the memory directly?

5
Contribution
  • Use address registers and scratchpad memory to
    replace vector register file
  • Eliminate slow load/store operations
  • More efficient on-chip memory usage
  • Auto-increment/decrement and circular buffer
    features
  • Reduce need for loop unrolling
  • Lower loop overhead

6
Outline
  • Motivation
  • New Pipeline Structure
  • VIPERS II Architecture
  • Results
  • Conclusion

7
New Pipeline Structure
  • Classic 5-stage pipeline
  • Swap the execution stage with the memory access
    stage

8
Implementation
  • The data register file is replaced by address
    registers and a scratchpad memory.
  • Eliminates load/store when data set fits in
    scratchpad memory.

9
VIPERS II ISA
10
Outline
  • Motivation
  • New Pipeline Structure
  • VIPERS II Architecture
  • Results
  • Conclusion

11
VIPERS II Architecture
12
Architectural Changes
  • Vector address registers
  • Vector scratchpad memory
  • Data alignment crossbar network (DACN)
  • Fracturable ALUs

13
Vector Address Registers
  • Features auto post-increment, pre-decrement, and
    circular buffer modes
  • Reduce loop overheads
  • Require less address registers than data
    registers to implement an application

14
Vector Address Register
15
Vector Scratchpad Memory
  • Reduced load/store latencies with simpler memory
    interface
  • Operate at 2X clock

16
Vector Scratchpad Memory
  • Efficient data storage
  • Flexible data set size restriction
  • e.g. Median filter benchmark with byte-size data

17
Data Alignment Crossbar Network
  • With vector lanes coupled directly to memory,
    input vectors must be aligned
  • For misaligned operands, vector move instruction
    (vmov) is used to move data into alignment

18
Example
19
Data Alignment Crossbar Network
  • Implemented with multistage switching network to
    trade off performance for area

20
Fracturable ALUs
  • Data elements are stored in their natural length
  • Fracturable ALUs are used to execute on operands
    with varying widths

21
Fracturable ALUs
22
Fracturable ALUs
  • Increased processing power
  • 4-Lane VIPERS II operating on byte-size data is
    equivalent to having a 16 lanes

23
Outline
  • Motivation
  • New Pipeline Structure
  • VIPERS II Architecture
  • Results
  • Conclusion

24
Resource Usage
25
Simulated Performance
26
Hardware Performance
27
Future Work
  • Increase operating frequency
  • Implement strided and indexed moves
  • Implement DACN with Omega network
  • Alternative implementation of address register

28
Related Works
  • VESPA (Rose, CASES08) and VIPERS (Lemieux,
    FPGA08) are two previous soft-core vector
    processors
  • VIPERS II uses vector scratchpad memory instead
    of register file
  • IBMs CELL processor (Pham, ISSCC05) features
    SRAM scratchpad memory populated by DMA
  • VIPERS II does not require load/store operations
  • Register pointer architecture (Dally, DATE07)
    reduces need for loop unrolling by dynamically
    changing the register pointer
  • VIPERS II is the first vector processor to
    utilize this technique

29
Conclusion
  • VIPERS II architecture provides many advantages
  • Improve performance by eliminating slow
    load/store operations
  • Achieve unrolled performance without unrolling
  • Efficient usage of on-chip memory
  • Increased processing power when executing smaller
    operands

30
Thank you
31
Vector Scratchpad Memory
  • e.g. Largest median filter that can be realized
    given a 64kb memory budget

32
Implementation
33
Strided/Indexed Access
  • Strided/indexed loads are replaced by
    strided/indexed move operations.
  • Similar to vmov, strided move vmovs simply
    moves scattered elements to contiguous locations
    in the memory.
  • e.g. vmovs vA1, vA0, vstride0

34
Permutation Requirement
Write a Comment
User Comments (0)
About PowerShow.com