Softcore Vector Processor - PowerPoint PPT Presentation

About This Presentation
Title:

Softcore Vector Processor

Description:

Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 37
Provided by: isi110
Learn more at: https://www.isi.edu
Category:

less

Transcript and Presenter's Notes

Title: Softcore Vector Processor


1
Softcore Vector Processor
  • Team ASP
  • Brandon Harris
  • Arpith Jacob

2
Outline
  • Motivation
  • Smith-Waterman
  • Solution
  • System Architecture
  • Overview
  • Functional Unit
  • Instruction Controller
  • Processing Element
  • Memory Controller
  • ISA
  • Results
  • Future Research

3
Motivation
  • Smith-Waterman sequence alignment

4
Motivation
  • Smith-Waterman sequence alignment

5
Motivation
  • Smith-Waterman sequence alignment

6
Motivation
  • Smith-Waterman sequence alignment

7
Motivation
  • Smith-Waterman sequence alignment

8
Motivation
  • Smith-Waterman sequence alignment

9
Motivation
  • Smith-Waterman sequence alignment

10
Motivation
  • Smith-Waterman sequence alignment

11
Motivation
  • Smith-Waterman sequence alignment

12
Motivation
  • Smith-Waterman sequence alignment

13
Motivation
  • Smith-Waterman sequence alignment

14
Motivation
  • Smith-Waterman sequence alignment
  • Similar Problems
  • HMMer, BLAST, RNA Secondary Structure Prediction

15
Our Solution
  • Softcore Vector Processor
  • Massively Parallel
  • Software programmable
  • Configurable Instantiation
  • Why Softcore?
  • Optimize for specific applications
  • Adapt to changes in algorithms
  • FPGA technology improves with time

16
Architectural Overview
  • Streaming Architecture
  • Memory Mapped FIFOs
  • Read Once Data
  • Write Once Data
  • Provides communication between components

Software
DMA
SVP Functional Unit
DMA
Software
SVP Functional Unit
17
Architectural Overview
  • Streaming Architecture
  • Memory Mapped FIFOs
  • Read Once Data
  • Write Once Data
  • Provides communication between components

Software
DMA
SVP Functional Unit
DMA
Software
SVP Functional Unit
18
Functional Unit
Processing Element
Processing Element
Processing Element
Reg File
Reg File
Reg File
Memory Controller
Shared Local Memory
Stream In
Stream Out
19
Instruction Controller
  • SIMD Instruction Broadcast

R5
R5
10
10
addi
R1
addi
R1
addi
10
R5
R1
Processing Element
Processing Element
Processing Element
R0 0 R1 1 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 2 R2 R3 R4 R5
1
2
0
10
11
12
20
Instruction Controller
  • SIMD Instruction Broadcast

R2
Ld
R2
0
Ld
R3
0
R3
R2
0
Ld
R3
Processing Element
Processing Element
Processing Element
R0 0 R1 0 R2 R3 ptr1 R4 R5
R0 0 R1 0 R2 R3 ptr1 R4 R5
R0 0 R1 0 R2 R3 ptr1 R4 R5
ptr1
ptr1
ptr1
21
Instruction Controller
  • SIMD Instruction Broadcast
  • Instruction Register Broadcast
  • 40 Register Savings

R2
Ldir
IR3
R0
Ld
Processing Element
Processing Element
Processing Element
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 R2 R3 R4 R5
ptr1
ptr1
ptr1
22
Instruction Controller
  • SIMD Instruction Broadcast
  • Instruction Register Broadcast
  • 40 Register Savings

R2
R0
ptr1
Ld
Processing Element
Processing Element
Processing Element
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 R2 R3 R4 R5
ptr1
23
Processing Element
bmseti
R17
EQ
16
Ra Addr
Rb Addr
Ra Addr
Rb Addr
Immediate
1
2
Register File
Register File
Ra Data Left
Ra Data Right
Rb Data Left
Rb Data Right
Data Select
16
2
Pipeline Register
Compare
1 1 1 1 1
0
1
0
Pipeline Register
Write Enables
Data
0
0
Wr Enable Left
Wr En Right
Mem Wr Enable
Memory Controller
24
Functional Unit
Reg File
Reg File
Processing Element
Reg File
Processing Element
Processing Element
Memory Controller
Shared Local Memory
Stream In
Stream Out
25
Functional Unit
Reg File
Reg File
Processing Element
Reg File
Processing Element
Processing Element
Memory Controller
Shared Local Memory
Stream In
Stream Out
26
Memory Controller
IC
PE 0-3
Memory Controller
DualPortedBlockRAM
DualPortedBlockRAM
DualPortedBlockRAM
Single Cycle Read
27
Memory Controller
IC
PE 0-3
Memory Controller
DualPortedBlockRAM
DualPortedBlockRAM
DualPortedBlockRAM
Multiple Cycle Write
28
Instruction Set Architecture
  • Custom ISA
  • Two Sets of Instruction Types
  • Instruction Controller
  • Processing Element
  • Optimized for target applications
  • Max, Min, Loop
  • Expandable
  • Core vs. Application Specific

29
Sample Code
  • _query_loop
  • subir r8, r3, ir10
  • nop
  • nop
  • max r4, r4, r8
  • add r3, r19, PE_ZERO_REG
  • bmseti PE_ID_REG EQ PE_NUM_ELEMENTS - 1
  • icaddi ir15, ir8, PE_NUM_ELEMENTS - 1
  • nop
  • nop
  • ldir PE_MEM_REG, PE_ZERO_REG(ir15)
  • nop
  • nop
  • nop
  • nop
  • addi r3, PE_MEM_REG, 0
  • bmend

_query_loop icaddi ir15, ir8,
PE_NUM_ELEMENTS - 1 subir r8, r3,
ir10 add r3, r19, PE_ZERO_REG ldir PE_MEM_REG
, PE_ZERO_REG(ir15) max r4, r4,
r8 bmseti PE_ID_REG EQ PE_NUM_ELEMENTS -
1 icaddi ir7, ir7, 1 icaddi ir9, ir9,
1 addi r3, PE_MEM_REG, 0 bmend ld PE_MEM_REG
, PE_ZERO_REG(DB_ADDRESS) icloop ir4, ir5,
_query_loop
30
Results
  • VHDL Implementation
  • Simulated
  • Synthesized
  • Smith-Waterman
  • 16 PE version tested
  • Millions of Cell Updates Per Second (MCUPS)

31
Smith-Waterman Speedup
System Freq MCUPS Speedup
P4 1.8 GHz 15 1
SVP16 150 MHz 52 3.47
SVP32 150 MHz 103 6.87
SVP64 125 MHz 167 11.13
SVP128 120 MHz 302 20.13
SVP128 150 MHz 378 25.20
32
Comparative Performance
System Freq PEs/Chip MCUPS/PE Chips MCUPS/Chip Cost(1000) MCUPS/1000
SVP128 150 MHz 128 2.95 1 378 5 75
SVP128 120 MHz 128 2.36 1 302 5 60
SVP64 125 MHz 64 2.61 1 167 5 33
SVP32 150 MHz 32 3.22 1 103 5 20
Kestrel 20 MHz 64 0.78 8 50 25 16
GeneMatcher2 192 MHz 192 5.21 16 1000 69 14
Fuzion 150 200 MHz 1536 1.63 1 2500 ? ?
Reference 1 Estimated
33
Performance
  • Hardware
  • Xilinx Vertex 4 VLX200

PEs Freq (MHz) Area BRAM
16 150 13 22
32 150 22 38
64 125 41 70
128 120 80 134
34
Future Work
  • Software Development
  • How can HMMer and other systolic algorithms be
    implemented?
  • ISA Expansion
  • What additional instructions are needed?
  • What instructions can be added to optimize?
  • Hardware Development
  • How can we optimize the hardware to make it
    faster and smaller?
  • What hardware can we add to enhance
    performance?
  • How can we take advantage of advances in FPGAs,
    such as DSP48s?

35
Acknowledgments
  • Special Thanks
  • Young Cho
  • Roger Chamberlain
  • Jeremy Buhler
  • Joseph Lancaster
  • References
  • Di Blas et al, The Kestrel Parallel Processor,
    IEEE Transactions on Parallel and Distributed
    Systems, January 2005
  • A. Jacob et al, Whole Genome Comparison Using
    Commodity Workstations, Technical Report, 2003

36
Questions?
  • Team ASP
  • Brandon Harris
  • Arpith Jacob
Write a Comment
User Comments (0)
About PowerShow.com