Softcore Vector Processor

About This Presentation

Title:

Softcore Vector Processor

Description:

Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 37

Provided by: isi110

Learn more at: https://www.isi.edu

Category:

more less

Transcript and Presenter's Notes

Title: Softcore Vector Processor

1
Softcore Vector Processor

Team ASP
Brandon Harris
Arpith Jacob

2
Outline

Motivation
Smith-Waterman
Solution
System Architecture
Overview
Functional Unit
Instruction Controller
Processing Element
Memory Controller
ISA
Results
Future Research

3
Motivation

Smith-Waterman sequence alignment

4
Motivation

Smith-Waterman sequence alignment

5
Motivation

Smith-Waterman sequence alignment

6
Motivation

Smith-Waterman sequence alignment

7
Motivation

Smith-Waterman sequence alignment

8
Motivation

Smith-Waterman sequence alignment

9
Motivation

Smith-Waterman sequence alignment

10
Motivation

Smith-Waterman sequence alignment

11
Motivation

Smith-Waterman sequence alignment

12
Motivation

Smith-Waterman sequence alignment

13
Motivation

Smith-Waterman sequence alignment

14
Motivation

Smith-Waterman sequence alignment

Similar Problems
HMMer, BLAST, RNA Secondary Structure Prediction

15
Our Solution

Softcore Vector Processor
Massively Parallel
Software programmable
Configurable Instantiation
Why Softcore?
Optimize for specific applications
Adapt to changes in algorithms
FPGA technology improves with time

16
Architectural Overview

Streaming Architecture
Memory Mapped FIFOs
Read Once Data
Write Once Data
Provides communication between components

Software
DMA
SVP Functional Unit
DMA
Software
SVP Functional Unit
17
Architectural Overview

Streaming Architecture
Memory Mapped FIFOs
Read Once Data
Write Once Data
Provides communication between components

Software
DMA
SVP Functional Unit
DMA
Software
SVP Functional Unit
18
Functional Unit
Processing Element
Processing Element
Processing Element
Reg File
Reg File
Reg File
Memory Controller
Shared Local Memory
Stream In
Stream Out
19
Instruction Controller

SIMD Instruction Broadcast

R5
R5
10
10
addi
R1
addi
R1
addi
10
R5
R1
Processing Element
Processing Element
Processing Element
R0 0 R1 1 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 2 R2 R3 R4 R5
1
2
0
10
11
12
20
Instruction Controller

SIMD Instruction Broadcast

R2
Ld
R2
0
Ld
R3
0
R3
R2
0
Ld
R3
Processing Element
Processing Element
Processing Element
R0 0 R1 0 R2 R3 ptr1 R4 R5
R0 0 R1 0 R2 R3 ptr1 R4 R5
R0 0 R1 0 R2 R3 ptr1 R4 R5
ptr1
ptr1
ptr1
21
Instruction Controller

SIMD Instruction Broadcast
Instruction Register Broadcast
40 Register Savings

R2
Ldir
IR3
R0
Ld
Processing Element
Processing Element
Processing Element
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 R2 R3 R4 R5
ptr1
ptr1
ptr1
22
Instruction Controller

SIMD Instruction Broadcast
Instruction Register Broadcast
40 Register Savings

R2
R0
ptr1
Ld
Processing Element
Processing Element
Processing Element
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 R2 R3 R4 R5
ptr1
23
Processing Element
bmseti
R17
EQ
16
Ra Addr
Rb Addr
Ra Addr
Rb Addr
Immediate
1
2
Register File
Register File
Ra Data Left
Ra Data Right
Rb Data Left
Rb Data Right
Data Select
16
2
Pipeline Register
Compare
1 1 1 1 1
0
1
0
Pipeline Register
Write Enables
Data
0
0
Wr Enable Left
Wr En Right
Mem Wr Enable
Memory Controller
24
Functional Unit
Reg File
Reg File
Processing Element
Reg File
Processing Element
Processing Element
Memory Controller
Shared Local Memory
Stream In
Stream Out
25
Functional Unit
Reg File
Reg File
Processing Element
Reg File
Processing Element
Processing Element
Memory Controller
Shared Local Memory
Stream In
Stream Out
26
Memory Controller
IC
PE 0-3
Memory Controller
DualPortedBlockRAM
DualPortedBlockRAM
DualPortedBlockRAM
Single Cycle Read
27
Memory Controller
IC
PE 0-3
Memory Controller
DualPortedBlockRAM
DualPortedBlockRAM
DualPortedBlockRAM
Multiple Cycle Write
28
Instruction Set Architecture

Custom ISA
Two Sets of Instruction Types
Instruction Controller
Processing Element
Optimized for target applications
Max, Min, Loop
Expandable
Core vs. Application Specific

29
Sample Code

_query_loop
subir r8, r3, ir10
nop
nop
max r4, r4, r8
add r3, r19, PE_ZERO_REG
bmseti PE_ID_REG EQ PE_NUM_ELEMENTS - 1
icaddi ir15, ir8, PE_NUM_ELEMENTS - 1
nop
nop
ldir PE_MEM_REG, PE_ZERO_REG(ir15)
nop
nop
nop
nop
addi r3, PE_MEM_REG, 0
bmend

_query_loop icaddi ir15, ir8,
PE_NUM_ELEMENTS - 1 subir r8, r3,
ir10 add r3, r19, PE_ZERO_REG ldir PE_MEM_REG
, PE_ZERO_REG(ir15) max r4, r4,
r8 bmseti PE_ID_REG EQ PE_NUM_ELEMENTS -
1 icaddi ir7, ir7, 1 icaddi ir9, ir9,
1 addi r3, PE_MEM_REG, 0 bmend ld PE_MEM_REG
, PE_ZERO_REG(DB_ADDRESS) icloop ir4, ir5,
_query_loop
30
Results

VHDL Implementation
Simulated
Synthesized
Smith-Waterman
16 PE version tested
Millions of Cell Updates Per Second (MCUPS)

31
Smith-Waterman Speedup
System Freq MCUPS Speedup
P4 1.8 GHz 15 1
SVP16 150 MHz 52 3.47
SVP32 150 MHz 103 6.87
SVP64 125 MHz 167 11.13
SVP128 120 MHz 302 20.13
SVP128 150 MHz 378 25.20
32
Comparative Performance
System Freq PEs/Chip MCUPS/PE Chips MCUPS/Chip Cost(1000) MCUPS/1000
SVP128 150 MHz 128 2.95 1 378 5 75
SVP128 120 MHz 128 2.36 1 302 5 60
SVP64 125 MHz 64 2.61 1 167 5 33
SVP32 150 MHz 32 3.22 1 103 5 20
Kestrel 20 MHz 64 0.78 8 50 25 16
GeneMatcher2 192 MHz 192 5.21 16 1000 69 14
Fuzion 150 200 MHz 1536 1.63 1 2500 ? ?
Reference 1 Estimated
33
Performance

Hardware
Xilinx Vertex 4 VLX200

PEs Freq (MHz) Area BRAM
16 150 13 22
32 150 22 38
64 125 41 70
128 120 80 134
34
Future Work

Software Development
How can HMMer and other systolic algorithms be
implemented?
ISA Expansion
What additional instructions are needed?
What instructions can be added to optimize?
Hardware Development
How can we optimize the hardware to make it
faster and smaller?
What hardware can we add to enhance
performance?
How can we take advantage of advances in FPGAs,
such as DSP48s?

35
Acknowledgments

Special Thanks
Young Cho
Roger Chamberlain
Jeremy Buhler
Joseph Lancaster
References
Di Blas et al, The Kestrel Parallel Processor,
IEEE Transactions on Parallel and Distributed
Systems, January 2005
A. Jacob et al, Whole Genome Comparison Using
Commodity Workstations, Technical Report, 2003

Softcore Vector Processor - PowerPoint PPT Presentation

Softcore Vector Processor

Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob – PowerPoint PPT presentation