Title: Softcore Vector Processor
1Softcore Vector Processor
- Team ASP
- Brandon Harris
- Arpith Jacob
2Outline
- Motivation
- Smith-Waterman
- Solution
- System Architecture
- Overview
- Functional Unit
- Instruction Controller
- Processing Element
- Memory Controller
- ISA
- Results
- Future Research
3Motivation
- Smith-Waterman sequence alignment
4Motivation
- Smith-Waterman sequence alignment
5Motivation
- Smith-Waterman sequence alignment
6Motivation
- Smith-Waterman sequence alignment
7Motivation
- Smith-Waterman sequence alignment
8Motivation
- Smith-Waterman sequence alignment
9Motivation
- Smith-Waterman sequence alignment
10Motivation
- Smith-Waterman sequence alignment
11Motivation
- Smith-Waterman sequence alignment
12Motivation
- Smith-Waterman sequence alignment
13Motivation
- Smith-Waterman sequence alignment
14Motivation
- Smith-Waterman sequence alignment
- Similar Problems
- HMMer, BLAST, RNA Secondary Structure Prediction
15Our Solution
- Softcore Vector Processor
- Massively Parallel
- Software programmable
- Configurable Instantiation
- Why Softcore?
- Optimize for specific applications
- Adapt to changes in algorithms
- FPGA technology improves with time
16Architectural Overview
- Streaming Architecture
- Memory Mapped FIFOs
- Read Once Data
- Write Once Data
- Provides communication between components
Software
DMA
SVP Functional Unit
DMA
Software
SVP Functional Unit
17Architectural Overview
- Streaming Architecture
- Memory Mapped FIFOs
- Read Once Data
- Write Once Data
- Provides communication between components
Software
DMA
SVP Functional Unit
DMA
Software
SVP Functional Unit
18Functional Unit
Processing Element
Processing Element
Processing Element
Reg File
Reg File
Reg File
Memory Controller
Shared Local Memory
Stream In
Stream Out
19Instruction Controller
- SIMD Instruction Broadcast
R5
R5
10
10
addi
R1
addi
R1
addi
10
R5
R1
Processing Element
Processing Element
Processing Element
R0 0 R1 1 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 2 R2 R3 R4 R5
1
2
0
10
11
12
20Instruction Controller
- SIMD Instruction Broadcast
R2
Ld
R2
0
Ld
R3
0
R3
R2
0
Ld
R3
Processing Element
Processing Element
Processing Element
R0 0 R1 0 R2 R3 ptr1 R4 R5
R0 0 R1 0 R2 R3 ptr1 R4 R5
R0 0 R1 0 R2 R3 ptr1 R4 R5
ptr1
ptr1
ptr1
21Instruction Controller
- SIMD Instruction Broadcast
- Instruction Register Broadcast
- 40 Register Savings
R2
Ldir
IR3
R0
Ld
Processing Element
Processing Element
Processing Element
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 R2 R3 R4 R5
ptr1
ptr1
ptr1
22Instruction Controller
- SIMD Instruction Broadcast
- Instruction Register Broadcast
- 40 Register Savings
R2
R0
ptr1
Ld
Processing Element
Processing Element
Processing Element
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 0 R2 R3 R4 R5
R0 0 R1 R2 R3 R4 R5
ptr1
23Processing Element
bmseti
R17
EQ
16
Ra Addr
Rb Addr
Ra Addr
Rb Addr
Immediate
1
2
Register File
Register File
Ra Data Left
Ra Data Right
Rb Data Left
Rb Data Right
Data Select
16
2
Pipeline Register
Compare
1 1 1 1 1
0
1
0
Pipeline Register
Write Enables
Data
0
0
Wr Enable Left
Wr En Right
Mem Wr Enable
Memory Controller
24Functional Unit
Reg File
Reg File
Processing Element
Reg File
Processing Element
Processing Element
Memory Controller
Shared Local Memory
Stream In
Stream Out
25Functional Unit
Reg File
Reg File
Processing Element
Reg File
Processing Element
Processing Element
Memory Controller
Shared Local Memory
Stream In
Stream Out
26Memory Controller
IC
PE 0-3
Memory Controller
DualPortedBlockRAM
DualPortedBlockRAM
DualPortedBlockRAM
Single Cycle Read
27Memory Controller
IC
PE 0-3
Memory Controller
DualPortedBlockRAM
DualPortedBlockRAM
DualPortedBlockRAM
Multiple Cycle Write
28Instruction Set Architecture
- Custom ISA
- Two Sets of Instruction Types
- Instruction Controller
- Processing Element
- Optimized for target applications
- Max, Min, Loop
- Expandable
- Core vs. Application Specific
29Sample Code
- _query_loop
- subir r8, r3, ir10
- nop
- nop
- max r4, r4, r8
- add r3, r19, PE_ZERO_REG
- bmseti PE_ID_REG EQ PE_NUM_ELEMENTS - 1
- icaddi ir15, ir8, PE_NUM_ELEMENTS - 1
- nop
- nop
- ldir PE_MEM_REG, PE_ZERO_REG(ir15)
- nop
- nop
- nop
- nop
- addi r3, PE_MEM_REG, 0
- bmend
_query_loop icaddi ir15, ir8,
PE_NUM_ELEMENTS - 1 subir r8, r3,
ir10 add r3, r19, PE_ZERO_REG ldir PE_MEM_REG
, PE_ZERO_REG(ir15) max r4, r4,
r8 bmseti PE_ID_REG EQ PE_NUM_ELEMENTS -
1 icaddi ir7, ir7, 1 icaddi ir9, ir9,
1 addi r3, PE_MEM_REG, 0 bmend ld PE_MEM_REG
, PE_ZERO_REG(DB_ADDRESS) icloop ir4, ir5,
_query_loop
30Results
- VHDL Implementation
- Simulated
- Synthesized
- Smith-Waterman
- 16 PE version tested
- Millions of Cell Updates Per Second (MCUPS)
31Smith-Waterman Speedup
System Freq MCUPS Speedup
P4 1.8 GHz 15 1
SVP16 150 MHz 52 3.47
SVP32 150 MHz 103 6.87
SVP64 125 MHz 167 11.13
SVP128 120 MHz 302 20.13
SVP128 150 MHz 378 25.20
32Comparative Performance
System Freq PEs/Chip MCUPS/PE Chips MCUPS/Chip Cost(1000) MCUPS/1000
SVP128 150 MHz 128 2.95 1 378 5 75
SVP128 120 MHz 128 2.36 1 302 5 60
SVP64 125 MHz 64 2.61 1 167 5 33
SVP32 150 MHz 32 3.22 1 103 5 20
Kestrel 20 MHz 64 0.78 8 50 25 16
GeneMatcher2 192 MHz 192 5.21 16 1000 69 14
Fuzion 150 200 MHz 1536 1.63 1 2500 ? ?
Reference 1 Estimated
33Performance
- Hardware
- Xilinx Vertex 4 VLX200
PEs Freq (MHz) Area BRAM
16 150 13 22
32 150 22 38
64 125 41 70
128 120 80 134
34Future Work
- Software Development
- How can HMMer and other systolic algorithms be
implemented? - ISA Expansion
- What additional instructions are needed?
- What instructions can be added to optimize?
- Hardware Development
- How can we optimize the hardware to make it
faster and smaller? - What hardware can we add to enhance
performance? - How can we take advantage of advances in FPGAs,
such as DSP48s?
35Acknowledgments
- Special Thanks
- Young Cho
- Roger Chamberlain
- Jeremy Buhler
- Joseph Lancaster
- References
- Di Blas et al, The Kestrel Parallel Processor,
IEEE Transactions on Parallel and Distributed
Systems, January 2005 - A. Jacob et al, Whole Genome Comparison Using
Commodity Workstations, Technical Report, 2003
36Questions?
- Team ASP
- Brandon Harris
- Arpith Jacob