Title: Soft Multiprocessor Systems for Network Applications
1Soft Multiprocessor Systems for Network
Applications
- Yujia Jin, Will Plishker, Kaushik Ravindran,
Nadathur Satish, Kurt Keutzer - University of California, Berkeley
FPGA 2005 Architectures and Circuits
2Networking Trends
- High performance requirements
- Compute at line-rates
- Programmability requirements
- Support multiple services
Packet forwarding
Encryption
Packet reordering
Multicast
Address Translation
Quality of Service
3What is the ideal platform?
Our focus
4FPGAs LUTs or Processors?
- Conventional approach -
- HDL to LUTs
- Create hardware models
- Map application onto LUTs
HDL
C
- Alternatively -
- program a multiprocessor
- Create network of soft processors on FPGA
- Execute C program in multiprocessor system
5Soft processors Building Blocks for FPGA designs
- Disadvantages
- Performance loss due to software implementation
- Less efficient usage of FPGA area
- Advantages
- Software implementation is quick and easy!
- Open FPGAs to the larger world of software
designers
- Driving questions
- How much performance do soft multiprocessors
lose? - Is the loss in performance worth gains in design
time?
6Case Study IPv4 Packet Forwarding
2-port router (2 Gbps)
Xilinx Virtex-II Pro FPGA (2VP30)
IP Lookup longest prefix match (trie lookup
algorithm)
7Multiprocessor for Header Processing
On-chip peripheral bus
FIFO queues
MicroBlaze Soft Processor
IPv4 data plane header processing 1.80 Gbps
throughput with 64 byte packets
8Design Characteristics
9Conclusions
- Soft multiprocessors advantageous for easy
application deployment - Quick way to get working implementation
- Some algorithms are much easier in software
- Critical components can be migrated to hardware
- Software allows high extensibility
- Extension to NAT took only 75 more hours
- Modern FPGAs have sufficient capacity to build
interesting multiprocessors - 30-40 processors, on-chip memory, diverse
interconnection schemes, peripheral support
10Looking Ahead The Mapping Problem
Application description
High-level optimizations
Task graph (platform specific)
Profile
Architecture configuration
HW / SW partitioning Task allocation Data
layout Communication assignment
Compilation / Synthesis
11Design Space for Packet Streaming
Array architecture
Packet streaming pattern
?Blaze
?Blaze
?Blaze
?Blaze
?Blaze
source
sink
number of parallel processors in each stages
?Blaze
?Blaze
?Blaze
?Blaze
?Blaze
- A steady flow of packets is supplied into the
forwarding unit - Packets are independent of each other
?Blaze
?Blaze
?Blaze
number of pipeline stages
12ILP Formulation for Design Space Exploration
- Treat the problem as a flow problem
- Model a processor as a node with a flow rate ti
- Maximize overall throughput
- Calculate throughput for each stage
- Set overall throughput to be minimum of all stage
throughput - Add constraints to reflect board and architecture
limitations
Throughput1 t1 number_of_processor1 Throughpu
t2 t2 number_of_processor2 Throughputn
tn number_of_processorn Overall_throught lt
Throughput1, , Throughputn Board and
architecture constraints Maximize(Overall_through
put)