Title: SCORE
1SCORE
StreamComputationsOrganized forReconfigurableE
xecution
Eylon Caspi, Michael Chu, Randy Huang, Joseph
Yeh, John WawrzynekUniversity of California,
Berkeley BRASS group André DeHonCalifornia
Institute of Technology Dept. Computer Science
http//brass.cs.berkeley.edu/SCORE/
2Goal Software Survival
- Software for microprocessors survives on new
devices - Binary compatibility
- Automatic improvement
- Software for reconfigurable devices does not
- Substantial effort to port/redeploy
3Outline
- Problem Software Survival
- A New Compute Model
- SCORE Components
- Preliminary Results
- Future Work
4Why Cant Reconfig. Software Survive?
- Resource constraints/sizes are exposed
- to programmer
- in low-level representation (netlist)
- Design revolves around device size
- Algorithmic structure
- Exploited parallelism
5The SCORE Approach
- A compute model with unbounded resources
- Efficient hardware virtualization
- Demand paging
6Page-Compatible Devices
- Family of devices with
- Common page definition
- Varying number of pages
- Binary Compatibility
- Automatic Performance Improvement
7Virtualizing a Netlist (is bad)
- Netlist is sensitive to timing
- Disallow asynchronous features (e.g. busses)
- Synchronous
- WASMII LingAmano, FCCM 93
- Page I/O via registers
- Execute each cycle of every page
- Hugereconfigurationoverhead!
8Previous Attempts at Virtualization
- Multi-context
- DPGA DeHon, FPGA 94
- TM-FPGA Xilinx, FCCM 97
- Configuration Cache
- Striped
- PipeRench CMU, FPGA 98
- Pipelined reconfiguration
- Restricted to feed-forward pipelines
9Streams
- Goal
- Less frequent reconfiguration
- Batch process block of inputs
- Amortize reconfiguration cost over large data set
10Stream Implementation
- Only one endpoint (page) loaded
- Stream memory buffer
- Desire distributed, on-chip memory
- Both endpoints (pages) loaded
- Stream wire
11Execution Example Spatial
12Execution Example Time-Multiplexed
13SCORE Components
14SCORE Compute Model
- Computation graph of compute nodes
- Concretely compute pages
- Abstractly operators with local state (FSM)
- Communication streaming data flow
- Storage
- Streams
- Memory segments,accessed through streams
15SCORE Hardware Model
- Paged FPGA
- Compute Page (CP)
- Fixed-size slice of RC hardware
- Fixed number of I/O ports
- Distributed, on-chip memory
- Configurable Memory Block (CMB)
- Stream access
- High-level interconnect
- Microprocessor
- Run-time support user code
16SCORE Run-Time Support
- Mechanics of run-time reconfiguration
- Page swap context save/load
- Reconfigure interconnect
- Page Scheduling
- Which page to run where, when
- Static Dynamic
17Functional Simulation
- FPGA based on HSRA Berkeley, FPGA 99
- CP 512 4-LUTs
- CMB 2Mbit DRAM
- Area for CP-CMB pair
- Page reconfiguration 5000 cycles (from CMB)
- Synchronous operation (same clock speed as
processor) - x86 microprocessor
- Page Scheduler task
- Swap on timer interrupt (every 250,000 cycles)
- Fully dynamic scheduling
18Applications
- Multimedia processing applications
- Hand-partitioned into 512-LUT pages
- Good applications
- Primarily feed-forward (feedback loops fit in
HW) - Bad applications
- Large, tight feedback loops (e.g. ADPCM)
19Application JPEG Encode
20Scaling Results JPEG Encode
Total Time (Makespan in millions of cycles)
Physical Compute Pages
21Summary
- SCORE enables software survival on reconfigurable
systems - Binary compatibility
- Automatic performance scaling
- Virtual Hardware
- Requirements
- Graph-based compute model
- Paged FPGA hardware
- Run-time support for RTR/Scheduling
22Future Work
- Compilation/CAD
- Partitioning FSM operators into pages
- Study architectural parameters
- Page size
- CMB size
- Tolerable reconfiguration time
- Scheduling
- Static scheduling
23More Info on the Web
- SCORE project
- http//brass.cs.berkeley.edu/SCORE/
- Tutorial
- http//brass.cs.berkeley.edu/documents/ score
_tutorial.html