Title: SPREE Tutorial
1SPREE Tutorial
- Peter Yiannacouras
- April 13, 2006
2Processors on FPGAs
- You all used FPGAs (ECE241)
- Adders
- 7-segment decoders
- Etc.
- We are putting whole microprocessors on them
- We call these soft processors
3Hard Versus Soft Processors
- Soft Processor
- Written in HDL
- Programmed onto chip
- Hard Processors
- Made of transistors
- Costs millions to make
Verilog
Faster Smaller Less Power
4Processors and FPGA Systems
- FPGAs are a common platform for digital systems
UART
Memory Interface
Soft Processor
Custom Logic
Ethernet
- Performs coordination and even computation
- Better processors gt less hardware to design
5Our Research Problem
- Soft processors have worse
- Area
- Speed
- Power
- But are
- Flexible
use to counteract HOW???
Customize the processors architecture ie. Intel
vs AMD ie. Motorola 68360 vs 68010 HOW????
6Research Goals
- Understand tradeoffs in soft processors
- Eg. A hardware multiplier is big but can perform
multiplies fast - Customize it to the application
- Eg. Bubble sort doesnt use multiplies, therefore
remove hardware multiplier and save on area
7SPREE System (Soft Processor Rapid Exploration
Environment)
- Input Processor description
- Verify ISA against datapath
Verilog
- Output Synthesizable Verilog
8Input Instruction Set Architecture (ISA)
Description
- Graph of Generic Operations (GENOPs)
- Edges indicate flow of data
MIPS ADD add rd, rs, rt
FETCH
SPREE
RFREAD
RFREAD
ADD
RFWRITE
9Input Datapath Description
- Interconnection of hand-coded components
- Allows efficient synthesis
- Described using C
Ifetch
Reg File
Ifetch
Reg File
SPREE
Mul
Data Mem
Mul
ALU
Shifter
Write Back
ALU
SPREE Component Library
10Component Selection
- Select by name
- Names looked up in library
- Stored in cpugen/rtl_lib
RTLComponent ifetchnew RTLComponent("ifetch") R
TLComponent reg_filenew RTLComponent("reg_file")
11Datapath Wiring Example
dst a_reg a_data b_reg
b_data writedata
Regfile
proc.addConnection(ifetch,"rs",reg_file,"a_reg")
proc.addConnection(ifetch,"rt",reg_file,"b_reg")
12SPREE System Backend (Soft Processor Rapid
Exploration Environment)
SPREE generator (spegen)
Verilog
Processor Description
Benchmarks
Quartus II CAD Software (specadflow)
Modelsim Verilog Simulator (spebenchmark)
Mint MIPS Simulator (simulator/run)
4. Cycle Count
1. Area 2. Clock Frequency 3. Power
Compare traces
?
?
13Walking through an Example (see README.txt)
- Choose a pre-built processor
- cpugen/src/arch lists all the processors
- Lets choose pipe3_serialshift
- 3-stage pipeline with serial shifter
14Using SPREE on a Processor
- Generate, benchmark, synthesize
spegen pipe3_serialshift spebenchmark
pipe3_serialshift specadflow
pipe3_serialshift specompare pipe3_serialshift
? Generates Verilog
? Runs benchmarks
? Synthesizes processor
? Display results
15spegen Generating Processors
- Input Processor description
- Syntax spegen ltprocessor namegt
- Output
- A folder named after the processor
- Hand-coded Verilog modules
- system.v
- Generated hookup and control
- OUT.cpugen
- stages per instruction
- Hazard window/branch penalty
- test_bench.v
- test bench for Modelsim simulation
16Benchmarking
- Run programs on the processor
- Measure time taken till completion
- Verify functionality
- Can do this without knowing anything about the
benchmarks themselves
17spebenchmark Benchmarking
- Input Processor implementation
- Syntax spebenchmark ltprocessorgt
- Output (ideally)
- Cycle counts of all benchmarks
- Traces /tmp/modelsim_trace.txt
Benchmarking pipe3_serialshift
Simulating bubble_sort ...
Success! Cycle count2994 Simulating crc ...
Success! Cycle count112750 Simulating
des ... Success! Cycle
count5129 Simulating fft ...
Success! Cycle count5077 Simulating fir ...
Success! Cycle count1214 ...
18Benchmarking under the hood
C source benchmarks
Compiler (gcc - MIPS)
Binary Executable
spebenchmark
Modelsim Verilog Simulator (spebenchmark)
Mint MIPS Simulator (simulator/run)
Compare traces
Trace
Trace
Cycle Count
/tmp/modelsim_trace.txt
applications/ltbenchmark namegt/mint
?
?
/tmp/modelsim_store_trace.txt
19specompiler - Setup compiler
- Choose the path to your compiler (prebuilt)
- Default /jayar/b/b0/yiannac/spe/compiler
- GCC 3.3.3, software division
- Another /jayar/b/b0/yiannac/spe/compiler-softmul
- GCC 3.3.3, software division and software
multiplication - specompiler will
- Compile all benchmarks (and store binaries)
- Simulate all benchmarks (and store traces)
specompiler /jayar/b/b0/yiannac/spe/compiler-sof
tmul
After this point, you can just run spebenchmark
20spebenchmark - failure
- Shows discrepancy between MINT and Modelsim
Benchmarking pipe3_serialshift
Simulating bubble_sort ...
Error Trace does not match, Cycle
count381 Discrepancy found at
6800000 ps Modelsim PC04000064 IR24090001
05 00000000 Mint PC040000b8
IR8c47004c 07 00000064
value being written
Clues to where the error occurred
destination register
21spebenchmark - waveforms
- Can see any signal within the processor
sim_gui bubble_sort pipe3_serialshift
22Modelsim
- LEARN IT!!!
- Quartus Simulator is vastly inferior, and even
unusable for our purposes
23The Testbench (test_bench.v)
- What is it?
- The stimulus and monitor for your circuit
- SPREE automatically generates
- And hence it works right away
- Handcoding your own processor means
- You have to interface with the test bench
- Once you have the testbench you can use
spebenchmark
24Manual Interfacing with the Testbench
- Need only 6 wires
- To track writes to register file and data mem
test_bench.v regfile_we regfile_dst regfile_d
ata datamem_we datamem_addr datamem_data
Your soft processor
25SPREE System Backend (Soft Processor Rapid
Exploration Environment)
SPREE generator (spegen)
Verilog
Processor Description
Benchmarks
Quartus II CAD Software (specadflow)
Modelsim Verilog Simulator (spebenchmark)
Mint MIPS Simulator (simulator/run)
4. Cycle Count
1. Area 2. Clock Frequency 3. Power
Compare traces
?
?
26specadflow Synthesis
- Input Processor implementation
- Syntax specadflow ltprocessor namegt
- Performs a seed sweep
- Average several runs since results are noisy
- Run several instances of quartus
- Across several machines in parallel
27specadflow Output
- Output
- Synthesis results (hidden)
- Summary output
Started Tue 627PM, Waiting for processes
10.0.0.61 10.0.0.57 10.0.0.56 10.0.0.55 10.0.0.54
10.0.0.51 Finished Tue 633PM 1081 75.7812 0.9982
2 ... Waiting on eda writer
Area (LEs or ALUTs)
Clock Frequency (MHz)
Estimated Energy/cycle dissipated (nJ/cycle)
28Any Questions?
- Technical support, ask me
29EXTRAS
30Setup/Install
- Copy and unpack the SPREE tarball
- /jayar/b/b0/yiannac/spree.tar.gz
- Build all the SPREE software
- Follow instructions in INSTALL.txt
- If theres any errors, email me
cd spree make
31SPREE Directory Structure
spree
applications
cpugen
modelsim
quartus
simulator
compiler
binutils gcc newlib
the cpu generator processor descriptions
Verilog simulator
MIPS simulator
Benchmarks C source
synthesis
32Setup cluster
- Choose the cluster youre using
- aenao high performance, limited access
- eecg any eecg-connected machine
- Edit quartus/machines.txt
- Put a list of 11 or so good eecg machines
specluster eecg
specluster aenao
OR