Title: Optimizing the mMIPS
1Optimizing the mMIPS
2The mMIPS
- Pipelined core
- Hazard detection
- No forwarding
- mMIPS instruction set
- 31 instructions in hardware available (add, bnez,
mul, ...) - Other instructions supported via a C compiler
(div, sra, ...)
3Outline
- LCC compiler for the mMIPS
- Using memories in the mMIPS
- The mMIPS vs Hennesy and Patterson
4Toolflow
test
implementation
LCC C Compiler
Application (C source)
LCC C Compiler
sw
mMIPS (C sources that use SystemC libraries)
hw
Borland C Compiler
Xilinx ISE
Synopsys FPGA Compiler II
Synopsys SystemC compiler
5LCC compiler its a C compiler
- Consider the following code fragment
- for (int i 0 i lt 3 i)
- ai ...
- It should be
- int i
- for (i 0 i lt 3 i)
- ai ...
-
6How does a compiler work?
lcc prog.c o mips_rom.bin
7Adding special functions
- Examples
- Division, multiply, swap, clip, ...
- Constraints
- At most 2 input operands and 1 output operand
- Manifest loop bounds
- Clock frequency
- Chip area
8Securing our skies
- Measure height each second
- The airplane may never be for more then 1 second
below 1000ft - If needed, take appropriate action
9Securing our skies
- Measure height each second
- The airplane may never be for more then 1 second
below 1000ft - If needed, take appropriate action
10missile.c
- define TRUE 1
- define FALSE 0
- int launch(int height1, int height2)
-
- int l
- if (height1 lt 1000 height2 lt 1000)
- l TRUE
- else
- l FALSE
- return l
-
- void main(void)
-
- int height1, height2
11Assembler
int launch(int height1, int height2) int
l if (height1 lt 1000 height2 lt 1000)
l TRUE else l FALSE return
l
80 addiu sp,sp,-8 84 li t8,1000 88
slt s8,a0,t8 8c beqz s8,0xac 90 nop 94
slt s8,a1,t8 98 beqz s8,0xac 9c nop a0
li t8,1 a4 b 0xb0 a8 sw t8,4(sp) ac
sw zero,4(sp) b0 lw v0,4(sp) b4 jr ra b8
addiu sp,sp,8
lcc missile.c o missile disas missile
12Adding a special function to the mMIPS (overview)
- New mMIPS instruction launch
- Select an opcode and function code
opcode ? 0 functioncode ? 0x10 (not yet used)
13Adding a special function to the mMIPS (hardware)
aluctrl alu
14Converting a C program to the LCC data
representation
0 int main(void) 1 int a 3 2 if
(a 3) 3 return 1 4 return 0 5
15What does a rule look like?
A rule for adding two unsigned integer (4
bytes) reg ADDU4 (reg,reg)
"\taddu c,0,1\n" 1
1 The first source operand register 2 The
second source operand register c The
destination register
16Converting the LCC data-structure to assembler
.set reorder .globl main .text
.text .align 2 .ent
main main .frame sp,8,31 addu
sp,sp,-8 la 24,3 sw
24,-48(sp) lw 24,-48(sp) la
15,3 bne 24,15,L.2 la 2,1
b L.1 L.2 move 2,0 L.1
addu sp,sp,8 j 31 .end main
17Adding a special function to the mMIPS (software)
- Launch function must be detected by LCC
- Use special pattern to indicate use of launch
function - Example ((a) - ((b) (int ) 0x12344321))
- The following 4 constructs map to custom
operations in LCC - ((a) - ((b) (int ) 0x12344321))
- ((a) ((b) (int ) 0x12344321))
- ((a) - ((b) - (int ) 0x12344321))
- ((a) ((b) - (int ) 0x12344321))
- More operations (possibly with more operands) can
be added. Look at the website for more
information.
18Custom operation in C and assembler
- define TRUE 1
- define FALSE 0
- define launch(h1, h2) ((h1) - ((h2) (int )
0x12344321)) - void main(void)
-
- int height1, height2
- int l
- while (TRUE)
-
- l launch(height1, height2)
-
-
80addiu sp,sp,-16 84 sw s5,0(sp) 88
sw s6,4(sp) 8c b 0x98 90 sw s7,8(sp) 94
tgeu s7,s6,0x2a0 98 b 0x94 9c nop a0
lw s5,0(sp) a4 lw s6,4(sp) a8
lw s7,8(sp) ac jr ra b0 addiu sp,sp,16
lcc missile.c o missile disas missile
19Comparison
original
added custom instruction
80 addiu sp,sp,-8 84 li t8,1000 88
slt s8,a0,t8 8c beqz s8,0xac 90 nop 94
slt s8,a1,t8 98 beqz s8,0xac 9c nop a0
li t8,1 a4 b 0xb0 a8 sw t8,4(sp) ac
sw zero,4(sp) b0 lw v0,4(sp) b4 jr ra b8
addiu sp,sp,8
94 tgeu s7,s6,0x2a0
Reduction of 14 instructions per execution!
20Outline
- LCC compiler for the mMIPS
- Using memories in the mMIPS
- The mMIPS vs Hennesy and Patterson
21The mMIPS memory layout
22Taking the memory from LCC to the mMIPS
mips_rom.bin
mips_ram.bin
23Using the data memory in your program
- / The program copies the data from str1 to str2.
Note that - at most 512 characters are copied.
- /
- char str1 (char )0x0 // Memory address 0
in ram - char str2 (char )0x200 // Memory address
0x200 in ram - void main (void)
-
- int i
- for (i 0 str1i ! \0 i lt 0x200 i)
-
- str2i str1i
-
- str2i \0
24Outline
- LCC compiler for the mMIPS
- Using memories in the mMIPS
- The mMIPS vs Hennesy and Patterson
25Registerfile and write-back hazards
Input
Output
Write
HP
Data is available on the output of the
registerfile in the current cycle
Write
Output
Input
Data is available on the output of the
registerfile in the next cycle
mMIPS
26Branch hazards
- Hennesy and Patterson
- Branch detection in the decoding phase (after
registerfile) - Two cycles needed to determine branch taken (IF
and ID) - The first instruction after the branch is the
branch delay slot filled by the assembler. - mMIPS
- Branch detection in the execution phase (using
the alu) - Three cycles needed to determine branch taken
(IF, ID, EX) - Two branch delay slots (one used by the
assembler, the second delay slot is filled with a
NOP by the hazard detection unit).
27 28(No Transcript)
29Assignment
30Assignment
- Optimize the run-time of an image processing
algorithm running on the mMIPS. - Allowed
- Add special instructions to the mMIPS
- Change design of the mMIPS (e.g. forwarding).
- Not-allowed
- Modification of the image processing algorithm
that are not needed to use special instructions
(e.g. replace multiply with shifts).
31Testing and implementing the design
- Test for functional correctness
- Run the original mMIPS with the algorithm to
produce a reference output. - Compare the results of your mMIPS to the
reference output. - Implement your design on the FPGA
- You must complete the flow till the FPGA. The
maximum clock frequency at which your mMIPS can
be synthesized is part of the performance.
32ImageProcessing.zip
- Download the file ImageProcessing.zip at
- http//www.es.ele.tue.nl/education/Computation/og
o12/ - Content
- ImageProcessing/algorithm
- ImageProcessing/bendime
- ImageProcessing/bennoc
- ImageProcessing/cocentric
- ImageProcessing/lcc
- ImageProcessing/mips
- ImageProcessing/SystemC2.0.1borland
- bennoc_setup.csh
33Support and Information
- Dominic Gawlowski - FPGA
- Valentin Gheorghita - LCC
- Sander Stuijk - SystemC
- Each Tuesday and Friday between 14.00 and 16.00h.
- Look also at http//www.es.ele.tue.nl/education/Co
mputation/ogo12/ for information, tips, etc.