Title: Adding custom instructions to Simplescalar/GCC architecture
1Adding custom instructions to Simplescalar/GCC
architecture
2Agenda
- Motivation
- GCC overall architecture
- Simplescalar architecture
- Adding a custom instruction
- Conclusion
- Motivation
- GCC overall architecture
- Simplescalar architecture
- Adding a custom instruction
- Conclusion
3Motivation
- What regular ISA instructions can be combined?
- Which regular ISA instructions are to be combined
into a CFU instruction?
- Retarget the compiler to produce optimised code
with CFU instructions
- Simulate the extended processor with CFU
instructions
4GNU Compiler Collection
- Many front-ends
- C
- Fortran
- C/Java/Ada
- Backend targeted at many processors
- x86, Alpha, Sparc
- ARC, ARM, MIPS . . .
5GCC Compiler Flow
RTL?
Combine small RISC ISA like patterns into bigger
CISC ISA like patterns
Are we interested in everything?
6GCC Low Level Optimisation
- Uses Lisp like RTL as IR
- Example
- Tip use da compiler option to get the IR output
- (insn 48 47 50 (set (reg/vSI 36)
- (multSI (regSI 42)
- (regSI 41))) 41 mulsi3 (nil)
- (nil))
- (call_insn 94 93 97 (parallel
- (set (regSI 0 r0)
- (call (memSI (symbol_refSI
("printf")) 0) - (const_int 0 0x0)))
- (clobber (regSI 14 lr))
- ) -1 (nil)
- (nil)
- (expr_list (use (regSI 1 r1))
- (expr_list (use (regSI 0 r0))
7GCC - Target Machine Description
- Use a similar language in md machine
description file - (define_insn "mulsi3"
- (set (match_operandSI 0 "s_register_operand"
"r,r") - (multSI (match_operandSI 2 "s_register_operand"
"r,r") - (match_operandSI 1 "s_register_operand"
"?r,0"))) - ""
- "mul?\\t0, 2, 1"
- (set_attr "type" "mult"))
8GCC Combine Phase
- Combines some standard IR pattern into a single
user-defined IR pattern - User-defined IR patterns are defined in the
target.md file
- Operand constraints should be satisfied
- Example MAC (Multiply-Accumulate)
- Merge mulsi3 and addsi3 ? mulsi3addsi
9GCC Combine Phase
Let us assume that the following patterns are
defined in the machine description addsi3 ?
Matches CAB (all 32-bit regs) mulsi3 ? Matches
CAB (all 32-bit regs) mulsi3addsi ? Matches
DABC (all 32-bit regs) mulsi4addsi ? Matches
EABCD (all 32-bit regs)
10GCC Combine Phase
Assume this DDG sub-graph
11GCC Combine Phase
Try 55,45
No matching pattern
Try 55,47
No matching pattern
Try 55,53
We have a match
12GCC Combine phase
Try 55,52
No matching pattern
Try 55,50
No matching pattern
Try 55,45
No matching pattern
Try 55,47
No matching pattern
Try 55,52,50
No matching pattern
Try 55,52,45
No matching pattern
Try 55,52,47
No matching pattern
Try 55,50,45
No matching pattern
Try 55,50,47
No matching pattern
Try 55,47,45
No matching pattern
Cannot try to combine more than 3 patterns!
Hence, stop!
13GCC Combine phase Summary
- Can combine upto 3 instructions together
- Can recursively combine more instructions
- Deletes a smaller instruction once combined
- Always works on a function
14Retargetting GCC for CFU
- Build a better Combiner phase
- Write a new combiner with better pattern merger
which works on inputs from RTL - Replace existing combiner with this combiner
- New patterns for the CFU instruction in the
target.md file
- Changes in GAS (included in binutils package) to
generate insn. word
15SimpleScalar is
- Instruction Set simulator
- Profiles programs
- Simulates micro-architectural features
- Different levels of speed of simulation Vs
accuracy trade-off - Written in C
- Easily retargettable
16Simplescalar CFU issues
- More arguments than used by RISC instructions
- Out-of-order execution needs to take care of the
increase in dependencies - New instructions in decode tree
- Easy to add new instructions to the decode tree
(machine.def)
17Let us add a new instruction
- Achieve the operation EABCD using one
instruction - 4 input operands and 1 output operand
- Extension to ARM ISA
- Provide
- Compiler
- Assembler
- Simulator
18Pattern for the instruction
- gcc/config/arm/arm.md
- (define_insn "mulsi4addsi"
- (set (match_operandSI 0 "s_register_operand"
"r") - (plusSI
- (multSI (match_operandSI 2
"s_register_operand" "r") - (match_operandSI 1 "s_register_operand"
"r")) - (multSI (match_operandSI 4
"s_register_operand" "r") - (match_operandSI 3 "s_register_operand"
"r")))) - ""
- "ml2a?\\t0, 2, 1, 4, 3"
- (set_attr "type" "mult"))
19Simplescalar changes
- Instruction Decode Tree
- Chain of decoders Each looking at a set of bits
- target-arm/arm.def
- New chain of decoder macros for CFU class of
instructions - Increase the number of input dependencies in all
the instructio macros from 5 to 6 (predication in
ARM)
20Simplescalar changes
- sim-outorder.c
- Increase the number of input dependencies to be
monitored in the reservation unit - Both macros and code has to be changed
- Other files need to be changed for the same
purpose
- Compile test program and verify!
21Summary
- Identify the ways to add new instructions to
Simplescalar and GCC - Determine the capabilities of the current
combiner in GCC - Demonstrate the addition of a new custom
instruction - Understand GCC to some extent!