Title: ILP: Software Approaches
1ILP Software Approaches
- Bazat pe slide-urile lui Vincent H. Berk
2HW Support for More ILP
- Avoid branch prediction by turning branches into
conditionally executed instructions - If (X) then A B op C else NOP
- If false, then neither store result nor cause
exception - Expanded ISA of Alpha, MIPS, PowerPC, SPARC have
conditional move PA-RISC can annul any
following instruction. - IA-64 61 1-bit condition fields selected so
conditional execution of any instruction - Drawbacks to conditional instructions
- Still takes a clock even if annulled
- Stall if condition evaluated late
- Complex conditions reduce effectiveness
condition becomes known late in pipeline
X
A B op C
3Software Pipelining
- Observation if iterations from loops are
independent, then can get more ILP by taking
instructions from different iterations - Software pipelining reorganizes loops so that
each iteration is made from instructions chosen
from different iterations of the original loop
4SW Pipelining Example
4
1 LD F0, 0 (R1) LD F0, 0 (R1) 2 ADDD F4, F0,
F2 ADDD F4, F0, F2 3 SD 0 (R1), F4 LD F0, 8
(R1) 4 LD F6, 8 (R1) 1 SD 0 (R1), F4 Stores
Mi 5 ADDD F8, F6, F2 2 ADDD F4, F0, F2 Adds to
Mi-1 6 SD 8, (R1), F8 3 LD F0, 16 (R1) Loads
Mi-2 7 LD F10, 16 (R1) 4 SUBI R1, R1,
8 8 ADDD F12, F10, F2 5 BNEZ R1, LOOP 9 SD 16
(R1), F12 SD 0 (R1), F4 10 SUBI R1, R1,
24 ADDD F4, F0, F2 11 BNEZ R1, LOOP SD 8
(R1), F4
Read F4 Read F0 SD IF ID EX Mem WB Write
F4 ADD IF ID EX Mem WB LD IF ID EX Mem WB
Write F0
5SW Pipelining Example
5
- Symbolic Loop Unrolling
- Smaller code space
- Overhead paid only once vs. each iteration in
loop unrolling - 100 iterations 25 loops with 4 unrolled
iterations each
Software Pipelining
Number of overlapped operations
(a) Software pipelining
Time
Loop Unrolling
Number of overlapped operations
Time
(b) Loop unrolling
6Trace Scheduling
- Focus on critical path (trace selection)
- Compiler has to decide what the critical path
(the trace) is - Most likely basic blocks are put in the trace
- Loops are unrolled in the trace
- Now speed it up (trace compaction)
- Focus on limiting instruction count
- Branches are seen as jumps into or out of the
trace - Problem
- Significant overhead for parts that are not in
the trace - Unclear if it is feasible in practice
7Superblocks
- Similar to Trace Scheduling but
- Single entrance, multiple exits
- Tail duplication
- Handle cases that exited the superblock
- Residual loop handling
- Could in itself be a superblock
- Problem
- Code size
- Worth the hassle?
8(No Transcript)
9Conditional instructions
- Instruction that is executed depending on one of
its arguments - BNEZ R1, L
- ADDU R2, R3, R0
- L
- VS
- CMOVZ R2, R3, R1
- Instruction is executed but results are not
always written. - Should only be used for very small sequences,
else use normal branch
10Speculation
- Compiler moves instructions before branch if
- Data flow is not affected (optionally with use of
renaming) - Preserve exception behavior
- Avoid load/store address conflicts (no renaming
for memory loc.) - Preserving exception behavior
- Mechanism to indicate an instruction is
speculative - Poison bit raise exception when value is used
- Using Conditional instructions
- Requires In-Order instruction commit
- Register renaming
- Writeback at commit
- Forwarding
- Raise exceptions at commit
11Speculation
- if (A0) AB else AA4
- LD R1, 0(R3) load A
- BNEZ R1, L1 test A
- LD R1, 0(R2) then
- J L2 skip else
- L1 DADDI R1, R1, 4 else
- L2 SD R1, 0(R3) store A
- LD R1, 0(R3) load A
- LD R14, 0(R2) load B (speculative)
- BEQZ R1, L3 branch if
- DADDI R14, R1, 4 else
- L3 SD R14, 0(R3) store A
12(No Transcript)