Title: Feedback directed optimization in Compaq
1Feedback directed optimization in Compaqs
compilation tools for Alpha
- Robert Cohn (Robert.Cohn_at_compaq.com)
- P. Geoffrey Lowney (Geoff.Lowney_at_compaq.com)
- Compaq Computer Corporation
2Feedback directed optimization
- Compilers
- profiles used to determine frequently executed
paths - optimization makes common paths fast
- other paths might be slower
3Feedback directed optimization
- Mature and powerful classical optimizer
- leverage existing optimizations
- Feedback directed optimizations
- Augment cost model with profile information
- Simple feedback directed restructuring
- enables classical optimizations
- FDO 1 of compiler
4Feedback directed optimization in the tool chain
Compiler
Linker
Bin Opt
CodeGen
FrontEnd
Optimizer
IL ? IL
IL ? obj
obj ? bin
Source ? IL
bin ? bin
inliner tracer switch commando loe switch real
flow
register allocation scheduling
layout alignment
5Profile information
- Basic block counts
- pixie instrumentation
- DCPI statistical sampling
- Call edge counts computed from basic block counts
- Flow edge counts estimated from basic block counts
6Procedure inliner
- Static heuristics estimates benefit of inlining
a call site - code size, register pressure, constant arguments,
number of static callers - Frequency of execution
- lower or raise desirability
- number of dynamic callers
7Tracer
- Transforms complicated control flow to
superblocks - single entrance, multiple exit code sequence
- Benefit from larger superblocks
- bigger scheduling unit
- isolation of infrequently executed paths
8Tracer superblock formation
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
A
B
C
D
E
9Tracer superblock formation
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
A
B
C
D
E
10Tracer superblock formation
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
A
B
C
C1
D
E
11Tracer superblock formation
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
A
B
C
C1
D
E
12Tracer superblock formation
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
A
B
C
C1
D
E
E1
13Tracer superblock formation
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
A
B
C
C1
D
E
E1
14Tracer loop peeling
- Pull 1 or 2 iterations out of loop
- Implemented as superblock formation
p p-gtn if (p a) goto L1
do p p-gt n while (p ! a) return p
p p-gtn if (p a) goto L1
do p p-gtn while (p ! a) L1 return p
15Tracer loop peeling
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
A
1
1
B
1
C
16Tracer loop peeling
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
A
1
1
B
1
C
17Tracer loop peeling
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
1
1
0
B
1
0
C
18Tracer loop peeling
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
1
1
0
B
1
0
C
19Tracer loop peeling
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
1
1
0
B
0
0
C
1
20Tracer loop peeling
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
1
0
B
0
1
0
C
1
21Tracer loop peeling
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
1
0
B
1
0
0
C
1
22Tracer loop peeling
- Pick trace N1,, Nn
- Change trace to superblock
- visit nodes N2,, Nn
- if gt 1 predecessor
- copy node and outgoing edges
- redirect incoming trace edge to copy
1
0
B
1
0
0
C
1
23Commando loop optimization
- Restructure loop
- frequent paths are in inner loop
- infrequent paths moved to outer loop
- Create opportunities for classical opt.
- loop invariant removal
- register allocation
- Generalization of superblock loop optimization
24Commando loop optimization
- Make two loop bottoms
- Redirect infrequent back edges to one and the
rest to other - Add loop preheader
- Infrequent loop bottom targets preheader
- Frequent loop bottom targets loop top
0
42
Q
40
4
R
T
U
25Commando loop optimization
- Make two loop bottoms
- Redirect infrequent back edges to one and the
rest to other - Add loop preheader
- Infrequent loop bottom targets preheader
- Frequent loop bottom targets loop top
0
42
Q
40
4
R
T
U
H
C
26Commando loop optimization
- Make two loop bottoms
- Redirect infrequent back edges to one and the
rest to other - Add loop preheader
- Infrequent loop bottom targets preheader
- Frequent loop bottom targets loop top
Q
R
T
U
0
42
4
40
H
C
27Commando loop optimization
- Make two loop bottoms
- Redirect infrequent back edges to one and the
rest to other - Add loop preheader
- Infrequent loop bottom targets preheader
- Frequent loop bottom targets loop top
P
Q
R
T
U
0
42
4
40
H
C
28Commando loop optimization
- Make two loop bottoms
- Redirect infrequent back edges to one and the
rest to other - Add loop preheader
- Infrequent loop bottom targets preheader
- Frequent loop bottom targets loop top
P
4
Q
R
T
U
0
42
4
40
H
C
29Commando loop optimization
- Make two loop bottoms
- Redirect infrequent back edges to one and the
rest to other - Add loop preheader
- Infrequent loop bottom targets preheader
- Frequent loop bottom targets loop top
P
4
Q
82
R
T
U
0
42
4
40
H
C
30Commando loop optimization
- Make two loop bottoms
- Redirect infrequent back edges to one and the
rest to other - Add loop preheader
- Infrequent loop bottom targets preheader
- Frequent loop bottom targets loop top
P
4
Inner loop
Q
82
R
T
U
0
42
4
40
H
C
31Code layout
- Place code to improve
- instruction cache utilization
- memory working set
- instruction prefetch
- Pettis and Hansen
- Basic block chaining
- Routine ordering
- Routine splitting
32Switch statement optimization
- C switch statement
- test for most frequent case first
switch (a) case 1 return 3 case 2 return
4 case 4 return 5
if (a 4) return 5 else switch (a)
case 1 return 3 case 2 return 4
33Evaluation
- DS20
- 500MHZ 21264
- SPECInt95
- train train workload
- time ref workload
- Aggressive optimization for baseline
- Median of 9 runs
34Speedup by optimization
35Speedup for inlining
36Code layout
37Tracer
38Commando
39Loop unroller
40Switch optimization
41Code growth by optimization
42Summary and conclusions
- FDO is effective 17 speedup
- Complement to a strong classical optimizer
- augment cost model of static optimization
- simple restructuring transformations
- Inlining is most important
- Reduces code size
43Acknowledgements
- Gene Albert
- Michael Adler, David Blickstein, Peter Craig,
Caroline Davidson, Neil Faiman, Kent Glossop,
David Goodwin, Rich Grove, Lucy Hamnett, Steve
Hobbs, Bob Nix, Bill Noyce, and John Pieper