Feedback directed optimization in Compaq - PowerPoint PPT Presentation

About This Presentation
Title:

Feedback directed optimization in Compaq

Description:

Make two loop bottoms. Redirect infrequent back edges to one and the rest to other ... loop bottom targets preheader. Frequent loop bottom targets loop ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 44
Provided by: rober189
Category:

less

Transcript and Presenter's Notes

Title: Feedback directed optimization in Compaq


1
Feedback directed optimization in Compaqs
compilation tools for Alpha
  • Robert Cohn (Robert.Cohn_at_compaq.com)
  • P. Geoffrey Lowney (Geoff.Lowney_at_compaq.com)
  • Compaq Computer Corporation

2
Feedback directed optimization
  • Compilers
  • profiles used to determine frequently executed
    paths
  • optimization makes common paths fast
  • other paths might be slower

3
Feedback directed optimization
  • Mature and powerful classical optimizer
  • leverage existing optimizations
  • Feedback directed optimizations
  • Augment cost model with profile information
  • Simple feedback directed restructuring
  • enables classical optimizations
  • FDO 1 of compiler

4
Feedback directed optimization in the tool chain
Compiler
Linker
Bin Opt
CodeGen
FrontEnd
Optimizer
IL ? IL
IL ? obj
obj ? bin
Source ? IL
bin ? bin
inliner tracer switch commando loe switch real
flow
register allocation scheduling
layout alignment
5
Profile information
  • Basic block counts
  • pixie instrumentation
  • DCPI statistical sampling
  • Call edge counts computed from basic block counts
  • Flow edge counts estimated from basic block counts

6
Procedure inliner
  • Static heuristics estimates benefit of inlining
    a call site
  • code size, register pressure, constant arguments,
    number of static callers
  • Frequency of execution
  • lower or raise desirability
  • number of dynamic callers

7
Tracer
  • Transforms complicated control flow to
    superblocks
  • single entrance, multiple exit code sequence
  • Benefit from larger superblocks
  • bigger scheduling unit
  • isolation of infrequently executed paths

8
Tracer superblock formation
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

A
B
C
D
E
9
Tracer superblock formation
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

A
B
C
D
E
10
Tracer superblock formation
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

A
B
C
C1
D
E
11
Tracer superblock formation
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

A
B
C
C1
D
E
12
Tracer superblock formation
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

A
B
C
C1
D
E
E1
13
Tracer superblock formation
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

A
B
C
C1
D
E
E1
14
Tracer loop peeling
  • Pull 1 or 2 iterations out of loop
  • Implemented as superblock formation

p p-gtn if (p a) goto L1
do p p-gt n while (p ! a) return p
p p-gtn if (p a) goto L1
do p p-gtn while (p ! a) L1 return p
15
Tracer loop peeling
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

A
1
1
B
1
C
16
Tracer loop peeling
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

A
1
1
B
1
C
17
Tracer loop peeling
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

1
1
0
B
1
0
C
18
Tracer loop peeling
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

1
1
0
B
1
0
C
19
Tracer loop peeling
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

1
1
0
B
0
0
C
1
20
Tracer loop peeling
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

1
0
B
0
1
0
C
1
21
Tracer loop peeling
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

1
0
B
1
0
0
C
1
22
Tracer loop peeling
  • Pick trace N1,, Nn
  • Change trace to superblock
  • visit nodes N2,, Nn
  • if gt 1 predecessor
  • copy node and outgoing edges
  • redirect incoming trace edge to copy

1
0
B
1
0
0
C
1
23
Commando loop optimization
  • Restructure loop
  • frequent paths are in inner loop
  • infrequent paths moved to outer loop
  • Create opportunities for classical opt.
  • loop invariant removal
  • register allocation
  • Generalization of superblock loop optimization

24
Commando loop optimization
  • Make two loop bottoms
  • Redirect infrequent back edges to one and the
    rest to other
  • Add loop preheader
  • Infrequent loop bottom targets preheader
  • Frequent loop bottom targets loop top

0
42
Q
40
4
R
T
U
25
Commando loop optimization
  • Make two loop bottoms
  • Redirect infrequent back edges to one and the
    rest to other
  • Add loop preheader
  • Infrequent loop bottom targets preheader
  • Frequent loop bottom targets loop top

0
42
Q
40
4
R
T
U
H
C
26
Commando loop optimization
  • Make two loop bottoms
  • Redirect infrequent back edges to one and the
    rest to other
  • Add loop preheader
  • Infrequent loop bottom targets preheader
  • Frequent loop bottom targets loop top

Q
R
T
U
0
42
4
40
H
C
27
Commando loop optimization
  • Make two loop bottoms
  • Redirect infrequent back edges to one and the
    rest to other
  • Add loop preheader
  • Infrequent loop bottom targets preheader
  • Frequent loop bottom targets loop top

P
Q
R
T
U
0
42
4
40
H
C
28
Commando loop optimization
  • Make two loop bottoms
  • Redirect infrequent back edges to one and the
    rest to other
  • Add loop preheader
  • Infrequent loop bottom targets preheader
  • Frequent loop bottom targets loop top

P
4
Q
R
T
U
0
42
4
40
H
C
29
Commando loop optimization
  • Make two loop bottoms
  • Redirect infrequent back edges to one and the
    rest to other
  • Add loop preheader
  • Infrequent loop bottom targets preheader
  • Frequent loop bottom targets loop top

P
4
Q
82
R
T
U
0
42
4
40
H
C
30
Commando loop optimization
  • Make two loop bottoms
  • Redirect infrequent back edges to one and the
    rest to other
  • Add loop preheader
  • Infrequent loop bottom targets preheader
  • Frequent loop bottom targets loop top

P
4
Inner loop
Q
82
R
T
U
0
42
4
40
H
C
31
Code layout
  • Place code to improve
  • instruction cache utilization
  • memory working set
  • instruction prefetch
  • Pettis and Hansen
  • Basic block chaining
  • Routine ordering
  • Routine splitting

32
Switch statement optimization
  • C switch statement
  • test for most frequent case first

switch (a) case 1 return 3 case 2 return
4 case 4 return 5
if (a 4) return 5 else switch (a)
case 1 return 3 case 2 return 4
33
Evaluation
  • DS20
  • 500MHZ 21264
  • SPECInt95
  • train train workload
  • time ref workload
  • Aggressive optimization for baseline
  • Median of 9 runs

34
Speedup by optimization
35
Speedup for inlining
36
Code layout
37
Tracer
38
Commando
39
Loop unroller
40
Switch optimization
41
Code growth by optimization
42
Summary and conclusions
  • FDO is effective 17 speedup
  • Complement to a strong classical optimizer
  • augment cost model of static optimization
  • simple restructuring transformations
  • Inlining is most important
  • Reduces code size

43
Acknowledgements
  • Gene Albert
  • Michael Adler, David Blickstein, Peter Craig,
    Caroline Davidson, Neil Faiman, Kent Glossop,
    David Goodwin, Rich Grove, Lucy Hamnett, Steve
    Hobbs, Bob Nix, Bill Noyce, and John Pieper
Write a Comment
User Comments (0)
About PowerShow.com