Title: Overview of The Pro64 Code Generator (slides by Gao, Dehnert and Amaral)
1Overview of The Pro64 Code Generator(slides by
Gao, Dehnert and Amaral)
TOPIC N
2Outline
- The code generator flow diagram
- Hyperblock formation and predication (HBF)
- Predicate Query System (PQS)
- Loop preparation (CGPREP) and software pipelining
- Global and local instruction scheduling (IGLS)
- Global and local register allocation (GRA, LRA)
- WHIRL/CGIR and TARG-INFO
3Flowchart of Code Generator
WHIRL
Control Flow Opt II EBO
WHIRL-to-TOP Lowering
EBO Extended basic block optimization peephole, e
tc.
CGIR Quad Op List
IGLS pre-pass GRA, LRA, EBO IGLS
post-pass Control Flow Opt
Control Flow Opt I EBO
Hyperblock Formation Critical-Path Reduction
PQS Predicate Query System
Code Emission
Process Inner Loops unrolling, EBO Loop prep,
software pipelining
4Hyperblock Formation and Predicated Execution
- Hyperblock single-entry multiple-exit
control-flow region - loop body, hammock region, etc.
- Hyperblock formation algorithm
- Based on Scott Mahlkes method Mahlke96
- But, capable of performing conditional tail
duplication based on heuristics to eliminate
side-effects (such as code duplication)
5Hyperblock Formation Algorithm
- Hammock regions
- Innermost loops
- General regions (sequence based)
- Paths sorted by priorities
- Inclusion of a path is guided by its impact on
resources, scheduling height, and priority level - Internal branches are removed via predication
- Predicate reuse
- Side exits
Region Identification
Block Selection
Tail Duplication
If Conversion
Objective Keep the scheduling height close to
that of the highest priority path.
6Features of the Pro64 Hyperblock Formation
Algorithm
- Form good vs. maximal hyperblocks
- Conditional code duplication
- Reduce unnecessary duplication
- Seamless integration of HBF with global
scheduling - an integrated part of IGLS - Avoid unnecessary reverse if-conversion
7Hyperblock Formation - An Example
1
1
aa ai bb bi switch (aa) case 1
if (aa lt tabsiz) aa tabaa case 2
if (bb lt tabsiz) bb tabbb default
ans aa bb
2
4
4
2
1
5
4,5
5
2
6
6
6
6,7
7
8
7
7
8
8
8
H1
H2
(a) Source
(c) Hyperblock formation with aggressive
tail duplication
(b) CFG
8Hyperblock Formation - An Example
Contd
1
1
1
2
4
4
2
4
2
H1
5
5
5
6
6
6
6
7
7
7
7
8
8
H2
8
H1
H2
8
(b) Hyperblock formation with aggressive
tail duplication
(c) Pro64 hyperblock formation
(a) CFG
9Predicate Query System (PQS)
- Purpose gather information and provide
interfaces allowing other phases to make queries
regarding the relationships among predicate
values - PQS functions (examples)
- BOOL PQSCG_is_disjoint (PQS_TN tn1, PQS_TN
tn2) - BOOL PQSCG_is_subset (PQS_TN_SET
tns1, PQS_TN_SET tns2) - Efficiency O(log n), where n is the number of
ancestor temporaries (TNs).
10Loop Preparation and Optimization for Software
Pipelining
- Loop canonicalization for SWP
- Read/Write removal (register aware)
- Loop unrolling (resource aware)
- Recurrence removal
- Prefetch (several different types)
- Forced if-conversion
11Pro64 Software Pipelining Method Overview
- Only apply to SWP-amenable loops
- Extensive loop preparation and optimization
before application DehnertTowle93 - Use lifetime sensitive SWP algorithm Huff93
- Register allocation after scheduling based on
Cydra 5 RLTS92, DeTo93 - Handle both while and do loops
- Smooth switching to normal scheduling if not
successful.
12Pro64 Lifetime-Sensitive Modulo Scheduling for
Software Pipelining
- Features
- Try to place an op ASAP or ALAP to minimize
register pressure - Slack scheduling
- Limited backtracking
- Operation-driven scheduling framework
Compute Estart/Lstart for all unplaced ops
Choose a good op to place into the current
partial schedule within its Estart/Lstart range
yes
Register allocate
Succeed
no
done
Eject conflicting Ops
13Integrated Global Local Scheduling (IGLS) Method
- The basic IGLS framework integrates global code
motion (GCM) with local scheduling
MantripragadaJainDehnert98 - IGLS extended to hyperblock scheduling
- Performs profitable code motion between
hyperblock regions and normal regions
14IGLS Phase Flow Diagram
Hyperblock Scheduling (HBS)
Block Priority Selection Motion
Selection Target Selection
Global Code Motion (GCM)
Local Code Scheduling (LCS)
15Advantages of the Extended IGLSMethod - The
Example Revisited
1
- Advantages
- No rigid boundaries between hyperblocks and
non-hyperblocks - GCM moves code into and out of a hyperblock
according to profitability
1
2
4
4
2
H1
5
5
6
6
7
7
8
8
H1
H2
H2
8
(a) Pro64 hyperblock
(b) Aggressive duplication
16Software Pipelining vsNormal Scheduling
a SWP-amenable loop candidate ?
No
Yes
IGLS
Inner loop processing software pipelining
GRA/LRA
Failure/not profitable
IGLS
Code Emission
Success
17WHIRL
- Abstract syntax tree based
- Base representation is simple and efficient
- Used through several phases with lowering
- Designed for multiple target architectures
- Use symbol table and maps
18Code Generation Intermediate Representation (CGIR)
- Conventional and simple
- Load/store architecture
- Predication
- Flags on ops (copy ops, integer add, load, etc.)
- Flags on operands (TNs)
- Structured as basic blocks
19Global and Local Register Allocation(GRA/LRA)
From prepass IGLS
- LRA-RQ provides an estimate of local register
requirements - Allocates global variables using a priority-based
register allocator ChowHennessy90,Chow83,
Briggs92 - Incorporates IA-64 specific extensions, e.g.
register stack usage
GRA
LRA Register Request LRA-RQ
Priority Based Register Allocation with IA-64
Extensions
LRA
To postpass IGLS
20Pro64 Priority-Based Register Allocator
- Create_LRANGE (live range set)
- Create_Live_BB_Sets (for each live range, find
out blocks in which the live range is live) - Create_Interference_Graph (backward walk-through
to find out live ranges live simultaneously) - Simplify (form a stack composed of LRs which will
be colored from top to bottom) - Choose_Register or GRA_Note_Spill
- Spill (Spill and optimize spill-code placement)
GRA-Create
GRA-Color
GRA-Spill
21Local Register Allocation (LRA)
- Assign_registers using reverse linear scan with
priority assignment - Reordering depth-first ordering on the DDG
Assign_Registers
succeed
failed
Fix_LRA
first time
Instruction reordering
Spill global spill local
22From WHIRL to CGIR An Example
- T1 sp a
- T2 ld T1
- T3 sp i
- T4 ld T3
- T5 sxt T4
- T6 T5 ltlt 2
- T7 T6
- T8 T2 T7
- T9 ld T8
- T10 sp aa
- st T10 T9
ST aa
int a int i int aa aa ai
LD
a
CVTL32
4
i
(a) Source
(b) WHIRL
(c) CGIR
23From WHIRL to CGIR
Contd
- Information passed
- alias information
- loop information
- symbol table and maps
24The Target Information Table (TARG_INFO)
- Objective
- Parameterized description of a target machine and
system architecture - Separates architecture details from the
compilers algorithms - Minimizes compiler changes when targeting a new
architecture
25The Target Information Table (TARG_INFO)
Cond
- Based on an extension of Cydra tables, with major
improvements - Architecture models have already targeted
- Whole MIPS family
- IA-64
- IA-32
- SGI graphics processors (earlier version)