Overview of The Pro64 Code Generator (slides by Gao, Dehnert and Amaral) - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Overview of The Pro64 Code Generator (slides by Gao, Dehnert and Amaral)

Description:

WHIRL/CGIR and TARG-INFO. 10/10/09. PACT2000 Tutorial: Open64. 3. Flowchart ... WHIRL. Abstract syntax tree based. Base representation is simple and efficient ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 26

Provided by: ggao

Category:

more less

Transcript and Presenter's Notes

Title: Overview of The Pro64 Code Generator (slides by Gao, Dehnert and Amaral)

1
Overview of The Pro64 Code Generator(slides by
Gao, Dehnert and Amaral)
TOPIC N
2
Outline

The code generator flow diagram
Hyperblock formation and predication (HBF)
Predicate Query System (PQS)
Loop preparation (CGPREP) and software pipelining
Global and local instruction scheduling (IGLS)
Global and local register allocation (GRA, LRA)
WHIRL/CGIR and TARG-INFO

3
Flowchart of Code Generator
WHIRL
Control Flow Opt II EBO
WHIRL-to-TOP Lowering
EBO Extended basic block optimization peephole, e
tc.
CGIR Quad Op List
IGLS pre-pass GRA, LRA, EBO IGLS
post-pass Control Flow Opt
Control Flow Opt I EBO
Hyperblock Formation Critical-Path Reduction
PQS Predicate Query System
Code Emission
Process Inner Loops unrolling, EBO Loop prep,
software pipelining
4
Hyperblock Formation and Predicated Execution

Hyperblock single-entry multiple-exit
control-flow region
loop body, hammock region, etc.
Hyperblock formation algorithm
Based on Scott Mahlkes method Mahlke96
But, capable of performing conditional tail
duplication based on heuristics to eliminate
side-effects (such as code duplication)

5
Hyperblock Formation Algorithm

Hammock regions
Innermost loops
General regions (sequence based)
Paths sorted by priorities
Inclusion of a path is guided by its impact on
resources, scheduling height, and priority level
Internal branches are removed via predication
Predicate reuse
Side exits

Region Identification
Block Selection
Tail Duplication
If Conversion
Objective Keep the scheduling height close to
that of the highest priority path.
6
Features of the Pro64 Hyperblock Formation
Algorithm

Form good vs. maximal hyperblocks
Conditional code duplication
Reduce unnecessary duplication
Seamless integration of HBF with global
scheduling - an integrated part of IGLS
Avoid unnecessary reverse if-conversion

7
Hyperblock Formation - An Example
1
1
aa ai bb bi switch (aa) case 1
if (aa lt tabsiz) aa tabaa case 2
if (bb lt tabsiz) bb tabbb default
ans aa bb
2
4
4
2
1
5
4,5
5
2
6
6
6
6,7
7
8
7
7
8
8
8
H1
H2
(a) Source
(c) Hyperblock formation with aggressive
tail duplication
(b) CFG
8
Hyperblock Formation - An Example
Contd
1
1
1
2
4
4
2
4
2
H1
5
5
5
6
6
6
6
7
7
7
7
8
8
H2
8
H1
H2
8
(b) Hyperblock formation with aggressive
tail duplication
(c) Pro64 hyperblock formation
(a) CFG
9
Predicate Query System (PQS)

Purpose gather information and provide
interfaces allowing other phases to make queries
regarding the relationships among predicate
values
PQS functions (examples)
BOOL PQSCG_is_disjoint (PQS_TN tn1, PQS_TN
tn2)
BOOL PQSCG_is_subset (PQS_TN_SET
tns1, PQS_TN_SET tns2)
Efficiency O(log n), where n is the number of
ancestor temporaries (TNs).

10
Loop Preparation and Optimization for Software
Pipelining

Loop canonicalization for SWP
Read/Write removal (register aware)
Loop unrolling (resource aware)
Recurrence removal
Prefetch (several different types)
Forced if-conversion

11
Pro64 Software Pipelining Method Overview

Only apply to SWP-amenable loops
Extensive loop preparation and optimization
before application DehnertTowle93
Use lifetime sensitive SWP algorithm Huff93
Register allocation after scheduling based on
Cydra 5 RLTS92, DeTo93
Handle both while and do loops
Smooth switching to normal scheduling if not
successful.

12
Pro64 Lifetime-Sensitive Modulo Scheduling for
Software Pipelining

Features
Try to place an op ASAP or ALAP to minimize
register pressure
Slack scheduling
Limited backtracking
Operation-driven scheduling framework

Compute Estart/Lstart for all unplaced ops
Choose a good op to place into the current
partial schedule within its Estart/Lstart range
yes
Register allocate
Succeed
no
done
Eject conflicting Ops
13
Integrated Global Local Scheduling (IGLS) Method

The basic IGLS framework integrates global code
motion (GCM) with local scheduling
MantripragadaJainDehnert98
IGLS extended to hyperblock scheduling
Performs profitable code motion between
hyperblock regions and normal regions

14
IGLS Phase Flow Diagram
Hyperblock Scheduling (HBS)
Block Priority Selection Motion
Selection Target Selection
Global Code Motion (GCM)
Local Code Scheduling (LCS)
15
Advantages of the Extended IGLSMethod - The
Example Revisited
1

Advantages
No rigid boundaries between hyperblocks and
non-hyperblocks
GCM moves code into and out of a hyperblock
according to profitability

1
2
4
4
2
H1
5
5
6
6
7
7
8
8
H1
H2
H2
8
(a) Pro64 hyperblock
(b) Aggressive duplication
16
Software Pipelining vsNormal Scheduling
a SWP-amenable loop candidate ?
No
Yes
IGLS
Inner loop processing software pipelining
GRA/LRA
Failure/not profitable
IGLS
Code Emission
Success
17
WHIRL

Abstract syntax tree based
Base representation is simple and efficient
Used through several phases with lowering
Designed for multiple target architectures
Use symbol table and maps

18
Code Generation Intermediate Representation (CGIR)

Conventional and simple
Load/store architecture
Predication
Flags on ops (copy ops, integer add, load, etc.)
Flags on operands (TNs)
Structured as basic blocks

19
Global and Local Register Allocation(GRA/LRA)
From prepass IGLS

LRA-RQ provides an estimate of local register
requirements
Allocates global variables using a priority-based
register allocator ChowHennessy90,Chow83,
Briggs92
Incorporates IA-64 specific extensions, e.g.
register stack usage

GRA
LRA Register Request LRA-RQ
Priority Based Register Allocation with IA-64
Extensions
LRA
To postpass IGLS
20
Pro64 Priority-Based Register Allocator

Create_LRANGE (live range set)
Create_Live_BB_Sets (for each live range, find
out blocks in which the live range is live)
Create_Interference_Graph (backward walk-through
to find out live ranges live simultaneously)
Simplify (form a stack composed of LRs which will
be colored from top to bottom)
Choose_Register or GRA_Note_Spill
Spill (Spill and optimize spill-code placement)

GRA-Create
GRA-Color
GRA-Spill
21
Local Register Allocation (LRA)