Overview of The Pro64 Code Generator (slides by Gao, Dehnert and Amaral) - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Overview of The Pro64 Code Generator (slides by Gao, Dehnert and Amaral)

Description:

WHIRL/CGIR and TARG-INFO. 10/10/09. PACT2000 Tutorial: Open64. 3. Flowchart ... WHIRL. Abstract syntax tree based. Base representation is simple and efficient ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 26
Provided by: ggao
Category:

less

Transcript and Presenter's Notes

Title: Overview of The Pro64 Code Generator (slides by Gao, Dehnert and Amaral)


1
Overview of The Pro64 Code Generator(slides by
Gao, Dehnert and Amaral)
TOPIC N
2
Outline
  • The code generator flow diagram
  • Hyperblock formation and predication (HBF)
  • Predicate Query System (PQS)
  • Loop preparation (CGPREP) and software pipelining
  • Global and local instruction scheduling (IGLS)
  • Global and local register allocation (GRA, LRA)
  • WHIRL/CGIR and TARG-INFO

3
Flowchart of Code Generator
WHIRL
Control Flow Opt II EBO
WHIRL-to-TOP Lowering
EBO Extended basic block optimization peephole, e
tc.
CGIR Quad Op List
IGLS pre-pass GRA, LRA, EBO IGLS
post-pass Control Flow Opt
Control Flow Opt I EBO
Hyperblock Formation Critical-Path Reduction
PQS Predicate Query System
Code Emission
Process Inner Loops unrolling, EBO Loop prep,
software pipelining
4
Hyperblock Formation and Predicated Execution
  • Hyperblock single-entry multiple-exit
    control-flow region
  • loop body, hammock region, etc.
  • Hyperblock formation algorithm
  • Based on Scott Mahlkes method Mahlke96
  • But, capable of performing conditional tail
    duplication based on heuristics to eliminate
    side-effects (such as code duplication)

5
Hyperblock Formation Algorithm
  • Hammock regions
  • Innermost loops
  • General regions (sequence based)
  • Paths sorted by priorities
  • Inclusion of a path is guided by its impact on
    resources, scheduling height, and priority level
  • Internal branches are removed via predication
  • Predicate reuse
  • Side exits

Region Identification
Block Selection
Tail Duplication
If Conversion
Objective Keep the scheduling height close to
that of the highest priority path.
6
Features of the Pro64 Hyperblock Formation
Algorithm
  • Form good vs. maximal hyperblocks
  • Conditional code duplication
  • Reduce unnecessary duplication
  • Seamless integration of HBF with global
    scheduling - an integrated part of IGLS
  • Avoid unnecessary reverse if-conversion

7
Hyperblock Formation - An Example
1
1
aa ai bb bi switch (aa) case 1
if (aa lt tabsiz) aa tabaa case 2
if (bb lt tabsiz) bb tabbb default
ans aa bb
2
4
4
2
1
5
4,5
5
2
6
6
6
6,7
7
8
7
7
8
8
8
H1
H2
(a) Source
(c) Hyperblock formation with aggressive
tail duplication
(b) CFG
8
Hyperblock Formation - An Example
Contd
1
1
1
2
4
4
2
4
2
H1
5
5
5
6
6
6
6
7
7
7
7
8
8
H2
8
H1
H2
8
(b) Hyperblock formation with aggressive
tail duplication
(c) Pro64 hyperblock formation
(a) CFG
9
Predicate Query System (PQS)
  • Purpose gather information and provide
    interfaces allowing other phases to make queries
    regarding the relationships among predicate
    values
  • PQS functions (examples)
  • BOOL PQSCG_is_disjoint (PQS_TN tn1, PQS_TN
    tn2)
  • BOOL PQSCG_is_subset (PQS_TN_SET
    tns1, PQS_TN_SET tns2)
  • Efficiency O(log n), where n is the number of
    ancestor temporaries (TNs).

10
Loop Preparation and Optimization for Software
Pipelining
  • Loop canonicalization for SWP
  • Read/Write removal (register aware)
  • Loop unrolling (resource aware)
  • Recurrence removal
  • Prefetch (several different types)
  • Forced if-conversion

11
Pro64 Software Pipelining Method Overview
  • Only apply to SWP-amenable loops
  • Extensive loop preparation and optimization
    before application DehnertTowle93
  • Use lifetime sensitive SWP algorithm Huff93
  • Register allocation after scheduling based on
    Cydra 5 RLTS92, DeTo93
  • Handle both while and do loops
  • Smooth switching to normal scheduling if not
    successful.

12
Pro64 Lifetime-Sensitive Modulo Scheduling for
Software Pipelining
  • Features
  • Try to place an op ASAP or ALAP to minimize
    register pressure
  • Slack scheduling
  • Limited backtracking
  • Operation-driven scheduling framework

Compute Estart/Lstart for all unplaced ops
Choose a good op to place into the current
partial schedule within its Estart/Lstart range
yes
Register allocate
Succeed
no
done
Eject conflicting Ops
13
Integrated Global Local Scheduling (IGLS) Method
  • The basic IGLS framework integrates global code
    motion (GCM) with local scheduling
    MantripragadaJainDehnert98
  • IGLS extended to hyperblock scheduling
  • Performs profitable code motion between
    hyperblock regions and normal regions

14
IGLS Phase Flow Diagram
Hyperblock Scheduling (HBS)
Block Priority Selection Motion
Selection Target Selection
Global Code Motion (GCM)
Local Code Scheduling (LCS)
15
Advantages of the Extended IGLSMethod - The
Example Revisited
1
  • Advantages
  • No rigid boundaries between hyperblocks and
    non-hyperblocks
  • GCM moves code into and out of a hyperblock
    according to profitability

1
2
4
4
2
H1
5
5
6
6
7
7
8
8
H1
H2
H2
8
(a) Pro64 hyperblock
(b) Aggressive duplication
16
Software Pipelining vsNormal Scheduling
a SWP-amenable loop candidate ?
No
Yes
IGLS
Inner loop processing software pipelining
GRA/LRA
Failure/not profitable
IGLS
Code Emission
Success
17
WHIRL
  • Abstract syntax tree based
  • Base representation is simple and efficient
  • Used through several phases with lowering
  • Designed for multiple target architectures
  • Use symbol table and maps

18
Code Generation Intermediate Representation (CGIR)
  • Conventional and simple
  • Load/store architecture
  • Predication
  • Flags on ops (copy ops, integer add, load, etc.)
  • Flags on operands (TNs)
  • Structured as basic blocks

19
Global and Local Register Allocation(GRA/LRA)
From prepass IGLS
  • LRA-RQ provides an estimate of local register
    requirements
  • Allocates global variables using a priority-based
    register allocator ChowHennessy90,Chow83,
    Briggs92
  • Incorporates IA-64 specific extensions, e.g.
    register stack usage

GRA
LRA Register Request LRA-RQ
Priority Based Register Allocation with IA-64
Extensions
LRA
To postpass IGLS
20
Pro64 Priority-Based Register Allocator
  • Create_LRANGE (live range set)
  • Create_Live_BB_Sets (for each live range, find
    out blocks in which the live range is live)
  • Create_Interference_Graph (backward walk-through
    to find out live ranges live simultaneously)
  • Simplify (form a stack composed of LRs which will
    be colored from top to bottom)
  • Choose_Register or GRA_Note_Spill
  • Spill (Spill and optimize spill-code placement)

GRA-Create
GRA-Color
GRA-Spill
21
Local Register Allocation (LRA)
  • Assign_registers using reverse linear scan with
    priority assignment
  • Reordering depth-first ordering on the DDG

Assign_Registers
succeed
failed
Fix_LRA
first time
Instruction reordering
Spill global spill local
22
From WHIRL to CGIR An Example
  • T1 sp a
  • T2 ld T1
  • T3 sp i
  • T4 ld T3
  • T5 sxt T4
  • T6 T5 ltlt 2
  • T7 T6
  • T8 T2 T7
  • T9 ld T8
  • T10 sp aa
  • st T10 T9

ST aa
int a int i int aa aa ai
LD

a

CVTL32
4
i
(a) Source
(b) WHIRL
(c) CGIR
23
From WHIRL to CGIR
Contd
  • Information passed
  • alias information
  • loop information
  • symbol table and maps

24
The Target Information Table (TARG_INFO)
  • Objective
  • Parameterized description of a target machine and
    system architecture
  • Separates architecture details from the
    compilers algorithms
  • Minimizes compiler changes when targeting a new
    architecture

25
The Target Information Table (TARG_INFO)
Cond
  • Based on an extension of Cydra tables, with major
    improvements
  • Architecture models have already targeted
  • Whole MIPS family
  • IA-64
  • IA-32
  • SGI graphics processors (earlier version)
Write a Comment
User Comments (0)
About PowerShow.com