EECS 583 Lecture 6 Hyperblocks, Control CPR - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

EECS 583 Lecture 6 Hyperblocks, Control CPR

Description:

HW 1 due today _at_11:59pm. No 4th testcase I didn't get around to it ... Create tar file, uniquename.tgz. put in /y/eecs583/hw1 ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 39

Provided by: scottm3

Category:

more less

Transcript and Presenter's Notes

Title: EECS 583 Lecture 6 Hyperblocks, Control CPR

1
EECS 583 Lecture 6Hyperblocks, Control CPR

University of Michigan
January 27, 2003

2
Homeworks

HW 1 due today _at_1159pm
No 4th testcase I didnt get around to it
scp tar file to lloth.eecs.umich.edu
Please dont email it to me
user eecs583
password is same
Create tar file, uniquename.tgz
put in /y/eecs583/hw1
scp mahlke.tgz eecs583_at_lloth.eecs.umich.edu/y/eec
s583/hw1/.
HW 2 is available Due in 2 wks

3
Class Problem from Last Time
if (a gt 0) r t s if (b gt 0 c gt
0) u v 1 else if (d gt 0)
x y 1 else z z 1

Draw the CFG
Compute CD
If-convert the code

4
Region Formation If-conversion
10

Control flow representation
branches
predicated operations
If-conversion not all all or nothing deal
Often bad to apply in blanket mode
Selectively apply
Regions
Extend a superblock to contain if-converted code
Convert off-trace transitions to on-trace
A hyperblock is born
Superblock is a special case HB where all
guarding predicates are True

BB1
20
80
BB2
BB3
80
20
BB4
BB4
8
20
72
BB5
28
BB6
BB6
7.2
25.2
64.8
2.8
5
When to Apply If-conversion

Positives
Remove branch
No disruption to sequential fetch
No prediction or mispredict
No use of branch resource
Increase potential for operation overlap
Enable more aggressive compiler xforms
Software pipelining
Height reduction
Negatives
Max or Sum function applied when overlap
Resource usage
Dependence height
Hazard presence
Executing useless operations

10
BB1
80
90
20
BB2
BB3
80
20
BB4
10
BB5
90
10
BB6
10
6
Negative 1 Resource Usage
Case 1 Each BB requires 3 resources Assume
processor has 2 resources No IC 13 .63
.43 13 9 9 / 2 4.5 5 cycles IC 1(3
3 3 3) 12 12 / 2 6 cycles
Resource usage is additive for all BBs that are
if-converted
100
BB1
BB1
60
40
BB2 if p1
BB2
BB3
Case 2 Each BB requires 3 resources Assume
processor has 6 resources No IC 13 .63
.43 13 9 9 / 6 1.5 2 cycles IC
1(3333) 12 12 / 6 2 cycles
BB3 if p2
60
40
BB4
BB4
100
7
Negative 2 Dependence Height
Case 1 height(bb1) 1, height(bb2)
3 Height(bb3) 9, height(bb4) 2 No IC 11
.63 .49 12 8.4 IC 11 1MAX(3,9)
13 13
Dependence height is max of for all BBs that are
if-converted (dep height schedule length with
infinite resources)
100
BB1
BB1
Case 2 height(bb1) 1, height(bb2)
3 Height(bb3) 3, height(bb4) 2 No IC 11
.63 .43 12 6 IC 11 1MAX(3,3)
12 6
60
40
BB2 if p1
BB2
BB3
BB3 if p2
60
40
BB4
BB4
100
8
Negative 3 Hazard Presence
Case 1 Hazard in BB3 No IC SB out of BB1, 2,
4, operations In BB4 free to overlap with those
in BB1 and BB2 IC operations in BB4 cannot
overlap With those in BB1 (BB2 ok)
Hazard operation that forces the compiler to be
conservative, so limited reordering or
optimization, e.g., subroutine call, pointer
store,
100
BB1
BB1
60
40
BB2 if p1
BB2
BB3
BB3 if p2
60
40
BB4
BB4
100
9
When To If-convert

Resources
Small resource usage ideal for less important
paths
Dependence height
Matched heights are ideal
Close to same heights is ok
Remember everything is relative for resources
and dependence height !
Hazards
Avoid hazards unless on most important path
Estimate of benefit
Branches/Mispredicts removed
Fudge factor

100
BB1
BB1
60
40
BB2 if p1
BB2
BB3
BB3 if p2
60
40
BB4
BB4
100
10
The Hyperblock

Hyperblock - Collection of basic blocks in which
control flow may only enter at the first BB. All
internal control flow is eliminated via
if-conversion
Likely control flow paths
Acyclic (outer backedge ok)
Multiple intersecting traces with no side
entrances
Side exits still exist
Hyperblock formation
1. Block selection
2. Tail duplication
3. If-conversion

10
BB1
80
90
20
BB2
BB3
80
20
BB4
10
BB5
90
10
BB6
10
11
Block Selection

Block selection
Select subset of BBs for inclusion in HB
Difficult problem
Weighted cost/benefit function
Height overhead
Resource overhead
Hazard overhead
Branch elimination benefit
Weighted by frequency

10
BB1
80
90
20
BB2
BB3
80
20
BB4
10
BB5
90
10
BB6
10
12
Block Selection

Create a trace ?main path
Use a heuristic function to select other blocks
that are compatible with the main path
Consider each BB by itself for simplicity
Compute priority for other BBs
Normalize against main path.
BSVi (K x (weight_bbi / size_bbi) x
(size_main_path / weight_main_path) x bb_chari)
weight execution frequency
size number of operations
bb_char characteristic value of each BB
Max value 1, Hazardous instructions reduce
this to 0.5, 0.25, ...
K constant to represent processor issue rate
Include BB when BSVi gt Threshold

13
Example - Step 1 - Block Selection
main path 1,2,4,6 num_ops 5 8 3 2
18 weight 80 Calculate the BSVs for BB3,
BB5 assuming no hazards, K 4 BSV3 4 x (20 /
2) x (18 / 80) 9 BSV5 4 x (10 / 5) x (18 /
80) 1.8 If Threshold 2.0, select BB3 along
with main path
10
BB1 - 5
80
90
20
BB2 - 8
BB3 2
80
20
BB4 - 3
10
BB5 - 5
90
10
BB6 - 2
10
14
Example - Step 2 - Tail Duplication
Tail duplication same as with Superblock formation
10
10
BB1
BB1
80
20
80
20
BB2
BB3
BB2
BB3
80
20
80
20
BB4
BB4
10
10
BB5
90
BB5
90
10
10
BB6
BB6
BB6
90
81
9
10
9
1
15
Example - Step 3 If-conversion
If-convert intra-HB branches only!!
10
10
BB1
80
20
BB1 p1,p2 CMPP
BB2
BB3
80
20
BB2 if p1
BB4
BB3 if p2
10
BB4
BB5
90
BB6
BB5
10
10
BB6
81
BB6
9
81
BB6
9
9
1
1
9
16
Hyperblock Performance Evaluation (1)

O BB code
IP Structural if-conversion
All innermost loops, acyclic SEME regions
PP Selective if-conversion

17
Class Problem
Form the HB for this subgraph Assume K 4, BSV
Threshold 2
100
BB1- 3
20
80
BB2 - 8
BB3 - 2
80
20
BB4 - 2
45
55
BB5 - 3
BB6 - 2
10
35
55
BB7 -1
BB8 -2
35
10
BB9 -1
18
Block Selection Try 2

Problems with BSV formula
Ignore dependence height
Blocks considered independently (control flow
ignored)
Enumerate all paths of execution through region
of interest
Consider a path execution from entry to some
exit
Give priority to path as a whole
Path priority
dep_ratioi 1.0 (dep_heighti / max dep_height)
op_ratioi 1.0 (num_opsi / max num_ops)
priorityi (probabilityi x hazardi) x
(dep_ratioi op_ratioi K)
Hazard multiplier was 0.25 for paths containing
subroutine call or unresolvable memory store
K base contribution for a path (0.1 used)

19
Block Selection Try 2 (continued)

Path selection
Rank paths from highest to lowest priority
Include paths until either
Estimated available resources full
Priority drops too low
Exclude any paths with excessive resource util or
dep height
Use union of selected paths to form Hyperblock
Causes some lower priority paths to be included

20
Block Selection - Try 2 - Example
Enumerate all paths, rank by priority
1. A-B-D-E-F-H-N 2. A-B-D-E-F-H-K-N 3.
A-B-D-E-G-J-M-N 4. A-B-D-E-G-J-L-M-N 5.
A-B-D-E-G-I-M-N 6. A-B-D-E-G-J-L-N 7. A-B-D 8.
A-C-D-E-F-H-N 9. A-C-D-E-F-H-K-N 10.
A-C-D-E-G-J-M-N 11. A-C-D-E-G-J-L-M-N 12.
A-C-D-E-G-I-M-N 13. A-C-D-E-G-J-L-N 14. A-C-D
15. A-B-D-E-F-G-I-M-N 16. A-B-D-E-F-G-J-M-N 17.
A-B-D-E-F-G-J-L-M-N 18. A-B-D-E-F-G-J-L-N 19.
A-B-C-E-F-G-I-M-N 20. A-B-C-E-F-G-J-M-N 21.
A-B-C-E-F-G-J-L-M-N 22. A-B-C-E-F-G-J-L-N
21
Block Selection Try 2 Example continued
22
Hyperblock Performance Using Paths
4 - issue
8 - issue
23
Control CPR A Branch Height Reduction
Optimization for EPIC ArchitecturesPLDI - 1999

Mike Schlansker
Scott Mahlke
Hewlett-Packard Laboratories
Richard Johnson
Transmeta Corporation

24
Introduction and Problem Statement

Dependences limit performance
Data
Control
Long dependence chains
Sequential code
Problem worse for next generation processors
High degree hardware parallelism
Low degree of program parallelism
Resources idle most of the time
Height reduction optimizations
Traditional compilers focus on reducing operation
count
Future compilers need on increasing program
parallelism

25
Height Reduction Optimization

Goals
Break dependences
Reduce latency of edges
Reorganize computation
Common approach
Tradeoff redundant work for reduced height
Inverse of CSE
Data height reduction
Use of the associative property
Induction variable back substitution
Control height reduction
Control dependences
Reduce height through branch network
Focus of our work

26
Our Approach to Control Height Reduction

Goals
Reduce dependence height through a network of
branches
Reduce number of executed branches
Applicable to a large fraction of the program
Fit into our existing compiler infrastructure
Difficulty
Reducing height while
Not increasing operation count
Irredundant Consecutive Branch Method (ICBM)
Use branch profile information
Optimize likely the important control flow paths
Possibly penalize less important paths

27
Definitions

Superblock
single-entry linear sequence of operations
containing 1 or more branches
Our basic compilation unit
Non-speculative operations
Exit branch
branch to allow early transfer out of the
superblock
compare condition (ai lt bi)
On-trace
preferred execution path (E4)
identified by profiling
Off-trace
non-preferred paths (E1, E2, E3)
taking an exit branch

28
ICBM for a Simple RISC Processor - Step 1
Input superblock
Insert bypass branch
29
ICBM for a Simple RISC Processor - Step 2
Superblock with bypass branch
Move code down through bypass branch
30
ICBM for a Simple RISC Processor - Step 3
Code after downward motion
Simplify resultant code
31
ICBM for a Simple RISC Processor - Step 4
Sequential boolean
Height reduced
Code after simplification
expression
expression
32
Is the ICBM Transformation Always Correct?