EECS 583 Lecture 18 Group 1 Advanced control flow analysis and optimization presentation

About This Presentation

Transcript and Presenter's Notes

Title: EECS 583 Lecture 18 Group 1 Advanced control flow analysis and optimization

1
EECS 583 Lecture 18Group 1 Advanced control
flow analysis and optimization

University of Michigan
March 18, 2002

2
Projects

Heuristic hyperblock formation
Aaron Erlandson, David Ray
When to if-convert, when not to
Correlated branch analysis and optimization
Nael Botros, Jay Luangsuwimol, Nuwee Wiwatwattana
Use code duplication to eliminate or make
branches more predictable
Compiler switch spacewalking
Ibrahim Bashir, Peter Schwartz
What switch settings should be used to maximize
application performance on a particular
architecture

3
EECS 583Control Flow GroupHeuristic Hyperblock
Formation

University of Michigan
March 18, 2002

4
Hyperblock Review

Basic Block Have Problems
Too small to achieve sufficient ILP
Control Flow is too complex for many compiler
transforms.
Not all basic blocks are equal
Want
Intermediate sized regions with simple control
flow
Bigger basic blocks would be ideal !!
Separate important code from less important
Optimize frequently executed code at the expense
of the rest
Traces
Traces play a key role in forming larger regions
Can make Superblocks

5
Hyperblock Review (2)
10
BB1
BB1
20
80
90
80
20
BB2
BB2
BB3
BB3
64.8
80
20
80
20
BB4
BB4
BB4
8
10
20
72
BB5
BB5
90
28
10
BB6
BB6
BB6
7.2
25.2
2.8
10
6
Hyperblock Review (3)

Superblocks are not perfect either
Compiler complexity
Code growth
Here comes the HYPERBLOCK!
Extend superblock to contain if-converted code
We did this in Homework 1
About three steps to formation
1. Block selection
2. Tail duplication
3. If-conversion

7
Hyperblock Formation
8
Hyperblocks A Silver Bullet?

Sounds like a good idea, right?
Increase potential for operation overlap
Remove branches
More aggressive compiler transforms
However, you cannot just if-convert everything.

9
Pitfalls in Hyperblock Formation

So, what can go wrong if we choose the wrong BBs
for inclusion in a hyperblock?
Hazards
Function calls, unresolved memory stores, etc.
Forces compiler to choose a conservative
optimization strategy.
Limits instruction reordering
Thus, including a BB with a hazard can lead to
poorly optimized, poorly performing code

10
HB Pitfalls (Execution Frequency)

The execution frequency of combined BBs can
impact performance. For example
The Good
BB1 small block, infrequently executed
BB2 large block, frequently executed
BB1 BB2 large, frequently executed hyperblock
where a majority of ops are executed
The Bad
BB1 small block, frequently executed
BB2 large block, infrequently executed
BB1 BB2 large, frequently executed hyperblock
with a preponderance of useless code

11
HB Pitfalls (Resource Utilization)

Resource Utilization
Required resources are additive across all BBs.
Combination of resource intensive BBs results in
resource conflicts at schedule time
Could result in poorer performance than original
code.

12
HB Pitfalls (Dependence Height)

Recall
Dependence Height Schedule length with infinite
resources.
A hyperblock must execute all the instructions of
its constituent paths.
Dependence height of a HB is the Max of the
dependence height of all its paths
Thus, a small if block combined with a large else
block can result in a significant slowdown since
the large part is always executed.

13
HB Pitfalls (When are HBs formed?)

HBs can be formed in two places
At the beginning of the backend, before
optimization
At the end of the backend, before/during
scheduling
Before optimization
Less control flow to hinder aggressive compiler
optimization
Compiler optimizations can have a unexpected
impact on hyperblock formation decisions.
Before/During Scheduling
Scheduler has more intimate knowledge of target
processor
Can make resource-based formation decisions
Control flow can hinder aggressive compiler
optimization prior to scheduling

14
Early HB Formation (1)

Subsequent optimizations can turn bad formation
decisions into good ones

Two seemingly incompatible paths
Both blocks scheduled in 3 instructions after
renaming optimization

15
Early HB Formation (2)

Unpredictable Resource Interference
Cant assume that resource usage is evenly
distributed through block
Could contain sections of parallel and sequential
code
Parallel -gt High resource usage
Sequential -gt Low resource usage
This partition of usage could mask the real
resource usage of a block
Two blocks may have seemingly compatible resource
utilizations, but if their parallel sections are
overlapped, resource interference and poorer
performance can result.

16
Early HB Formation (3)

Resource Interference

Seemingly compatible paths
Poorer performance due to resource interference

17
Traditional Predicated Execution

There was no decision
Apply if-conversion to entire innermost loops
Enable modulo scheduling
If-conversion is all or nothing
Numerical Applications
Need more flexible strategy

18
Hyperblock Decision factors

In light of potential drawbacks
Must create hyperblocks selectively
How to choose what to if-convert?
Factors
Execution frequency
Size
Resource usage
Dependence height
Number of instructions
Instruction Characteristics
Hazards
Factors alone do not force a decision
Priority-based

19
Less-obvious factor

Consider this code snippet
Dependence height of fall through path 6 cycles
Register renaming reduces height. (7 ? 8 and 8 ?
10)
Good heuristic should anticipate other
optimizations and when they occur.

20
Block Selection Method 1 (Example)

Create a trace, the main path
Use a heuristic function to give priority to the
main path
Compute priority for other BBs
Normalize against main path.
One such heuristic
bb_char characteristic value of each BB
Typically 1. Hazardous instructions affect this.
K constant to represent processor issue rate
Heuristic also used to influence node-splitting

21
Method 2

Enumerate all paths of execution
Give priority to path as a whole
Hazard multiplier in this paper was 0.25
Paths containing subroutine call or unresolvable
memory store
K base contribution for a path
K 0.1 for this paper

22
Method 2 (continued)

Block Selection Algorithm
Resource constraints
Dependence height vs. Highest Priority Path
Path priority must be within some fraction of
last path priority
Uses simplified model of resources
Issue Width x Dependence Height
Other padding multipliers
Use union of selected paths to form Hyperblock
Causes some lower priority paths to be included

23
Method 3

Take into account other optimizations, resource
interference, and partial paths.
Account for other optimizations
Create hyperblocks right before scheduling
Adds predicate complexity to scheduler
Whats the happy medium?

24
Method 3 (contd)

Examine Hyperblocks twice
First stage
Aggressive
Form big blocks
Liberal with resources
Second stage Partial Reverse if-conversion

25
Method 3 (contd)

Partial Reverse if-conversion
Modifies hyperblock paths to enhance schedule
Inserts branch to removed code
Analysis
In-depth process
Predicate Flow Graph (PFG)
Determine savings with 2 HBs vs. 1 HB
Determine penalty from branch
Heuristic-guided
Predicate define schedule
Which reverse if-conversions provides benefit?

26
Hyperblock Performance Evaluation (1)

So, what do hyperblocks buy us?
For simple hyperblock formation, benchmarks
results depend on issue rate of processor (2,4,
or 8) and whether all execution paths are
included (IP), or only selected paths are
included.
Performance gains are more common if issue rate
is gt 2.

Effective Compiler Support for Predicated
Execution Using the Hyperblock (1992), S.
Mahlke, et al
27
Hyperblock Performance Evaluation (2)

Using Partial Reverse If-Conversion, significant
speedups were observed over standard superblock
and hyperblock code
Note that in many cases, hyperblock performance
losses became gains

A Framework for Balancing Control Flow and
Predication , D. August, et al.
28
Hyperblock Code Growth

Static code size growth of simple HB formation
and partial RIC formation versus superblocks
varies with benchmarks
HBs reduce code size through tail-duplication
elimination, but if-conversion introduces code
growth

A Framework for Balancing Control Flow and
Predication , D. August, et al.
29
Conslusions

Hyperblocks have been shown to improve
performance when applied judiciously
There are several, often conflicting, constraints
on hyperblock formation algorithms
Optimal hyperblock formation is a non-trivial
problem
Several formation heuristics have been
implemented with varying degrees of success.
How can we improve upon them?

30
Acknowledgements

Method 1
Effective Compiler Support for Predicated
Execution Using the Hyperblock
Scott Mahlke, David Lin, William Chen, Richard
Hank, Roger Bringmann
Method 2
Exploiting Instruction Level Parallelism in the
Presence of Conditional Branches
Scott Mahlke
Method 3
A Framework for Balancing Control Flow and
Predication
David August, Wen-mei Hwu, Scott Mahlke

31
Optimized Selection of Compiler Switches

Ibrahim Bashir
Group 1 Control Flow Analysis and Optimization
March 18, 2002

32
Overview of switch selection

Compilers have many different optimizations
available
If-conversion
Common subexpression elimination
Instruction scheduling
Loop unrolling
Switches are used to turn ON/OFF each of these
optimizations
How do we decide what combination of switches to
select?

33
Issues involved

As the number of switches increases, the number
of possible combinations increases exponentially
10 switches, 210 1024 combinations
20 switches, over a million combinations
Limited resources/time
Not feasible to try all combinations
There are different types of switches
ON/OFF (binary)
LOW/MEDIUM/HIGH
Range of values (0-100)
Switches are not necessarily independent of each
other
There can be positive or negative interactions

34
Why this is useful

Currently, user has to rely on trial-and-error
Not the most efficient technique
Saves development time
Portable to other compilers

35
Automating the process

Paper 1 Automatic Recommendation of Compiler
Options by Elana Granston and Anne Holler
Describes a program called Dr. Options that
recommends compiler switches
Dr. Options can recommend optimization options
for the program as a whole or for each individual
module
Allows optimization of hot modules, thus saving
time

36
How it works

Uses 3 sources of information
1. User-supplied information
Application type
Optimization goals
Constraints
2. Compiler-supplied information
Function and module sizes
Characteristics of loops
Data access patterns
3. Profile information
Determine relative importance of each function

37
Performance/Usefulness

Recommendations are substantially better than
what an unassisted user would select
Provides explanations on why its recommending
particular options
Leads to a better understanding of the
characteristics of the application
Good consultation tool for performance analysts

38
Feedback-Directed Selection

Paper 2 Feedback-Directed Selection and
Characterization of Compiler Options by Kingsum
Chow and Youfeng Wu
Uses fractional factorial design (FFD) of
experiments and feedback from run-time
performance to optimize selection of switches
Identifies interaction (constructive or
destructive) between different switches
Each experiment is a refinement of earlier ones
Finds the switches with the best chance of being
part of the optimal selection

39
Ideas used

For a particular application, certain switches
might have little or no effect on performance and
interact with only a few others
Testing all permutations would be a waste of
resources
FFD of experiments works as follows
Each compiler switch is a factor
Each subset of factors is an interaction between
switches
Purpose of an experiment is to determine effects
and interactions of factors

40
Ideas applied

Systematically select a subset of experimental
runs
Aliasing
Ambiguity exists between the effects of switches
Performance of a particular combination can be
the result of certain switches being ON or
certain switches begin OFF
Start with a small number of runs and collect
performance results
Do further experiments using the most effective
factors
More experiments can be done to resolve aliasing

41
Pros/Cons of FFD

Advantages
Number of combinations to consider doesnt
increase exponentially
Significantly fewer runs can lead to a
near-optimal solution
Identifies interactions between different
optimizations
Can be used to determine interesting switches
quickly
Effects of these switches can be further explored
using artificial intelligence techniques (genetic
algorithms)
Disadvantages
Needs an initial set of switches to work with
Requires user input/thinking
Currently only works with ON/OFF type switches

42
Evolutionary Algorithms for Reinforcement Learning

Written by David Moriarty, Alan Schultz, and John
Grefenstette
Published in 1999
Good survey of research in evolutionary
algorithms
Response to other surveys of reinforcement
learning that did not include EA

43
Evolutionary Algorithms

Form of randomized search
Useful for very large search space
Search is biased toward better solutions
Based on Darwins Theory of Evolution

44
Darwins Theory of Evolution

More fit individuals are more likely to survive
and reproduce
Offspring inherit traits of parents along with
random mutations
Weaker individuals weeded out through natural
selection
Population tends to get stronger over time

45
Basic Algorithm

Initialize population
Evaluate individuals
While termination condition not met
Select individuals
Reproduce offspring
Evaluate individuals

46
Initialize Population

Usually done randomly
Might help to start with individuals you already
know are good
Variety is good

47
Evaluate Individuals

Each individual is tested with a fitness function
Fitness function estimates how good individual
will perform in real environment
Better fitness function can lead to stronger
individuals, but can take more time

48
Termination Condition

Not always easy to define
Can be domain dependent
Some possibilities
Performance criterion is met
Preset number of generations have passed
Ran out of time

49
Select Individuals

Based on evaluation from fitness function
Sometimes just pick n strongest individuals
Can also assign probability of survival based on
evaluation, then pick randomly

50
Reproduce Offspring

Crossover
Randomly select crossover point
Offspring inherits genetic code before crossover
point from one parent and after from the other
Mutation
Make small random changes to offspring

51
Possible Problems

No guarantee of optimal solution
Only searches part of search space
Might get stuck at local maximum
Can be slow
Might need lots of generations
Fitness function usually takes time

52
Possible Solutions

Immigration
Gender
Multiple chromosomes
Dominant/recessive genes

53
Distributed Representation

Break down the problem into subtasks
Each co-evolving population addresses different
subtask
Combine individuals from different populations to
form collaborations
Individuals from stronger collaborations get
higher evaluations

54
Switch Optimization Ideas

Identify important switches
Identify independent switches for distributed
representation
Start with crude (but fast) fitness function,
keep refining it in later generations

Write a Comment

User Comments (0)

About PowerShow.com

EECS 583 Lecture 18 Group 1 Advanced control flow analysis and optimization PowerPoint PPT Presentation