Title: EECS 583 Lecture 18 Group 1 Advanced control flow analysis and optimization
1EECS 583 Lecture 18Group 1 Advanced control
flow analysis and optimization
- University of Michigan
- March 18, 2002
2Projects
- Heuristic hyperblock formation
- Aaron Erlandson, David Ray
- When to if-convert, when not to
- Correlated branch analysis and optimization
- Nael Botros, Jay Luangsuwimol, Nuwee Wiwatwattana
- Use code duplication to eliminate or make
branches more predictable - Compiler switch spacewalking
- Ibrahim Bashir, Peter Schwartz
- What switch settings should be used to maximize
application performance on a particular
architecture
3EECS 583Control Flow GroupHeuristic Hyperblock
Formation
- University of Michigan
- March 18, 2002
4Hyperblock Review
- Basic Block Have Problems
- Too small to achieve sufficient ILP
- Control Flow is too complex for many compiler
transforms. - Not all basic blocks are equal
- Want
- Intermediate sized regions with simple control
flow - Bigger basic blocks would be ideal !!
- Separate important code from less important
- Optimize frequently executed code at the expense
of the rest - Traces
- Traces play a key role in forming larger regions
- Can make Superblocks
5Hyperblock Review (2)
10
BB1
BB1
20
80
90
80
20
BB2
BB2
BB3
BB3
64.8
80
20
80
20
BB4
BB4
BB4
8
10
20
72
BB5
BB5
90
28
10
BB6
BB6
BB6
7.2
25.2
2.8
10
6Hyperblock Review (3)
- Superblocks are not perfect either
- Compiler complexity
- Code growth
- Here comes the HYPERBLOCK!
- Extend superblock to contain if-converted code
- We did this in Homework 1
- About three steps to formation
- 1. Block selection
- 2. Tail duplication
- 3. If-conversion
7Hyperblock Formation
8Hyperblocks A Silver Bullet?
- Sounds like a good idea, right?
- Increase potential for operation overlap
- Remove branches
- More aggressive compiler transforms
- However, you cannot just if-convert everything.
9Pitfalls in Hyperblock Formation
- So, what can go wrong if we choose the wrong BBs
for inclusion in a hyperblock? - Hazards
- Function calls, unresolved memory stores, etc.
- Forces compiler to choose a conservative
optimization strategy. - Limits instruction reordering
- Thus, including a BB with a hazard can lead to
poorly optimized, poorly performing code
10HB Pitfalls (Execution Frequency)
- The execution frequency of combined BBs can
impact performance. For example - The Good
- BB1 small block, infrequently executed
- BB2 large block, frequently executed
- BB1 BB2 large, frequently executed hyperblock
where a majority of ops are executed - The Bad
- BB1 small block, frequently executed
- BB2 large block, infrequently executed
- BB1 BB2 large, frequently executed hyperblock
with a preponderance of useless code
11HB Pitfalls (Resource Utilization)
- Resource Utilization
- Required resources are additive across all BBs.
- Combination of resource intensive BBs results in
resource conflicts at schedule time - Could result in poorer performance than original
code.
12HB Pitfalls (Dependence Height)
- Recall
- Dependence Height Schedule length with infinite
resources. - A hyperblock must execute all the instructions of
its constituent paths. - Dependence height of a HB is the Max of the
dependence height of all its paths - Thus, a small if block combined with a large else
block can result in a significant slowdown since
the large part is always executed.
13HB Pitfalls (When are HBs formed?)
- HBs can be formed in two places
- At the beginning of the backend, before
optimization - At the end of the backend, before/during
scheduling - Before optimization
- Less control flow to hinder aggressive compiler
optimization - Compiler optimizations can have a unexpected
impact on hyperblock formation decisions. - Before/During Scheduling
- Scheduler has more intimate knowledge of target
processor - Can make resource-based formation decisions
- Control flow can hinder aggressive compiler
optimization prior to scheduling
14Early HB Formation (1)
- Subsequent optimizations can turn bad formation
decisions into good ones
- Two seemingly incompatible paths
- Both blocks scheduled in 3 instructions after
renaming optimization
15Early HB Formation (2)
- Unpredictable Resource Interference
- Cant assume that resource usage is evenly
distributed through block - Could contain sections of parallel and sequential
code - Parallel -gt High resource usage
- Sequential -gt Low resource usage
- This partition of usage could mask the real
resource usage of a block - Two blocks may have seemingly compatible resource
utilizations, but if their parallel sections are
overlapped, resource interference and poorer
performance can result.
16Early HB Formation (3)
- Seemingly compatible paths
- Poorer performance due to resource interference
17Traditional Predicated Execution
- There was no decision
- Apply if-conversion to entire innermost loops
- Enable modulo scheduling
- If-conversion is all or nothing
- Numerical Applications
- Need more flexible strategy
18Hyperblock Decision factors
- In light of potential drawbacks
- Must create hyperblocks selectively
- How to choose what to if-convert?
- Factors
- Execution frequency
- Size
- Resource usage
- Dependence height
- Number of instructions
- Instruction Characteristics
- Hazards
- Factors alone do not force a decision
- Priority-based
19Less-obvious factor
- Consider this code snippet
- Dependence height of fall through path 6 cycles
- Register renaming reduces height. (7 ? 8 and 8 ?
10) - Good heuristic should anticipate other
optimizations and when they occur.
20Block Selection Method 1 (Example)
- Create a trace, the main path
- Use a heuristic function to give priority to the
main path - Compute priority for other BBs
- Normalize against main path.
- One such heuristic
- bb_char characteristic value of each BB
- Typically 1. Hazardous instructions affect this.
- K constant to represent processor issue rate
- Heuristic also used to influence node-splitting
21Method 2
- Enumerate all paths of execution
- Give priority to path as a whole
- Hazard multiplier in this paper was 0.25
- Paths containing subroutine call or unresolvable
memory store - K base contribution for a path
- K 0.1 for this paper
22Method 2 (continued)
- Block Selection Algorithm
- Resource constraints
- Dependence height vs. Highest Priority Path
- Path priority must be within some fraction of
last path priority - Uses simplified model of resources
- Issue Width x Dependence Height
- Other padding multipliers
- Use union of selected paths to form Hyperblock
- Causes some lower priority paths to be included
23Method 3
- Take into account other optimizations, resource
interference, and partial paths. - Account for other optimizations
- Create hyperblocks right before scheduling
- Adds predicate complexity to scheduler
- Whats the happy medium?
24Method 3 (contd)
- Examine Hyperblocks twice
- First stage
- Aggressive
- Form big blocks
- Liberal with resources
- Second stage Partial Reverse if-conversion
25Method 3 (contd)
- Partial Reverse if-conversion
- Modifies hyperblock paths to enhance schedule
- Inserts branch to removed code
- Analysis
- In-depth process
- Predicate Flow Graph (PFG)
- Determine savings with 2 HBs vs. 1 HB
- Determine penalty from branch
- Heuristic-guided
- Predicate define schedule
- Which reverse if-conversions provides benefit?
26Hyperblock Performance Evaluation (1)
- So, what do hyperblocks buy us?
- For simple hyperblock formation, benchmarks
results depend on issue rate of processor (2,4,
or 8) and whether all execution paths are
included (IP), or only selected paths are
included. - Performance gains are more common if issue rate
is gt 2.
Effective Compiler Support for Predicated
Execution Using the Hyperblock (1992), S.
Mahlke, et al
27Hyperblock Performance Evaluation (2)
- Using Partial Reverse If-Conversion, significant
speedups were observed over standard superblock
and hyperblock code - Note that in many cases, hyperblock performance
losses became gains
A Framework for Balancing Control Flow and
Predication , D. August, et al.
28Hyperblock Code Growth
- Static code size growth of simple HB formation
and partial RIC formation versus superblocks
varies with benchmarks - HBs reduce code size through tail-duplication
elimination, but if-conversion introduces code
growth
A Framework for Balancing Control Flow and
Predication , D. August, et al.
29Conslusions
- Hyperblocks have been shown to improve
performance when applied judiciously - There are several, often conflicting, constraints
on hyperblock formation algorithms - Optimal hyperblock formation is a non-trivial
problem - Several formation heuristics have been
implemented with varying degrees of success. - How can we improve upon them?
30Acknowledgements
- Method 1
- Effective Compiler Support for Predicated
Execution Using the Hyperblock - Scott Mahlke, David Lin, William Chen, Richard
Hank, Roger Bringmann - Method 2
- Exploiting Instruction Level Parallelism in the
Presence of Conditional Branches - Scott Mahlke
- Method 3
- A Framework for Balancing Control Flow and
Predication - David August, Wen-mei Hwu, Scott Mahlke
31Optimized Selection of Compiler Switches
- Ibrahim Bashir
- Group 1 Control Flow Analysis and Optimization
- March 18, 2002
32Overview of switch selection
- Compilers have many different optimizations
available - If-conversion
- Common subexpression elimination
- Instruction scheduling
- Loop unrolling
- Switches are used to turn ON/OFF each of these
optimizations - How do we decide what combination of switches to
select?
33Issues involved
- As the number of switches increases, the number
of possible combinations increases exponentially - 10 switches, 210 1024 combinations
- 20 switches, over a million combinations
- Limited resources/time
- Not feasible to try all combinations
- There are different types of switches
- ON/OFF (binary)
- LOW/MEDIUM/HIGH
- Range of values (0-100)
- Switches are not necessarily independent of each
other - There can be positive or negative interactions
34Why this is useful
- Currently, user has to rely on trial-and-error
- Not the most efficient technique
- Saves development time
- Portable to other compilers
35Automating the process
- Paper 1 Automatic Recommendation of Compiler
Options by Elana Granston and Anne Holler - Describes a program called Dr. Options that
recommends compiler switches - Dr. Options can recommend optimization options
for the program as a whole or for each individual
module - Allows optimization of hot modules, thus saving
time
36How it works
- Uses 3 sources of information
- 1. User-supplied information
- Application type
- Optimization goals
- Constraints
- 2. Compiler-supplied information
- Function and module sizes
- Characteristics of loops
- Data access patterns
- 3. Profile information
- Determine relative importance of each function
37Performance/Usefulness
- Recommendations are substantially better than
what an unassisted user would select - Provides explanations on why its recommending
particular options - Leads to a better understanding of the
characteristics of the application - Good consultation tool for performance analysts
38Feedback-Directed Selection
- Paper 2 Feedback-Directed Selection and
Characterization of Compiler Options by Kingsum
Chow and Youfeng Wu - Uses fractional factorial design (FFD) of
experiments and feedback from run-time
performance to optimize selection of switches - Identifies interaction (constructive or
destructive) between different switches - Each experiment is a refinement of earlier ones
- Finds the switches with the best chance of being
part of the optimal selection
39Ideas used
- For a particular application, certain switches
might have little or no effect on performance and
interact with only a few others - Testing all permutations would be a waste of
resources - FFD of experiments works as follows
- Each compiler switch is a factor
- Each subset of factors is an interaction between
switches - Purpose of an experiment is to determine effects
and interactions of factors
40Ideas applied
- Systematically select a subset of experimental
runs - Aliasing
- Ambiguity exists between the effects of switches
- Performance of a particular combination can be
the result of certain switches being ON or
certain switches begin OFF - Start with a small number of runs and collect
performance results - Do further experiments using the most effective
factors - More experiments can be done to resolve aliasing
41Pros/Cons of FFD
- Advantages
- Number of combinations to consider doesnt
increase exponentially - Significantly fewer runs can lead to a
near-optimal solution - Identifies interactions between different
optimizations - Can be used to determine interesting switches
quickly - Effects of these switches can be further explored
using artificial intelligence techniques (genetic
algorithms) - Disadvantages
- Needs an initial set of switches to work with
- Requires user input/thinking
- Currently only works with ON/OFF type switches
42Evolutionary Algorithms for Reinforcement Learning
- Written by David Moriarty, Alan Schultz, and John
Grefenstette - Published in 1999
- Good survey of research in evolutionary
algorithms - Response to other surveys of reinforcement
learning that did not include EA
43Evolutionary Algorithms
- Form of randomized search
- Useful for very large search space
- Search is biased toward better solutions
- Based on Darwins Theory of Evolution
44Darwins Theory of Evolution
- More fit individuals are more likely to survive
and reproduce - Offspring inherit traits of parents along with
random mutations - Weaker individuals weeded out through natural
selection - Population tends to get stronger over time
45Basic Algorithm
- Initialize population
- Evaluate individuals
- While termination condition not met
- Select individuals
- Reproduce offspring
- Evaluate individuals
46Initialize Population
- Usually done randomly
- Might help to start with individuals you already
know are good - Variety is good
47Evaluate Individuals
- Each individual is tested with a fitness function
- Fitness function estimates how good individual
will perform in real environment - Better fitness function can lead to stronger
individuals, but can take more time
48Termination Condition
- Not always easy to define
- Can be domain dependent
- Some possibilities
- Performance criterion is met
- Preset number of generations have passed
- Ran out of time
49Select Individuals
- Based on evaluation from fitness function
- Sometimes just pick n strongest individuals
- Can also assign probability of survival based on
evaluation, then pick randomly
50Reproduce Offspring
- Crossover
- Randomly select crossover point
- Offspring inherits genetic code before crossover
point from one parent and after from the other - Mutation
- Make small random changes to offspring
51Possible Problems
- No guarantee of optimal solution
- Only searches part of search space
- Might get stuck at local maximum
- Can be slow
- Might need lots of generations
- Fitness function usually takes time
52Possible Solutions
- Immigration
- Gender
- Multiple chromosomes
- Dominant/recessive genes
53Distributed Representation
- Break down the problem into subtasks
- Each co-evolving population addresses different
subtask - Combine individuals from different populations to
form collaborations - Individuals from stronger collaborations get
higher evaluations
54Switch Optimization Ideas
- Identify important switches
- Identify independent switches for distributed
representation - Start with crude (but fast) fitness function,
keep refining it in later generations