EECS 583 Lecture 18 Group 1 Advanced control flow analysis and optimization PowerPoint PPT Presentation

presentation player overlay
1 / 54
About This Presentation
Transcript and Presenter's Notes

Title: EECS 583 Lecture 18 Group 1 Advanced control flow analysis and optimization


1
EECS 583 Lecture 18Group 1 Advanced control
flow analysis and optimization
  • University of Michigan
  • March 18, 2002

2
Projects
  • Heuristic hyperblock formation
  • Aaron Erlandson, David Ray
  • When to if-convert, when not to
  • Correlated branch analysis and optimization
  • Nael Botros, Jay Luangsuwimol, Nuwee Wiwatwattana
  • Use code duplication to eliminate or make
    branches more predictable
  • Compiler switch spacewalking
  • Ibrahim Bashir, Peter Schwartz
  • What switch settings should be used to maximize
    application performance on a particular
    architecture

3
EECS 583Control Flow GroupHeuristic Hyperblock
Formation
  • University of Michigan
  • March 18, 2002

4
Hyperblock Review
  • Basic Block Have Problems
  • Too small to achieve sufficient ILP
  • Control Flow is too complex for many compiler
    transforms.
  • Not all basic blocks are equal
  • Want
  • Intermediate sized regions with simple control
    flow
  • Bigger basic blocks would be ideal !!
  • Separate important code from less important
  • Optimize frequently executed code at the expense
    of the rest
  • Traces
  • Traces play a key role in forming larger regions
  • Can make Superblocks

5
Hyperblock Review (2)
10
BB1
BB1
20
80
90
80
20
BB2
BB2
BB3
BB3
64.8
80
20
80
20
BB4
BB4
BB4
8
10
20
72
BB5
BB5
90
28
10
BB6
BB6
BB6
7.2
25.2
2.8
10
6
Hyperblock Review (3)
  • Superblocks are not perfect either
  • Compiler complexity
  • Code growth
  • Here comes the HYPERBLOCK!
  • Extend superblock to contain if-converted code
  • We did this in Homework 1
  • About three steps to formation
  • 1. Block selection
  • 2. Tail duplication
  • 3. If-conversion

7
Hyperblock Formation
8
Hyperblocks A Silver Bullet?
  • Sounds like a good idea, right?
  • Increase potential for operation overlap
  • Remove branches
  • More aggressive compiler transforms
  • However, you cannot just if-convert everything.

9
Pitfalls in Hyperblock Formation
  • So, what can go wrong if we choose the wrong BBs
    for inclusion in a hyperblock?
  • Hazards
  • Function calls, unresolved memory stores, etc.
  • Forces compiler to choose a conservative
    optimization strategy.
  • Limits instruction reordering
  • Thus, including a BB with a hazard can lead to
    poorly optimized, poorly performing code

10
HB Pitfalls (Execution Frequency)
  • The execution frequency of combined BBs can
    impact performance. For example
  • The Good
  • BB1 small block, infrequently executed
  • BB2 large block, frequently executed
  • BB1 BB2 large, frequently executed hyperblock
    where a majority of ops are executed
  • The Bad
  • BB1 small block, frequently executed
  • BB2 large block, infrequently executed
  • BB1 BB2 large, frequently executed hyperblock
    with a preponderance of useless code

11
HB Pitfalls (Resource Utilization)
  • Resource Utilization
  • Required resources are additive across all BBs.
  • Combination of resource intensive BBs results in
    resource conflicts at schedule time
  • Could result in poorer performance than original
    code.

12
HB Pitfalls (Dependence Height)
  • Recall
  • Dependence Height Schedule length with infinite
    resources.
  • A hyperblock must execute all the instructions of
    its constituent paths.
  • Dependence height of a HB is the Max of the
    dependence height of all its paths
  • Thus, a small if block combined with a large else
    block can result in a significant slowdown since
    the large part is always executed.

13
HB Pitfalls (When are HBs formed?)
  • HBs can be formed in two places
  • At the beginning of the backend, before
    optimization
  • At the end of the backend, before/during
    scheduling
  • Before optimization
  • Less control flow to hinder aggressive compiler
    optimization
  • Compiler optimizations can have a unexpected
    impact on hyperblock formation decisions.
  • Before/During Scheduling
  • Scheduler has more intimate knowledge of target
    processor
  • Can make resource-based formation decisions
  • Control flow can hinder aggressive compiler
    optimization prior to scheduling

14
Early HB Formation (1)
  • Subsequent optimizations can turn bad formation
    decisions into good ones
  • Two seemingly incompatible paths
  • Both blocks scheduled in 3 instructions after
    renaming optimization

15
Early HB Formation (2)
  • Unpredictable Resource Interference
  • Cant assume that resource usage is evenly
    distributed through block
  • Could contain sections of parallel and sequential
    code
  • Parallel -gt High resource usage
  • Sequential -gt Low resource usage
  • This partition of usage could mask the real
    resource usage of a block
  • Two blocks may have seemingly compatible resource
    utilizations, but if their parallel sections are
    overlapped, resource interference and poorer
    performance can result.

16
Early HB Formation (3)
  • Resource Interference
  • Seemingly compatible paths
  • Poorer performance due to resource interference

17
Traditional Predicated Execution
  • There was no decision
  • Apply if-conversion to entire innermost loops
  • Enable modulo scheduling
  • If-conversion is all or nothing
  • Numerical Applications
  • Need more flexible strategy

18
Hyperblock Decision factors
  • In light of potential drawbacks
  • Must create hyperblocks selectively
  • How to choose what to if-convert?
  • Factors
  • Execution frequency
  • Size
  • Resource usage
  • Dependence height
  • Number of instructions
  • Instruction Characteristics
  • Hazards
  • Factors alone do not force a decision
  • Priority-based

19
Less-obvious factor
  • Consider this code snippet
  • Dependence height of fall through path 6 cycles
  • Register renaming reduces height. (7 ? 8 and 8 ?
    10)
  • Good heuristic should anticipate other
    optimizations and when they occur.

20
Block Selection Method 1 (Example)
  • Create a trace, the main path
  • Use a heuristic function to give priority to the
    main path
  • Compute priority for other BBs
  • Normalize against main path.
  • One such heuristic
  • bb_char characteristic value of each BB
  • Typically 1. Hazardous instructions affect this.
  • K constant to represent processor issue rate
  • Heuristic also used to influence node-splitting

21
Method 2
  • Enumerate all paths of execution
  • Give priority to path as a whole
  • Hazard multiplier in this paper was 0.25
  • Paths containing subroutine call or unresolvable
    memory store
  • K base contribution for a path
  • K 0.1 for this paper

22
Method 2 (continued)
  • Block Selection Algorithm
  • Resource constraints
  • Dependence height vs. Highest Priority Path
  • Path priority must be within some fraction of
    last path priority
  • Uses simplified model of resources
  • Issue Width x Dependence Height
  • Other padding multipliers
  • Use union of selected paths to form Hyperblock
  • Causes some lower priority paths to be included

23
Method 3
  • Take into account other optimizations, resource
    interference, and partial paths.
  • Account for other optimizations
  • Create hyperblocks right before scheduling
  • Adds predicate complexity to scheduler
  • Whats the happy medium?

24
Method 3 (contd)
  • Examine Hyperblocks twice
  • First stage
  • Aggressive
  • Form big blocks
  • Liberal with resources
  • Second stage Partial Reverse if-conversion

25
Method 3 (contd)
  • Partial Reverse if-conversion
  • Modifies hyperblock paths to enhance schedule
  • Inserts branch to removed code
  • Analysis
  • In-depth process
  • Predicate Flow Graph (PFG)
  • Determine savings with 2 HBs vs. 1 HB
  • Determine penalty from branch
  • Heuristic-guided
  • Predicate define schedule
  • Which reverse if-conversions provides benefit?

26
Hyperblock Performance Evaluation (1)
  • So, what do hyperblocks buy us?
  • For simple hyperblock formation, benchmarks
    results depend on issue rate of processor (2,4,
    or 8) and whether all execution paths are
    included (IP), or only selected paths are
    included.
  • Performance gains are more common if issue rate
    is gt 2.

Effective Compiler Support for Predicated
Execution Using the Hyperblock (1992), S.
Mahlke, et al
27
Hyperblock Performance Evaluation (2)
  • Using Partial Reverse If-Conversion, significant
    speedups were observed over standard superblock
    and hyperblock code
  • Note that in many cases, hyperblock performance
    losses became gains

A Framework for Balancing Control Flow and
Predication , D. August, et al.
28
Hyperblock Code Growth
  • Static code size growth of simple HB formation
    and partial RIC formation versus superblocks
    varies with benchmarks
  • HBs reduce code size through tail-duplication
    elimination, but if-conversion introduces code
    growth

A Framework for Balancing Control Flow and
Predication , D. August, et al.
29
Conslusions
  • Hyperblocks have been shown to improve
    performance when applied judiciously
  • There are several, often conflicting, constraints
    on hyperblock formation algorithms
  • Optimal hyperblock formation is a non-trivial
    problem
  • Several formation heuristics have been
    implemented with varying degrees of success.
  • How can we improve upon them?

30
Acknowledgements
  • Method 1
  • Effective Compiler Support for Predicated
    Execution Using the Hyperblock
  • Scott Mahlke, David Lin, William Chen, Richard
    Hank, Roger Bringmann
  • Method 2
  • Exploiting Instruction Level Parallelism in the
    Presence of Conditional Branches
  • Scott Mahlke
  • Method 3
  • A Framework for Balancing Control Flow and
    Predication
  • David August, Wen-mei Hwu, Scott Mahlke

31
Optimized Selection of Compiler Switches
  • Ibrahim Bashir
  • Group 1 Control Flow Analysis and Optimization
  • March 18, 2002

32
Overview of switch selection
  • Compilers have many different optimizations
    available
  • If-conversion
  • Common subexpression elimination
  • Instruction scheduling
  • Loop unrolling
  • Switches are used to turn ON/OFF each of these
    optimizations
  • How do we decide what combination of switches to
    select?

33
Issues involved
  • As the number of switches increases, the number
    of possible combinations increases exponentially
  • 10 switches, 210 1024 combinations
  • 20 switches, over a million combinations
  • Limited resources/time
  • Not feasible to try all combinations
  • There are different types of switches
  • ON/OFF (binary)
  • LOW/MEDIUM/HIGH
  • Range of values (0-100)
  • Switches are not necessarily independent of each
    other
  • There can be positive or negative interactions

34
Why this is useful
  • Currently, user has to rely on trial-and-error
  • Not the most efficient technique
  • Saves development time
  • Portable to other compilers

35
Automating the process
  • Paper 1 Automatic Recommendation of Compiler
    Options by Elana Granston and Anne Holler
  • Describes a program called Dr. Options that
    recommends compiler switches
  • Dr. Options can recommend optimization options
    for the program as a whole or for each individual
    module
  • Allows optimization of hot modules, thus saving
    time

36
How it works
  • Uses 3 sources of information
  • 1. User-supplied information
  • Application type
  • Optimization goals
  • Constraints
  • 2. Compiler-supplied information
  • Function and module sizes
  • Characteristics of loops
  • Data access patterns
  • 3. Profile information
  • Determine relative importance of each function

37
Performance/Usefulness
  • Recommendations are substantially better than
    what an unassisted user would select
  • Provides explanations on why its recommending
    particular options
  • Leads to a better understanding of the
    characteristics of the application
  • Good consultation tool for performance analysts

38
Feedback-Directed Selection
  • Paper 2 Feedback-Directed Selection and
    Characterization of Compiler Options by Kingsum
    Chow and Youfeng Wu
  • Uses fractional factorial design (FFD) of
    experiments and feedback from run-time
    performance to optimize selection of switches
  • Identifies interaction (constructive or
    destructive) between different switches
  • Each experiment is a refinement of earlier ones
  • Finds the switches with the best chance of being
    part of the optimal selection

39
Ideas used
  • For a particular application, certain switches
    might have little or no effect on performance and
    interact with only a few others
  • Testing all permutations would be a waste of
    resources
  • FFD of experiments works as follows
  • Each compiler switch is a factor
  • Each subset of factors is an interaction between
    switches
  • Purpose of an experiment is to determine effects
    and interactions of factors

40
Ideas applied
  • Systematically select a subset of experimental
    runs
  • Aliasing
  • Ambiguity exists between the effects of switches
  • Performance of a particular combination can be
    the result of certain switches being ON or
    certain switches begin OFF
  • Start with a small number of runs and collect
    performance results
  • Do further experiments using the most effective
    factors
  • More experiments can be done to resolve aliasing

41
Pros/Cons of FFD
  • Advantages
  • Number of combinations to consider doesnt
    increase exponentially
  • Significantly fewer runs can lead to a
    near-optimal solution
  • Identifies interactions between different
    optimizations
  • Can be used to determine interesting switches
    quickly
  • Effects of these switches can be further explored
    using artificial intelligence techniques (genetic
    algorithms)
  • Disadvantages
  • Needs an initial set of switches to work with
  • Requires user input/thinking
  • Currently only works with ON/OFF type switches

42
Evolutionary Algorithms for Reinforcement Learning
  • Written by David Moriarty, Alan Schultz, and John
    Grefenstette
  • Published in 1999
  • Good survey of research in evolutionary
    algorithms
  • Response to other surveys of reinforcement
    learning that did not include EA

43
Evolutionary Algorithms
  • Form of randomized search
  • Useful for very large search space
  • Search is biased toward better solutions
  • Based on Darwins Theory of Evolution

44
Darwins Theory of Evolution
  • More fit individuals are more likely to survive
    and reproduce
  • Offspring inherit traits of parents along with
    random mutations
  • Weaker individuals weeded out through natural
    selection
  • Population tends to get stronger over time

45
Basic Algorithm
  • Initialize population
  • Evaluate individuals
  • While termination condition not met
  • Select individuals
  • Reproduce offspring
  • Evaluate individuals

46
Initialize Population
  • Usually done randomly
  • Might help to start with individuals you already
    know are good
  • Variety is good

47
Evaluate Individuals
  • Each individual is tested with a fitness function
  • Fitness function estimates how good individual
    will perform in real environment
  • Better fitness function can lead to stronger
    individuals, but can take more time

48
Termination Condition
  • Not always easy to define
  • Can be domain dependent
  • Some possibilities
  • Performance criterion is met
  • Preset number of generations have passed
  • Ran out of time

49
Select Individuals
  • Based on evaluation from fitness function
  • Sometimes just pick n strongest individuals
  • Can also assign probability of survival based on
    evaluation, then pick randomly

50
Reproduce Offspring
  • Crossover
  • Randomly select crossover point
  • Offspring inherits genetic code before crossover
    point from one parent and after from the other
  • Mutation
  • Make small random changes to offspring

51
Possible Problems
  • No guarantee of optimal solution
  • Only searches part of search space
  • Might get stuck at local maximum
  • Can be slow
  • Might need lots of generations
  • Fitness function usually takes time

52
Possible Solutions
  • Immigration
  • Gender
  • Multiple chromosomes
  • Dominant/recessive genes

53
Distributed Representation
  • Break down the problem into subtasks
  • Each co-evolving population addresses different
    subtask
  • Combine individuals from different populations to
    form collaborations
  • Individuals from stronger collaborations get
    higher evaluations

54
Switch Optimization Ideas
  • Identify important switches
  • Identify independent switches for distributed
    representation
  • Start with crude (but fast) fitness function,
    keep refining it in later generations
Write a Comment
User Comments (0)
About PowerShow.com