Compiler Optimization Research in Embedded Systems - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Compiler Optimization Research in Embedded Systems

Description:

Load/Store instructions with overlapping range MAY be merged ... Assuming fixed boundary will cause incorrect merge ... A framework to analyze and merge LD/STs. ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 20
Provided by: c02
Category:

less

Transcript and Presenter's Notes

Title: Compiler Optimization Research in Embedded Systems


1
Compiler Optimization Research in Embedded Systems
  • Santosh Pande
  • CERCS
  • College of Computing
  • Georgia Institute of Technology
  • E-mail santosh_at_cc.gatech.edu
  • http//www.cc.gatech.edu/santosh

2
Embedded and Mobile Systems Why Compiler
Optimizations?
  • Motivation Compilers can do intricate analysis
    of program properties and fine tune requirements
    of programs when resources are limited or when
    they are shared over the network
  • Performance measures to be optimized code size,
    speed, predictability, power consumption,
    mobility etc.
  • Requirements due to limited local memories,
    non-orthogonal instruction sets of embedded
    processors, limited bandwidths of communication,
    real time requirements, need for mobility etc.
  • Optimizations are essential given these
    constraints unlike traditional systems when they
    may be optional.

3
Overview of Research Projects
  • Optimizing Java Mobile Programs Over Fixed and
    Wireless Networks
  • Tamper-resistant Program Partitioning for Mobile
    Codes on Smart Cards for Security and Efficiency
    (Infineon)
  • Optimizing Energy Consumption in High Performance
    Superscalars through Compiler Control (DARPA)
  • Efficient Code Generation and Memory
    Optimizations for High Performance Digital Signal
    Processors (DSPs) (NSF)
  • Dynamic Optimizations using combination of
    static/dynamic analyses
  • Mixed code generation for ARM/Thumb (NSF)

4
Compiler Research on DSPs
  • DSPs Largest class of embedded processors used
  • Key processor architecture features
  • Instruction widths small (16 bit) to promote code
    density heavy use of address registers (Our work
    Simultaneous optimization of layouts and
    program order, PLDI 99)
  • MAC (Multiply-accumulate instructions dominate!)
    (Our work Program restructuring for maximizing
    MACs, IEEE Trans. On CAD, 2001)
  • On-chip memory precious (Resource-constrained
    loop fusion, IEEE Trans. On Computers, Oct 2002)
  • X-Y memory (Placement of values and parallel
    load/stores, PACT 2002)

5
A Framework for Parallelizing Load/Stores on
Embedded Processors
  • Xiaotong Zhuang
  • Santosh Pande
  • College of Computing, Georgia Tech

6
Background and Motivation
  • Speed gap between memory and CPU remains
  • Multi-bank memory architecture Motorola DSP56000
    series, NEC 77016, SONY pDSP, Analog Devices
    ADSP-210x, Starcore SC140 processor core
  • Parallel instructions allow parallel access to
    memory banks PLDXY r1, _at_a, r2, _at_b, loads _at_a?r1
    and _at_b?r2 at the same time. Restrictions Not all
    regs avail, PSTST absent
  • Objectives
  • Try to maximally generate parallel Load/Store
    (such as PLDXY) instructions through compiler
    optimizations.
  • Controlled code data segment growth
  • Reasonable speed of compilation

7
Basic concepts (1)
  • Post-pass approach assuming a good register
    allocator has been used -- Briggs style
    allocator with Appel/George Coalescing phase
  • Value separation Independent values to be
    allocated in memory without copying
  • Alias analysis
  • Memory access instruction dis-ambiguity
  • 95 aliases can be uniquely determined in our
    benchmark programs
  • Memory access instructions
  • STaddr,r is the definition of a memory address
  • LDaddr,r is the use of a memory address
  • Dependencies
  • Address conflicts,Register conflicts

8
Basic concepts (2)
  • Building Webs
  • Webs maximal union of du-chains. All variable
    def/use on the web MUST be allocate to the same
    memory location
  • One variable appears in separate web can be put
    into different memory locations
  • Achieve value separation
  • Motion range determination
  • Defined as interval between program points where
    a Load/Store can be legally moved, restrained by
    dependencies
  • Load/Store instructions with overlapping range
    MAY be merged
  • Register/address dependencies resolved by
    predicating motion

9
Movable boundary problem
  • The motion boundary of one Load/Store instruction
    is also a Load/Store instruction
  • Assuming fixed boundary will cause incorrect
    merge
  • Predicate motion of a load/store on another and
    solve for safe solution

10
Motion schedule graph
  • Pseudo fixed-boundary
  • For Store move as early as possible assuming
    other instructions are fixed
  • For Load move as late as possible assuming other
    instructions are fixed
  • Motion Schedule Graph
  • Nodes represent individual Load/Store
    instructions
  • Oval shows a value to which load/stores belong
  • Edges link nodes that have overlapped motion
    range (with respect to pseudo fixed-boundaries)

11
Graph solving
  • The whole problem is provably NP-complete
  • Two separate problems Bank Assignment and Edge
    Picking
  • For predetermined bank assignments, the Edge
    Picking problem can be optimally solved in
    polynomial time
  • Heuristic algorithms
  • Brutal force searching will take O(V32n) time.
    Doable for small programs
  • SA can approach the optimal solution but will
    greatly increase the compilation time
  • Use heuristic to solve bank assignment, then get
    optimal solution for Edge Picking
  • Greedy heuristic chooses the bank assignment

12
Post-pass phases
13
Cross BB merge (Instr. duplication)
  • Move to predecessor/successor to create new
    opportunities
  • To guarantee profitability
  • Move to where the reference is live
  • Move ST on EBB
  • Move LD on reverse EBB
  • Make sure can be combined if pushed to at least
    one of the live predecessors/successors

14
Variable duplication
15
Compilation time
16
Runtime performance
17
Code size comparison
18
Major contributions
  • Make the model easy to solve
  • Identify the movable boundary problem, which
    impedes the problem modeling and simplification
  • Propose Motion Schedule Graph (MSG) and two
    approaches to solve it heuristically
  • Merge with instruction duplication and variable
    duplication
  • Other improvements like local conflict
    elimination through rematerialization and some
    global optimization issues
  • An iterative approach, which systematically grows
    the code segment and then the data segment
    minimally.

19
Conclusion
  • A framework to analyze and merge LD/STs.
  • Our heuristic approach comes close to exhaustive
    search with less compilation time.
  • Enhancing the range of motion of the instructions
    by undertaking variable and instruction
    replications, so the generated code quality is
    superior to the exhaustive methods previously
    proposed Corinna Lee et. al. ASPLOS 1998
Write a Comment
User Comments (0)
About PowerShow.com