Title: Retargetable Code Optimisation by Integer Linear Programming Daniel Kstner kaestnercs.unisb.de Saarl
1Retargetable Code Optimisation by Integer Linear
ProgrammingDaniel Kästner kaestner_at_cs.uni-sb.d
e Saarland UniversitySaarbrücken, Germany
2Overview
- Motivation
- The Phase-Coupling Problem
- Reasons of the Postpass Orientation
- The Design of PROPAN
- Overview
- The Machine Description Language TDL
- Integer Linear Programming (ILP) Models for
Phase-Coupled Code Generation - ILP Modelling Styles and Hardware Design
- Experimental Results
3Code Generation for Embedded Processors
4Code Generation
- Instruction scheduling list scheduling, trace
scheduling Fisher,81, region scheduling
Gupta,Soffa,90, percolation scheduling
Nicolau,85. - Register allocation graph coloring Chaitin et
al, 81,Chow, Hennessy, 90,Briggs,92
probabilistic register allocation Fisher,
Proebsting,92 - Interdependence of code generation
phases?Suboptimal combination of suboptimal
results? Inefficient code
5Phase-Coupled Code Generation
- Heuristical methods MARIL Bradlee,91,Mutation
scheduling Nicolau,Novack,94. - Exact search-based approaches AVIV (code
selection and instruction scheduling)
Hanono,Devadas,98,ICG (code selection and
register allocation) Bashford,Leupers,99.
6Postpass Optimisations
- Use of retargetable compilers in industry still
rare. - One reason costs induced by changing the
compiler infrastructure. - The postpass approach
- Related work vpo Davidson,Benitez,94
- PROPAN Phase-coupling of instruction scheduling,
register assignment, and resource allocation.
Assembly Programs
Improved Assembly Program
Postpass Optimiser
7The PROPAN System
8TDL Target Description Language
- Large number of existing hardware description
languages VHDL, Verilog, MIMOLA Nowak,87, nML
Fauth,Van Praet, Freericks,95, SALTO
Bodin,Chamski,Rohou, Seznec,97,ISDL
Hadjiyiannis,98,EXPRESSION Halambi,Grun,Ganesh,K
hare,Dutt, Nicolau,99, CSDL Davidson,Ramsey,98.
- Requirements of PROPAN
- generating a parser for the specified assembly
language - easy extendability different views on the
instruction set ? wide range of target
architectures and target applications - Specification of irregular hardware properties in
a way that supports - generic incorporation into ILP-based
optimisations - generic program analyses
9TDL Descriptions
- Resource section Declaration of the relevant
hardware resources with their properties. - Instruction set section Definition of the
instruction set in the form of an attribute
grammar. - Constraint section Logical constraints that have
to be respected to preserve correctness during
code transformations ? Support for architectural
irregularities - Assembly section Syntactic details of the
assembly language
10Styles of ILP Models
- Time-indexed formulations Decision variables are
based on the points of time the modelled events
are assigned to. - Order-indexed formulations Decision variables
reflect the ordering of the modelled events.
11Code Generation By Integer Linear Programming
- Well-structured ILP formulations
- SILP (Scheduling and Allocating with Integer
Linear Programming) Zhang,96 order-based. - OASIC (Optimal Architectural Synthesis with
Interface Constraints) Gebotys,Elmasry,93
time-based.
12The SILP Formulation (1)
- Order-based formulation.
- The main decision variables describe the flow of
the hardware resources through the operations of
the program - The resource flow graph
13The SILP Formulation (2)
Data Dependences
Flow Modelling
- Number of constraints O(n2)
- Number of variables O(n2)
14The OASIC Formulation (1)
- Time-based formulation.
- The main decision variables describe the
assignment of an operation's starting time to a
control step and a functional unit type. - Main decision variables where
means that the execution of operation j is
started in control step n on an instance of
functional unit type k.
15The OASIC Formulation (2)
- Precedence Constraints
- Assignment Constraints
- Resource Constraints
- Number of constraints O(n3)
- Number of variables O(n2)
16The Analog Devices ADSP-2106x SHARC
- Restricted Parallelism between ALU and
multiplier.
R1R1R4 R2R8R12
17Incorporating Architectural Irregularities
- Assumption All specified resource types can work
in parallel. - Restrictions of instruction-level parallelism and
of resource usage can be specified in the
constraint section of the TDL-specification with
the help of logical formula. - The logical constraints are transformed into
integer linear constraints. ? Architectural
irregularities are fully incorporated in the
generated integer linear programs.
18Example
- Constraint for the restricted parallelism of ALU
and multiplier in the Analog Devices SHARC - (op1 in MulOps op2 in AluOps)
- (op1 op2) -gt ((op1.src1 in groupA)
- (op1.src2 in groupB)
- (op2.src1 in groupC)
- (op2.src2 in groupD))
19Superblock-Based Code Optimisation
- The formulations presented so far work for
straight line code. - Superblock concept each superblock can comprise
several basic blocks and be extended across loop
boundaries. - Reason Integration of instruction scheduling,
register assignment and resource allocation.
20ILP-based Approximations
- Integer linear programming is NP-complete.?
computing provably optimal solutions may take a
long time.? ILP-based Approximations. - Basic idea
- Iteratively solve partial relaxations of the
original problem (some integer variables may take
non-integral values) - In each iteration fix some variables with an
integer value to their current value?
Computation time can be significantly reduced,
yet high-quality solutions are obtained.
21Modelled Processors (TDL)
- Analog Devices ADSP-2106x SHARC.
- Philips TriMedia TM1000.
- Infineon Tricore Use of PROPAN in a framework
for calculating worst-case execution time
guarantees for real-time systems. - Infineon C16x Use of PROPAN as a starting point
for hardware-sensitive code optimisations part
of a commercial postpass optimiser. - Under Construction TI C6x, Intel Pentium.
22Experimental Results (Sharc)
23Experimental Results (Sharc)
24The Philips TriMedia TM1000
- Multimedia processor with VLIW architecture.
- 128 homogeneous general purpose registers.
- Problem of Issue Slot Assignment
- At most 5 operations can be started
simultaneously in each instruction word. - The assignment of operations to issue slotsis
restricted. - Synchronisation of operations wrt write-back bus
- At most 5 operations may write their result
simulaneously on the bus (explicit
synchronisation required due to different
operation execution times).
25Experimental Results (TM1000)
26Experimental Results (TM1000)
27ILP Models and Hardware Architectures
- Order-indexed formulations (SILP)
- Efficient modelling of irregular architectures
where resource competition is high. - Efficient incorporation of register assignment
problem. - For architectures with a large number of
alternative resource types available for the
execution of each operation increasing number of
alternative resource flows leads to a decrease in
solution efficiency in the order-based
formulation. ? Time-indexed formulation better
suited (OASIC).
28Conclusion
- PROPAN Retargetable framework for the generation
of phase-coupled postpass optimisers, especially
for irregular architectures. - Optimisers are based on integer linear
programming and are generated from the
TDL-specification of the target machine. - Application of ILP-based optimisations to two
contemporary processors with considerably
different hardware characteristics. - With ILP-based approximations computation time
can be significantly reduced while still
obtaining a very high solution quality.