Title: Processor Modelling and Retargetable Compilation
1Processor Modelling and Retargetable Compilation
2Outline
- Introduction to Retargetable Compilers
- Processor Modelling (nML)
- CHESS
- Intermediate representation
- Code Selection process
- Compilation flow
3Evolution of Compilers
Micro programming
70
CISC
Code Gen (dynamic prog) Code selection (LR
parsing) Code Selection (combiner)
80
High Level Synthesis (ASIC)
Register Allocation(coloring) Code Selection
(tree automata) Scheduling (trace) Scheduling
(s/w pipelining)
RISC, VLIW, Superscalar
90
Models for Retargetablity Phase coupling Register
Allocation (heterog reg)
Embedded Procs
4Why Retargetable compilation?
- DSP oriented application increasing
- Embedded processors
- Architecture of Embedded proc
- subjected to changes regularly
- Program Developers
- Need compiler support for varying target proc
architecture - Conventional Compilers
- Back end has to be rewritten
- Tedious
5Conventional Compilers
Application In HLL
Syntactic Semantic checks
Front End
Refinement, proc Independent optimizations
IR
Code Selection, Register allocation, scheduling
Code Generator
Machine code
Knowledge of target arch built in Code Generator
6Retargetable Compiler
Processor Specification
Appln in HLL
Front end
Front end
IR (s)
Refinement, proc Independent optimizations
Code Selection, Register allocation, scheduling
Code Generator
M/c code
Knowledge of target arch specified explicity
7Retargetabilty
- Types
- Depending on amount of work to accommodate the
new target processor - Developer Retargetable
- Rewriting the backend
- User Retargetable
- Writing compiler specific proc models
- uses aid of compiler-compilers
- Eg Gcc, Lcc
- Automatically Retargetable
- Independent proc specification
- Spec at level of Programmers manual
- Eg CHESS, MIMOLA
8The CHESS Environment
C (nDL) -primitive datatypes -and
operations -appln algos
nML proc description -inst set -high level
structure
Front end
Front end
High level optimization
CDFG
ISG
Code selection
LIB
Register allocation
Scheduling
Machine code
9Processor Modelling
- Basic features of the proc
- registers
- data path (connectivity)
- Instruction Set (execution behaviour)
- For good quality code, all arch pecularities
- heterogeneous reg structure
- Addressing modes
- Specified at a high level of abstraction
- Language and associated grammer
- Proc modelling Languages
- nML, ISPS, ISDL, LISA
10nML
- Specifies the syntax and semantics of the
Instruction Set - 2 main parts
- Declarations (Structural Skeleton)
- H/W entities of the target proc (Storage)
- Grammer
- Instruction Set
- Execution Behaviour
- Topology (datapath connectivity)
- State m/c
- values in storage state
- Instrution execution Transition Function
11nML (Declarations)
- Defines structural Skeleton by defining the
conection points - All storage elements declared globally
- 2 types of storage elements
- Static Storage
- Transitory Storage
12Static Storage
- Defn Elements storing values for one or more
than one m/c cycle until explicitly over
written - Componets
- Memories
- Controllable registers
- Capacity of storage also specified
- Eg
- Memory
- mem DM1024 ltnumgt
- Registers
- reg AX ltnumgt Alu reg
- reg Ia2 ltaddrgt address reg
13Transitory Storage
- Defn Elements that pass the value with certain
delay, specified in m/c cycles - Components
- Buses
- Nets
- Pipeline Regs
- Capacity is one, and can be read once
- Eg
- trn A ltnumgt Alu input
- trn XD ltnumgt Data Bus
- trn T ltnumgt d 1 delay of 1 m/c cycle
- Memory and Reg Ports are also specified as
transitory to identify h/w conflicts - Eg reg Axltnumgt read (AW_rA, AX_rB) write (AX_w)
14Other declarations
- Record type storage
- Eg Accumulator of fixed pt DSP
- Functional Units can also be modelled
- fu alu
- fu mult
- Hardwired constants
- cst C_3 ltfactgt xxx
- cst one_8 ltnumgt 00000001
Record data_type class acc public num w0 num
w1
Storage element reg MRltaccgt MR0 MR1
15nML (Grammer)
- Instr set and behaviour description
- Instr set analysed structure captured in
Production Rules (grammer) - The topology (connectivity) of datapath captured
by grammer attributes - Production Rules
- OR-Rules
- Lists all alternatives for an Instr part
- mutually exclusive
- opn jvp_core (arith_ls_instr control_inst
direct_mv) - And-Rules
- Composition of instr parts
- orthogonal
- opn arith_ls_instr (ar arith_instr ls
indirect_mv)
16- Each possible derivation from these rules
represents - a legal instr
- The structure (hierarchy) in the Instr set
captured by - the production rules
Jvp_core
Direct_move
Control_instr
Arith_ls_instr
Dir_store
Reg_mov
Dir_ld
. . . . . .
Indirect_mov
Arith_instr
OR_rule
. . . . . .
. . . . . .
AND_rule
17Grammer Attributes
- OR rules just pass attributes
- AND rules define 4 types
- Action attribute
- specifies what is executed by instr/instr part
- Each AND rule have one action attribute
- Syntax attribute
- specifies assembler syntax (mnemonic)
- Each AND rule may have multiple syntax attrs
- Image attribute
- defines binary encoding of instr/instr part
- Value and Mode attributes
- specifies how a storage element is addressed
18 Eg Instruction part performing immediate
shift opn shift_instr (al alu_left_op, factor
c_3) action A al.value
// read the operand C pass(A, AS) _at_alu //
pass it thru ALU AR C ltlt factor _at_sh //
perform shift image 11 al.image factor
Operation-types which can be executed on ALU
specified best using switch statements opn
alu_op (op alu) action switch (op)
case add C add(A, B, AS) _at_alu case
sub C sub(A, B, AS) _at_alu . .
image 0 op
19Control Instructions
- Modelled using switch statements
- Action attributes contain primitive operation
types that model controller - opn cond_jump (t c_10, c cond)
- action
- switch (c)
- case EQ
- tC eq(AS)
- jump(tC, t)
- case GT
- tC ge (AS)
- jump(tC, t)
- .
- .
-
- image 100xxx c t
20The CHESS Environment
C (nDL) -primitive datatypes -and
operations -appln algos
nML proc description -inst set -high level
structure
Front end
Front end
High level optimization
CDFG
ISG
Code selection
LIB
Register allocation
Scheduling
Machine code
21Instruction Set Graph (ISG)
- Intermediate processor model
- Directed Bi-partite Graph
- GISGltVISG, EISGgt
- Vertices VISG VS U VI
- VS storage elements
- VI operating-types
- Edges EISG C (VS x VI) U (VI x VS)
- connectivity
- data flow
22Partial ISG of a processor
AX(num)
AR(num)
MR1(num)
MR0(num)
AX read_reg 00xxx0xx 01100xxx AR_r
AX read_reg 00xxx0xx 01100xxx AX_r
..
..
AR_r(num)
AX_r(num)
AX_r copy 00xxx0xx 01100xxx A
AR_r copy 00xxx1xx B
B(num)
A(num)
Static storage
A B and 00010xxx C
AS_w
Transitory
Operation type
AR (num)
..
23ISG (contd)
- ISG Operation-types
- Defn Primitive processor operation activity, has
fixed no of ordered i/p args and o/p - each arg connected to one-edge and one storage
element - Impl of primitive-operation types defined in a
header file - Enabling Conditions
- Each instr proc executes many oprn-types
- One oprn-type enabled by many instrs
- Defn All the instrs enabling an oprn-type
- enabling(i)
24Conflicts
- Encoding conflicts
- H/W or resource conflicts
- Encoding Conflicts
- Defn Subset of ISG oprn-types Vio C VI
- enabling (Vio) Intersection I ? VIo enabling
(I) - Vio has encoding conflict if enabling (Vio) F
- For packing 2 oprn-type into an instrn
25- Resource (H/W) conflicts
- Several oprn-types contend for the same resource
- input (i, n) Vi x N -gt Vs
- output(i, n) Vi x N -gt Vs
- read and write ports transitories
- H/W conflict modelled as access conflict on
transitories - To check H/W conflicts
- resources(i) set of all transitories oper i
accesses - Vio C Vi, if for all ii , ij ? Vio ii NE ij and
- resources (ii) inters resources (ij ) ?
- then Vio is free of H/W conflicts
26nML to ISG front end
- nML is parsed into a parse tree
- parse tree passed thru, 3 passes
- pass 1 Finding instruction word length and
locating the position of each image attribute in
the instrn - pass 2 Finding enabling conditions and all
specified instructions - pass 3 Finding exact enabling cond using the
set of instrs found in pass 2
27CDFG (IR for application)
- Similar to ISG
- Directed Bi-partite Graph
- GCDFGltVCDFG , ECDFGgt
- Vertices VCDFG VO U VV
- VO operations in application
- VZ values that operations produce/consume
- Edges ECDFG C (VO x VV) U (VV x VO)
- Represents data-flow from operations through
values - Control-flow is modelled by imposing hierarchy of
macronodes on CDFG operations - macronodes have type
- basic block, if-stat, for-stat, do-stat
28Eg of CDFG
root
Block (init)
a
b
c
d
Do-stat
Block (loop- init)
x
If-stat
t4
Block (then)
Block (else)
t2
t3
-
Block (loop-end)
t1
Data flow of (a(bc))-((bc)d)
Control flow of a do-while loop
29- CDFG Operation Types
- Operation types used in application
- Could be hierarchial
- Different from the ISG operation types
- All operation types both applications and
processors are declared in an header file - The operation types are linked by a library(LIB)
which defines the operation heirarchy
30Operation-Type hierarchy (LIB)
- The LIB contains 3 parts
- Proc independent part
- defines operation properties
- eg commutative, inline fun, primitive
- Proc dependent part from header file
- proc dependent part from nML part
- Basic Idea
- Operation types in LIB organised in a hierarchial
way that represents different ways in which, CDFG
operation can be mapped to an ISG operation
31Operation type Hierarchy (eg.)
func_opn
comm_opn
sub
add
X
Y
sub_XY
sub_YX
add_XY
add_YX
C
ISG
32The CHESS Environment
C (nDL) -primitive datatypes -and
operations -appln algos
nML proc description -inst set -high level
structure
Front end
Front end
High level optimization
CDFG
ISG
Code selection
LIB
Register allocation
Scheduling
Machine code
33Code Generation Process
- Mapping of GCDFGltVo, Vvgt onto GISGltVI, Vsgt
- Vv onto Vs
- Vo onto VI
- Assumptions
- Basic block by basic block
- Transitories have zero delay
- So, each oprn type executes in 1 cycle
- Phases
- Code selection phase
- Refinement
- Bundling
- Covering
- Register allocation
- Scheduling
34Code Seletion (Refinement)
- Replacing CDFG operations by its childern
- refinement (o) r
- Valid Refinement
- A CDFG operation r ? VOR is valid refinement for
a CDFG operation o ? VO with type (o) ? L, iff - type( r) i i is subtype of type (o)
- datatype(input(o,n)) datatype(input(i,n))
- datatype(output(o,n)) datatype(output(i,n))
- Valid Mapping
- mapping (o) i, o ? VO i ? VI
- i type (r) type (refinement(o))
35Refinement (contd)
- Binding data dendency
- 2 types
- Direct data dependency
- Allocated data dependency
- Direct data dependency
- Data dep b/w 2 refined CDFG oprn r1 r2 is
direct if it is implemented as a valid direct
path in ISG - direct path A path in ISG, b/w 2 operations that
does not iny any storage other than transitories - direct (ri, rj) true/false
- Allocated data dependency
- Data dep b/w 2 refined CDFG oprn r1 r2 is
allocated if it is implemented as a path in
ISG, that has one or more static storage elements
36Bundling
- Idea To find conflict free CDFG operations
that can be executed in same cycle - Defn Set of CDFG operations that can be refined
to form a refined bundle - Refined bundle
- Set of all refined operations r1 r2 ? VOR that
are coupled - coupled (r1, r2) true/false
- true if
- r1 r2 or
- direct(r1, r2) or
- coupled (r1, r3) couple(r3, r2)
- Defined for a given refinement function
- Each bundle can be associated with a set of
refinement functions
37Properties/constraints on bundles
- Same cycle theorem
- 2 refined operations that have a direct data
dependency belong to the same bundle if they
have allocated data dependency they cannot be in
the same bundle - Operations in bundles should not have encoding or
Resource conflicts - Bundles need to be convex
- convex bundle if no opern path b/w 2 of its
opern contains an oprn path external to the bundle
ltlt1
x
Not a convex bundle
gtgt2
38- Refined bundle which satisfies those prop is
called a valid refined bundle - A bundle is valid if its operns can be refined to
form a valid refined bundle - Eg of Valid bundle
- In effect, each valid bundle coresponds to an
intruction/instruction part
ltlt1
x
gtgt2
39Refinement and Bundling in a nutshell
library
sub
Func_oprn
Comm_opn
sub
ltlt
add
ltlt
A
B
Sub_BA
Add_AB
Sub_AB
Sub_AB
C
ltlt_C
ltlt_C
AR_w
mapping
ISG
refinement
type
40Code Selection (covering)
- Previous stages, give all possibilites of valid
bundling - Each oprn may be coverd by one or more bundles
- and, each bundle covers one or more oprns
- Minimum graph cover
- Given a collection of bundles B that induce
patterns in CDFG, problem is to seqrch for q
minimum number of patterns that cover the whole
GCDFG - cost fn no of bundles
- Solution Branch and Bound Algorithm
41Branch and Bound (Basic strategy)
- Find essential bundles
- if oi is covered by only one bundle
- Add these to the cover C
- For the rest, build a search tree
- each node is a partial cover of CDFG
- Branching at each node models selection of
bundles - Depth traversal gives a cover C
B1
x
B4
B5
B3
B2
42The search tree
start
O2
O2
B4o1, o2
B2o2
O3
O3
O3
O3
B5o1, o3 cost 2
B3o3
B5o1, o3
B4o3
O1
O1
O1
B1o1 cost 3
B1o1 cost 3
B1o1 cost 3
Covers B5, B4, B1, B3, B4
43Issues in covering
- Overlapping bundles
- operation duplication
- Order of choosing the operations oi
- size of tree can be reduced
- eg Increasing order of BOI
- Pruning and branching heuristics
44Register Allocation and Scheduling
- Register allocation
- Binds the values to registers/ memory
- Modelled as Data routing problem (ISG)
- Makes sure capacity of storage is not exceeded
- spilling values to memory
- fixing execution order b/w bundles
- Scheduling (compaction phase)
- oprns are bound to time
- oprns packed into instruction
- oprns in same bundle exectute in same instr
- diff bundles may be scheduled in parallel in the
same instrn
45Other issues related to Code generation
- Code generation beyond basic blocks
- Bundling of operations beyond basic blocks
- scheduling done globally
- oprns could still be moved across blocks
- loop unfolding or S/W pipeling
- Phase Coupling
- delayed binding
- common operands
- coupling by cost funtions
- cost for each bundle different
- cost (C) S cost (Bi)
- scheduling in parallel is emphasised
46Compilation flow using CHESS environment
Proc modelling
Processor.h
NOODLE
Processor.cdfg
Proc.nml
Processor.lib
ANIMAL
Processor.isg
Processor.h program.c
NOODLE
Application prog
Program.cdfg.cdfg
COSEL
Prgram_bndl.lib
Program_bndl.cdfg
Program_bndl.isg
AMNESIA
Program_dr.cdfg
MIST
Program_sch.cdfg
STATIC
Program.micro