Processor Modelling and Retargetable Compilation - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Processor Modelling and Retargetable Compilation

Description:

Register Allocation(coloring) Code Selection (tree automata) Scheduling (trace) ... Need compiler support for 'varying' target proc architecture. Conventional ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 47

Provided by: pcverh

Category:

more less

Transcript and Presenter's Notes

Title: Processor Modelling and Retargetable Compilation

1
Processor Modelling and Retargetable Compilation
2
Outline

Introduction to Retargetable Compilers
Processor Modelling (nML)
CHESS
Intermediate representation
Code Selection process
Compilation flow

3
Evolution of Compilers
Micro programming
70
CISC
Code Gen (dynamic prog) Code selection (LR
parsing) Code Selection (combiner)
80
High Level Synthesis (ASIC)
Register Allocation(coloring) Code Selection
(tree automata) Scheduling (trace) Scheduling
(s/w pipelining)
RISC, VLIW, Superscalar
90
Models for Retargetablity Phase coupling Register
Allocation (heterog reg)
Embedded Procs
4
Why Retargetable compilation?

DSP oriented application increasing
Embedded processors
Architecture of Embedded proc
subjected to changes regularly
Program Developers
Need compiler support for varying target proc
architecture
Conventional Compilers
Back end has to be rewritten
Tedious

5
Conventional Compilers
Application In HLL
Syntactic Semantic checks
Front End
Refinement, proc Independent optimizations
IR
Code Selection, Register allocation, scheduling
Code Generator
Machine code
Knowledge of target arch built in Code Generator
6
Retargetable Compiler
Processor Specification
Appln in HLL
Front end
Front end
IR (s)
Refinement, proc Independent optimizations
Code Selection, Register allocation, scheduling
Code Generator
M/c code
Knowledge of target arch specified explicity
7
Retargetabilty

Types
Depending on amount of work to accommodate the
new target processor
Developer Retargetable
Rewriting the backend
User Retargetable
Writing compiler specific proc models
uses aid of compiler-compilers
Eg Gcc, Lcc
Automatically Retargetable
Independent proc specification
Spec at level of Programmers manual
Eg CHESS, MIMOLA

8
The CHESS Environment
C (nDL) -primitive datatypes -and
operations -appln algos
nML proc description -inst set -high level
structure
Front end
Front end
High level optimization
CDFG
ISG
Code selection
LIB
Register allocation
Scheduling
Machine code
9
Processor Modelling

Basic features of the proc
registers
data path (connectivity)
Instruction Set (execution behaviour)
For good quality code, all arch pecularities
heterogeneous reg structure
Addressing modes
Specified at a high level of abstraction
Language and associated grammer
Proc modelling Languages
nML, ISPS, ISDL, LISA

10
nML

Specifies the syntax and semantics of the
Instruction Set
2 main parts
Declarations (Structural Skeleton)
H/W entities of the target proc (Storage)
Grammer
Instruction Set
Execution Behaviour
Topology (datapath connectivity)
State m/c
values in storage state
Instrution execution Transition Function

11
nML (Declarations)

Defines structural Skeleton by defining the
conection points
All storage elements declared globally
2 types of storage elements
Static Storage
Transitory Storage

12
Static Storage

Defn Elements storing values for one or more
than one m/c cycle until explicitly over
written
Componets
Memories
Controllable registers
Capacity of storage also specified
Eg
Memory
mem DM1024 ltnumgt
Registers
reg AX ltnumgt Alu reg
reg Ia2 ltaddrgt address reg

13
Transitory Storage

Defn Elements that pass the value with certain
delay, specified in m/c cycles
Components
Buses
Nets
Pipeline Regs
Capacity is one, and can be read once
Eg
trn A ltnumgt Alu input
trn XD ltnumgt Data Bus
trn T ltnumgt d 1 delay of 1 m/c cycle
Memory and Reg Ports are also specified as
transitory to identify h/w conflicts
Eg reg Axltnumgt read (AW_rA, AX_rB) write (AX_w)

14
Other declarations

Record type storage
Eg Accumulator of fixed pt DSP
Functional Units can also be modelled
fu alu
fu mult
Hardwired constants
cst C_3 ltfactgt xxx
cst one_8 ltnumgt 00000001

Record data_type class acc public num w0 num
w1
Storage element reg MRltaccgt MR0 MR1
15
nML (Grammer)

Instr set and behaviour description
Instr set analysed structure captured in
Production Rules (grammer)
The topology (connectivity) of datapath captured
by grammer attributes
Production Rules
OR-Rules
Lists all alternatives for an Instr part
mutually exclusive
opn jvp_core (arith_ls_instr control_inst
direct_mv)
And-Rules
Composition of instr parts
orthogonal
opn arith_ls_instr (ar arith_instr ls
indirect_mv)

Each possible derivation from these rules
represents
a legal instr
The structure (hierarchy) in the Instr set
captured by
the production rules

Jvp_core
Direct_move
Control_instr
Arith_ls_instr
Dir_store
Reg_mov
Dir_ld
. . . . . .
Indirect_mov
Arith_instr
OR_rule
. . . . . .
. . . . . .
AND_rule
17
Grammer Attributes

OR rules just pass attributes
AND rules define 4 types
Action attribute
specifies what is executed by instr/instr part
Each AND rule have one action attribute
Syntax attribute
specifies assembler syntax (mnemonic)
Each AND rule may have multiple syntax attrs
Image attribute
defines binary encoding of instr/instr part
Value and Mode attributes
specifies how a storage element is addressed

18
Eg Instruction part performing immediate
shift opn shift_instr (al alu_left_op, factor
c_3) action A al.value
// read the operand C pass(A, AS) _at_alu //
pass it thru ALU AR C ltlt factor _at_sh //
perform shift image 11 al.image factor

Operation-types which can be executed on ALU
specified best using switch statements opn
alu_op (op alu) action switch (op)
case add C add(A, B, AS) _at_alu case
sub C sub(A, B, AS) _at_alu . .
image 0 op
19
Control Instructions

Modelled using switch statements
Action attributes contain primitive operation
types that model controller
opn cond_jump (t c_10, c cond)
action
switch (c)
case EQ
tC eq(AS)
jump(tC, t)
case GT
tC ge (AS)
jump(tC, t)
.
.
image 100xxx c t

20
The CHESS Environment
C (nDL) -primitive datatypes -and
operations -appln algos
nML proc description -inst set -high level
structure
Front end
Front end
High level optimization
CDFG
ISG
Code selection
LIB
Register allocation
Scheduling
Machine code
21
Instruction Set Graph (ISG)

Intermediate processor model
Directed Bi-partite Graph
GISGltVISG, EISGgt
Vertices VISG VS U VI
VS storage elements
VI operating-types
Edges EISG C (VS x VI) U (VI x VS)
connectivity
data flow

22
Partial ISG of a processor
AX(num)
AR(num)
MR1(num)
MR0(num)
AX read_reg 00xxx0xx 01100xxx AR_r
AX read_reg 00xxx0xx 01100xxx AX_r
..
..
AR_r(num)
AX_r(num)
AX_r copy 00xxx0xx 01100xxx A
AR_r copy 00xxx1xx B
B(num)
A(num)
Static storage
A B and 00010xxx C
AS_w
Transitory
Operation type
AR (num)
..
23
ISG (contd)

ISG Operation-types
Defn Primitive processor operation activity, has
fixed no of ordered i/p args and o/p
each arg connected to one-edge and one storage
element
Impl of primitive-operation types defined in a
header file
Enabling Conditions
Each instr proc executes many oprn-types
One oprn-type enabled by many instrs
Defn All the instrs enabling an oprn-type
enabling(i)

24
Conflicts

Encoding conflicts
H/W or resource conflicts
Encoding Conflicts
Defn Subset of ISG oprn-types Vio C VI
enabling (Vio) Intersection I ? VIo enabling
(I)
Vio has encoding conflict if enabling (Vio) F
For packing 2 oprn-type into an instrn

Resource (H/W) conflicts
Several oprn-types contend for the same resource
input (i, n) Vi x N -gt Vs
output(i, n) Vi x N -gt Vs
read and write ports transitories
H/W conflict modelled as access conflict on
transitories
To check H/W conflicts
resources(i) set of all transitories oper i
accesses
Vio C Vi, if for all ii , ij ? Vio ii NE ij and
resources (ii) inters resources (ij ) ?
then Vio is free of H/W conflicts

26
nML to ISG front end

nML is parsed into a parse tree
parse tree passed thru, 3 passes
pass 1 Finding instruction word length and
locating the position of each image attribute in
the instrn
pass 2 Finding enabling conditions and all
specified instructions
pass 3 Finding exact enabling cond using the
set of instrs found in pass 2

27
CDFG (IR for application)

Similar to ISG
Directed Bi-partite Graph
GCDFGltVCDFG , ECDFGgt
Vertices VCDFG VO U VV
VO operations in application
VZ values that operations produce/consume
Edges ECDFG C (VO x VV) U (VV x VO)
Represents data-flow from operations through
values
Control-flow is modelled by imposing hierarchy of
macronodes on CDFG operations
macronodes have type
basic block, if-stat, for-stat, do-stat

28
Eg of CDFG
root
Block (init)
a
b
c
d
Do-stat
Block (loop- init)
x
If-stat

t4
Block (then)
Block (else)
t2
t3
-
Block (loop-end)
t1
Data flow of (a(bc))-((bc)d)
Control flow of a do-while loop
29

CDFG Operation Types
Operation types used in application
Could be hierarchial
Different from the ISG operation types
All operation types both applications and
processors are declared in an header file
The operation types are linked by a library(LIB)
which defines the operation heirarchy

30
Operation-Type hierarchy (LIB)

The LIB contains 3 parts
Proc independent part
defines operation properties
eg commutative, inline fun, primitive
Proc dependent part from header file
proc dependent part from nML part
Basic Idea
Operation types in LIB organised in a hierarchial
way that represents different ways in which, CDFG
operation can be mapped to an ISG operation

31
Operation type Hierarchy (eg.)
func_opn
comm_opn
sub
add
X
Y
sub_XY
sub_YX
add_XY
add_YX
C
ISG
32
The CHESS Environment
C (nDL) -primitive datatypes -and
operations -appln algos
nML proc description -inst set -high level
structure
Front end
Front end
High level optimization
CDFG
ISG
Code selection
LIB
Register allocation
Scheduling
Machine code
33
Code Generation Process

Mapping of GCDFGltVo, Vvgt onto GISGltVI, Vsgt
Vv onto Vs
Vo onto VI
Assumptions
Basic block by basic block
Transitories have zero delay
So, each oprn type executes in 1 cycle
Phases
Code selection phase
Refinement
Bundling
Covering
Register allocation
Scheduling

34
Code Seletion (Refinement)

Replacing CDFG operations by its childern
refinement (o) r
Valid Refinement
A CDFG operation r ? VOR is valid refinement for
a CDFG operation o ? VO with type (o) ? L, iff
type( r) i i is subtype of type (o)
datatype(input(o,n)) datatype(input(i,n))
datatype(output(o,n)) datatype(output(i,n))
Valid Mapping
mapping (o) i, o ? VO i ? VI
i type (r) type (refinement(o))

35
Refinement (contd)

Binding data dendency
2 types
Direct data dependency
Allocated data dependency
Direct data dependency
Data dep b/w 2 refined CDFG oprn r1 r2 is
direct if it is implemented as a valid direct
path in ISG
direct path A path in ISG, b/w 2 operations that
does not iny any storage other than transitories
direct (ri, rj) true/false
Allocated data dependency
Data dep b/w 2 refined CDFG oprn r1 r2 is
allocated if it is implemented as a path in
ISG, that has one or more static storage elements

36
Bundling

Idea To find conflict free CDFG operations
that can be executed in same cycle
Defn Set of CDFG operations that can be refined
to form a refined bundle
Refined bundle
Set of all refined operations r1 r2 ? VOR that
are coupled
coupled (r1, r2) true/false
true if
r1 r2 or
direct(r1, r2) or
coupled (r1, r3) couple(r3, r2)
Defined for a given refinement function
Each bundle can be associated with a set of
refinement functions

37
Properties/constraints on bundles

Same cycle theorem
2 refined operations that have a direct data
dependency belong to the same bundle if they
have allocated data dependency they cannot be in
the same bundle
Operations in bundles should not have encoding or
Resource conflicts
Bundles need to be convex
convex bundle if no opern path b/w 2 of its
opern contains an oprn path external to the bundle

ltlt1

x
Not a convex bundle
gtgt2
38

Refined bundle which satisfies those prop is
called a valid refined bundle
A bundle is valid if its operns can be refined to
form a valid refined bundle
Eg of Valid bundle
In effect, each valid bundle coresponds to an
intruction/instruction part

ltlt1

x
gtgt2
39
Refinement and Bundling in a nutshell
library
sub
Func_oprn
Comm_opn
sub
ltlt
add
ltlt
A
B
Sub_BA
Add_AB
Sub_AB
Sub_AB
C
ltlt_C
ltlt_C
AR_w
mapping
ISG
refinement
type
40
Code Selection (covering)

Previous stages, give all possibilites of valid
bundling
Each oprn may be coverd by one or more bundles
and, each bundle covers one or more oprns
Minimum graph cover
Given a collection of bundles B that induce
patterns in CDFG, problem is to seqrch for q
minimum number of patterns that cover the whole
GCDFG
cost fn no of bundles
Solution Branch and Bound Algorithm

41
Branch and Bound (Basic strategy)

Find essential bundles
if oi is covered by only one bundle
Add these to the cover C
For the rest, build a search tree
each node is a partial cover of CDFG
Branching at each node models selection of
bundles
Depth traversal gives a cover C

B1
x
B4
B5

B3
B2
42
The search tree
start
O2
O2
B4o1, o2
B2o2
O3
O3
O3
O3
B5o1, o3 cost 2
B3o3
B5o1, o3
B4o3
O1
O1
O1
B1o1 cost 3
B1o1 cost 3
B1o1 cost 3
Covers B5, B4, B1, B3, B4
43
Issues in covering

Overlapping bundles
operation duplication
Order of choosing the operations oi
size of tree can be reduced
eg Increasing order of BOI
Pruning and branching heuristics

44
Register Allocation and Scheduling

Register allocation
Binds the values to registers/ memory
Modelled as Data routing problem (ISG)
Makes sure capacity of storage is not exceeded
spilling values to memory
fixing execution order b/w bundles
Scheduling (compaction phase)
oprns are bound to time
oprns packed into instruction
oprns in same bundle exectute in same instr
diff bundles may be scheduled in parallel in the
same instrn

45
Other issues related to Code generation

Code generation beyond basic blocks
Bundling of operations beyond basic blocks
scheduling done globally
oprns could still be moved across blocks
loop unfolding or S/W pipeling
Phase Coupling
delayed binding
common operands
coupling by cost funtions
cost for each bundle different
cost (C) S cost (Bi)
scheduling in parallel is emphasised

46
Compilation flow using CHESS environment
Proc modelling
Processor.h
NOODLE
Processor.cdfg
Proc.nml
Processor.lib
ANIMAL
Processor.isg
Processor.h program.c
NOODLE
Application prog
Program.cdfg.cdfg
COSEL
Prgram_bndl.lib
Program_bndl.cdfg
Program_bndl.isg
AMNESIA
Program_dr.cdfg
MIST
Program_sch.cdfg
STATIC
Program.micro

Write a Comment

User Comments (0)