Code Selection for Media Processors with SIMD Instructions - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Code Selection for Media Processors with SIMD Instructions

Description:

The primary goal of traditional code selectors is ... Consistency for Write/read to/from 16-bit CSE. For two nodes two be packed into a SIMD pair, ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 29
Provided by: carl294
Category:

less

Transcript and Presenter's Notes

Title: Code Selection for Media Processors with SIMD Instructions


1
Code Selection for Media Processors with SIMD
Instructions
  • Rainer Leupers
  • DATE00

2
Code Selection
  • Source ? Intermediate Representation (IR) ?
    Assembly
  • Back end consists of
  • code selection
  • optimization, etc
  • The primary goal of traditional code selectors is
  • to find the minimum cost machine code that is
    equivalent to a given IR
  • cost latency, size,
  • the best known code selector twig by Aho et
    al.TOPLAS89
  • In this paper, extend twig to handle SIMD
    instructions
  • SIMD instruction manipulates data-stored
    subregisters instead of full registers (in this
    paper)

front end
back end
3
Code Generation Using Tree Matching and Dynamic
Programming (Twig)
  • Alfred V. Aho et al.
  • TOPLAS89

4
Code Generation by Tree Matching An Example
(4)
  • IR of ai b
  • Output of a front-end
  • Input of a code generator

(3)
(2)
(1)
  • Machine code of the IR
  • Output of the code generator

(1)
(2)
(3)
(4)
5
Code Generation by Tree Matching An Example
  • Reduction rules

6
Code Generation by Tree Matching An Example
i0, ca
i0, jSP
7
Code Generation by Tree Matching An Example
  • What if several matches exist?
  • Ties are broken by cost
  • How can the global cost be considered at local
    decision steps?
  • By dynamic programming!

?
8
Code Generation by Tree Matching An Example
9
Code Generation by Tree Matching An Example
  • The minimum-cost cover of the IR of ai b !

10
Dynamic Programming for Minimum-Cost Covering
  • Informally, a necessary and sufficient condition
    for a problem to have an optimal dynamic
    programming algorithm is that
  • the problem exhibits optimal substructure!
  • For the problem of optimal code generation, OK
  • Partition the problem of generating optimal code
    for an expression E into subproblems of
    generating optimal code for the subexpressions
  • and later merge.. (with some manipulation..)

11
When Common Subexpressions Exists..
?
b a1 c a1
  • The problem of optimal code generation becomes
    intractable
  • The algorithm does not guarantee an optimal
    solution

12
Code Selection for Media Processors with SIMD
Instructions
  • Rainer Leupers
  • DATE00

13
Code Selection
  • Source ? Intermediate Representation (IR) ?
    Assembly
  • Back end consists of
  • code selection
  • optimization, etc
  • The primary goal of traditional code selectors is
  • to find the minimum cost machine code that is
    equivalent to a given IR
  • cost latency, size,
  • the best known code selector twig by Aho et
    al.TOPLAS89
  • In this paper, extend twig to handle SIMD
    instructions
  • SIMD instruction manipulates data-stored
    subregisters instead of full registers (in this
    paper)

front end
back end
14
SIMD Instructions
  • Media Processors..
  • TI C62xx, Philips Trimedia, Intel MMX
  • Typically, word length of media processors 32
    bits. But,
  • audio data 16 bits, video data 8 bits ? waste
    of resources !
  • So, they provide special instructions, called
    SIMD instr.
  • virtually split each full register into multiple
    subregisters and
  • perform identical computations on the
    subregisters in parallel
  • But, existing code selection techniques are not
    applicable..
  • Ad hoc techniques compiler intrinsics,
    hand-optimized libraries

15
Why cant traditional code selectors be used for
SIMD?
  • Recall Ahos algorithm
  • Processes one data flow tree (DFT) after another
  • But, to select SIMD instructions, it requires to
    simultaneously cover multiple DFTs !
  • Then, why dont we select SIMD instructions after
    (traditional) code selection step?
  • Register allocation.. that is,
  • if multiple values share a single 32-bit
    register, their live ranges may interfere !
  • So, we should be much more conservative. (c.f.
    Gebotys, DAC00)

16
An Example of Parallelization
  • SIMD instructions
  • ADD2
  • Two 16 bits add ops
  • LOAD2
  • Two 16 bits load ops
  • STORE2
  • Two 16 bits store ops

17
A Difficulty in Exploiting SIMD Instructions
  • Parallel load store..
  • How to check that the memory address is
    contiguous
  • In terms of PL community, must-alias analysis
  • In this paper, traditional data flow analysis was
    used
  • For non-toy programs, more strong analysis
    techniques needed (e.g., abstract interpretation,
    constraint-based analysis)

18
Overview of the Code Generator
  • 1 Identify contiguous load/store operations
  • 2 Add pseudo reduction rules to the original
    DFTs
  • 3 Find alternative covers for the modified DFTs
    by Ahos dynamic programming algorithm (Covering
    Phase)
  • - may contain invalid covers (cannot avoid
    when directly applying Ahos algorithm)
  • 4 Select a valid cover that maximizes the use
    of SIMD instructions (Selection Phase)
  • - transform into ILP formulation
  • - the validity constraint can be easily
    expressed in the ILP formulation

19
Covering Selection Phases..
  • Recall Ahos algorithm
  • When find a cover,
  • by dynamic programming
  • in a bottom-up manner
  • only one optimal solution is stored for each node
  • Ties are broken arbitrarily
  • Selecting phase is trivial
  • The uniquely stored cover directly corresponds to
    machine instructions
  • For the problem with SIMD instrs., why not find a
    global optimum solution at covering phase?
  • Enforcing validity condition is difficult at
    covering phase if ahos dynamic programming
    covering algorithm is to be used..
  • So, at covering phase, find alternative solutions
    among which a global optimal solution exists, and
    then
  • Select the optimal solution by enforcing
    validity condition

20
Covering Phases An Example
  • a1 b1c1
  • a2 b2c2
  • b1 mem1 (16 bits)
  • b2 mem2 (16 bits)
  • c1 .. , c2

2
reg_lo

reg_lo
reg_lo
reg
2
reg
reg
reg_hi
reg_hi
reg_hi
a1
a2
load
reg

addr

c2
c1
b2
b1
fetch
fetch
fetch
fetch
mem2

mem1

21
Selection Phases
  • What is a necessary and sufficient condition for
    validity ?
  • Each node is covered by exactly one rule
  • Type of parent Types of children
  • Consistency for Write/read to/from 16-bit CSE
  • For two nodes two be packed into a SIMD pair,
  • Yet another artifact constraint to avoid
    scheduling deadlock
  • Cyclic dependence due to true dependence false
    (output) dependence pair
  • Subject to these constraints, select a cover that
    maximizes the use of SIMD instructions
  • These can be expressed by ILP formulation

22
Constraint 1
  • Each node is covered by exactly one rule

covered by r_j or not
23
Constraint 2
  • Type of parent Types of children

24
Constraint 3
  • Consistency for Write/read to/from 16-bit CSE

25
Constraint 4
  • For two nodes two be packed into a SIMD pair,

26
Constraint 5
  • Yet another artifact constraint to avoid
    scheduling deadlock
  • Cyclic dependence due to true dependence false
    (output) dependence pair

27
Optimization Goal
  • Subject to these constraints, select a cover that
    maximizes the use of SIMD instructions

28
Experimental Results
Write a Comment
User Comments (0)
About PowerShow.com