Modles et outils mathmatiques pour la compilation - PowerPoint PPT Presentation

About This Presentation
Title:

Modles et outils mathmatiques pour la compilation

Description:

Mathematical tools for high-level program transformations. ... Wilde, Rajopadhye (1996), Quiller , Rajopadhye (2000): projections. ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 38
Provided by: alain52
Category:

less

Transcript and Presenter's Notes

Title: Modles et outils mathmatiques pour la compilation


1
Lattice-Based Memory Allocation
Alain Darte Compsys Project Compilation and
Embedded Systems CNRS, LIP, ENS-Lyon, France
Joint work with Rob Schreiber (HP Labs) and
Gilles Villard (CNRS, LIP).
References CASES03, IEEE Transactions on
Computers (to appear).
WOG04, April. 25 th, 2004. Recent trends in
Compiler Construction. Sven Verdoolaeges PhD
Defense.
2
Outline
  • Introduction
  • The initial context PICO, HP Labs software tool
    for compiling high-level programs (e.g., C code)
    into NPAs (Non Programmable Accelerators). How to
    store intermediate results?
  • Mathematical tools for high-level program
    transformations.
  • An example of communicating pipelined loops.
  • Lattice-based memory allocation.
  • Examples of previous work limitations.
  • Main results and open questions.

3
PiCo (Program In Chip Out)
HP Labs automatic generation of non programmable
accelerator (NPA)

Similar tools MMAlpha (Inria), Atomium (IMEC),
Compaan (Leiden) Other possible inputs
Recurrence equations, Matlab, Kahn processes
4
High-Level Program Optimizations
  • Program analysis dependence analysis, lifetime
    analysis, footprint analysis, array expansion,
    array renaming, etc.
  • Code and loop transformations tiling,
    scheduling, nested loop transformations, modulo
    scheduling, etc.
  • ? Well-established mathematical tools and
    theory graph algorithms, polyhedral
    manipulations, Hermite/Smith forms, integer
    linear programming, Ehrhart polynomials, etc.
  • BUT
  • Memory optimizations
  • optimization of local memory (intra-loop buffer)
  • optimization of inter-loop buffers for
    communicating NPAs.
  • ? No suitable mathematical tools so far.

5
Example DCT-like code.
First NPA do br 0, 63 do bc 0, 63
do r 0, 7 A(br, bc, r, )
enddo enddo enddo
Second NPA do br 0, 63 do bc 0, 63
do c 0, 7 A(br, bc, , c)
enddo enddo enddo
pipelined with
Memory for A
  • How to schedule the computations?
  • How to allocate elements of A in local memory so
    as to reduce its size?
  • a) Full array 256K elements. b) Optimized size
    112 elements (lt 2 blocks).

A(br, bc, r, c) mapped to (r mod 4, 16(brbc)
2r c mod 28)
6
Outline
  • Introduction.
  • Lattice-based memory allocation
  • Definition of modular allocations.
  • Conflicting indices and critical lattices.
  • Examples of limitations of previous work.
  • Main results and open questions.

7
Memory Reduction Problem for Arrays
Given a scheduled program (i.e., operations are
not reordered), or several communicating
programs, find the minimal memory size to store
intermediate values and an adequate memory
mapping.
  • Lifetime analysis
  • Schedule of computations ? Lifetime for each
    value (similar to dependence analysis, exact or
    over-approximated).
  • Memory reuse
  • Values simultaneously live should not share the
    same location (constraints similar to register
    allocation).
  • Restrict to simple addressing functions (for
    code generation)
  • canonical linearization, linear mapping in
    multi-dimensional arrays
  • wrapping with modulo
    operations (reuse).
  • ? All are special cases of modular memory
    allocations.

8
Modular Mappings
  • Generalization of (rotating) registers in higher
    dimensions
  • Value indexed by i writes in multi-dimensional
    position Mi mod b, where b is a positive integral
    vector, and M an integral matrix.
  • Ex i(i1,i2) stored at _at_ (2i1i2 mod 3,
    i1i2 mod 6) ? b(3,6), size 18.

Given a schedule and a lifetime analysis, find a
valid allocation (M,b) such that the product of
the components of b (memory size) is minimized.
  • Generalizes all previous approaches
  • De Greef, Catthoor, De Man (1996-1997)
    linearizations 1 modulo
  • Lefebvre, Feautrier (1996-1997) successive
    modulos.
  • Wilde, Rajopadhye (1996), Quilleré, Rajopadhye
    (2000) projections.
  • Strout, Carter, Ferrante, Simon (ASPLOS98) only
    1 modulo.
  • Thies, Vivien, Sheldon, Amarasinghe (PLDI01)
    same.

9
Our Main Contributions
Thies et al., PLDI01 There is a need for a
technique able to consider more general storage
mappings and that would allow variations in the
number of array dimensions, while still capturing
the directional and modular reuse of the
occupancy vector.
  • We identify the fundamental object to work with
  • The set S of all differences of conflicting
    indices.
  • We show the link with critical lattices
  • Finding the best allocation Mi mod b among ALL
    possible modular allocation amounts to find the
    critical integer lattice for the set S.
  • We give guaranteed heuristics to approximate the
    optimal
  • ? It explains previous work
  • ? It gives new (and better) solutions
  • ? It shows the link with theoretical work on
    successive minima, basis reduction, Minkowskis
    theorems, etc.

10
Outline
  • Introduction.
  • Lattice-based memory allocation.
  • Examples of previous work limitations
  • rely on particular linearizations,
  • or may wrap along the wrong axis.
  • Main results and open questions.

11
De Greef, Catthoor, and De Man
  • Were the first to identify the need for memory
    reduction techniques for embedded multimedia
    applications. ? Patent (1996) for intra- and
    inter-array memory reuse.
  • Inter-array reuse
  • Geometrical heuristics for packing different
    arrays in a given memory buffer. ? will not be
    discussed here.
  • Intra-array memory reuse
  • Consider each original d-dimensional array and
    its 2dd! canonical linearizations. (Example in 2D
    for an NxM array, look at 8 linearizations Mij,
    Mi-j, -Mij, -Mi-j, iNj, i-Nj, -iNj, -i-Nj).
  • Compute the maximal address difference D between
    two simultaneously live values.
  • Select the linearization with smallest distance D
    and wrap the array modulo (D1).

12
De Greef, Catthoor, De Man Example 1
do i 1,N do j 1,N a(i,j) ...
b(i,j) a(i-1,j) enddo enddo
do i 1,N do j 1,N a(Nij mod (N1))
... b(i,j) a(Nij1 mod (N1))
enddo enddo
do i 1,N do j 1,N a(-ij mod (N1))
... b(i,j) a(-ij1 mod (N1))
enddo enddo
Column-major order (Fortran-like) iNj,
maximal distance N(N-1)1 Row-major order
(C-like) Nij, maximal distance N ? Best
canonical linearization Nij mod (N1).
13
De Greef, Catthoor, De Man Example 2
How could we have missed this?
do i 1,N do j 1,N a(i,j) ...
b(i,j) a(i-1,j) enddo enddo
do t 2,2N / t ij / do j
max(1,t-N),min(N,t-1) a(t-j,j) ...
b(i,j) a(t-j-1,j) enddo enddo
do t 2,2N / t ij / do j
max(1,t-N),min(N,t-1) a(t-j) ...
b(i,j) a(t-j-1) enddo enddo
Any canonical linearization leads to a distance
T(N2)! But the allocation i mod N, or even i is
just fine!
14
Lefebvre and Feautrier
  • Developed in the context of parallelizing
    compilers
  • a) Eliminate spurious memory dependences thanks
    to single assignment form b) Wrap memory back
    when possible.
  • Inter-array reuse
  • Coloring heuristics on array names (as for
    register allocation).
  • Intra-array memory reuse
  • Idea 1 forget about original arrays, focus on
    original loop indices.
  • Idea 2 wrap successively in each dimension with
    modulos.
  • ? As a computational point of view, use classical
    techniques based on (rational) linear programming.

15
Lefebvre, Feautrier Example 1 revisited
do i 1,N do j 1,N a(i,j) ...
b(i,j) a(i-1,j) enddo enddo
do i 1,N do j 1,N a(i mod 2, j)
... b(i,j) a(i-1 mod 2, j) enddo enddo
Along i, maximal distance 1 ? i mod 2. Along j
(for a fixed i), maximal distance N-1 ? j mod
N, i.e., j. ? Selected allocation (i mod 2, j),
with a memory size 2N (note N1 in previous
solution).
16
Lefebvre, Feautrier Example 2 revisited
do i 1,N do j 1,N a(i,j) ...
b(i,j) a(i-1,j) enddo enddo
do t 2,2N / t ij / do j
max(1,t-N),min(N,t-1) a(t-j) ...
b(i,j) a(t-j-1) enddo enddo
Along i, maximal distance N-1 ? i mod N, i.e.,
i. Along j (for a fixed i), maximal distance 0
? no extra dimension. ? Selected allocation i mod
N, i.e., i. (Note order N2 in previous
solution)
17
Lefebvre, Feautrier Example 3
do i 1,N do j 1,N a(i,j) ...
enddo enddo
pipelined 1 clock cycle later with
do i 1,N do j 1,N b(i,j)
a(i,j)... enddo enddo
  • Along i, maximal distance 1 ? i mod 2
  • Along j (for a fixed i), maximal distance 1 ? j
    mod 2.
  • Selected allocation (i mod 2, j mod 2) and size
    4. OK.

18
Lefebvre, Feautrier Example 3 (variant)
do t 2,2N / t ij / do j
max(1,t-N),min(N,t-1) a(t-j,j) ...
enddo enddo
pipelined 1 clock cycle later with
do t 2,2N / t ij / do j
max(1,t-N),min(N,t-1) b(t-,j)
a(t-j,j)... enddo enddo
  • Along i, maximal distance N-1 ? i mod N
  • Along j (for fixed i), max. dist 0 ? j mod 1.
  • Corresponding memory size N!
  • Same if starting with j. FAIL!

19
Outline
  • Introduction.
  • Lattice-based memory allocation.
  • Examples of previous work limitations.
  • Main results and open questions
  • No way to explain quickly all details, even to
    experts in lattice theory and reduction theory...
  • See CASES03 proceedings, research report
    (http//perso.ens-lyon.fr/alain.darte) or, IEEE
    TC journal version (to appear).
  • But I can try to
  • Explain basic concepts of critical lattice and
    modular allocations.
  • Illustrate different mechanisms.
  • State results.

20
There was a Need for a Framework for Memory
Reduction Based on Modular Allocations
  • Lower bounds
  • Given a lifetime analysis, can we give a lower
    bound for the best achievable memory size? What
    is the best modular memory allocation?
  • Upper bounds
  • Can we find mechanisms leading to allocations
    whose corresponding memory size is not
    arbitrarily bad compared to the lower bound
    (guaranteed heuristics)?
  • Robustness
  • We need a framework that can possibly capture
    parameters, that does not depend on the basis in
    which the problem is described, etc. ?
    Geometrical model.
  • Computability
  • We need to make sure the mechanisms are
    constructive and lead to heuristics (or
    algorithms) that can be implemented.

21
Set of Conflicting Index Differences
  • Index description
  • Choose an index description for values that are
    going to share a given array (the allocation will
    be linear with respect to these indices).
    Typically, loop indices, array indices, etc.
  • Sef of conflicting index differences
  • Build the set CS of pairs of conflicting (i.e.,
    simultaneously live) indices (i,j), and the set
    DS of differences (i-j).
  • We want (i,j) ? CS, i ? j ? Mi mod b ? Mj
    mod b, or equivalently
  • d ? DS, d ? 0 ? Md mod b ? 0 ,
    or equivalently

Md mod b 0, d ? DS ? d 0
22
Admissible and Critical Lattices
  • The kernel of (M,b)
  • The set ? i Mi mod b 0 is a
    full-dimensional lattice.
  • (M,b) is valid iff ? ? DS ? 0, i.e., ? is an
    admissible lattice for DS.
  • Conversely
  • If A is a basis for ?, admissible integral
    lattice for DS, compute the Smith form A Q1 S
    Q2 with Q1 and Q2 unimodular, S diag(b).
  • The mapping (M,b) where M is the inverse of Q1
    has the kernel ?, thus is a valid allocation with
    memory size det(S) det(?).
  • ? The modular allocation with smallest memory
    size corresponds to a critical integer lattice
    for DS, i.e., an admissible integer lattice for
    DS with smallest determinant.

23
Modular Mappings Toy Example
Corners (-1,5), (1,-5), (8,1), (-8,-1)
24
Modular Mappings Toy Example
Bounding Box (i mod 9, j mod 6) ? Size 54
Corners (-1,5), (1,-5), (8,1), (-8,-1)
25
Modular Mappings Toy Example
Successive modulos (i mod 9, j mod 5) ? Size
45
Corners (-1,5), (1,-5), (8,1), (-8,-1)
26
Modular Mappings Toy Example
Skewed Bounding Box (i-j mod 8, j mod 6) ? Size
48
Corners (-1,5), (1,-5), (8,1), (-8,-1)
27
Modular Mappings Toy Example
Skewed successive modulos (i-j mod 8, j mod 4)
? Size 32
Corners (-1,5), (1,-5), (8,1), (-8,-1)
28
Modular Mappings Toy Example
Better allocation (i-j mod 7, j mod 4) ? Size
28
Corners (-1,5), (1,-5), (8,1), (-8,-1)
29
Modular Mappings Toy Example
Critical lattice basis (4,3), (8,0) ? Best
allocation (3i-4j mod 24).
Corners (-1,5), (1,-5), (8,1), (-8,-1)
30
Results for 0-Symmetric Convex Bodies
  • We work with a 0-symmetric polytope K such that
    DS ?K. (actually, we assume that the vector
    spaces generated by the points in K and the
    integer points in K are equal ? K is
    full-dimensional)
  • Lower bound in terms of volume Vol(K)/2n
  • Optimal solution found by optimized enumeration
    ILP.
  • Heuristics exist with memory size ? cn Vol(K)
    where cn depends on the dimension n only. ?
    guaranteed heuristics.
  • One heuristic uses exactly Lefebvre-Feautrier
    mechanism but in a well-chosen basis. Always
    equivalent (i.e., with same memory size) to a
    particular linearization ( 1D mapping).
  • Another heuristic (Rogers principle) works even
    for arbitrary sets, but equivalent linearization
    not clear.
  • In practice follow the schedule, when possible...

Reference Gruber and Lekkerkerker, Geometry of
Numbers.
31
Remarks on critical lattices
  • a) Hard to find the critical lattice, starting
    from 3D, even for simple bodies. b) critical
    integer lattice ? critical lattice for large
    bodies. ? Hard to find the optimal, heuristics
    needed.
  • Lower bound in terms of volume ?(K) ? Vol(K)/2n
  • If S-S ?K, then all elements in S are mapped to
    different locations ? ?(K) ? Card(S).
  • Minkowskis first theorem if ? is a lattice and
    K is 0-symmetric with Vol(K) ? 2n det(?), then K
    contains a nonzero lattice point of ?.
  • Gauge function F(x) inf?gt0 x in ?K is a
    distance function.
  • Successive minima ?i(K) inf? ? 0 dim(Vect(?K
    ? Zn)) ? i.
  • Minkowskis second theorem

(2n/n!)det(?) ? ?1(K) ?n(K) ? 2ndet(?)
32
Looking for the optimal solution
  • Generate all possible lattices of a given
    determinant
  • Avoid duplicates each lattice is uniquely
    determined by its Hermite form (triangular
    matrix).
  • (Remark not clear we could do the same for
    non-equivalent mappings without reasoning with
    the corresponding lattices.)
  • Check that the lattice is admissible for K,
    either by ILP, or by enumeration if integer
    points in K can be enumerated.
  • For the DCT example
  • in 4D, optimal 112, there are 86.416.644
    lattices to check, it takes roughly 2 days!
  • rewritten in 3D, optimal 112, there are 941.901
    lattices to check, it takes roughly 30 minutes.
  • ? Feasible only for small sets K and small
    dimensions.

33
Rogers heuristic adapted
  • Choose n positive integers ?i such that ?i is a
    multiple of ?(i1) and dim(Li) ? i-1 where Li
    Vect(K/?i ? Zn).
  • Choose a basis (a1, , an) of Zn s. t. Li?
    Vect(a1, , ai-1).
  • Define ? the lattice generated by the vectors ?i
    ai.
  • ? det(?) ? n! Vol(K)

34
Heuristic based on K (i.e., lattice)
  • Choose n linearly independent integer vectors
    (a1, , an)
  • Compute Fi(ai) inf F(y) y in ai Vect(a1,
    , ai-1).
  • Choose n integers ?i such that ?i Fi(ai) gt 1.
  • Define ? the lattice generated by the vectors ?i
    ai.
  • ? det(?) ? (n!)2 Vol(K) if Fi(ai) ? 1 for all i

35
Heuristic based on K (i.e., mapping)
  • K dual (or polar reciprocal) of K y y.x ?
    1 for all x in K
  • K K, F related to F, Vol(K) related to
    Vol(K), successive minima related, etc.
  • Choose n linearly independent integer vectors
    (c1, , cn)
  • Compute Fi(ci) supci.x x in K, c1.x
    ci-1.x 0
  • Choose n integers ?i such that ?i gtFi(ci).
  • Define the mapping (M,b) with the ci as rows of M
    and b?.
  • ? det(?) ? (n!)2 Vol(K) if Fi(ci) ? 1 for all
    I
  • ? Dual of the previous heuristic. Exactly
    Lefebvre-Feautrier in a well-chosen basis.

36
Important practical factors
  • The set DS can be skewed for 3 reasons
  • Skewed iteration sub-domain with respect to full
    domain.
  • Skewed schedule with respect to iteration domain.
  • Skewed access function when reasoning with array
    indices.
  • ? In practice, following the schedule -- if it
    is expressed as a basis -- is not too bad.
  • ? But, ad-hoc counter-examples can be built. And
    schedule basis may be hidden in a linearized
    schedule.

37
Open or On-Going Questions
  • How much do we loose if we restrict to 1D
    mappings?
  • How much do we loose, when restricting to modular
    mappings, compared to MAXLIVE?
  • Mixing both Lefebvre-Feautrier (successive
    modulos) and Quilleré-Rajopadhye (choice of
    basis) is often ok (i.e., follow the schedule and
    wrap). Can we quickly identify when?
  • How costly and how good are the heuristics in
    practice?
  • How to handle more general cases (union of
    polyhedra for conflicting differences, multiple
    arrays, etc.).
  • Can this be used as a basis for solving the
    general problem (i.e., find the schedule with
    minimal memory requirements)?
  • Fully implemented in Cl_at_K parameters still in
    progress
Write a Comment
User Comments (0)
About PowerShow.com