Loop Transformations - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Loop Transformations

Description:

Loop transformations can be used systematically in a variety of different ways ... This particular strategy was developed to generate ... Idiom Recognition ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 53
Provided by: barbara179
Category:

less

Transcript and Presenter's Notes

Title: Loop Transformations


1
Loop Transformations
  • Vectorization
  • Statement Reordering
  • Loop distribution

2
Strategies for Optimization
  • Loop transformations can be used systematically
    in a variety of different ways to achieve a
    desired program form.
  • This particular strategy was developed to
    generate code for vector supercomputers
  • It can be used for vector co-processors and with
    few modifications for SIMD co-processors

3
The xvector option
  • Transforms loops using an intrinsic function to a
    single call to a faster vectorized equivalent
  • for ( I0 IltN I)
  • xI exp(yI)
  • Syntax -xvectoryes no
  • These vector functions are a part of the libmvec
  • (man libmvec)
  • It has to be added on the link line also

4
Vectorization
Vector supercomputers, vector co-processors and
with few modifications for SIMD co-processors
  • Goals
  • generate array syntax
  • convert Fortran 77 to Fortran 90
  • create vector code for vector supercomputers or
    vector co-processors
  • Approach
  • convert as many statements in program loops as
    possible into vector form

5
Vector Code
  • A(2 101) B(1100) A(1100)
  • right hand side is evaluated, then left hand side
  • Above is not the same as
  • DO I 2, 101
  • A(I) B(I-1) A(I -1)
  • END DO

True dependence!
6
Vector Code
A(2 101) B(1100) A(1100)
  • is the same as
  • DO I 2, 101
  • TEMP(I) B(I-1) A(I-1)
  • END DO
  • DO I 2, 101
  • A(I) TEMP(I)
  • END DO

No dependences!
Finish this loop before we start next one!
No dependences!
7
Vectorization
  • Transformation generates a vector statement to
    replace a statement within a loop nest.
  • It is applied to individual statements.
  • But we must first generate series of loop nests
    where the loop body consists of a single
    statement.
  • Vectorization is legal if it preserves all
    dependences of the original code.

8
Example
DO I 1, N A(I) B(I) C(I) D(I)
A(I-1) D(I) END DO
this becomes A(1N) B(1N) C(1N)
  • is equivalent to-

DO I 1, N A(I) B(I) C(I) END DO DO I
1, N D(I) A(I-1) D(I) END DO
this becomes D(1N) A(0N-1) D(1N)
9
Strategy
  • First, loop distribution is used to replace a
    loop nest by multiple loops with a single
    statement each
  • and then vector code generation is used to
    vectorize each statement in both loops.
  • Works so long as there is no dependence cycle
    involving these statements.

10
In General
  • If there is a dependence with source in a
    statement that comes textually after the sink
    statement, we re-order the statements in the loop
    first.
  • The vector statement defining new values must be
    executed before the vector statement using them.

11
Statement Reordering Example
antidep
DO I 1, N A(I) B(I) C(I) D(I)
A(I1) D(I) END DO
this becomes D(1N) A(0N-1) D(1N)
DO I 1, N D(I) A(I-1) D(I) END DO DO I
1, N A(I) B(I) C(I) END DO
  • is equivalent to-

DO I 1, N D(I) A(I1) D(I) A(I) B(I)
C(I) END DO
this becomes A(1N) B(1N) C(1N)
12
Statement Reordering
  • The statement reordering transformation has many
    uses
  • Synopsis Exchange the position of two adjacent
    statements in the body of a loop.
  • Legality test Valid iff there is no
    loop-independent dependence between the pair of
    statements.

13
Statement Reordering
bef appears in source text before
  • Exchanges text position of two adjacent
    statements in a loop.
  • Given loop L containing adjacent statements S,
    S, with S bef S and S S does not hold.
    Creates loop L with S and S swapped.
  • Suppose S1 S2 for statements S1 and S2. Then
    there are iteration vectors i1 and i2 such that
    S1(i1) k S2(i2), and the former defines a
    value used by the latter.
  • If k , this would change the meaning of the
    loop. So this is what we must test for.

14
Loop Distribution
  • Used to isolate some statements in a loop nest
    from other statements.
  • Simple form of loop distribution attempts to
    transform a loop nest with multiple assignment
    statements in loop body to a series of loop
    nests, each containing a single assignment
    statement.
  • It is not necessary that the statements are all
    enclosed in exactly the same loop nest.

15
Loop Distribution Example
DO K 1, M A(K) 0. DO J 1, N
A(K) A(K) B(J,K) END DO END DO
DO K 1, M A(K) 0. END DO DO K 1, M
DO J 1, N A(K) A(K) B(J,K) END
DO END DO
becomes
16
Validity of Loop Distribution
  • Let S, S be statements in a loop L.
  • A dependence S S is backward iff S bef S.
  • We call all other dependences forward.
  • A backward dependence is loop-carried.
    Distribution of L would destroy it.
  • Therefore loop distribution may be applied to a
    loop L iff all of its dependences are forward.

17
Illegal Loop Distribution Example
S2 S1
DO I 1, N S1 A(I) B(I) C(I-1) S2 C(I)
B(I) END DO
DO I 1, N S1 A(I) B(I) C(I-1) END DO
DO I 1, N S2 C(I) B(I) END DO
is not equivalent to
Of course, if we had applied statement reordering
first.
18
Loop Distribution
  • Let L be a loop and L be the code obtained by a
    legal distribution of loop L.
  • Then L preserves the type of all dependences in
    L. But it does not preserve the level.
  • If S S in L, then S S in L.
  • If S c S in L. then S c S in L.
  • If S c S in L, where S ! S, then S S
    in L.

19
Loop Distribution
  • A loop L can be transformed into an equivalent
    loop L with no backward dependences
  • by a sequence of valid statement reordering
    transformations
  • iff its dependence graph is
  • either acyclic or
  • contains only single-statement cycles.
  • If vectorization is the objective, it can
    immediately follow

20
Limits of Loop Distribution
  • If there is a non-trivial cycle in the dependence
    graph,
  • e.g. S S and S S
  • then any legal statement reordering will exchange
    the dependences.
  • But this does not eliminate backward edges, so
    loop distribution cannot be applied to the
    resulting loop.

21
Loop Distribution
  • Assume loop L has an acyclic dependence graph.
  • 1. Number nodes of graph S1, .., Sn such that if
    Sj Sk, then j lt k.
  • Then if S bef S1, S S1 does not hold.
  • 2. So S1 can be moved to the first position in
    the loop by legal statement reorderings, etc.
  • 3. Now assume the first k statements in loop are
    S1,..,Sk. Let S be statement immediately before
    Sk1.
  • Since S Sk1 cannot hold, Sk1 can be reordered
    to follow Sk.
  • This has only forward edges.

22
Loop Fission
  • Name sometimes given to more general form of loop
    distribution
  • Some loops are only partially distributive.
  • The general transformation separates distributive
    statements from non-distributive code.
  • The former can then be vectorized.

23
Loop Fission Example
  • DO I 1, N
  • DO J 1, M
  • S1 A(I,J) ..
  • S2 A(I,J1) A(I,J)
  • END DO
  • S3 B(I) X Y(I)
  • S4 D(I) B(I) 1
  • END DO

dt
do
S3
dt
S4
24
Loop Fission Example
  • DO I 1, N
  • DO J 1, M
  • S1 A(I,J) ..
  • S2 A(I,J1) A(I,J)
  • END DO
  • END DO

vectorizable
DO I 1, N S3 B(I) X Y(I) END DO DO I
1, N S4 D(I) B(I) 1 END DO
Not vectorizable
25
Loop Fission Example
  • DO I 1, N
  • DO J 1, M
  • S1 A(I,J) ..
  • S2 A(I,J1) A(I,J)
  • END DO
  • END DO
  • S3 B(1N) X Y(1N)
  • S4 D(1N) B(1N) 1

dt
do
26
Vector Code Generation
  • Replaces single-statement loop that is not
    recurrence by vector (array) form
  • DO I L, U
  • A(I) B(I) C(I1)
  • END DO
  • becomes
  • A(LU) B(LU) C(L1U1)

27
General Vector Code Generation
  • Find cycles in dependence graph D.
  • 2. Create the acyclic condensation (this is new
    graph where each cycle is represented by a node)
    AC(D). Order its nodes.
  • 3. Perform sequence of statement reordering
    transformations so that all statements in a cycle
    are adjacent, and the order of statements in
    different cycles corresponds to their order in
    AC(D).
  • 4. Apply loop distribution to this loop.

28
Example
S1
  • DO I L, U
  • S1 A(I) B(I) C(I1)
  • S2 B(I1) A(I-1) D(I)
  • S3 E(I) E(I-1) 1
  • S4 C(I2) E(I) F(I)
  • END DO
  • S1 and S2 are in a dependence cycle
  • S3 has a single statement cycle
  • S3 d S4 and S4 d S1

dt
dt
S2
dt
dt
S3
dt
S4
29
Example
  • The acyclic condensation is
  • DO I L, U
  • S3 E(I) E(I-1) 1
  • END DO
  • DO I L, U
  • S4 C(I2) E(I) F(I)
  • END DO
  • DO I L, U
  • S1 A(I) B(I) C(I1)
  • S2 B(I1) A(I-1) D(I)
  • END DO

S3
S4
S1,S2
30
Example
  • Only the second loop is vectorizable
  • DO I L, U
  • S3 E(I) E(I-1) 1
  • END DO
  • S4 C(L2U2) E(LU) F(LU)
  • DO I L, U
  • S1 A(I) B(I) C(I1)
  • S2 B(I1) A(I-1) D(I)
  • END DO

recurrence
Dependence cycle
31
Vector Code Algorithm
  • This does not suffice to deal well with many loop
    nests encountered in practice.
  • Fortunately, we can still improve upon this
    result in general.
  • We
  • can sometimes further transform a loop nest so
    that the vector code transformation is applicable
  • can exploit opportunities for vectorization of
    some levels of a loop nest.
  • may also apply additional dependence-breaking
    transformations.

32
Scalar Expansion
DO K 1,N DO I 1, N T(I)
T(I) A(K)
DO K 1,N DO I 1, N T T
A(K)
  • This transformation creates a copy of a scalar
    variable for each iteration of the loop nest in
    which it occurs by replacing the variable with an
    appropriately dimensioned array.
  • This transformation may eliminate dependences
    while preserving the semantics of the code.
  • It is particularly useful in vectorizing and
    shared memory parallelization.

33
Scalar Expansion
  • Replaces a scalar variable by an array of the
    same size as a loops iteration space.
  • Input perfect nest L of n loops with scalar A on
    left hand side of assignment in L.
  • A is not a formal parameter, induction variable,
    single- statement reduction or element in common
    block.
  • The kth loop in L has lower bound Lk and upper
    bound Uk.
  • Output modified loop where all references to A
    are replaced by a reference to an n-dimensional
    array A(L1U1,,LnUn), followed by assignment
    to A immediately after loop nest
  • A A( U1, U2, , Un)

34
Scalar Expansion
  • If the first occurrence of A in L is a use, we
    modify the above as follows
  • the lower bound in the last dimension is (Ln-1)
    instead.
  • An assignment to A is inserted immediately
    before the loop A(L1,..,Ln-1) A.
  • All uses of A before the first definition in L
    are replaced by A(I1,I2,..,In-1). All other
    occurrences are replaced as previously.

35
Idiom Recognition
Maybe we can handle this after all
DO I L, U E(I) E(I-1) 1 END DO
  • Recognize frequent (small) computations that can
    be tuned to target architecture, and perform
    specific optimizations.
  • Examples
  • dot product,
  • max and min values of array
  • maxloc and minloc of array
  • scalar product
  • matrix multiplication

36
Loop Interchange
  • Powerful transformation to exchange order of two
    loops
  • Reorders execution of statement instances, so
    only legal if this does not change semantics of
    program
  • Has many uses, e.g.
  • Move vectorizable loops to innermost position of
    loop
  • Move parallelizable loops to outermost position
    to increase granularity and decrease
    synchronization

37
Loop Interchange
  • Rearrange execution order of statement instances
    in loop
  • Goals increase use of cache, move dependence
    cycles
  • Legality valid if data dependences are preserved
  • Input a (perfectly) nested loop L of depth n gt
    1, interchange level c lt n
  • Output loop nest L that is obtained from L by
    exchanging the order of the DO loop headers for
    Lc and Lc1
  • Generalizations to imperfectly nested loops, and
    to pairs of non-adjacent loops

38
Loop Interchange Example
  • DO J 1, M
  • DO I 1, N
  • A(I,J1) A( I1, J) B
  • END DO
  • END DO

DO I 1, N ? DO J 1, M
A(2,2) is defined in iteration (1,2) and used in
(2,1)
39
Loop Interchange Example
dir(gt,lt)
da
  • Loop interchange is not legal for this loop
  • DO J 1, M
  • DO I 1, N
  • A(I,J1) A( I1, J) B
  • END DO
  • END DO
  • A(2,2) will get the wrong value

DO I 1, N DO J 1, M
A(2,2) is defined in iteration (2,1) and used in
(1,2)
40
Loop Interchange
  • The direction vector records the loop levels
    associated with each dependence.
  • After interchanging a pair of adjacent loops, the
    new direction vector is formed by exchanging the
    corresponding entries in the direction vector.
  • If loop interchanges swaps the references
    involved in a dependence, then it will change the
    semantics of the program.
  • A dependence with direction vector (lt,gt) would
    become (gt,lt) if the loops are swapped. But this
    does reverse the access pattern!

41
Loop Interchange
  • So we must test whether there are direction
    vectors of the form (lt,gt) involving the loop
    levels that we want to exchange.
  • If not we can apply the transformation.
  • If there are such direction vectors, the
    transformation is illegal and cannot be applied.

42
Legal Loop Interchange
  • The transformation exchanging loops c, c1 does
    not modify any loop dependences at levels other
    than c, c1.
  • If dir(i,i) is (c-1,lt,,,..,) then in the
    modified loop this will become (c,lt,,..,). If
    dir(i,i) is ( c-1, lt, lt,,,) it remains
    unchanged.
  • Thus if S(i) d k S(i) for any 1 k lt c or
    c1 lt k n, or if k then S(i) d k
    S(i) in the modified loop.
  • If S(i) d c1 S(i), then in the new loop S(j)
    d c S(j)

If a loop at some level p is not involved in any
loop-carried dependences, then it may be moved
inward to any other level.
43
Example
  • DO J 1, 100
    DO K 1, 100
  • DO I 1, 100
    DO I 1, 100
  • DO K 1, 100
    DO J 1, 100
  • C(I,J) C(I,J) A(I,K) B(K,J)
    C(I,J) C(I,J) A(I,K) B(K,J)
  • END DO
    END DO
  • END DO
    END DO
  • END DO
    END DO
  • Version on left has true dependence in innermost
    loop.
  • In version on right, loop level 1 has been
    interchanged with loop level 3. This is legal
    because the level 1 loop has no dependences.

44
Example
Here, there is a dependence at the innermost level
  • DO I 1, L
  • DO J 1, M
  • C(I,J) 0.0
  • DO K 1, N
  • C(I,J) C(I,J) A(I,K) B(K,J)
  • END DO
  • END DO

dt
dt
There are many ways to vectorize matrix
multiplication
45
Example
  • DO I 1, L
  • DO J 1, M
  • C(I,J) 0.0
  • ENDDO
  • ENDDO
  • DO K 1, N
  • DO I 1, L
  • DO J 1, M
  • C(I,J) C(I,J) A(I,K) B(K,J)
  • END DO END DO
  • END DO

Apply loop distribution and then loop interchange
to move dependence to outermost level
46
Example
  • C(1L,1M) 0.0
  • DO K 1, N
  • C(1L, 1M) C(1L, 1M) A(1L,K)
    B(K,1M)
  • END DO
  • We now have vector code, although this may not be
    the most suitable form for many machines
  • It may be more appropriate to vectorize in one
    dimension only and to apply strip mining to it

47
Example
  • DO J 1, M
  • C(1L, J ) 0.0
  • DO K 1, N
  • C(1L, J) C(1L, J) A(1L,K)
    B(K,J)
  • END DO
  • END DO
  • is vectorized in a single dimension

48
Example
  • DO J 1, M, 32
  • DO J J, J31
  • C(1L, J ) 0.0
  • DO K 1, N
  • C(1L, J) C(1L, J) A(1L,K)
    B(K,J)
  • END DO
  • END DO
  • breaks up work into vectors of length 32

49
Example
Here, the problem is that the updates use a
temporary scalar, T. We cannot apply scalar
forward substitution.
  • DO I 1, L
  • DO J 1, M
  • T 0.0
  • DO K 1, N
  • T T A(I,K) B(K,J)
  • END DO
  • C(I,J) T
  • END DO

50
Scalar Expansion
Scalar expansion generates an array that replaces
the scalar T
  • DO I 1, L
  • DO J 1, M
  • T ( I, J) 0.0
  • DO K 1, N
  • T(I,J) T(I,J) A(I,K)
    B(K,J)
  • END DO
  • C(I,J) T (I,J)
  • END DOs

Unfortunately, this is very wasteful of memory!
51
Strip Mining
  • DO I 1, L
  • DO J 1, M, 32
  • DO J J, J 31
  • T ( I, J ) 0.0
  • DO K 1, N
  • T(I, J ) T(I, J )
    A(I,K) B(K,J)
  • END DO
  • C( I,J) T ( I, J )
  • END DO
  • END DO
  • END DO

Strip mining creates vectors with length equal to
that of the vector registers (here, we use 32)
52
Software Pipelining
  • Recall this overlaps execution of multiple
    iterations of a loop nest
  • To exploit multiple hardware resources
    concurrently
  • Requires data dependence testing to determine
    legality
  • Some transformations may improve results
  • Loop unrolling may increase workload in an
    iteration to enable more efficient schedule
  • For large loops, loop distribution may reduce the
    amount of data used in an iteration to avoid
    register spills
Write a Comment
User Comments (0)
About PowerShow.com