Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization

Description:

Chains of recurrences algebra and associated algorithms for the GCC and Polaris compilers ... After IV substitution (IVS) (note the affine indexes) Example loop ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 47
Provided by: rober473
Category:

less

Transcript and Presenter's Notes

Title: Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization


1
Array Dependence Analysis with the Chains of
Recurrences Framework for Loop Optimization
  • Robert van Engelen
  • Florida State University

Also thanks to J. Birch, Y. Shou, and K. Gallivan
2
Outline
  • Motivation
  • Restructuring compilers
  • Chains of recurrences algebra and associated
    algorithms for the GCC and Polaris compilers
  • Nonlinear array dependence testing for loop
    restructuring and vectorization
  • Experimental results
  • Conclusions

3
Motivation
  • Intel CTO the increased power requirements of
    newer chips will lead to CPUs that are hotter
    than the surface of the sun by 2010
  • Enter multi-core CPUs
  • Increase the overall system speed by adding CPU
    cores
  • Speed up multi-threaded applications
  • Can effectively lower the power consumption
  • Enter (more?) multi-media extensions
  • Vector-like instruction sets MMX, SSE, AltiVec
  • Speed up multi-media codes, such as JPEG, MPEG

4
Code Optimization by Hand or Automatic?
  • Rewriting applications by hand to exploit
    parallelism is doable, if
  • Tasks can be identified that run independently,
    such as a Web browsers rendering and
    communications tasks
  • Course-grain parallelism tasks must have
    sufficient work
  • Rewriting applications by hand to exploit lots of
    fine-grain parallelism is not doable
  • Thousands of read-after-write (RAW),
    write-after-read (WAR), and write-after-write
    (WAW), data dependences must be analyzed

5
Restructuring Compilers
  • A restructuring compiler typically applies
    source-code transformations automatically to meet
    various performance enhancement criteria
  • Exploit parallelism in loops by reordering the
    loop structure to run loop iterations in parallel
  • Find small loops to replace with vector
    instructions
  • Optimize data locality by reordering code to
    change memory access order and cache
  • All code changes are safe as long as RAW, WAR,
    and WAW data dependences are preserved!

6
Example Loop Fission
S1 DO I 1, 10S2 DO J 1, 10S3 A(I,J)
B(I,J) C(I,J)S4 D(I,J) A(I,J-1) 2.0S5
ENDDO S6 ENDDO
  • Loop fission splits a single loop into multiple
    loops
  • Allows vectorization and parallelization of the
    new loops when original loop was sequential
  • Loop fission must preserve all dependence
    relations of the original loop

S3 ?(,lt) S4
S1 DO I 1, 10S2 DO J 1, 10S3 A(I,J)
B(I,J) C(I,J)Sx ENDDO Sy DO J 1, 10S4
D(I,J) A(I,J-1) 2.0S5 ENDDO S6 ENDDO
S3 ?(,lt) S4
S1 PARALLEL DO I 1, 10S3 A(I,110)B(I,110)
C(I,110)S4 D(I,110)A(I,09) 2.0S6 ENDDO
S3 ?(,lt) S4
7
Loop Fission Algorithm
S1 DO I 1, 10S2 A(I) A(I) B(I-1)S3
B(I) C(I-1)X ZS4 C(I) 1/B(I)S5 D(I)
sqrt(C(I))S6 ENDDO
  • Compute the acyclic condensation of the
    dependence graph to find a legal order of the
    loops

S3 ?(lt) S2S4 ?(lt) S3 S3 ?() S4S4 ?() S5
S2
S1 DO I 1, 10S3 B(I) C(I-1)X ZS4
C(I) 1/B(I)Sx ENDDO S2 A(110) A(110)
B(09)S5 D(110) sqrt(C(110))
1
S3 S4
S3
0
1
S2
S5
S4
0
Acyclic condensation
S5
Dependence graph
8
Example Loop Interchange
S1 DO I 1, NS2 DO J 1, MS3 A(I,J)
A(I,J-1) B(I,J)S4 ENDDOS5 ENDDO
  • Changes the loop nesting order
  • Allows vectorization of an outer loop and more
    effective parallelization of an inner loop
  • Can be used to improve spatial locality
  • Loop interchange must preserve all dependence
    relations of the original loop

S3 ?(,lt) S3
S2 DO J 1, MS1 DO I 1, NS3 A(I,J)
A(I,J-1) B(I,J)S4 ENDDOS5 ENDDO
S3 ?(lt,) S3
S2 DO J 1, MS3 A(1N,J)A(1N,J-1)B(1N,J)S
5 ENDDO
S3 ?(lt,) S3
9
Loop Interchange Algorithm
S1 DO I 1, NS2 DO J 1, MS3 DO K 1,
LS4 A(I1,J1,K) A(I,J,K)
A(I,J1,K1)S5 ENDDOS6 ENDDOS7
ENDDO
  • Compute the direction matrix and find which
    columns (and therefore which loops) can be
    permuted without violating dependence relations
    in the original loop nest

S4 ?(lt,lt,) S4S4 ?(lt,,gt) S4
lt lt lt gt
lt lt gt lt
lt lt lt gt
Invalid
Direction matrix
lt lt lt gt
lt lt lt gt
Valid
10
Complications
  • Loop restructuring is complicated by
  • The presence of several induction variables
  • Nonlinear and symbolic array index expressions
  • The use of pointer arithmetic instead of arrays
    in C
  • Non-unit loop strides and unstructured loops
  • Control flow
  • Need loop normalization and preprocessing
  • Apply induction variable substitution
  • Convert pointer dereferences to array accesses
  • Normalize the loop iteration space

11
Induction Variable Substitution
Example loop After IV substitution (IVS) (note the affine indexes) After parallelization
I 0 J 1 while (IltN) I I1 AJ J J2 K 2I AK endwhile for i0 to N-1 S1 A2i1 S2 A2i2 endfor forall (i0,N-1) A2i1 A2i2 endforall
Dep test
IVS
GCD test to solve dependence equation 2id - 2iu
-1 Since 2 does not divide 1 there is no data
dependence.

A
W R W R W R
A2i1
A2i2
12
IV Recognitionon SSA Forms
Cytron91, Wolfe92
I1 3M1 0do I2 ?(I1,I3) J1 ?(?,J3)
K1 ?(?,K2) L1 ?(?,L2) M2 ?(M1,M3) J2
3 I3 I21 L2 M21 M3 L22 J3
I3J2 K2 2J3while ()
Spanningtree
I2(i) 3i J1(i) 7iL2(i) 13i
K1(i) 142iM2(i) 3i
13
Symbolic Differencing
Haghighat95
Use abstract interpretation to evaluate loop
iterations and construct symbolic difference
table of the IV values
do x xz y z1 z y1while ()
Iteration x x x y y z z
1 xz diff z1 diff z diff
2 x2z2 z2 diff z3 2 z2 2
3 x3z6 z4 2 z5 2 z4 2
x(i) x0 z0i (i2-i) y(i) z0 2i
1 z(i) z0 2i
14
Pointer-to-Array Conversion
vanEngelen01, Franke01
f 2lsp 2for (i 2 i lt 5 i) f
f-2 for (j 1 j lt i j, f--) f
f-2-2(lsp)f-1 f - 2(lsp) f
i lsp 2
for (i 0 i lt 3 i) fi2 fi
for (j 0 j lt i j) fi-j2 fi-j-
2lsp2i2fi-j1 f1 -
2lsp2i2
Lsp_az speech codec segmentfrom ETSI with
pointer updates.
Lsp_az speech codec segmentafter
pointer-to-array conversion.Note that all array
indexexpressions are affine.
15
Control-Flow Issues
  • Conditional array accesses and conditionally
    updated induction variables present problems

do K 3 K KJ if () J K
else J J3 AJ while (JltN)
DO I1,10 IF J J2 ELSE J I
ENDIF A(J) ENDDO
for () if () AI else
AJ
Assume RAW andWAR dependences
Extensive analysisreveals that JJ3
Problem J has nosingle recurrence form
16
Chains of Recurrences for Compiler Optimization
  • Chains of recurrence forms and algebra can be
    used to
  • Detect (non)linear coupled IVs
  • Analyze pointer arithmetic
  • Effectively handle control flow
  • Implement array dependence testing

17
Chains of Recurrences
  • A chain of recurrences (CR) represents a
    polynomial or exponential function or mix
    evaluated over a unit-distance grid Zima92
  • Basic form init, ?, stride

Iteration init, ?, stride f(i) 2i1 1,,2 f(i) 2i 1,,2
i 0 init 1 1
i 1 init ? stride 3 2
i 2 init ? stride ? stride 5 4
i 3 init ? stride ? stride ? stride 7 8
18
Chains of RecurrencesGeneral Formulation
  • The key idea is to represent a non-constant CR
    stride in CR form itself, thereby forming a chain
    of recurrences
  • Example f(i) i2 0, , s(i-1) 0, , 1,
    , 2 where s(i-1) 1, , 2

Iteration init, ?, s(i-1) s(i) 1, , 2 f(i) 0, , s(i-1)
i 0 init 1 0
i 1 init ? s(0) 3 1
i 2 init ? s(0) ? s(1) 5 4
i 3 init ? s(0) ? s(1) ? s(2) 7 9
19
CRs for Expediting Function Evaluations on Grids
  • Suppose f(i) a bi ci2 a, , bc, ,
    2c
  • We have two IVs x and yf(i) x x0, , y
    with x0 as(i) y y0, , 2c with y0 bc
  • Implement loop to update x and y for efficient
    evaluation of f(i) over a unit-distance grid i
    0, , n

s(i)
x ay bcfor i0 to n fi x x xy
y y2cendfor
20
Multi-Dimensional Example
  • Let f(i,j) i2 ij 1
  • Create IV k for f(i,j) in j-loopf(i,j) kj
    pi, , rij with pi i2 1 and ri i
  • Create IVs for pi and ri in i-looppi p0, ,
    qii with p0 1qi q0, , 2i with q0 1ri
    r0, , 1i with r0 0
  • Implement k, p, q, and r ini-j-loop nest

p 1q 1r 0for i 0 to n k p for j
0 to m fi,j k k kr endfor p
pq q q2 r r1endfor
21
CR Construction with the CR Algebra
  • To construct the CR form of a symbolic function
    f(i)
  • Replace i with CR 0,,1
  • Apply CR algebra rewrite rules (selected rules
    shown)
  • Examplef(i) c(ia) c(0, , 1a) ca,
    , 1 ca, , c

x, , y c ? xc, , y
cx, , y ? cx, , cy
x, , y u, , v ? xu, , yv
x, , y u, , v ? xu, , yu, , vvx, , yyv
22
Loop Analysis with CR Forms
vanEngelen01
  • The basic idea
  • Scan the loop to detect IV updates
  • Construct the CR form for each IV using the CR
    algebra

do J JI I I3 P 2P while () J J0, , I J J0, , I0, , 3 I I0, , 3 P P0, , 2
23
Algorithm 1 Find Recurrences
  • Input Loop L with live variable
    informationOutput Set S of recurrence relations
    of IVs
  • Start with set S ?v, v? v is live at loop
    header
  • Search L from bottom to topfor each assignment
    v x of expression x to scalar variable v update
    tuples ?u, y? in S by replacing v in y with
    x

Loop L Step Changes to S ?H, H?, ?I, I?, ?J, J?, ?K, K?
do M 2 L J-H J LM K KMI I I1 while () 54321 S5 ?H, H?, ?I, I1?, ?J, J-H2?, ?K, K2I?S4 ?H, H?, ?I, I1?, ?J, J-HM?, ?K, KMI?S3 ?H, H?, ?I, I1?, ?J, LM?, ?K, KMI?S2 ?H, H?, ?I, I1?, ?J, J?, ?K, KMI?S1 ?H, H?, ?I, I1?, ?J, J?, ?K, K?
24
Algorithm 2 Compute CR Forms
  • Input Set S with recurrence relationsOutput CR
    forms for IVs in S
  • For each relation ?v, x? in S doif x is of the
    form v then v v0 (v is loop invariant) if x
    is of the form v y then v v0, , yif x is
    of the form v y then v v0, , yif x does
    not contain v then v v0, , y (v is wrap
    around)
  • Simplify the CR forms with the CR algebra rewrite
    rules

Recurrence relation in S CR form Simplified CR form
?H, H? ?I, I1? ?J, J-H2? ?K, K2I? H H0 I I0, , 1 J J0, , 2-H K K0, , 2I H H0 I I0, , 1 J J0, , 2-H0 K K0, , 2I0, , 2
25
Algorithm 3 Solve
  • Input CR forms for IVsOutput Closed-form
    solutions for IVs (when possible)
  • For each CR form of v apply the CR inverse
    algebra, assuming loop is normalized for i 0,
    , n
  • Certain exotic mixed non-polynomial and
    non-exponential CR forms may not have closed forms

Loop L Simplified CR form Closed form
do M 2 L J-H J LM K KMI I I1 while () J J0, , 2-H0 K K0, , 2I0, , 2 I I0, , 1 J(i) J0 (2-H0)i K(i) K0 i2 (2I0-1)i I(i) I0 i
26
Example 1
Loop L Step S ?x, x?, ?z, z? CR form Closed form
x 2 z 0 do A(x) A(z) x xz y z1 z y1 while (zltN) 321 S3 ?x, xz?, ?z, z2?S2 ?x, x?, ?z, z2?S1 ?x, x?, ?z, y1? x x0, , z z z0, , 2 x(i) x0 z0i i2-i z(i) z02i
do i0,2N-2 A(ii-i2) A(2i)end do
27
Example 2
DO I1,M DO J1,I ij ij1 ijkl
ijklI-J1 DO KI1,M DO L1,K
ijkl ijkl1 xijklijklxklL
ENDDO ENDDO ijkl ijklijleft
ENDDOENDDO
DO I0,M-1 DO J0,I DO K0,M-I-2 DO
L0,IK1 tmp ijklLI(K(MMM2left
6)/4)J(left(MMM)/2)((IIMM)2(KK3KII
(left1))MII)/42 xijkltmp
xklL1 ENDDO ENDDO ENDDOENDDO
IVS
TRFD code segmentfrom Perfect Benchmarkwith IV
updates
TRFD after aggressiveinduction variable
substitution
28
Example 3 (SSA)
a 1 a0 1while (alt10)
if (a0gt10) goto L2 x a2
L1 a a1 a1 ?(a0, a2)
x0 a1 2
a2 a11
if (a2lt10) goto L1 L2
GCC 4.x uses our approachapplied to SSA
form.Note GCC developers referto CRs as
scalar evolutions
a1 1,,1
29
Example 4 (SSA)
  • x 0 x0 0
    i 1 i0 1while
    (ilt10) if (i0gt10) goto L2 x xi
    L1 x1 ?(x0, x2) i i1
    i1 ?(i0, i2) x2
    x1i1 i2 i11
    if (i2lt10) goto L1
    L2

i1 1,,1x1 0,,i1 0,,1,,1
30
Example 5 (SSA)
j0 0 i0 1 if (i0gt10) goto L2 L1 i1
?(i0, i2) j1 ?(j0, j4) if (!p) goto
L3 j2
j12 goto L4 L3 j3 j13 L4 j4 ?(j2,
j3) i2 i11 if (i2lt10) goto L1 L2
  • j 0
  • i 1
  • while (ilt10)
  • if (p)
  • j j2
  • else
  • j j3
  • i i1

0,,2 lt j1 lt 0,,3
31
Recognizing Mixed Functional Forms and Reductions
Loop L Simplified CR form Factorial
I 1 do F FI I I1 while () F F0, , 1, , 1 I 1, , 1 F F0 i!
Loop L Simplified CR form Reduction
I 0 S 0 do S SAI I I2 while () S 0, , A0, , 2 I 0, , 2 S ? A2i
32
Pointer Access Descriptions of Pointer and Array
References
  • A pointer access description (PAD) vanEngelen01
    is a CR form of a pointer or array reference in a
    loop nest
  • PADs are computed with the CR-based IV algorithms

short a, pint ip afor(i0i)
Loop Code PAD Sequence
ai a, , 1 a0,a1,a2,a3
a2i1 a1, , 2 a1,a3,a5,a7
a(ii-i)/2 a, , 0, , 1 a0,a0,a1,a3
a1ltlti a1, , 1, , 2 a1,a2,a4,a8
p a, , 1 a0,a1,a2,a3
pi a, , 0, , 1 a0,a0,a1,a3
33
CR-Enhanced Array Dependence Testing
  • Basic idea construct dependence equations in CR
    form for both pointer and array accesses
  • Determine the solution intervals by computing the
    value ranges of the equations in CR form
  • If the solution space is empty, there is no
    dependence

34
Example
S

float a, p, q p a q a2n for
(i0 iltn i) t p S p q
q-- t
Dependence equationa, , 1id a2n,
,-1iuConstraints0 lt id lt n-10 lt iu lt n-1
pa, , 1qa2n, , -1
Compute solution intervalLow-2n, , 1iu, ,
1id Low-2n, , 1iu -2n Up-2n, ,
1iu, , 1id Up-2n, , 1iu n-1 Up-2n
2n - 2 -2
Rewrite dependence equationa, , 1id a2n,
, -1iu? a, , 1id - a2n, , -1iu 0?
-2n, , 1iu, , 1id 0
No dependence
35
Determining the Value Range of a CR Form
  • Suppose x(i) x0, , s(i-1) for i 0, , n
  • If s(i-1) gt 0 then x(i) is monotonically
    increasing
  • If s(i-1) lt 0 then x(i) is monotonically
    decreasing
  • If a function is monotonic on its domain, then it
    is trivial to find its exact value range

36
Example Nonlinear and Symbolic Dependence Testing
float a, p, qp q a for (i0 iltn
i) for (j0 jlti j) q p
q
DO i 1, M1 S1 AIN10 ... S2
... A2IK K 2KN ENDDO
S1 AN10, , Ni S2 AK02N, , K0 N2,
, 2i
p a1, , 1, , 1i, , 1j
a(i2i)/2j1q a, , 1i ai
CR range test disprovesdependence whenKN gt 10
and K gt 2
CR dep. test disprovesflow dependence (lt, lt)
37
Results
  • Implemented a CR-enhanced trapezoidal Banerjee
    test
  • Relatively simple test
  • Enhanced with support for nonlinear forms
  • Enhanced with support for conditional flow
  • Construct dependence equations in CR form
  • Implementation based on the Polaris compiler
  • Pros can compare to powerful dependence tests
    such as Omega and Range test
  • Cons Fortran only

38
Additional Independences Filtered over Omega Test
LAPACK
Perf. Benchmark
39
Additional Independences Filtered over Range Test
40
Additional Independences Filtered over OmegaRange
41
Percentage of Conditional IVs w/o Closed Forms in
LAPACK
42
Timing Comparison Perf Bench.
43
Timing Comparison LAPACK
44
Conclusions
  • A CR-based compiler framework has advantages
  • Applicable to CFG, AST, and SSA forms
  • Handles conditional flow
  • Handles nonlinear and symbolic induction variable
    expressions
  • Allows array and pointer-based dependence testing
    to be applied directly to the CR forms without
    induction variable substitution
  • Future work
  • Improve GCC implementation
  • Enhance other dependence tests with CR forms

45
Further Reading
  • Robert van Engelen, Johnnie Birch, Yixin Shou,
    Burt Walsh, and Kyle Gallivan, A Unified
    Framework for Nonlinear Dependence Testing and
    Symbolic Analysis, in the proceedings of the ACM
    International Conference on Supercomputing (ICS),
    2004, pages 106-115.
  • Robert van Engelen, Johnnie Birch, and Kyle
    Gallivan, Array Dependence Testing with the
    Chains of Recurrences Algebra, in the
    proceedings of the IEEE International Workshop on
    Innovative Architectures for Future Generation
    High-Performance Processors and Systems (IWIA),
    January 2004, pages 70-81.
  • Robert van Engelen and Kyle Gallivan, An
    Efficient Algorithm for Pointer-to-Array Access
    Conversion for Compiling and Optimizing DSP
    Applications, in proceedings of the 2001
    International Workshop on Innovative
    Architectures for Future Generation
    High-Performance Processors and Systems (IWIA),
    January 2001, pages 80-89.
  • Robert van Engelen, Efficient Symbolic Analysis
    for Optimizing Compilers, in proceedings of the
    International Conference on Compiler
    Construction, ETAPS 2001, LNCS 2027, pages
    118-132.

46
The End
Write a Comment
User Comments (0)
About PowerShow.com