Title: Reducing number of operations: The joy of algebraic transformations
1Reducing number of operationsThe joy of
algebraic transformations
- CS498DHP Program Optimization
2Number of operations and execution time
- Fewer number of operations does not necessarily
mean shorter execution times. - Because of scheduling in a parallel environment.
- Because of locality.
- Because of communication in a parallel program.
- Nevertheless, although it has to be applied
carefully, reducing the number of operations is
one of the important optimizations. - In this presentation, we discuss transformation
to reduce the number of operations or reduce the
length of scheduling in an idealized parallel
environment where communication costs are zero.
3Scheduling
- Consider the expression tree
- It can be shortened by applying
- Associativity and commutativity
ahb(cgdef) or - Associativity, commutativity and distributivity
ahbcbgbdef. - The second expression is the sortest of the
three. This means that with enough resources the
third expression is the fastest although is has
the most operations.
4Locality
- Consider
- do i1.n
- c(i) a(i)b(i)a(i)/b(i)
- end do
-
- do i1,n
- x(i) (a(i)b(i))t(i)a(i)/b(i)
- end do
do i1,n d(i) a(i)/b(i) c(i)
a(i)b(i)d(i) end do do i1,n x(i)
(a(i)b(i))t(i)d(i) end do
- The sequence on the right executes fewer
operations, but, if n is large enough, it also
incurs in more cache misses. (We assume that t
is computed between the two loops so that they
cannot be fused.)
5Communication in parallel programs
cobegin do i1,n a(i) .. end do //
do i1,n a(i) .. end do
coend
- Consider
- cobegin
-
- do i1,n
- a(i) ..
- end do
- send a(1n)
-
- //
-
- receive a(1n)
-
- coend
- The sequence on the right executes more operation
s, but it would execute faster if the send
operation is expensive.
6Approaches to reducing cost of computation
- Eliminate (syntactically) redundant computations.
- Apply algebraic transformations to reduce the
number of operations. - Decompose sequential computations for parallel
execution. - Apply algebraic transformations to reduce the
height of expressions trees and thus reduce
execution time in a parallel environment.
7Elimination of redundant computations
- Many of the transformations were discussed in the
context of compiler transformations. - Common subexpression elimination
- Loop invariant removal
- Elimination of redundant counters
- Loop unrolling (not discussed, but should have).
It eliminates bookkeeping operations.
8- However, compilers will not eliminate all
redundant computations. Here is an example where
user intervention is needed - The following sequence
- do i1,n
- s a(i)s
- end do
-
- do i1,n-1
- t a(i)t
- end do
- t
9- May be replaced by
- do i1,n-1
- t a(i)t
- end do
- sta(n)
-
- t
-
- This transformation is not usually done by
compilers.
10- Another example, from C, is the loop
- for (i 0 i lt n i)
-
- for (j 0 j lt n j)
-
- ai,j0
-
-
- Which, if a is n n, can be transformed into
the loop below that has fewer bookkeeping
operations. - ba
- for (i 0 i lt nn i)
-
- b0 b
-
11Applying algebraic transformations to reduce the
number of operations
- For example, the expressions a(bc)(ba)dae
can be transformed into (ab)(cd)ae by
distributivity and then by associativity and
distributivity into a(b(cd)e). - Notice that associativity has to be applied with
care. For example, suppose we are operating on
floating point values and that x is very much
larger than y and z-x. Then (yx)z may give 0
as a result, while y(xz) gives y as an answer.
12- The application of algebraic rules can be very
sophisticated. Consider the computation of xn. A
naïve implementation would require n-1
multiplications. - However, if we represent n in binary as
nb02(b12(b2 )) and notice that xnxb0
(xb12(b2 ))2, the number of multiplications
can be reduced to O(log n).
13- function power(x,n) (assume ngt0)
- if n1 then return x
- if n21 then return xpower(x,n-1)
- else xpower(x,n/2) return xx
14Horners rule
- A polynomial
- A(x) a0 a1x a2x² a3x³ ...
- may be written as
- A(x) a0 x(a1 x(a2 x(a3 ...))).
- As a result, a polynomial may be evaluated at a
point x', that is A(x') computed, in T(n) time
using Horner's rule. That is, repeated
multiplications and additions, rather than the
naive methods of raising x to powers, multiplying
by the coefficient, and accumulating.
15Conventional matrix multiplication
Asymptotic complexity 2n3 operationsEach
recursion step (blocked version) 8
multiplications, 4 additions
16Strassens Algorithm
Asymptotic complexity O(nlog27) O(n2.8)
operationsEach recursion step 7
multiplications, 18 additions/subtractions
Asymptotic complexity is solution of
T(n)7T(n/2)18(n/2)2
17Winograd
Asymptotic complexity O(n2.8..)operationsEach
recursion step 7 multiplications, 15
additions/subtractions
18Parallel matrix multiplication
- Parallel matrix multiplication can be
accomplished without redundant operations. - First observe that the time to compute a sum of n
elements, given enough resources, is - .
19Time
20Time
21- With sufficient replication and computational
resources matrix multiplication can take just one
multiplication step and additions
22Copying can also be done in logarithmic steps
23Parallelism and redundancy
- Algebra rules can be applied to reduce tree
height. - In some cases, the height of the tree is reduced
at the expense of an increase in the number of
operations
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29Parallel Prefix
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Redundancy in parallel sorting.Sorting networks.
39Comparator (2-sorter)
outputs
inputs
min(x, y)
x
max(x, y)
y
40Comparison Network
n / 2 comparisons per stage
41Sorting Networks
inputs
outputs
1
0
0
0
0
0
Sorting Network
1
0
0
1
0
1
1
1
1
1
42Insertion Sort Network
inputs
outputs
depth 2n 3
43 comparator stages comparators
Odd-even transposition sort O(n) O(n2)
Bubblesort O(n) O(n2)
Bitonic sort O(log(n)2) O(nlog(n)2)
Odd-even mergesort O(log(n)2) O(nlog(n)2)
Shellsort O(log(n)2) O(nlog(n)2)