Generating Truly Optimal Code Using a Metaprogramming Library - PowerPoint PPT Presentation

About This Presentation

Title:

Generating Truly Optimal Code Using a Metaprogramming Library

Description:

... greet(char [] greeting) return `writefln('` ~ greeting ~`, world!');`; void ... Use the GPU in modern video cards to perform massively parallel calculations. ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 20

Provided by: donclu

Category:

more less

Transcript and Presenter's Notes

Title: Generating Truly Optimal Code Using a Metaprogramming Library

1
Generating Truly Optimal CodeUsing a
Metaprogramming Library

Don Clugston
First D Programming Conference, 24 August 2007

2
String mixins in D undercooked, but very tasty
char greet(char greeting) return
writefln( greeting , world!) void
main() mixin( greet( Hello ) )

Compiles to
Vindicates built-in string operations

void main() writefln( Hello, world! )
3
The Challenge

Fortran BLAS (a standard set of highly optimised
routines). The crucial functions are coded in
asm.
y a x
But BLAS is limited nothing for simple things
x y - z
a r0.3 g0.5 b0.2

void DAXPY(double y, double x, double a)
for (i 0 i lt y.length i)
yi xi a
4
Operating overloading

Gives ideal syntax, always works
Cant operate on built-in types
Inefficient because
Creates unnecessary temporaries.
Multiple loops, eg abcd ?
Somehow, we need to get the expression inside the
for loop!

double temp1 new double, temp2 new
double for(int i0 iltb.length i)
temp1i bi ci for(int i0,
ilttemp1.length i) temp2i temp1i
di a temp2
5
The Wizard Solution Expression Templates (eg,
Blitz)

Overloaded operators dont do the calculation
instead, they record the operation as a proxy
type, creating a syntax tree.
Example (ab)/(c-d)
Need a good optimiser.
Works in D as well as C. BUT we are fighting
the compiler!

DVExprltDVBinExprOpltDVExprlt DVBinExprOpltDVeciterT
, DVeciterT, DApAddgtgt, DVExprltDVBinExprOplt
DVeciterT, DVeciterT, DApSubtractgtgt,
DApDividegtgt
6
Representing the Syntax Tree in D

In D, any expression can be represented in a
single template.
Represent types and values in a tuple. Represent
expression in a char . A..Z correspond to
T0..T25.
eg
Note that A appears twice in the expression
(operator overloading cant represent that).

void vectorOperation(char expression, T)(T
values)
vectorOperation!(A(BC)/(AD))(x, y, z, u, v)
7
Finding the vectors in a tuple

Its a vector if you can index it.
Imperfection cant index tuple in CTFE.
Workaround create array of results.
Usage
if ( isVector!(Tuple)i)

template isVector(T...) static if (T.length
0) const bool isVector else
static if( is( typeof(T00) ) ) const
bool isVector true isVector!(T1..)
else const bool isVector false
isVector!(T1..)
8
Metaprogramming For Muggles
char muggle (char expr, Values...)()
char code "for (int i0 iltvalues0.length
i) " foreach(c expr) if (c gt 'A'
c lt 'Z) // A-Z become tuple members.
code "values" itoa(c-'A') ""
// add i if it was a vector
if (isVector!(Values)c-'A') code "i"
else code c // Everything else
is unchanged return code "
template VEC(char expr) void
VEC(Values...)(Values values) mixin(
muggle!(expr, Values) )

USAGE
double firstvec, secondvec, thirdvec
VEC!("AB(CAD)")(firstvec, secondvec,
thirdvec, 25.7)

9
Trivial enhancements

Ensure all vectors are the same length.
Assert no aliasing (vectors dont overlap).
Equalize with hand-coded asm BLAS routines.

foreach(int i, bool b isVector!(Values)1..)
if (b) code assert(values
atoi(i) .length values0.length)
static if ( expr ABC is( Values0
double ) is( Values1 double )
is ( Values2 double ) ) return
DAXPY(values0.length, values0.ptr,
values1.ptr, values2)
10
Asm code via perturbation

Its hard to determine the optimal asm for an
algorithm, much easier to modify existing code.
Begin with Agner Foggs optimal asm code for
DAXPY. Use same loop design and register
allocation strategy.
Ignore difficult cases fallback to D code.

11
X87 (stack-based)

Convert the infix expression into postfix. Split
into and .
Swap operands to avoid FMUL latency.
A B - C D ? A (AB) - (CD)
? C D A B - A
Avoid gaps in the instruction set
Eg, fewer instructions for 80-bit reals, so load
them first whenever possible.

12
X87 code generation

Directly convert postfix to inline asm.

VEC!("CB(AD)")( 2213.3, vec1, floatvec,
vec2) // Postfix BADCC L1 fld double
ptr EAX 8ESI //B fld double ptr EAX
8ESI //A fadd double ptr EDX 8ESI
//D fmulp ST(1), ST // fadd float ptr
ECX 4ESI //C fxch ST(1), ST fstp
float ptr ECX 4ESI - 4 // C L2 inc
ESI jnz L1
13
SSE/SSE2 (register-based)

Cant do mixed-precision operations.
Unroll loop by 2 or 4, to take advantage of SIMD.
Instruction scheduling is less critical, but
register allocation is more complicated than for
x87.

14
GPGPU

Use the GPU in modern video cards to perform
massively parallel calculations.
Uses OpenGL or DirectX calls, instead of inline
asm.
Full of hacks (pretend your data is a texture!)
but a rational API should emerge soon.
This should NOT be built into a compiler!

15
Adding a front end

Operator overloading
Same limitations as before
Mixins eg, mixin(blade(firstvecsecondvec2.38
))
clumsy syntax BUT
Can detect aliases
Allows better error messages
Can unroll small loops inline
Closer to proposed macro syntax

16
Front end using mixins

Lex first second 2.38 ? ABC.
Determine types, resolve aliases, convert
constants to literals.
Determine precedence and associativity
Perform constant folding
We can do most of this using mixins
Compiler help is most required for 4
__traits could help

17
Determining types
char getSymbolTable(char symbols)
char result "" for(int i0
iltsymbols.length i) if (igt0) result
"," result "typeof(" symbolsi
).stringof, symbolsi
.stringof result "" return
result

When mixed in, this creates an array2 of
string literals.
0 is the type, 1 is the value

18
Determining precedence
class AST(char expr) alias expr text
AST!("(" text T.text ")") opAdd(T)(T
x) return null AST!("(" text
T.text ")") opMul(T)(T x) return null
AST!( text "(" T.text ") )
opIndex(T)(T x) return null char
getPrecedence(char expr) char code
"typeof(" for(int i0 iltexpr.length
i) if (exprigt'A'
exprilt'Z') code
"(cast(AST!(" expri "))(null))"
else code expri return code
").text" mixin(getPrecedence(ABCD) ) ?
A((BC)D)
19
Conclusion

Implementation and syntactic issues remain
Syntax for runtime and compile-time reflection
Macros, and an extended __traits syntax should
help.
How to clean up mixin(), yet retain its power?
Yet perfectly optimal code is already possible.
Libraries can perform optimisations previously
required a compiler back-end.

Write a Comment

User Comments (0)