Generating Truly Optimal Code Using a Metaprogramming Library - PowerPoint PPT Presentation

About This Presentation
Title:

Generating Truly Optimal Code Using a Metaprogramming Library

Description:

... greet(char [] greeting) return `writefln('` ~ greeting ~`, world!');`; void ... Use the GPU in modern video cards to perform massively parallel calculations. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 20
Provided by: donclu
Category:

less

Transcript and Presenter's Notes

Title: Generating Truly Optimal Code Using a Metaprogramming Library


1
Generating Truly Optimal CodeUsing a
Metaprogramming Library
  • Don Clugston
  • First D Programming Conference, 24 August 2007

2
String mixins in D undercooked, but very tasty
char greet(char greeting) return
writefln( greeting , world!) void
main() mixin( greet( Hello ) )
  • Compiles to
  • Vindicates built-in string operations

void main() writefln( Hello, world! )
3
The Challenge
  • Fortran BLAS (a standard set of highly optimised
    routines). The crucial functions are coded in
    asm.
  • y a x
  • But BLAS is limited nothing for simple things
  • x y - z
  • a r0.3 g0.5 b0.2

void DAXPY(double y, double x, double a)
for (i 0 i lt y.length i)
yi xi a
4
Operating overloading
  • Gives ideal syntax, always works
  • Cant operate on built-in types
  • Inefficient because
  • Creates unnecessary temporaries.
  • Multiple loops, eg abcd ?
  • Somehow, we need to get the expression inside the
    for loop!

double temp1 new double, temp2 new
double for(int i0 iltb.length i)
temp1i bi ci for(int i0,
ilttemp1.length i) temp2i temp1i
di a temp2
5
The Wizard Solution Expression Templates (eg,
Blitz)
  • Overloaded operators dont do the calculation
    instead, they record the operation as a proxy
    type, creating a syntax tree.
  • Example (ab)/(c-d)
  • Need a good optimiser.
  • Works in D as well as C. BUT we are fighting
    the compiler!

DVExprltDVBinExprOpltDVExprlt DVBinExprOpltDVeciterT
, DVeciterT, DApAddgtgt, DVExprltDVBinExprOplt
DVeciterT, DVeciterT, DApSubtractgtgt,
DApDividegtgt
6
Representing the Syntax Tree in D
  • In D, any expression can be represented in a
    single template.
  • Represent types and values in a tuple. Represent
    expression in a char . A..Z correspond to
    T0..T25.
  • eg
  • Note that A appears twice in the expression
    (operator overloading cant represent that).

void vectorOperation(char expression, T)(T
values)
vectorOperation!(A(BC)/(AD))(x, y, z, u, v)
7
Finding the vectors in a tuple
  • Its a vector if you can index it.
  • Imperfection cant index tuple in CTFE.
  • Workaround create array of results.
  • Usage
  • if ( isVector!(Tuple)i)

template isVector(T...) static if (T.length
0) const bool isVector else
static if( is( typeof(T00) ) ) const
bool isVector true isVector!(T1..)
else const bool isVector false
isVector!(T1..)
8
Metaprogramming For Muggles
char muggle (char expr, Values...)()
char code "for (int i0 iltvalues0.length
i) " foreach(c expr) if (c gt 'A'
c lt 'Z) // A-Z become tuple members.
code "values" itoa(c-'A') ""
// add i if it was a vector
if (isVector!(Values)c-'A') code "i"
else code c // Everything else
is unchanged return code "
template VEC(char expr) void
VEC(Values...)(Values values) mixin(
muggle!(expr, Values) )
  • USAGE
  • double firstvec, secondvec, thirdvec
  • VEC!("AB(CAD)")(firstvec, secondvec,
    thirdvec, 25.7)

9
Trivial enhancements
  • Ensure all vectors are the same length.
  • Assert no aliasing (vectors dont overlap).
  • Equalize with hand-coded asm BLAS routines.

foreach(int i, bool b isVector!(Values)1..)
if (b) code assert(values
atoi(i) .length values0.length)
static if ( expr ABC is( Values0
double ) is( Values1 double )
is ( Values2 double ) ) return
DAXPY(values0.length, values0.ptr,
values1.ptr, values2)
10
Asm code via perturbation
  • Its hard to determine the optimal asm for an
    algorithm, much easier to modify existing code.
  • Begin with Agner Foggs optimal asm code for
    DAXPY. Use same loop design and register
    allocation strategy.
  • Ignore difficult cases fallback to D code.

11
X87 (stack-based)
  • Convert the infix expression into postfix. Split
    into and .
  • Swap operands to avoid FMUL latency.
  • A B - C D ? A (AB) - (CD)
  • ? C D A B - A
  • Avoid gaps in the instruction set
  • Eg, fewer instructions for 80-bit reals, so load
    them first whenever possible.

12
X87 code generation
  • Directly convert postfix to inline asm.

VEC!("CB(AD)")( 2213.3, vec1, floatvec,
vec2) // Postfix BADCC L1 fld double
ptr EAX 8ESI //B fld double ptr EAX
8ESI //A fadd double ptr EDX 8ESI
//D fmulp ST(1), ST // fadd float ptr
ECX 4ESI //C fxch ST(1), ST fstp
float ptr ECX 4ESI - 4 // C L2 inc
ESI jnz L1
13
SSE/SSE2 (register-based)
  • Cant do mixed-precision operations.
  • Unroll loop by 2 or 4, to take advantage of SIMD.
  • Instruction scheduling is less critical, but
    register allocation is more complicated than for
    x87.

14
GPGPU
  • Use the GPU in modern video cards to perform
    massively parallel calculations.
  • Uses OpenGL or DirectX calls, instead of inline
    asm.
  • Full of hacks (pretend your data is a texture!)
    but a rational API should emerge soon.
  • This should NOT be built into a compiler!

15
Adding a front end
  • Operator overloading
  • Same limitations as before
  • Mixins eg, mixin(blade(firstvecsecondvec2.38
    ))
  • clumsy syntax BUT
  • Can detect aliases
  • Allows better error messages
  • Can unroll small loops inline
  • Closer to proposed macro syntax

16
Front end using mixins
  • Lex first second 2.38 ? ABC.
  • Determine types, resolve aliases, convert
    constants to literals.
  • Determine precedence and associativity
  • Perform constant folding
  • We can do most of this using mixins
  • Compiler help is most required for 4
  • __traits could help

17
Determining types
char getSymbolTable(char symbols)
char result "" for(int i0
iltsymbols.length i) if (igt0) result
"," result "typeof(" symbolsi
).stringof, symbolsi
.stringof result "" return
result
  • When mixed in, this creates an array2 of
    string literals.
  • 0 is the type, 1 is the value

18
Determining precedence
class AST(char expr) alias expr text
AST!("(" text T.text ")") opAdd(T)(T
x) return null AST!("(" text
T.text ")") opMul(T)(T x) return null
AST!( text "(" T.text ") )
opIndex(T)(T x) return null char
getPrecedence(char expr) char code
"typeof(" for(int i0 iltexpr.length
i) if (exprigt'A'
exprilt'Z') code
"(cast(AST!(" expri "))(null))"
else code expri return code
").text" mixin(getPrecedence(ABCD) ) ?
A((BC)D)
19
Conclusion
  • Implementation and syntactic issues remain
  • Syntax for runtime and compile-time reflection
  • Macros, and an extended __traits syntax should
    help.
  • How to clean up mixin(), yet retain its power?
  • Yet perfectly optimal code is already possible.
    Libraries can perform optimisations previously
    required a compiler back-end.
Write a Comment
User Comments (0)
About PowerShow.com