... NSF, Argonne National Lab, a gift from Intel, National Energy Research ... Idea: Replace Sparsity's explicit (BLAS-1-like) register block multiplication...
Reason: Low Flop to memory ratio: 2. Sparse matrix operation ... Makes sense when X does not fit in cache. Rectangular matrices, in particular. Source vector x ...
Dense matrix kernels similar Sparse kernels ... major disadvantage to the dense kernels ... Dense case supports arbitrary register blocks (no added zeros) ...
Po. P1. P2. P3. Po P1 P2 P3. 9/30/09. CS267, Yelick. 4 ... Po P1 P2 P3. 9/30/09. CS267, Yelick. 46. Optimization Opportunities. Send only necessary parts of x ...
dominant programming language used in engineering applications ... To overrule. IMPLICIT NONE. To add additional implicit variable type. IMPLICIT COMPLEX C,X ...
Dimension ranges from a few hundred to over a million. NNZ/row ranges from 1 to a few hundred ... Look over only a certain range of problem dimensions and NNZ/row ...