Title: Linear Algebra on GPUs
1Linear Algebra on GPUs
- Jens Krüger Technische Universität München
2Linear algebra?
- Why are we interested in Linear Algebra?
- It is THE machinery to solve PDEs
- PDEs are at the core of many graphics
applications - Physics based simulation, Animation, Mesh
fairing
3LA on GPUs?
- and why put LA on GPU?
- A perfect couple
- GPUs are fast stream processors,
- and many LA operations are streamable
- which goes hand in hand
- The solution is already on the GPU and ready for
display
4Getting started
Computer graphics applications
GPU as workhorse for numerical computations
Programmable GPUs
5Getting started
Computer graphics applications
GPU as workhorse for numerical computations
Programmable GPUs
6Internal affairs
-
-
- Per-pixel vs. per-vertex operations
- 6 gigapixels/second vs. 0.7 gigavertices/second
- Efficient texture memory cache
- Texture read-write access
- 2D Textures are even better
- 2D RGBA textures really rock
Vector representation
7Representation (cont.)
- Dense Matrix representation
- Treat a dense matrix as a set of column vectors
- Again, store these vectors as 2D textures
8Representation (cont.)
- Banded Sparse Matrix representation
- Treat a banded matrix as a set of diagonal
vectors - Combine opposing vectors to save space
Matrix
N
9Operations 1
- Vector-Vector Operations
- Reduced to 2D texture operations
- Coded in pixel shaders
- Example Vector1 Vector2 ? Vector3
Render Texture
Static quad
Vertex Shader
Pixel Shader
10Operations 2 (reduce)
- Reduce operation for scalar products
Reduce m x n region in fragment shader
11The single float on GPUs
- Some operations generate single float values e.g.
reduce - Read-back to main-mem is slow
- ? Keep single floats on the GPU as 1x1 textures
...
12Operations (cont.)
- Matrix-Vector Operations
- Split it up into Vector Vector operations
13Operations
- In depth example Vector / Banded-Matrix
Multiplication
A
x
b
14Example (cont.)
- Vector / Banded-Matrix Multiplication
A
b
A
x
b
15Example (cont.)
- Compute the result in 2 Passes
A
Pass 2
Pass 1
b
x
b
16Building a Framework
- Presented so far
- Representations on the GPU for
- Single float values
- Vectors
- Matrices
- Dense
- Banded
- Random sparse (see SIGGRAPH 03)
- Operations on these representations
- Add, multiply, reduce,
- Upload, download, clear, clone,
17Framework Classes (UML)
18Framework Example CG
- Encapsulate into Classes for more complex
algorithms - Example use Conjugate Gradient Method, complete
source
void clCGSolversolveInit() Matrix-gtmatrixVect
orOp(CL_SUB,X,B,R) // R Ax-b R-gtmultiply(-1)
// R -R R-gtclone(P) // P R
R-gtreduceAdd(R, Rho) // rho
sum(RR) void clCGSolversolveIteration()
Matrix-gtmatrixVectorOp(CL_NULL,P,NULL,Q) // Q
Ap P-gtreduceAdd(Q,Temp) // temp
sum(PQ) Rho-gtdiv(Temp,Alpha) // alpha
rho/temp X-gtaddVector(P,X,1,Alpha) // X X
alphaP R-gtsubtractVector(Q,R,1,Alpha) // R R
- alphaQ R-gtreduceAdd(R,NewRho) // newrho
sum(RR) NewRho-gtdivZ(Rho,Beta) // beta
newrho/rho R-gtaddVector(P,P,1,Beta) // P
RbetaP clFloat temp tempNewRho NewRhoRho
Rhotemp // swap rho and newrho
pointers void clCGSolversolve(int maxI)
solveInit() for (int i 0ilt maxIi)
solveIteration() int clCGSolversolve(float
rhoTresh, int maxI) solveInit()
Rho-gtclone(NewRho) for (int i 0ilt maxI
NewRho.getData() gt rhoTreshi)
solveIteration() return i
19Example 12D Waves (explicit)
- Finite difference discretization
- You could write a custom shader for this filter
- Think about this as a matrix-vector product
202D Waves (cont.)
- One Time Matrix Initialization
for (isYiltsXsYi) datai 1
// setup diagonal-sY matrix-gtgetRow(sX(sY-1))
-gtsetData(data) for (i0iltsXsYi) datai
(isX) ? 1 0 // setup diagonal-1 matrix-gt
getRow(sXsY-1)-gtsetData(data) for
(i0iltsXsYi) datai -4
// setup diagonal matrix-gtgetRow(sXsY)-gtsetData(d
ata) for (i0iltsXsYi) datai ((i1)sX)
? 1 0 // setup diagonal1 matrix-gtgetRow(sXs
Y1)-gtsetData(data) for (i0iltsX(sY-1)i)
datai 1 // setup
diagonalsY matrix-gtgetRow(sX(sY1))-gtsetData(dat
a)
Per Frame Iteration
clMatrix-gtmatrixVectorOp(CL_NULL,clLast,NULL,clCur
rent) // curr matrixlast clLast-gtcopyVector(c
lCurrent) // save for
next iteration clCurrent-gtunpack(cluCurrent)
// unpack for
rendering renderHF(cluCurrent-gtm_pVectorTexture)
// render as heightfield
21Example 22D Waves (implicit)
- Key Idea
- Use different time discretization (e.g. Crank
Nicholson) - Results in system of linear equations
- Iterative solution using CG
222D Waves (cont.)
- One Time Matrix Initialization
for (isYiltsXsYi) datai -alpha
// setup diagonal-sY matrix-gtgetRow(sX(sY-1
))-gtsetData(data) for (i0iltsXsYi) datai
(isX) ? - alpha 0 // setup
diagonal-1 matrix-gtgetRow(sXsY-1)-gtsetData(data)
for (i0iltsXsYi) datai 4alpha1
// setup diagonal matrix-gtgetRow(sXsY)-gtse
tData(data) for (i0iltsXsYi) datai
((i1)sX) ? -alpha0 // setup
diagonal1 matrix-gtgetRow(sXsY1)-gtsetData(data)
for (i0iltsX(sY-1)i) datai -alpha
// setup diagonalsY matrix-gtgetRow(sX(sY
1))-gtsetData(data)
Per Frame Iteration
cluRHS-gtcomputeRHS(cluLast, cluCurrent) //
generate c(i,j,t) clRHS-gtrepack(cluRHS)
// encode into RGBA iSteps
pCGSolver-gtsolve(iMaxSteps) // solve using
CG cluLast-gtcopyVector(cluCurrent) // save
for next iteration clNext-gtunpack(cluCurrent)
// unpack for rendering renderHF(cluCur
rent-gtm_pVectorTexture) // render as heightfield
23Demos
24The End
For more infos, browse to http//wwwcg.in.tum.de/
GPU http//www.gpgpu.org