Linear Algebra on GPUs - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Linear Algebra on GPUs

Description:

Why are we interested in Linear Algebra? It is THE machinery to solve PDEs. PDEs are at the core of ... clCurrent- unpack(cluCurrent); // unpack for rendering ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 25

Provided by: steve1631

Category:

more less

Transcript and Presenter's Notes

Title: Linear Algebra on GPUs

1
Linear Algebra on GPUs

Jens Krüger Technische Universität München

2
Linear algebra?

Why are we interested in Linear Algebra?
It is THE machinery to solve PDEs
PDEs are at the core of many graphics
applications
Physics based simulation, Animation, Mesh
fairing

3
LA on GPUs?

and why put LA on GPU?
A perfect couple
GPUs are fast stream processors,
and many LA operations are streamable
which goes hand in hand
The solution is already on the GPU and ready for
display

4
Getting started
Computer graphics applications
GPU as workhorse for numerical computations

Programmable GPUs
5
Getting started
Computer graphics applications
GPU as workhorse for numerical computations

Programmable GPUs
6
Internal affairs

Per-pixel vs. per-vertex operations
6 gigapixels/second vs. 0.7 gigavertices/second
Efficient texture memory cache
Texture read-write access

2D Textures are even better

2D RGBA textures really rock

Textures best we can do

Vector representation
7
Representation (cont.)

Dense Matrix representation
Treat a dense matrix as a set of column vectors
Again, store these vectors as 2D textures

8
Representation (cont.)

Banded Sparse Matrix representation
Treat a banded matrix as a set of diagonal
vectors
Combine opposing vectors to save space

Matrix
N
9
Operations 1

Vector-Vector Operations
Reduced to 2D texture operations
Coded in pixel shaders
Example Vector1 Vector2 ? Vector3

Render Texture
Static quad
Vertex Shader
Pixel Shader
10
Operations 2 (reduce)

Reduce operation for scalar products

Reduce m x n region in fragment shader
11
The single float on GPUs

Some operations generate single float values e.g.
reduce
Read-back to main-mem is slow
? Keep single floats on the GPU as 1x1 textures

...
12
Operations (cont.)

Matrix-Vector Operations
Split it up into Vector Vector operations

13
Operations

In depth example Vector / Banded-Matrix
Multiplication

A
x
b

14
Example (cont.)

Vector / Banded-Matrix Multiplication

A
b
A
x
b

15
Example (cont.)

Compute the result in 2 Passes

A
Pass 2
Pass 1
b
x
b
16
Building a Framework

Presented so far
Representations on the GPU for
Single float values
Vectors
Matrices
Dense
Banded
Random sparse (see SIGGRAPH 03)
Operations on these representations
Add, multiply, reduce,
Upload, download, clear, clone,

17
Framework Classes (UML)
18
Framework Example CG

Encapsulate into Classes for more complex
algorithms
Example use Conjugate Gradient Method, complete
source

void clCGSolversolveInit() Matrix-gtmatrixVect
orOp(CL_SUB,X,B,R) // R Ax-b R-gtmultiply(-1)
// R -R R-gtclone(P) // P R
R-gtreduceAdd(R, Rho) // rho
sum(RR) void clCGSolversolveIteration()
Matrix-gtmatrixVectorOp(CL_NULL,P,NULL,Q) // Q
Ap P-gtreduceAdd(Q,Temp) // temp
sum(PQ) Rho-gtdiv(Temp,Alpha) // alpha
rho/temp X-gtaddVector(P,X,1,Alpha) // X X
alphaP R-gtsubtractVector(Q,R,1,Alpha) // R R
- alphaQ R-gtreduceAdd(R,NewRho) // newrho
sum(RR) NewRho-gtdivZ(Rho,Beta) // beta
newrho/rho R-gtaddVector(P,P,1,Beta) // P
RbetaP clFloat temp tempNewRho NewRhoRho
Rhotemp // swap rho and newrho
pointers void clCGSolversolve(int maxI)
solveInit() for (int i 0ilt maxIi)
solveIteration() int clCGSolversolve(float
rhoTresh, int maxI) solveInit()
Rho-gtclone(NewRho) for (int i 0ilt maxI
NewRho.getData() gt rhoTreshi)
solveIteration() return i
19
Example 12D Waves (explicit)

Finite difference discretization
You could write a custom shader for this filter
Think about this as a matrix-vector product

20
2D Waves (cont.)

One Time Matrix Initialization

for (isYiltsXsYi) datai 1
// setup diagonal-sY matrix-gtgetRow(sX(sY-1))
-gtsetData(data) for (i0iltsXsYi) datai
(isX) ? 1 0 // setup diagonal-1 matrix-gt
getRow(sXsY-1)-gtsetData(data) for
(i0iltsXsYi) datai -4
// setup diagonal matrix-gtgetRow(sXsY)-gtsetData(d
ata) for (i0iltsXsYi) datai ((i1)sX)
? 1 0 // setup diagonal1 matrix-gtgetRow(sXs
Y1)-gtsetData(data) for (i0iltsX(sY-1)i)
datai 1 // setup
diagonalsY matrix-gtgetRow(sX(sY1))-gtsetData(dat
a)
Per Frame Iteration
clMatrix-gtmatrixVectorOp(CL_NULL,clLast,NULL,clCur
rent) // curr matrixlast clLast-gtcopyVector(c
lCurrent) // save for
next iteration clCurrent-gtunpack(cluCurrent)
// unpack for
rendering renderHF(cluCurrent-gtm_pVectorTexture)
// render as heightfield
21
Example 22D Waves (implicit)

Key Idea
Use different time discretization (e.g. Crank
Nicholson)
Results in system of linear equations
Iterative solution using CG

22
2D Waves (cont.)

One Time Matrix Initialization

for (isYiltsXsYi) datai -alpha
// setup diagonal-sY matrix-gtgetRow(sX(sY-1
))-gtsetData(data) for (i0iltsXsYi) datai
(isX) ? - alpha 0 // setup
diagonal-1 matrix-gtgetRow(sXsY-1)-gtsetData(data)
for (i0iltsXsYi) datai 4alpha1
// setup diagonal matrix-gtgetRow(sXsY)-gtse
tData(data) for (i0iltsXsYi) datai
((i1)sX) ? -alpha0 // setup
diagonal1 matrix-gtgetRow(sXsY1)-gtsetData(data)
for (i0iltsX(sY-1)i) datai -alpha
// setup diagonalsY matrix-gtgetRow(sX(sY
1))-gtsetData(data)
Per Frame Iteration
cluRHS-gtcomputeRHS(cluLast, cluCurrent) //
generate c(i,j,t) clRHS-gtrepack(cluRHS)
// encode into RGBA iSteps
pCGSolver-gtsolve(iMaxSteps) // solve using
CG cluLast-gtcopyVector(cluCurrent) // save
for next iteration clNext-gtunpack(cluCurrent)
// unpack for rendering renderHF(cluCur
rent-gtm_pVectorTexture) // render as heightfield
23
Demos
24
The End