A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware


1
A Multigrid Solver for Boundary Value Problems
Using Programmable Graphics Hardware
Nolan Goodnight Cliff Woolley Gregory
LewinDavid Luebke Greg Humphreys
University of Virginia
Graphics Hardware 2003
July 26-27 San Diego, CA
2
General-Purpose GPU Programming
  • Why do we port algorithms to the GPU?
  • How much faster can we expect it to be, really?
  • What is the challenge in porting?

3
Case Study
  • Problem Implement a Boundary Value Problem (BVP)
    solver using the GPU
  • Could benefit an entire class of scientific and
    engineering applications, e.g.
  • Heat transfer
  • Fluid flow

4
Related Work
  • Krüger and Westermann Linear Algebra Operators
    for GPU Implementation of Numerical Algorithms
  • Bolz et al. Sparse Matrix Solvers on the GPU
    Conjugate Gradients and Multigrid
  • Very similar to our system
  • Developed concurrently
  • Complementary approach

5
Driving problem Fluid mechanics sim
  • Problem domain is a warped disc

regular grid
regular grid
6
BVPs Background
  • Boundary value problems are sometimes governedby
    PDEs of the form
  • L? f
  • L is some operator
  • ? is the problem domain
  • f is a forcing function (source term)
  • Given L and f, solve for ?.

7
BVPs Example
  • Heat Transfer
  • Find a steady-state temperature distribution T in
    a solid of thermal conductivity k with thermal
    source S
  • This requires solving a Poisson equation of the
    form
  • k?2T -S
  • This is a BVP where L is the Laplacian operator
    ?2
  • All our applications require a Poisson solver.

8
BVPs Solving
  • Most such problems cannot be solved analytically
  • Instead, discretize onto a grid to form a set of
    linear equations, then solve
  • Direct elimination
  • Gauss-Seidel iteration
  • Conjugate-gradient
  • Strongly implicit procedures
  • Multigrid method

9
Multigrid method
  • Iteratively corrects an approximation to the
    solution
  • Operates at multiple grid resolutions
  • Low-resolution grids are used to correct
    higher-resolution grids recursively
  • Very fast, especially for large grids O(n)

10
Multigrid method
  • Use coarser grid levels to recursively correct an
    approximation to the solution
  • Algorithm
  • smooth
  • residual
  • restrict
  • recurse
  • interpolate

? L?i - f
11
Implementation
  • For each step of the algorithm
  • Bind as texture maps the buffers that contain the
    necessary data
  • Set the target buffer for rendering
  • Activate a fragment program that performs the
    necessary kernel computation
  • Render a grid-sized quad with multitexturing

source buffer texture
source buffer texture
render target buffer
render target buffer
fragment program
12
Optimizing the Solver
  • Detect steady-state natively on GPU
  • Minimize shader length
  • Special-case whenever possible
  • Avoid context-switching

13
Optimizing the Solver Steady-state
  • How to detect convergence?
  • L1 norm - average error
  • L2 norm RMS error (common in visual sim)
  • L? norm max error (common in sci/eng apps)
  • Can use occlusion query!

secs to steady statevs. grid size
14
Optimizing the Solver Shader length
  • Minimize number of registers used
  • Vectorize as much as possible
  • Use the rasterizer to perform computations of
    linearly-varying values
  • Pre-compute invariants on CPU

shader original fp fastpath fp fastpath vp
smooth 79-6-1 20-4-1 12-2
residual 45-7-0 16-4-0 11-1
restrict 66-6-1 21-3-0 11-1
interpolate 93-6-1 25-3-0 13-2
15
Optimizing the Solver Special-case
  • Fast-path vs. slow-path
  • write several variants of each fragment program
    to handle boundary cases
  • eliminates conditionals in the fragment program
  • equivalent to avoiding CPU inner-loop branching

fast path, no boundaries
slow path with boundaries
16
Optimizing the Solver Special-case
  • Fast-path vs. slow-path
  • write several variants of each fragment program
    to handle boundary cases
  • eliminates conditionals in the fragment program
  • equivalent to avoiding CPU inner-loop branching

secs per v-cyclevs. grid size
17
Optimizing the Solver Context-switching
  • Find best packing data of multiple grid
    levelsinto the pbuffer surfaces

18
Optimizing the Solver Context-switching
  • Find best packing data of multiple grid
    levelsinto the pbuffer surfaces

19
Optimizing the Solver Context-switching
  • Find best packing data of multiple grid
    levelsinto the pbuffer surfaces

20
Optimizing the Solver Context-switching
  • Remove context switching
  • Can introduce operations with undefined results
    reading/writing same surface
  • Why do we need to do this?
  • Can we get away with it?
  • What about superbuffers?

21
Data Layout
  • Performance

secs to steady statevs. grid size
22
Data Layout
  • Possible additional vectorization
  • Compute 4 values at a time
  • Requires source, residual, solution values to be
    in different buffers
  • Complicates boundary calculations
  • Adds setup and teardown overhead

Stacked domain
23
Results CPU vs. GPU
  • Performance

secs to steady statevs. grid size
24
Conclusions
  • What we need going forward
  • Superbuffers
  • or Universal support for multiple-surface
    pbuffers
  • or Cheap context switching
  • Developer tools
  • Debugging tools
  • Documentation
  • Global accumulator
  • Ever increasing amounts of precision, memory
  • Textures bigger than 2048 on a side

25
Acknowledgements
  • Hardware
  • David Kirk
  • Matt Papakipos
  • Driver Support
  • Nick Triantos
  • Pat Brown
  • Stephen Ehmann
  • Fragment Programming
  • James Percy
  • Matt Pharr
  • General-purpose GPU
  • Mark Harris
  • Aaron Lefohn
  • Ian Buck
  • Funding
  • NSF Award 0092793
Write a Comment
User Comments (0)
About PowerShow.com