Title: GPU Accelerated Image Aligned Splatting
1GPU Accelerated Image Aligned Splatting
- Neophytos Neophytou,Klaus MuellerCenter for
Visual Computing, - Stony Brook University (SUNY)
2Motivation
- Until recently
- Gaming industry drove development of 3D graphics
boards. - Graphics Architecture didnt really address
scientific visualization needs. - Much work needed to circumvent architectural
limitations. - Now
- Games still driving force, but far more
sophisticated - Programmable GPU can be used for many things by
Visualization community
3Motivation
- Direct Volume Rendering on GPU
- Using previous generation NVidia FX and ATI
equiv. - 3D Texture-based volume rendering
- Raycasting
- Extensive use of fragment shaders for per-pixel
programming
4Motivation
- Why not Splatting?
- Scatter Vs. Gather
- Vertex Processor Vs. Fragment processor
- Need Visibility Sorting
- Pre-Shaded splatting
- XRay Splatting
- Image Aligned Splatting ? Challenging
- Requires FP blending, auxiliary buffers, early
culling - Now available on Nvidia 6800 and equivalent ATI
5Previous Work
- Splatting Westover 90
- Volume representation ?overlapping basis
functions - Projection of pre-calculated footprints for each
point - Splat everything on the image plane and composite
front-to-back - Main Advantage Implicit space leaping
- Initial problems ? Some color bleeding and
sparkling - Solution Sheet Buffered splatting
6Previous Work
- Westovers compositing ? Axes aligned Sheet
buffers ?causes popping in animated viewing
- Solution ?image aligned sheet buffers Mueller
98 - ?Slice, accumulate, and composite along viewing
direction - ?Use pre-computed kernel sections instead of
whole footprint
7Previous Work
- Post-classified Rendering Mueller 99
- Sheet buffers accumulate opacities
- Classification and shading per-pixel
- Gradient calculation per-pixel at sheet buffers
Pre-shaded
Post-shaded
8Previous Work
- Other optimizations (for software based systems)
- Fast footprint rasterization Huang 00
- Post-Convolved Rendering Neophytou 03
- Hierarchical Splatting Laur 91
- 3D Adjacency structures Orchard 01
- Optimal sampling grids Theussl 01,Neophytou
02 - Anti-aliasing issues addressed by
- Swan 97, Mueller 98, Zwicker 01, 02
9Previous Work
- GPU Accelerated X-Ray Splatting
- Efficient point-convolution for X-Ray Xue 04
- High throughput with previous generation HW
- Similar to our throughput, 2 generations earlier.
Why? - Image aligned splatting renders points at least
4x - Post-Shading incurs additional costs/overhead
- Speedup gained from Moores law consumed by cost
of producing high quality images - GPU Accelerated EWA Splatting Chen 04
- High speedups
- Retained mode Splatting
- Axis aligned buffer approach
- May produce some popping
- Cannot do Post-Shading
10Image Aligned Splatting
- (1) Sort front-to-back (2) Create density
slices(3) Apply transfer Function and shade each
slide(4) Composit front-to-back
11Image Aligned Splatting
- (1) Sort front-to-back (2) Create density
slices(3) Apply transfer Function and shade each
slide(4) Composit front-to-back
12Image Aligned Splatting
- (1) Sort front-to-back (2) Create density
slices(3) Apply transfer Function and shade each
slide(4) Composit front-to-back
13Image Aligned Splatting
- (1) Sort front-to-back (2) Create density
slices(3) Apply transfer Function and shade each
slide(4) Composit front-to-back
14Challenge 1 Increased Vertex traffic
- Splatting ? need textured quad for each point
- 4X vertices
- Image Aligned Slicing approach
- Multiple slices per point ? 4x points
- Obvious solutions
- Use point Sprites
- Use Vertex arrays
- Improvement not significant (5)
- Image aligned splatting is mainly
Rasterization-Bound.
15Challenge 2 Excessive Overdraw
- Overdraw One point is rasterized ?multiple
buffers.
- Initial approach
- Splat gradients (Nx,Ny,Nz, D)
- Alternative approach
- Splatting different density buffers using all
RGBA channels - Rasterize each point only once
- Compute gradient on-the-fly
- Use 2D texture for (x,y) and modulate with 1D
kernel along z - Alternative is 3 times faster.
16Challenge 2 Excessive Overdraw
- Use color masks to cycle through channel/buffers
- Arrange z-coefficients in the 4 color components
- Assuming ordering of R,G,B,A,R,G,B,A,R,G,B,A
- Possible orderings for (i,i1,i2,i3) will be
RGBA, GBAR, BARG, ARGB - Fragment Program for each of these cases
- Computes gradient/classification/shading per
pixel.
17Challenge 2 Excessive Overdraw
- Use color masks to cycle through channel/buffers
- Arrange z-coefficients in the 4 color components
- Assuming ordering of R,G,B,A,R,G,B,A,R,G,B,A
- Possible orderings for (I,i1,i2,i3) will be
RGBA, GBAR, BARG, ARGB - Fragment Program for each of these cases
- Computes gradient/classification/shading per
pixel.
18Challenge 2 Excessive Overdraw
- Use color masks to cycle through channel/buffers
- Arrange z-coefficients in the 4 color components
- Assuming ordering of R,G,B,A,R,G,B,A,R,G,B,A
- Possible orderings for (I,i1,i2,i3) will be
RGBA, GBAR, BARG, ARGB - Fragment Program for each of these cases
- Computes gradient/classification/shading per
pixel.
19Challenge 2 Excessive Overdraw
- Use color masks to cycle through channel/buffers
- Arrange z-coefficients in the 4 color components
- Assuming ordering of R,G,B,A,R,G,B,A,R,G,B,A
- Possible orderings for (I,i1,i2,i3) will be
RGBA, GBAR, BARG, ARGB - Fragment Program for each of these cases
- Computes gradient/classification/shading per
pixel.
20Challenge 3 Shading empty regions
- Empty Space Skipping Implicit for Splatting.
- But, slice-based splatting on GPU?
All this area is useless but it is will be
processed with expensive shading / compositing
anyway
- Utilize the early-z culling GPU optimization to
disable processing of empty space - NOTE All temp buffers share the SAME depth
buffer. Use aux buffers as multiple surfaces of
the same p-buffer
21Challenge 3 Shading empty regions
- Early z-culling with GL_DEPTH_RANGE_TEST
- Eliminates fragments of depth out-of allowed
range - We use the depth buffer to tag newly splatted
pixels
Frame Buffer
Depth Buffer
22Challenge 3 Shading empty regions
- Early z-culling with GL_DEPTH_RANGE_TEST
- Eliminates fragments of depth out-of allowed
range - We use the depth buffer to tag newly splatted
pixels
Frame Buffer
Depth Buffer
Different shades of grey represent different
slices. We set DEPTH_RANGE to allow only current
slice.
23Challenge 4 Shading of opaque regions
- Early splat elimination applied by culling
splats that project to opaque regions of the image
Useless Processing
Actual slice contribution
- Utilize the early-z culling GPU optimization to
disable processing of opaque image regions
24Challenge 4 Shading of opaque regions
- Early z-culling with normal depth-test
- Eliminates fragments with 0 depth (use of
hierarchical z-buffer also eliminates whole quads
if small enough) - We set the depth0 for all opaque fragments in
the compositing program.Yes, we do have to read
the image buffer as a texture
Frame Buffer
Depth Buffer
25Overall Inefficiencies Improvements
- Bucket tossing ?done on CPU up to 0.1 sec
- Overdraw from slicing points to image aligned
buffers - Treat RGBA channels as buffers ? 300 faster
- Processing of empty regions in
shading/compositing - Early-z-culling for empty-space-slipping ? 30
gains - Splatting and Processing of opaque regions
- Early-z-culling for early-splat-elimination ?
200 gains
26Overall Inefficiencies Improvements
- Early-z extension temperamental on NVidia boards
- Avoid clearing the z-buffer
- Avoid changing z-test direction (GL_GREATER
??GL_LESS) - Cannot write to depth component in fragment
program - BUT using GL_DEPTH_RANGE_TEST seems to allow
this!
27Results
Volume Size 128x128x128, Image size 400x400
Semi-transparent FPS 7.2 Iso-surface FPS 7.6
28Results
Volume Size 128x128x128 Image size 400x400
Semi-transparent FPS 4.9 Head, BCC FPS 3.1
Volume Size 256x256x128, Image size 400x400
Iso-surface FPS 5.1 BCC FPS 5.3
Volume Size 128x128x128, Image size 400x400
FPS 9
29Conclusions
- We have provided a GPU accelerated implementation
of the Image Aligned Splatting technique - Addressed main inefficiencies
- Vertex traffic
- Excessive overdraw
- Empty-space leaping
- Early splat elimination
- High quality at acceptable frame-rates
- Not faster than 3D slicing based approaches
30Current work Splatting irregular volumes on GPU
- Main primitive generalized to Ellipsoidal kernels
- Ellipsoid expressed as rotated/scaled sphere
- Slice ellipsoids with same sheet buffered
approach - Use texture-mapping to rasterize kernel
- Use similar Fragment programs for shading
31Thank to
- NSF CAREER grant
- DOE
- More to come on this project athttp//fytos.net/
splatting
32(No Transcript)
33(No Transcript)
34(No Transcript)
35Splatting irregular volumes on GPU
- Splatting algorithm inefficiencies are the same
with regular splatting - Cannot use the RGBA as separate slices approach
- In irregular splatting we need to keep a separate
weight buffer during the splatting phase and then
use it for normalization. - We use depth-buffer and early-z-culling
- All temp buffers are surfaces of the same
pBuffer, in order to share the same depth-buffer. - Empty-space skipping and
- Early splat elimination
36Splatting irregular volumes on GPU
- Volume data is first organized into a flat cubic
structure - Provides a list of intersecting splats for every
cell. - Cells are accessed front to back in pre-defined
order, same way as rendering a fixed octree. - Render the associated Vertex-Array of each cell
- At the beginning of the frame, all splat slicing
parameters are computed. Result is - Initial slicing polygon, and advancing vectors
- When a splat is processed, all vertices are
transformed according to current-step and
advancing vectors. - The polygon is then texture-mapped to the right
kernel.
37Splatting irregular volumes on GPU
- Problem several splats will appear on different
cells. - How do you keep track of the ones that are
already being sliced? - CPU would just hold a status array
- GPU is notorious for not having global variables
- Solution Have the Current-Cell be a uniform
variable, and every vertex will know whether it
is his turn to draw, by comparing to his
pre-computed starting-cell var.