Title: Ray Tracing and Photon Mapping on GPUs
1Ray Tracing and Photon Mapping on GPUs
- Tim Purcell Stanford / NVIDIA
2Small Sampling of GI on GPUs
- Much more detail in the included papers
- Lots of other global illumination on GPUs in
the literature - The Ray Engine Carr et al. 2002
- GPU Algorithms for Radiosity and Subsurface
ScatteringCarr et al. 2003 - Radiosity on Graphics Hardware Coombe et al.
2004 - Lots and lots of shadow papers
3Radiosity
Radiosity on Graphics Hardware Coombe et al.
2004
4Subsurface Scattering
GPU Algorithms for Radiosity and Subsurface
Scattering Carr et al. 2003
5Ray Tracing
6Ray Tracing
Point Light
S
Occluder
Camera
P
S
S
R
Diffuse
T
Material
Specular
Material
T
Diffuse
Material
7Implementation Options
- GPU as a ray-triangle intersection engine
Carr et al. 2002 - Rays and geometry streamed to GPU
- Intersection calculation results read back
- Acceleration structure traversal done on host CPU
- GPU as a ray tracing engine Purcell et al.
2002 - Scene geometry and acceleration structure stored
on GPU - GPU performs ray generation, acceleration
structure traversal, intersection, and shading - Host provides camera info
8Streaming Ray Tracer
Generate Eye Rays
Camera
Traverse Acceleration Structure
Grid
Intersect Triangles
Triangles
Shade Hits and Generate Shading Rays
Materials
9Techniques Used
- Data structure navigation
- Texture memory stores data structures
- Dependent texture fetches walk through data
- Flow control
- Kernel binding based on occlusion query results
- Efficient selective execution of kernels using
early-z occlusion culling - Difficulty in flow control disappearing with
newest graphics cards - PS 3.0
10Texture Memory Organization
Uniform Grid 3D Luminance Texture
vox0
vox1
vox2
vox3
vox4
vox5
voxM
0
3
11
38
564
Triangle List 1D Luminance Texture
vox0
vox2
0
3
1
3
7
21
216
tri0
tri1
tri2
tri3
tri4
tri5
triN
Triangles 3x 1D RGB Textures
xyz
xyz
xyz
xyz
xyz
xyz
xyz
v0
xyz
xyz
xyz
xyz
xyz
xyz
xyz
v1
xyz
xyz
xyz
xyz
xyz
xyz
xyz
v2
11Efficient Selective Execution
- Rendering giant screen filling quad not ideal
- Not all pixels need to process every rendering
pass - Proposed low-overhead early fragment kill
- Computation mask
- Controllable early-Z occlusion culling
- Trade computation for bandwidth
12Original System Implementation
- ATI Radeon 9700 Pro (R300)
- ATI Fragment Program
13Cornell Box Ray Traced Shadows
Rendered using a Radeon 9700 Pro
14Teapotahedron
Rendered using a Radeon 9700 Pro
15Quake 3 Ray Traced Shadows
Rendered using a Radeon 9700 Pro
16Quake 3 Ray Traced Shadows
Rendered using a Radeon 9700 Pro
17Performance Results
- Radeon 9700 Pro
- 100M ray-triangle intersections/s
- 300K to 4.0M rays/s
- Between 3 12 fps _at_ 256x256 pixels
- CPU implementation
- 20M intersections/s P3 800 MHz Wald et al. 2001
- 800K to 7.1M ray/s 2.5 GHz P4 Wald et al. 2003
- With simple shading 1.8M to 2.3M rays/s
18Photon Mapping
19Photon MappingAlgorithm Review
- Photon tracing
- Emission, scattering, storing into k-d tree
- Similar to ray tracing
- Rendering
- Ray tracing for direct illumination
- Photon map visualization
- Indirect bounce
20Computational Challenge for GPUs 1
- Constructing a irregular or sparse data structure
21Computational Challenge for GPUs 2
- Adaptive nearest neighbor search
- Noise vs. blur
22Computational Challenge for GPUs 2
- Adaptive nearest neighbor search
- Noise vs. blur
23Scatter on the GPU
- Sort photons into grid cells
- Grid cell is sort key
- Two solutions
- Simulate scatter with fragment programs
- Bitonic merge sort followed by binary search
- Multiple rendering passes
- Vertex program with stencil buffer
- Fixed number of photons per grid cell
- Single rendering pass
24Adaptive Nearest Neighbor Search
- Iterative algorithm
- Accept or reject photons in cell visit order
- No priority queue!
- kNN-grid
25Original System Implementation
- NVIDIA GeForce FX 5900 Ultra (NV35)
- Cg compiler 1.1
Compute Lighting
Render Image
Trace Photons
Build Photon Map
Ray Trace Scene
Compute Radiance Estimate
26Glass Ball Bitonic Sort
18s _at_ 512x384, 5K photons
27Glass Ball Stencil Routing
11s _at_ 512x384, 5K photons
28Ring Bitonic Sort
9s _at_ 512x384, 16K photons
29Ring Stencil Routing
8s _at_ 512x384, 16K photons
30Cornell Box Bitonic Sort
64s _at_ 512x512, 65K photons
31Cornell Box Stencil Routing
47s _at_ 512x512, 65K photons
32Cornell Box Increased Search Radius
33Summary
- GPU can perform global illumination calculations
- Lots of options for splitting computation between
CPU and GPU - Global illumination calculations require many
techniques useful to GPGPU computations - Data structure navigation
- Sort, search
- Data dependent looping and branching