KD-Tree Acceleration Structures for a GPU Raytracer - PowerPoint PPT Presentation

About This Presentation
Title:

KD-Tree Acceleration Structures for a GPU Raytracer

Description:

9800 XT : 170M ray-triangle intersects/s. X800 XT PE: 350M ... Hierarchical accelerator (kd-tree) Improve scalability. GH05. Outline. Background. GPU Raytracing ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 28
Provided by: timf84
Category:

less

Transcript and Presenter's Notes

Title: KD-Tree Acceleration Structures for a GPU Raytracer


1
KD-Tree Acceleration Structures for a GPU
Raytracer
  • Tim Foley, Jeremy Sugerman
  • Stanford University

2
Motivation
  • Accelerated raytracing
  • On commodity HW
  • Production rendering
  • Real-time applications?
  • Performance trend
  • 9800 XT 170M ray-triangle intersects/s
  • X800 XT PE 350M ray-triangle intersects/s

3
GPU Raytracing
  • Promising early results
  • Simple scenes
  • Uniform grid
  • Problems with complex scenes
  • Hierarchical accelerator (kd-tree)
  • Improve scalability

4
Outline
  • Background
  • GPU Raytracing
  • KD-Tree Algorithm
  • KD-Restart, KD-Backtrack
  • Results
  • Future Work

5
Background
  • RayEngine Carr et al. 2002
  • Parallel ray-triangle intersection
  • Host controls culling
  • Purcell et al. 2002
  • Entire raytracing pipeline
  • Many rays required for efficiency
  • Uniform Grid

6
Why not KD-Tree?
  • Uniform grid acceleration structure
  • Regular structure efficient traversal
  • Regular structure poor partitioning
  • KD-Trees
  • Adapt to scene complexity
  • Compact storage, efficient traversal
  • Best for CPU raytracing Havran 2000

7
KD-Tree
tmin
Z
X
B
Y
D
C
A
tmax
8
KD-Tree Traversal
9
Per-Fragment Stacks
  • Parallel (per-ray) push
  • No indexed write in fragment program
  • Per-ray stack storage
  • Ernst et al. 2004
  • Emulate push with extra passes
  • Impractical, slow

10
Our Contribution
  • Stackless kd-tree traversal algorithms
  • KD-Restart
  • KD-Backtrack

11
Observation
Current leafs tmax
Next leafs tmin

12
KD-Restart
  • Standard traversal
  • Omit stack operations
  • Proceed to 1st leaf
  • If no intersection
  • Advance (tmin,tmax)
  • Restart from root
  • Proceed to next leaf

13
KD-Restart
  • Restart traversal after each leaf
  • m leaves
  • Average depth d
  • Cost O(md)
  • Balanced tree of n nodes
  • Upper bound O(n log(n))
  • Standard algorithm O(n)
  • Expected O( log(n) )

14
Observation
Ancestor of A is parent of Z
15
KD-Backtrack
  • If no intersection
  • Advance (tmin, tmax)
  • Start backtracking
  • If node intersects (tmin, tmax)
  • Resume traversal
  • Proceed to next leaf

16
KD-Backtrack
  • Backtrack after leaf
  • Revisits previous nodes
  • At most twice from left, right
  • Within constant factor of standard traversal
  • Upper bound O(n)
  • Expected O( log(n) )
  • Requires additional storage
  • Parent pointers
  • Bounding boxes for internal nodes

17
Implementation
  • Built GPU raytracer in Brook Buck et al.
  • 4 intersection schemes
  • Brute Force
  • Uniform Grid
  • KD-Restart
  • KD-Backtrack

18
Scenes
Stanford Bunny 69451 triangles
Cornell Box 32 triangles
BART Robots 71708 triangles
BART Kitchen 110561 triangles
19
Results
Box
Bunny
Robots
Kitchen
12.9
Relative speedup over brute-force intersection.
20
Results
Ideal Restart Backtrack
Traverse 10.86M 21.80M 10.86M
Backtrack 0 0 7.78M
Intersect 5.91M 5.91M 5.91M
Rays in each state throughout traversal.
21
Discussion
  • Absolute performance
  • Trails best CPU implementations 5-6x
  • Sources of inefficiency
  • Load balancing
  • Data reuse

22
Load Balancing
  • Subset of rays intersecting, traversing
  • Occlusion queries to select kernel
  • Early-Z to cull inactive rays
  • Approximately 5x overhead
  • Query, kernel switch overhead
  • Worse with fewer rays

23
Data Reuse
  • Every kernel
  • Loads ray origin/direction
  • Load/Store traversal state
  • Consumes streaming bandwidth
  • We are bandwidth-limited
  • CPU implementation stores these in registers

24
Branching
  • Merge multiple passes into larger kernel
  • Fragment branches for load balancing
  • Avoid load/store of reused data
  • Current branching has high overhead
  • Shifts efficiency burden to HW

25
Conclusion
  • Stackless Traversal
  • Allows efficient GPU kd-tree
  • Scales to larger, more complex scenes
  • Future Work
  • Changes in HW
  • Alternative acceleration structures
  • Out-of-core scenes
  • Dynamic scenes

26
Acknowledgements
  • Tim Purcell (NVIDA)
  • Streaming raytracer
  • Mark Segal (ATI)
  • Demo machine
  • NVIDIA, ATI HW
  • DARPA, Rambus Funding

27
Questions
Write a Comment
User Comments (0)
About PowerShow.com