Title: RayTracing, Rendering faster
1Ray-Tracing, Rendering faster
2A Little Morning Math
- Lets render a single second of animation
- 1280 x 1024 rays for the image
- 4 x 4 super-sampling ( anti-aliasing ) per pixel
- 8 time domain samples for motion blur
- 16 samples for depth of field
- 3 ray bounces before termination
- 16 light rays for the 3 area light sources
- 16 rays for glossy reflections
- 30 frames per second
3A Whole Lot of Rays
- 1280 x 1024 x 4 x 4 x 8 x 16 x 3 x 16 x 3 x 16 x
30
61,847,529,062,400 rays
Oh, did I mention our scene contains 1 million
triangles?
Q How long do you really want to wait?
4I Feel The Need For Speed
- Three Major Strategies
- Buy a faster computer ( or maybe 1,000 faster
computers and hook them together ) - Just make it go faster!
- Use better data structures
- Write faster code
- avoid divides, sqrts, and memory allocation
- Fire fewer rays ( e.g. just be smarter )
- Better sampling strategies
- Cache intermediate results
Solution Use ALL strategies, truly we need all
the help we can get!!!
5Just making it go Faster
- Visibility Queries make up gt60 of the ray
tracing process. - Shading takes up the rest.
- closed environment raysshading samples
- Options
- Bounding Volume Hierarchies
- Regular Grids
- Octrees
- BSP trees
Good for some scenes
Adaptive
Generally though better for all-purpose use
6Adaptivity is Critical for Some Scenes
- Regular grids are fine (if not preferred) if your
complexity is evenly distributed in the scene. - Teapot in a stadium problem.
- Axis Aligned BSP trees are simple and fast
- Useful for collision detection too!
7Review Rays BSP Trees
BSP Tree
- Review of our ray def.
- Review of BSP trees
- Each internal node needs
- One child pointer/offset
- splitting plane location
- flag which axis x,y,z
- Flag leaf or non-leaf
- Each leaf node needs
- One pointer to list of objects (triangles)
- Number of triangles
- Flag leaf or non-leaf
left
right
O t D
8A Classic Blunder
- When a ray hits an object, the hit-point must be
inside the BSP cell. - or else you might get the wrong answer
- This is also true for Octrees, Regular Grids,
etc
B
right
t1
t0
s
left
A
9BSP Algorithm
- Step1 return false if ray intersects the Axis
Aligned Bounding Box (AABB) of BSP - See Real-Time Rendering (Moeller, Haines)
- A where ray enters AABB
- A O if ray starts inside of AAB
- B where ray exits AABB
- maxT B - A
- Call traverse(pRoot, A, B, maxT)
A
A
right
right
B
s
s
B
A
A
left
left
A
A
B
B
10traverse(node ,vec3 A,vec3 B,float t)
-
- int plane pNode-gtgetSplitPlane()
- float s pNode-gtgetSplit()
- if( Aplane lt s )
- if( Bplane lt s ) //left child
- return traverse( pNode-gtleftChild(),A, B,maxT)
- // ray crosses the plane
- float minT (s-Oplane)/Dplane
- vec3 midB O DmaxT
- return (traverse( pNode-gtleftChild(),A,midB,midT)
- traverse( pNode-gtrightChild(),midB,B,
maxT) ) -
- else
- if( Bplane gt s ) //right child
- traverse( pNode-gtrightChild(),A,B,maxT)
- // ray crosses the plane
- float minT (s-Oplane)/Dplane
- vec3 midB O DmaxT
- return (traverse( pNode-gtrightChild(),A,midB,midT
)
B
right
midB
s
B
left
A
A
A
A
right
B
s
midB
left
B
11How do we build an Optimal BSP tree?
- Do we want it balanced?
- Maybe, maybe not.
- Optimize Tree based on the most probable rays.
- Its NP-hard
- So lets go greedy!
12Developing a Cost Model
- Chance of a random ray hitting a bounding box
(any convex object) is proportional to its
surface area - BSP node hit expectance SurfaceArea(AB)
- For a random ray, a BSP AABB with twice the
surface area is twice as likely to get hit.
AB
13Chance of Hitting Children
- Pa Area(A)/Area(AB) - Pab
- Pb Area(B)/Area(AB) - Pab
A
B
14Chance of a Ray-Passing A-gtB or B-gtA
- Split plane relative chance of being hit
- Pab SurfaceArea(C) / SurfaceArea(AB)
A
B
C
15A Simple Example
- Suppose we are given a unit cube
- Pab 211 / (2( 11 11 11 )) 1/3
30 Chance Ray Crosses Split Plane C
1
A
B
C
1
1
16A More Idyllic Cost Model
- Cost( nodeAB )
- Pa Cost( nodeA )
- Pb Cost( nodeB )
- Pab ( Cost( nodeA ) Cost( nodeB ) )
Cost of doing just A
Cost of doing just B
Cost of doing AB
A
B
Recursive, cant efficiently evaluate
17A (Very Simple) Local Cost Model
- Cost( nodeAB, s )
- Pa (Ta Tab) Tp
- Pb (Tb Tab) Tp
- Pab (Ta Tb Tab) Tp
Cost of doing just A
Cost of doing just B
Cost of doing AB
A
B
Ta Objects in A Tb Objects in B Tab
Objects in both A B Tp time to intersect object
18Building the Tree
- Cost function only interesting at the beginning
and ending of objects - Reduces the number of cost function evaluations
- For each dimension X,Y,Z
- Compute each objects min and max
- Place min max locations into sorted list
- Compute Cost for each min/max location
- Choose Split on dimension of lowest cost X,Y,Z
19BSP Tree Performance
- BSP traversal is FAST!
- In most cases it requires 2 if statements
- Bandwidth (not compute) limited
- Cost is in fetching nodes from memory and NOT in
calculations - Reduce BSP node size to 8 bytes
- See PBRT Book
- Authors have reported gt 1 million rays per second
for scene with millions of triangles.
20What about Dynamic Scenes
- Trade-off between how much time is spent
rendering, cost of building the tree, and the
speed-up gained. - Still a very open area of research
21References
- Read PBRT Book
- V.Havran, T.Kopal, J.Bittner, and J.Zara "Fast
Robust BSP Tree Traversal Algorithm for Ray
Tracing", in Journal of Graphics Tools, Vol.2,
No. 4, pp. 15-23, Dec 1998. - Ingo Wald, Thesis - RTRT
- BSP Plane Cost Function Revisited, by Eric Haines
- http//www.acm.org/tog/resources/RTNews/html/rtnv1
7n1.htmlart8
22How Do I Ray-Trace Triangles
- Tomas Moeller
- http//www.cs.lth.se/home/Tomas_Akenine_Moller/ray
tri/ - Many resources (for ray object intersection)
linked from - http//www.cs.lth.se/home/Tomas_Akenine_Moller/ray
tri/ - http//www.realtimerendering.com/isect
Barycentric Coordinates aA0/A ß A1/A ?
A2/A (a ß ?)1, 0 ? a ?1, 0 ? ß ?1, 0 ? ? ?1
N1
N
ray
P1, C1,..
N0
A2
A0
P0, C0 ,..
A
N (a N0) (ß N1) (? N2) C (a N0) (ß N1)
(? N2) P (a N0) (ß N1) (? N2) ..
A1
N2
P2, C2 ,..
23Normal Shading Problem
- Help, Im getting bad lighting with CSG???
Ray R
surface
Answer Surface Normal N is going the wrong
way. Sol check normal and flip for purposes of
shading. If RN gt 0 Then N -N
-N
N
24A Little Morning Math (Revisited)
- We had 61,847,529,062,400 rays.
- With fast BSP traversal we could do 1 million
ray queries (with simple shading) per second on a
single CPU. - How many minutes do we need?
- Lets render a single second of animation
- 1280 x 1024 rays for the image
- 4 x 4 super-sampling ( anti-aliasing ) per pixel
- 8 time domain samples for motion blur
- 16 samples for depth of field
- 3 ray bounces before termination
- 16 light rays for the 3 area light sources
- 16 rays for glossy reflections
- 30 frames per second
25A Whole Long Time
- (61,847,529,062,400 /1,000,000 ) / 60
1,030,792 minutes
Q How long do you really want to wait?
26I Feel The Need For Speed (Revisited)
- Three Major Strategies
- Buy a faster computer ( or maybe 1,000 faster
computers and hook them together ) - Just make it go faster!
- Use better data structures
- Write faster code
- avoid divides, sqrts, and frequent memory
allocation, - Fire fewer rays ( e.g. just be smarter )
- Better sampling strategies
- Cache intermediate results
Solution Use ALL strategies, truly we need all
the help we can get!!!
27A Public Service Message on Anti-Aliasing,Frequenc
y Analysis, and Sampling
- Anti-Aliasing, Frequency Analysis, and Sampling
are NOT just about picture quality. - Ray Tracing reduce number of rays needed
- Scanline GPU Rendering reduce shading, memory,
and bandwidth costs
The Message Better Picture for Less Work
More Speed
28A Motivating Example
Super-sampling (16 rays)
Single Sample (1-ray)
Q Can we get the same result?
29Ans Yes. In many cases we can!
- Sources of Aliasing
- Temporal
- Geometric
- Surface Shading
- Handle the shader aliasing problem directly
during shader evaluation - Texture Filtering
- Bilinear Filtering
- Mip-mapping
- Anisotropic Filtering
- Procedurally
- Use screen space derivatives
Geometric Shader Aliasing
Shader Aliasing
30Texture Aliasing
- Image mapped onto polygon
- Occur when screen resolution differs from texture
resolution - Magnification aliasing
- Screen resolution finer than texture resolution
- Multiple pixels per texel
- Minification aliasing
- Screen resolution coarser than texture resolution
- Multiple texels per pixel
31Nearest Neighbor Bilinear Filtering
- Nearest Neighbor
- Box Filter
- Bilinear Filtering
- Triangle Filter
Bilinear Filtering
32Mip-maps Trilinear Filtering
- Lance Williams, 1983
- Create a resolution pyramid of textures
- Repeatedly subsample texture at half resolution
- Until single pixel
- Need extra storage space
- Accessing
- Use texture resolution closest to screen
resolution - Need space derivatives
- Or interpolate between two closest resolutions
33Summed Area Tables
- Frank Crow, 1984
- Replaces texture map with summed-area texture map
- S(x,y) sum of texels lt x,y
- Need double range (e.g. 16 bit)
- Creation
- Incremental sweep using previous computations
- S(x,y) T(x,y) S(x-1,y) S(x,y-1) -
S(x-1,y-1) - Accessing
- S T(x1,x2,y1,y2) S(x2,y2) S(x1,y2)
S(x2,y1) S(x1,y1) - Ave T(x1,x2,y1,y2)/((x2 x1)(y2 y1))
x,y
x-1,y-1
x2,y2
x1,y1
34Anisotropic Filtering
- Def. Anisotropic
- Something that depends on the view direction
- Texel Samples are Not Pulled from a symmetric
shape. - Achieves better quality than tri-linear
- Current hardware takes 16 taps
- Can be ( and should be ) combined with mip-maps
for best results.
Texture Map
Screen Image
35Analytic Anti-Aliasing
- Ed Catmull, 1978
- Eliminates edge aliases
- Clip polygon to pixel boundary
- Sort fragments by depth
- Clip fragments against each other
- Scale color by visible area
- Sum scaled colors
36The A-Buffer
- Loren Carpenter, 1984
- Subdivides pixel into 4x4 bitmasks
- Clipping logical operations on bitmasks
- Bitmasks used as index to lookup table
- Used in REYES Architecture, PIXARs Rendermans
Architecture
37Multi-Sampling (OpenGL/D3D)
- Idea
- Let shaders handle shading aliases
- Bilinear, Trilinear, Mip-map, Aniso
- Use super-sampling to take care of geometric
aliases - Increase z-buffer size and color-buffer size
- Use same shading sample across sub-pixel
Final Pixel
Sub-pixels in framebuffer
Down-filter after All triangles have been rendered
z-sample
Shading Sample
38The Accumulation Buffer
- Increases OpenGLs resolution
- Render the scene 16 times
- Shear projection matrices
- Samples in different location in pixel
- Average result
- Jittered, but same jitter sampling pattern in
each pixel