Cg and Hardware Accelerated Shading - PowerPoint PPT Presentation

About This Presentation
Title:

Cg and Hardware Accelerated Shading

Description:

Cg and Hardware Accelerated Shading – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 84
Provided by: cse64
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Cg and Hardware Accelerated Shading


1
Cg and Hardware Accelerated Shading
  • Cem Cebenoyan

2
Overview
  • Cg Overview
  • Where we are in hardware today
  • Physical Simulation on GPU
  • GeforceFX / Cg Demos
  • Advanced hair and skin rendering in Dawn
  • Adaptive subdivision surfaces and ambient
    occlusion shading in Ogre
  • Procedural shading in Time Machine
  • Depth of field and post-processing effects in
    Toys
  • OIT

3
What is Cg?
  • A high level language for controlling parts of
    the graphics pipeline of modern GPUs
  • Today, this includes the vertex transformation
    and fragment processing units of the pipeline
  • Very C-like
  • Only simpler
  • Native support for vectors, matrices,
    dot-products, reflection vectors, etc.
  • Similar in scope to Renderman
  • But notably different to handle the way hardware
    accelerators work

4
Cg Pipeline Overview
Graphics Program Written in Cg C for Graphics
Compiled Optimized
Low Level, Graphics Assembly Code
5
Graphics Data Flow
VertexProgram
FragmentProgram
Application
Framebuffer
Cg Program
Cg Program
// // Diffuse lighting // float d dot
(normalize(frag.N), normalize(frag.L)) if (d lt
0) d 0 c d f4tex2D( t, frag.uv )
diffuse
6
Graphics Hardware Today
  • Fully programmable vertex processing
  • Full IEEE 32-bit floating point processing
  • Native support for mul, dp3, dp4, rsq, pow, sin,
    cos...
  • Full support for branching, looping, subroutines
  • Fully programmable pixel processing
  • IEEE 32-bit, 16-bit (s10e5) math supported
  • Same native math ops as vertex, plus texture
    fetch, and derivative instructions
  • No branching, but gt1000 instruction limit
  • Floating point textures / frame buffers
  • No blending / filtering yet
  • 500mhz core clock

7
Physical Simulation
  • Simple cellular automata-like simulations are
    possible on NV20 class hardware (e.g. Game of
    Life, Greg James water simulation, Mark Harris
    CML work)
  • Use textures to represent physical quantities
    (e.g. displacement, velocity, force) on a regular
    grid
  • Multiple texture lookups allow access to
    neighbouring values
  • Pixel shader calculates new values, renders
    results back to texture
  • Each rendering pass draws a single quad,
    calculating next time step in simulation

8
Physical Simulation
  • Problem 8 bit precision on NV20 is not enough,
    causes drifting, stability problems
  • Float precision on NV30 allows GPU physics to
    match CPU accuracy
  • New fragment programming model (longer programs,
    flexible dependent texture reads) allows much
    more interesting simulations

9
Example Cloth Simulation Shader
  • Uses Verlet integration (see Jakobsen, GDC 2001)
  • Avoids storing explicit velocity
  • newx x (x oldx)damping adtdt
  • Not always accurate, but stable!
  • Store current and previous position of each
    particle in 2 RGB float textures
  • Fragment program calculates new position, writes
    result to float buffer
  • Copy float buffer back to texture for next
    iteration (could use render-to-texture instead)
  • Swap current and previous textures

10
Cloth Shader Demo
11
Cloth Simulation Shader
  • 2 passes
  • 1. Perform integration
  • 2. Apply constraints
  • Floor constraint
  • Sphere constraint
  • Distance constraints between particles
  • Read back float frame buffer using glReadPixels
  • Draw particles and constraints

12
Cloth Simulation Cg Code (1st pass)
void Integrate(inout float3 x, float3 oldx,
float3 a, float timestep2, float damping) x
x damping(x - oldx) atimestep2myFragout
main(v2fconnector In, uniform
texobjRECT x_tex, uniform
texobjRECT ox_tex, uniform float
timestep, uniform float damping,
uniform float3 gravity)
myFragout Out float2 s In.TEX0.xy // get
current and previous position float3 x
f3texRECT(x_tex, s) float3 oldx
f3texRECT(ox_tex, s) // move the particle
Integrate(x, oldx, gravity, timesteptimestep,
damping) Out.COL.xyz x return Out
13
Cloth Simulation Cg Code (2nd pass)
// constrain particle to be fixed distance from
another particlevoid DistanceConstraint(float3
x, inout float3 newx, float3 x2,
float restlength, float stiffness)
float3 delta x2 - x float deltalength
length(delta) float diff (deltalength -
restlength) / deltalength newx newx
deltastiffnessdiff // constraint particle to
be outside spherevoid SphereConstraint(inout
float3 x, float3 center, float r) float3
delta x - center float dist
length(delta) if (dist lt r) x center
delta(r / dist) // constrain particle to
be above floorvoid FloorConstraint(inout float3
x, float level) if (x.y lt level) x.y
level
14
Cloth Simulation Cg Code (cont.)
myFragout main(v2fconnector In,
uniform texobjRECT x_tex, uniform
texobjRECT ox_tex, uniform float dist,
uniform float stiffness) myFragout
Out float2 s In.TEX0.xy // get current
position float3 x f3texRECT(x_tex, s) //
satisfy constraints FloorConstraint(x, 0.0f)
SphereConstraint(x, float3(0.0, 2.0, 0.0), 1.0f)
// get positions of neighbouring particles
float3 x1 f3texRECT(x_tex, s float2(1.0, 0.0)
) float3 x2 f3texRECT(x_tex, s
float2(-1.0, 0.0) ) float3 x3
f3texRECT(x_tex, s float2(0.0, 1.0) ) float3
x4 f3texRECT(x_tex, s float2(0.0, -1.0) )
// apply distance constraints float3 newx x
if (s.x lt 31) DistanceConstraint(x, newx, x1,
dist, stiffness) if (s.x gt 0)
DistanceConstraint(x, newx, x2, dist,
stiffness) if (s.y lt 31) DistanceConstraint(x,
newx, x3, dist, stiffness) if (s.y gt 0)
DistanceConstraint(x, newx, x4, dist,
stiffness) Out.COL.xyz newx return Out
15
Physical Simulation Future Work
  • Limitation - only one destination buffer, can
    only modify position of one particle at a time
  • Could use pack instructions to store 2 vec4h (8
    half floats) in 128 bit float buffer
  • Could also use additional textures to encode
    particle masses, stiffness, constraints between
    arbitrary particles (rigid bodies)
  • float buffer to vertex array extension offers
    possibility of directly interpreting results as
    geometry without any CPU intervention!
  • Collision detection with meshes is hard

16
Demos Introduction
  • Developed 4 demos for the launch of GeForce FX
  • Dawn
  • Toys
  • Time Machine
  • Ogre(Spellcraft Studio)

17
Characters Look Better With Hair
18
Rendering Hair
  • Two options
  • 1) Volumetric (texture)
  • 2) Geometric (lines)
  • We have used volumetric approximations (shells
    and fins) in the past (e.g. Wolfman demo)
  • Doesnt work well for long hair
  • We considered using textured ribbons (popular in
    Japanese video games). Alpha sorting is a pain.
  • Performance of GeForce FX finally lets us render
    hair as geometry

19
Rendering Hair as Lines
  • Each hair strand is rendered as a line strip
    (2-20 vertices, depending on curvature)
  • Problem lines are a minimum of 1 pixel thick,
    regardless of distance from camera
  • Not possible to change line width per vertex
  • Can use camera-facing triangle strips, but these
    require twice the number of vertices, and have
    aliasing problems

20
Anti-Aliasing
  • Two methods of anti-aliasing lines in OpenGL
  • GL_LINE_SMOOTH
  • High quality, but requires blending, sorting
    geometry
  • GL_MULTISAMPLE
  • Usually lower quality, but order independent
  • We used multisample anti-aliasing with alpha to
    coverage mode
  • By fading alpha to zero at the ends of hairs,
    coverage and apparent thickness decreases
  • SAMPLE_ALPHA_TO_COVERAGE_ARB is part of the
    ARB_multisample extension

21
Hair Without Antialiasing
22
Hair With Multisample Antialiasing
23
Hair Shading
  • Hair is lit with simple anisotropic shader
    (Heidrich and Seidel model)
  • Low specular exponent, dim highlight looks best
  • Black hair no shadows!
  • Self-shadowing hair is hard
  • Deep shadow maps
  • Opacity shadow maps
  • Top of head is painted black to avoid skin
    showing through
  • We also had a very short hair style, which helps

24
Hair Styling is Important
25
Hair Styling
  • Difficult to position 50,000 individual curves by
    hand
  • Typical solution is to define a small number of
    control hairs, which are then interpolated across
    the surface to produce render hairs
  • We developed a custom tool for hair styling
  • Commercial hair applications have poor styling
    tools and are not designed for real time output

26
Hair Styling
  • Scalp is defined as a polygon mesh
  • Hairs are represented as cubic Bezier curves
  • Controls hairs are defined for each vertex
  • Render hairs are interpolated across triangles
    using barycentric coordinates
  • Number of generated hairs is based on triangle
    area to maintain constant density
  • Can add noise to interpolated hairs to add
    variation

27
Hair Styling Tool
  • Provides a simple UI for styling hair
  • Combing tools
  • Lengthen / shorten
  • Straighten / mess up
  • Uses a simple physics simulation based on Verlet
    integration (Jakobson, GDC 2001)
  • Physics is run on control hairs only
  • Collision detection done with ellipsoids

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Dawn Demo
  • Show demo

32
(No Transcript)
33
The Ogre Demo
  • A real-time preview of Spellcraft Studios
    in-production short movie Yeah!
  • Created in 3DStudio MAX
  • Used Character Studio for animation, plus Stitch
    plug-in for cloth simulation
  • Original movie was rendered in Brazil with global
    illumination
  • Available at www.yeahthemovie.de
  • Our aim was to recreate the original as closely
    as possible, in real-time

34
What are Subdivision Surfaces?
  • A curved surface defined as the limit of repeated
    subdivision steps on a polygonal model
  • Subdivision rules create new vertices, edges,
    faces based on neighboring features
  • We used the Catmull-Clark subdivision scheme (as
    used by Pixar)
  • MAX, Maya, Softimage, Lightwave all support forms
    of subdivision surfaces

35
Realtime Adaptive Tessellation
  • Brute force subdivision is expensive
  • Generates lots of polygons where they arent
    needed
  • Number of polygons increases exponentially with
    each subdivision
  • Adaptive tessellation subdivides patches based on
    screen-space patch size test
  • Guaranteed crack-free
  • Generates normals and tangents on the fly
  • Culls off-screen and back-facing patches
  • CPU-based (uses SSE were possible)

36
Control Mesh vs. Subdivided Mesh
4000 faces
17,000 triangles
37
Control Mesh Detail
38
Subdivided Mesh Detail
39
Why Use Subdivision Surfaces?
  • Content
  • Characters were modeled with subdivision in mind
    (using 3DSMax MeshSmooth/NURMS modifier)
  • Scalability
  • wanted demo to be scalable to lower-end hardware
  • Infinite detail
  • Can zoom in forever without seeing hard edges
  • Animation compression
  • Just store low-res control mesh for each frame
  • May be accelerated on future GPUs

40
Disadvantages of Realtime Subdivision
  • CPU intensive
  • But we might as well use the CPU for something!
  • View dependent
  • Requires re-tessellation for shadow map passes
  • Mesh topology changes from frame to frame
  • Makes motion blur difficult

41
Ambient Occlusion Shading
  • Helps simulate the global illumination look of
    the original movie
  • Self occlusion is the degree to which an object
    shadows itself
  • How much of the sky can I see from this point?
  • Simulates a large spherical light surrounding the
    scene
  • Popular in production rendering Pearl Harbor
    (ILM), Stuart Little 2 (Sony)

42
Occlusion
N
43
How To Calculate Occlusion
  • Shoot rays from surface in random directions over
    the hemisphere (centered around the normal)
  • The percentage of rays that hit something is the
    occlusion amount
  • Can also keep track of average of un-occluded
    directions bent normal
  • Some Renderman compliant renders (e.g. Entropy)
    have a built-in occlusion() function that will do
    this
  • We cant trace rays using graphics hardware (yet)
  • So we pre-calculate it!

44
Occlusion Baking Tool
  • Uses ray-tracing engine to calculate occlusion
    values for each vertex in control mesh
  • We used 128 rays / vertex
  • Stored as floating point scalar for each vertex
    and each frame of the animation
  • Calculation took around 5 hours for 1000 frames
  • Subdivision code interpolates occlusion values
    using cubic interpolation
  • Used as ambient term in shader

45
(No Transcript)
46
(No Transcript)
47
Ogre Demo
  • Show demo

48
Procedural Shading in Time Machine
  • Goals for the Time Machine demo
  • Overview of effects
  • Metallic Paint
  • Wood
  • Chrome
  • Techniques used
  • Faux-BRDF reflection
  • Reveal and dXdT maps
  • Normal and DuDv scaling
  • Dynamic Bump mapping
  • Performance Issues
  • Summary

49
Why do Time Machine?
  • GPUs are much more programmable
  • Thanks to generalized dependent texturing, more
    active textures (16 on GeForce FX) and (for our
    purposes) unlimited blend operations,
    high-quality animation is possible per-pixel
  • GeForce FX has gt2x performance of GeForce 4Ti
  • Executing lots of per-pixel operations isnt just
    possible it can be done in real time.
  • Previous per-pixel animation was limited
  • Animated textures
  • PDE / CA effects (see Mark Harris talk at GDC)
  • Goal Full-scene per-pixel animation

50
Why do Time Machine? (continued)
  • Neglected pick-up trucks demonstrate a wide
    variety of surface effects, with intricate
    transitions and boundaries
  • Paint oxidizing, bleaching and rusting
  • Vinyl cracking
  • Wood splintering and fading
  • And more

Not possible with just per-vertex animation!
51
Time Machine Effects Paint
  • Paint textures
  • Paint Color
  • Rust LUT
  • Shadow map
  • Spotlight mask
  • Light Rust Color
  • Deep Rust Color
  • Ambient Light
  • Bubble Height
  • Reveal Time
  • New Environment
  • Old Environment
  • ( artist created)

Oxidation
Specular color shift
Rusting
Bubbling
60 Pixel Shader instructions, 11 textures
52
Effects (contd) Wood, Chrome, Glass
Chrome welts and corrodes
Wood fades and cracks
31 instructions, 6 textures
23 instructions, 8 textures
Headlights fog
24 instructions, 4 textures
53
Procedural or Not?
  • Procedural shading normally replaces textures
    with functions of several variables.
  • Time Machine uses textures liberally.
  • The only parameter to our shaders is time.
  • However, turning everything into math is
    expensive
  • Time Machines solution
  • Give artist direct control (textures) over final
    image, use functions to control transitions

54
Techniques Faux-BRDF Reflection
  • Many automotive paints exhibit a color-shift as a
    function of the light and viewer directions.
  • This effect has been approximated with analytic
    BRDFs (Lafortunes cosine lobes)
  • And measured by Cornell Universitys graphics lab
  • BRDF factorization McCool, Rusinkiewicz is one
    method to use this data on graphics hardware
  • Efficient representation with multiple 2D
    textures
  • Closely approximates the original BRDFs
  • But not necessarily the most efficient method for
    automotive paint, and not artist-controllable.
  • Reflection intensity is uninteresting (largely
    Blinn)
  • Rotated/projected axes hard to visualize

55
Techniques Faux-BRDF Reflection 2
  • Our solution project BRDF values onto a single
    2D texture, and factor out the intensity
  • Compute intensity in real-time, using (N.H)s
  • Texture varies slowly, so it can be low-res
    (64x64).
  • Anti-aliasing texture fixes laser noise at
    grazing angles
  • For automotive paints, N.L and N.H work well for
    axes.
  • Not physically accurate, but fast and
    high-quality.
  • Easy for artists to tweak.

Mystique lacquer
Dupont Cayman lacquer
56
Techniques Reveal and dXdT maps
  • Artists do not want to paint hundreds of frames
    of animation for a surface transition (e.g.,
    paint-gtrust)
  • Ultimately, effect is just a conditional
  • if (time gt n) color rust else color
    paint
  • Or an interpolation between a start and end point
  • paint interpolate(paint, bleach, s(time-n))
  • So all intermediate values can be generated.
  • For continuous effects, use dXdT (velocity) maps
  • Can be stored in alpha in a DXT5 texture.

57
Performance Concerns
  • Executing large shaders is expensive.
  • First rule of optimization Keep inner loops
    tight
  • Shaders are the inner loop, run gt1M times per
    frame.
  • But graphics cards have many parallel units
  • Vertex, fragment, and texture units
  • Modern GPUs do a great job of hiding texture
    latency
  • Bandwidth is unimportant in long shaders
  • Time Machine runs at virtually the same framerate
    on a 500/500 GeForceFX as it does on a 500/400 or
    500/550
  • So not using textures is wasting performance!

58
Performance Concerns
  • What makes a good texture?
  • Saves math operations
  • 8 (RGBA) or 16 (HILO) bit precision sufficient
  • Depends on a limited number of variables
  • Textures we used
  • Interpolating between light and dark rust layers
  • Required computing the difference between light
    and dark layers reveal maps, and expanding to
    0..1.
  • Function was dependent on current and reveal
    time.
  • Used to blend two texture maps

59
Performance Concerns
  • Textures Used, continued
  • Surround Maps
  • Recomputing the normal requires knowing the
    heights of 4 texels (s-1,t), (s1,t), (s,t1) and
    (s,t-1)
  • Each height is only 1 8-bit component
  • Instead of 4 dependent fetches, we can pack all
    into 1
  • S(s,t) H(s-1, t), H(s1, t), H(s,t-1),
    H(s,t1)
  • Saved 4 math ops and 3 texture fetches shuffle
    logic

60
Time Machine demo
  • Show demo

61
Toys Demo - Simple Depth of Field
  • Render scene to color and depth textures
  • Generate mipmaps for color texture
  • Render full screen quad with simpledof shader
  • Depth tex(depthtex, texcoord)
  • Coc (circle of confusion) abs(depthscale
    bias)
  • Color txd(colortex, texcoord, (coc,0), (0,coc))
  • Scale and bias are derived from the camera
  • Scale (aperture focaldistance planeinfocus
    (zfar znear)) / ((planeinfocus
    focaldistance) znear zfar)
  • Bias (aperture focaldistance (znear
    planeinfocus)) / ((planeinfocus
    focaldistance) znear)

62
Artifacts Bilinear Interpolation/Magnification
  • Bilinear artifacts in extreme back- and
    near-ground
  • Solution multiple jittered samples
  • Even without jittering, a 4 or 5 sample rotated
    grid pattern brings smaller artifacts under
    control
  • Larger artifacts need jittered samples, and more
    of them
  • Then its just a tradeoff between noise from the
    jittering and bilinear interpolation artifacts
  • (and of course the quality/performance tradeoff
    with number of samples)

63
Noise vs. Interpolation Artifacts
With Noise
Without Noise
64
Artifacts Depth Discontinuities
  • Near-ground (blurry) pixels dont properly blend
    out over top of mid-ground (sharp) pixels
  • Easy solution Cheat!
  • Either dont let objects get too far in front of
    the plane in focus, or blur everything a little
    more when they do soft edges help hide this
    fairly well.

65
Depth Discontinuities
66
Fun With Color Matrices
  • Since were already rendering to a full-screen
    texture, its easy to muck with the final image.
  • Operations are just rotations / scales in RGB
    space
  • Color (hue) shift
  • Saturation
  • Brightness
  • Contrast
  • These are all matrices, so compose them together,
    and apply them as 3 dot products in the shader

67
Original Image
68
Colorshifted Image
69
Black and White Image
70
Toys Demo
  • Show demo

71
Order Independent Transparency
  • Why is correct transparency hard?
  • Depth peeling
  • Two depth buffers
  • Enter the shadow map
  • Precision/invariance issues
  • Depth replace texture shader
  • Blending the layers
  • Other applications

72
Cant just glEnable(GL_BLEND)
Good Transparency Bad Transparency
without OIT
with OIT
73
Why is correct transparency hard?
  • Most hardware does object-order rendering
  • Correct transparency requires sorted traversal
  • Have to render polygons in sorted order
  • Not very convenient
  • Polygons cant intersect
  • Lot of extra application work
  • Especially difficult for dynamic scene databases

74
Depth Peeling
  • The algorithm uses an implicit sort to extract
    multiple depth layers
  • First pass render finds front-most fragment
    color/depth
  • Each successive pass render finds (extracts) the
    fragment color/depth for the next-nearest
    fragment on a per pixel basis
  • Use dual depth buffers to compare previous
    nearest fragment with current
  • Second depth buffer used for comparison (read
    only) from texture more on this later

75
Layer 0
Layer 1
Layer 2
Layer 3
76
Cross-section view of depth peeling
Layer 0
Layer 1
Layer 2
0 depth 1
0 depth 1
0 depth 1
Depth peeling strips away depth layers with each
successive pass. The frames above show the
frontmost (leftmost) surfaces as bold black
lines, hidden surfaces as thin black lines, and
peeled away surfaces as light grey lines.
77
Dual Depth Buffer Pseudo-code
  • for ( i 0 i lt num_passes i )
  • clear color buffer
  • depth unit 0
  • if(i 0) disable depth test
  • else enable depth test
  • bind depth buffer (i 2)
  • disable depth writes / read-only depth test
    /
  • set depth func to GREATER
  • depth unit 1
  • bind depth buffer ((i1) 2)
  • clear depth buffer
  • enable depth writes
  • enable depth test
  • set depth func to LESS
  • render scene
  • save color buffer RGBA as layer i

78
Implementation
  • There is no dual depth buffer extension to
    OpenGL, so what can we do?
  • Just need one depth test with writeable depth
    buffer the other can be read-only
  • Shadow mapping is a read-only depth test!
  • Depth test can have an arbitrary camera location
  • Other interesting uses for clip volumes
  • Fast copies make this proposition reasonable
  • Copies will be unnecessary in the future

79
Precision / Invariance issues
  • Using shadow mapping hardware introduces
    precision and invariance issues
  • depth rasterization usually just needs to match
    output depth buffer precision, and requires no
    perspective correction
  • Texture hardware requires perspective correction
    and projection at high precision
  • Making things match would be difficult without
    the DEPTH_REPLACE texture shader
  • Computes with texture hardware at texture
    precision
  • Solves invariance problems at some extra expense
  • Will be cheaper in the future

80
(No Transcript)
81
Compositing
  • Each time we peel, we capture the RGBA, then as a
    final step, we blend all the layers together from
    back to front
  • Opaque fragments completely overwrite previous
    transparent ones

82
Conclusions
  • Results are nice!
  • Get correct transparency without invasive changes
    to internal data structures
  • Can be bolted on to existing CAD/CAM apps
  • Requires n scene traversals for n correctly
    sorted depths
  • n 4 is often quite satisfactory (see previous
    slide)
  • Shadow maps are for more than shadows!

83
Questions?
  • cem_at_nvidia.com
  • http//developer.nvidia.com
  • http//developer.nvidia.com/cg/
  • http//www.cgshaders.org/
Write a Comment
User Comments (0)
About PowerShow.com