Advanced%20D3D10%20Rendering - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced%20D3D10%20Rendering

Description:

Clean starting point for future evolution of the API. Limited market short-term ... Do the same in probe passes to get multiple light bounces. Advanced D3D10 Rendering ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 28
Provided by: emilpe3
Category:

less

Transcript and Presenter's Notes

Title: Advanced%20D3D10%20Rendering


1
Advanced D3D10 Rendering
  • Emil Persson
  • May 24, 2007

2
Overview
  • Introduction to D3D10
  • Rendering techniques in D3D10
  • Optimizations

3
Introduction
  • Best D3D revision yet! ?
  • Clean and powerful API
  • Lots of new features
  • SM 4.0
  • New geometry shader
  • Stream Out
  • Texture arrays
  • Render to volume texture
  • MSAA individual sample access
  • Constant buffers
  • Sampler state decoupled from texture unit
  • Dual-source blending
  • Etc

4
Clean API
  • Vista only
  • Everything is mandatory (almost)
  • No legacy hardware support
  • Clean starting point for future evolution of the
    API
  • Limited market short-term
  • Some old features deprecated
  • Fixed function
  • Assembly shaders
  • Alpha test
  • Triangle fans
  • Point sprites
  • Clip planes

5
Dealing with deprecated features
  • Fixed function
  • Write a few über-shaders
  • Assembly shaders
  • Convert to HLSL
  • Alpha test
  • Use discard or clip() in pixel shader
  • Use alpha-to-coverage
  • Triangle fans
  • Seldom used anyway, usually just for a quad
  • Convert to triangle list or strip
  • Point sprites
  • Expand point to 2 triangles in GS
  • Clip planes
  • Use clip distance and/or cull distance

6
SM 4.0
  • Geometry shader
  • Processes a full primitive (point, line,
    triangle)
  • Has access to adjacency information (optional)
  • Useful for silhouette detection, shadow volume
    extrusion etc.
  • May output multiple primitives
  • Output limitation is 1024 floats
  • May output nothing (to kill primitive)

7
SM 4.0
  • Infinite instruction count
  • Very long shaders may have lower throughput
    though
  • Integer and bitwise instruction
  • Indexable temporaries
  • Allows for local arrays
  • May be used to emulate a stack
  • Useful system generated values
  • SV_VertexID
  • SV_PrimitiveID
  • SV_InstanceID
  • SV_Position (Like VPOS, but now .zw are defined
    too)
  • SV_IsFrontFace (Like VFACE)
  • SV_RenderTargetArrayIndex
  • SV_ViewportArrayIndex
  • SV_ClipDistance
  • SV_CullDistance

8
SM 4.0
  • Integer bitwise instructions
  • Signed and unsigned
  • No idiv though, just udiv
  • Same registers as floats
  • Can alias without conversion with asint(),
    asuint(), asfloat() etc.
  • Integer texture sample values
  • Syntax Texture2D ltuint4gt myTex
  • Access to individual samples in MSAA surface
  • Allows for custom AA resolve
  • Syntax Texture2DMS ltfloat4, 4gt myTex

9
Pixel center
  • Half pixel offset is gone! ?
  • Affects SV_Position as well
  • Now matches OpenGL
  • DX10
    DX9

10
Pixel center
  • Pixels and texels align
  • TexCoord SV_Position.xy / float2(width, height)
  • Texel center Screenspace

11
The small batch problem
  • D3D10 designed to minimize batch overhead
  • Pulls work from draw time to creation time
  • Validation
  • Shader input/output configuration
  • Immutable State Objects
  • Input layout
  • Rasterizer state
  • Sampler state
  • Depth stencil state
  • Blend state

12
The small batch problem
  • D3D10 also provides tools to reduce draw calls
  • Improved instancing interface
  • Geometry shader
  • More shader resources
  • Constant indexing in PS
  • Render target arrays
  • Texture arrays

13
Rendering techniques in D3D10
14
Global Illumination
15
Global Illumination
  • Probes on a volume grid across the scene
  • Each probe captures light environment into a tiny
    cubemap
  • Probes are converted to Spherical Harmonics
    coefficients
  • Indirect lighting is computed using interpolated
    SH coefficients
  • Do the same in probe passes to get multiple light
    bounces

16
Global Illumination
  • Awful lot of work
  • Each probe is 6 slices. We need loads of probes.
  • Sample scene has over 300 probes
  • Solution
  • Use geometry shader to reduce work
  • Distribute work across multiple frames
  • Sample updates 40 cubes per frame
  • Scatter updates to hide artifacts
  • Skip over empty space probes

17
Global Illumination
  • The Geometry Shader advantage
  • 40 cubes x 6 faces x n draw calls Pain
  • DX9 style unrealistic even for simple scenes
  • Update multiple slices per pass with GS
  • GS output limit is 1024 floats
  • Keep number of interpolators down to maximize
    primitive count
  • Managed to update 5 probes (30 slices) per pass
  • 8 passes is more manageable than 240 ...

18
Post tone-mapping resolve
  • D3D10 allows for custom AA resolves
  • Can drastically improve HDR AA quality
  • Standard resolve occurs before tone-mapping
  • Ideally resolve should be done after tone-mapping
  • Standard resolve Custom
    resolve

19
Post-tonemapping resolve
  • Texture2DMSltfloat4, SAMPLESgt tHDR
  • float4 main(float4 pos SV_Position)
    SV_Target
  • int3 coord
  • coord.xy (int2) pos.xy
  • coord.z 0
  • // Tone-map individual samples and sum it
    up
  • float4 sum 0
  • unroll
  • for (int i 0 i lt SAMPLES i)
  • float4 c tHDR.Load(coord, i)
  • sum.rgb 1.0 exp2(-exposure
    c.rgb)
  • // Average
  • sum (1.0 / SAMPLES)

20
Optimizations
21
Geometry shader
  • GS optimizations
  • Input/output usually the bottleneck
  • Reduce outputs with frustum and/or backface
    culling
  • Keep input small by packing data
  • TexCoord could be 2x16 bits in an uint
  • Or use for instance asuint(normal.w)
  • Merge to full float4 vectors
  • Dont do 2x float2
  • Keep output small
  • Could be faster to trade for some work in PS
  • Pass just position, dont interpolate both
    lightVec and viewVec
  • Or even back-project SV_Position.xyz to world
    space in PS
  • Small output means more work fits within 1024
    floats limit

22
GS frustum and backface culling
  • // Transform to clip space
  • float4 pos3
  • pos0 mul(mvp, In0.pos)
  • pos1 mul(mvp, In1.pos)
  • pos2 mul(mvp, In2.pos)
  • // Use frustum culling to improve performance
  • float4 t0 saturate(pos0.xyxy float4(-1,
    -1, 1, 1) - pos0.w)
  • float4 t1 saturate(pos1.xyxy float4(-1,
    -1, 1, 1) - pos1.w)
  • float4 t2 saturate(pos2.xyxy float4(-1,
    -1, 1, 1) - pos2.w)
  • float4 t t0 t1 t2
  • branch
  • if (!any(t))
  • // Use backface culling to improve
    performance
  • float2 d0 pos1.xy pos0.w -
    pos0.xy pos1.w
  • float2 d1 pos2.xy pos0.w -
    pos0.xy pos2.w

23
Miscellaneous optimizations
  • Pre-baked constant buffers
  • Dont update per-material constants in DX9 style
  • PS dont need to return float4 anymore
  • Use float3 if you only care about RGB
  • May reduce instruction count
  • Use GS to reduce draw calls
  • Single pass render-to-cubemap
  • Update multiple render targets per pass

24
The new shader compiler
  • SM4 shader compiler preserves semantics better
  • This means more responsibility for you guys
  • Be careful about your assumptions
  • Periodically check the resulting assembly
  • D3D10DisassembleShader()
  • Use GPUShaderAnalyzer for performance critical
    shaders

25
The new shader compiler
Example
  • HLSL code
  • float4 main(float4 t TEXCOORD0) SV_Target
  • if (t.x gt t.y)
  • return t.xyzw
  • else
  • return t.wzyx

DX9 assembly add r0.x, -v0.x, v0.y cmp
oC0, r0.x, v0.wzyx, v0
DX10 assembly lt r0.x, v0.y, v0.x if_nz
r0.x // lt--- Did you really want a branch here?
mov o0.xyzw, v0.xyzw ret else
mov o0.xyzw, v0.wzyx ret endif
26
The new shader compiler
  • Use branch, flatten, unroll loop to
    control output code
  • This is not for everyone
  • Poor use could reduce performance
  • Make sure you know what youre doing
  • Only use if youre familiar with assembly code
  • Verify that you get the code you expect
  • Always benchmark both options

New DX10 assembly (using flatten) lt
r0.x, v0.y, v0.x movc o0.xyzw, r0.xxxx,
v0.xyzw, v0.wzyx ret
27
Questions? emil.persson_at_amd.com
Write a Comment
User Comments (0)
About PowerShow.com