Practical Guide to Dot3 and Vertex Shaders in DirectX8 - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Practical Guide to Dot3 and Vertex Shaders in DirectX8

Description:

Silicon Dreams Studio was formed 7 years ago as part of US Gold, it is now part ... Uefa Dream Soccer. Lego Island 2. Dogs of War. Introduction ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 37
Provided by: davi677
Category:

less

Transcript and Presenter's Notes

Title: Practical Guide to Dot3 and Vertex Shaders in DirectX8


1
Practical Guide to Dot3 and Vertex Shaders in
DirectX8
  • By Leigh Davies
  • Silicon Dreams Studio Ltd.
  • Banbury.
  • Oxon.
  • email- Daviesl_at_SDreams.co.uk

2
Who are we / what have we done?
  • Silicon Dreams Studio was formed 7 years ago as
    part of US Gold, it is now part of the Kaboom
    group of companies headed by Geoff Brown.
  • Recent Products.
  • Uefa Champions League 2000-2001.
  • Uefa Dream Soccer.
  • Lego Island 2.
  • Dogs of War.

3
Introduction
  • This talk gives a description of how we designed
    our current 3D engine to make use of both Dot3
    Lighting and vertex shaders. Along with some of
    the practical lessons we learned.
  • It highlights-
  • How we set up the DirectX transform pipeline.
  • How we implemented skinned models.
  • Vertex shader samples.
  • The integration of Dot3 Lighting into the engine.
  • Performance Considerations.
  • How we dynamically scale the 3D engine.
  • Demo..

4
Building Blocks
  • The Initial design of the engine was started 14
    months ago during the early stages of the
    DirectX8 beta program.
  • Several new features of DirectX8 made us decide
    to redesign our 3D engine from the ground up,
    these were-
  • The introduction of vertex shaders.
  • The speed of software emulation of the vertex
    shader pipeline.
  • Introduction of multiple streams.
  • The D3DX effects framework.
  • Simplified caps checking and setup.

5
3D Engine Requirements
  • The 3D engine had to support the following-
  • Make best use of available hardware-
  • Hardware vertex processing of vertex shaders.
  • Fallback to mixed mode vertex processing.
  • Easy implementation of software fallback code.
  • Dynamic updating of some models.
  • Dynamic scaling of render complexity.

6
Setting up the pipeline
  • The setting of the DirectX 8 transformation
    pipeline is done using IDirect3DDevice8SetVertex
    Shader with either-
  • 1) Legacy D3D_FVF flags.
  • 2) Or a handle returned from CreateVertexShader.
  • Advantages of CreateVertexShader-
  • Access to new feature set.
  • Flexibility.
  • Allow the use of multiple streams.
  • Dynamic Models.
  • Content scaling.
  • Dynamic creation of vertex formats.
  • Fixed Function pipeline accelerated on Geforce
    and Radeon.

7
Setting up the pipeline Cont..
  • Disadvantages of CreateVertexShader-
  • Need to manage vertex shader resources-
  • Creation of shaders.
  • Validation of shaders.
  • Reference counting of shaders.
  • Increased Flexibility leads to-
  • More complex model loading.
  • More complex state management.
  • Final Decision
  • We used CreateVertexShader throughout the
    3Dengine as its increased flexibility and the
    scalability of the new feature set outweighed the
    increased complexity of its implementation.

8
Adding Skinning To The Engine
  • Traditional Skinning on CPU-
  • Vertices are stored relative to the bone that
    influences them. If the vertex is influenced by
    more than 1 bone it is stored once per bone along
    with its weighting.
  • The animation hierarchy describes the orientation
    on the bones in world space.
  • The CPU is then used to transform points by the
    animation hierarchy
  • The CPU then combines multiple weighted vertices
    back into a single vertex in world space.
  • The model which is now in world space is
    submitted to DirectX as a standard vertex as part
    of a dynamic vertex buffer.

9
Skinning with DirectX8
  • Traditional system is hard to accelerate
  • This system is hard to accelerate as each vertex
    in the models mesh is made up of several vertices
    each stored in its own coordinate space.
  • For DirectX all the vertices need to
  • be in a single coordinate system.
  • Model is exported in a default pose.
  • Export animation matrix hierarchy for this
  • pose.

10
Animating the default pose
  • We still need an animation stack that describes
    how the bones are orientated in world space, so
    that objects can easily interact with the model.
    To do this we-
  • Take the Animation matrix hierarchy for the
    default pose (sometimes called the bind pose),
    and create its inverse.
  • Create the animation stack as normal.
  • Pre-multiply the animation stack by the inverse
    of the bind pose.
  • D3DWorldMatrix (Inverse bind pose for bone)
    (animated bone matrix).

11
Vertex Shader Setup
  • Typical stream declaration for a vertex thats
    influenced by multiple matrices.
  • D3DVSD_STREAM(0)
  • D3DVSD_REG(D3DVSDE_POSITION,D3DVSDT_FLOAT3)
  • D3DVSD_REG(D3DVSDE_BLENDWEIGHT,D3DVSDT_FLOAT4)
  • D3DVSD_REG(D3DVSDE_BLENDINDICES,D3DVSDT_UBYTE4)
  • D3DVSD_REG(D3DVSDE_NORMAL,D3DVSDT_FLOAT3)
  • D3DVSD_REG(D3DVSDE_TEXCOORD0,D3DVSDT_FLOAT2)
  • D3DVSD_END()
  • Fixed Function Requirements-
  • Elements must come in a fixed order.
  • Elements must come in a fixed type.

12
Comparison
  • Advantages of DX
  • Saves data space, no duplicate vertices.
  • Fits into DX8 fixed function pipeline.
  • Easier to modify model on a vertex level.
  • Disadvantages
  • The number of weights per vertex is a draw
    primitive level setting-
  • if 1 vertex in a draw primitive call needs 4
    weights all vertices in that call also get
    transformed 4 times.
  • Very easy to waste large amounts of CPU/GPU
    cycles due to bad art work.
  • Give the artist some kind of feedback, max does
    strange things at times with weights.

13
Skinning pipeline
  • There are 2 ways to control the transformation
    pipeline.
  • Fixed Function pipeline, controlled by
  • D3DRS_VERTEXBLEND
  • D3DRS_INDEXEDVERTEXBLENDENABLE
  • Custom Vertex Shader
  • You will need different shaders for 1,2,3 and 4
    weight transformation calls.
  • You are responsible for the lighting
    calculations.
  • Use Nvlink or create your own.

14
Vertex Shader Code.
  • Vertex being transformed by 2 weights
  • vs.1.1 // Needed for index register
  • mul r0, v2, c94.x // scale Index by 4 or
    (4256)
  • mov a0.x, r0.x // move Index into Indexed
    Register
  • m4x4 r1, v0, ca0.x // Multiply by Matrix 1
  • mul r2, r1, v1.x // Multiply result by weight
  • mov a0.x, r0.y // Get next index
  • m4x4 r1, v0, ca0.x // Multiply by matrix 2
  • mad oPos, r1, v1.y, r2 // multiply by weight 2
    and add to previous weight
  • Some Video cards dont support UBYTE4, instead
    you store Indices as a D3DCOLOR but these require
    scaling by 255.

15
How many Bones at a time?
  • Vertex shader has 96 Constant registers.
  • You need-
  • Approx 10 for lighting / materials / maths
    constants
  • Either
  • 4 per World/Camera/Perspective matrix for
    position.
  • 3 per World matrix for normal.
  • you can have 12 bones (412)(312) 84
  • Or
  • 3 per World matrix for normal and position.
  • 4 per Camera/Perspective matrix for position.
  • you can have 263 4 82

16
How long does it take?
  • This will vary based on the hardware but as a
    rough guide on current hardware.
  • 200 MHz GPU does 1 instruction per cycle.
  • for skinned 2 weight model -
  • Transform Position to world 2.
  • Transform Normal to world 2.
  • Transform to clip projection space.
  • Lighting.
  • Copy Texture Coordinates.
  • Total instruction count 37
  • 200,000,000 /37 5.4 Million vertices per
    second.

17
Can you outperform Direct3D
  • As a test we created a vertex shader using the
    following vertex type.
  • D3DVSD_STREAM(0)
  • D3DVSD_REG(D3DVSDE_POSITION,D3DVSDT_FLOAT3)
  • D3DVSD_REG(D3DVSDE_NORMAL,D3DVSDT_FLOAT3)
  • D3DVSD_STREAM(1)
  • D3DVSD_REG(D3DVSDE_TEXCOORD0,D3DVSDT_FLOAT2)
  • D3DVSD_END()
  • We then
  • Transformed data into world space.
  • Locked dynamic buffer used by stream 0 with
    LOCK_DISCARD.
  • Uploaded data.
  • Draw skinned model.

18
Performance Cont.
  • Results on non TL hardware
  • We could out perform an emulated vertex shader on
    a PII 400 by up to 100, with only slight
    optimization.
  • DirectX outperformed us by 10 on PIII 500 and
    above even with hand coded SIMD pipeline.
  • Both the Intel and AMD Processor Specific
    Graphics Pipelines have been very well optimized.
  • Results on DirectX 7 TL hardware
  • We could out perform an emulated vertex shader on
    a PII 400 by up to 250.
  • We could out perform an emulated vertex shader on
    a PIII 500 by approximately 100. This decreased
    as CPU clock speed increased.
  • Still room for writing your own code.

19
Yet More Triangles...
  • During the last 4 years weve seen the following
    as hardware has improved-
  • Polygon counts in our typical models have risen
    from 300 to over 3000.
  • A typical scene has risen from 1,000 to 15,000.
  • Improvements in texture detail.
  • More complex animations.
  • Screen resolutions increase.
  • Problems
  • Diminishing visual returns on added polygons
  • Models still look flat.
  • Approaching limits of PC memory architecture.

20
Adding Bump Mapping
  • Solution, add per-pixel lighting.
  • 3 Common Bump mapping methods-
  • Emboss -
  • All hardware
  • EMBM -
  • Matrox
  • Dot3 -
  • Permidia 3
  • Geforce family
  • Radeon
  • Kryo
  • and many up and coming cards.

21
Dot 3 Lighting
  • Advantages-
  • Dot 3 Lighting gives an accurate lighting model,
    and is performed on a per pixel level.
  • Good support on all new hardware.
  • Scales well with pixel-shaders.
  • Geometry Independent.
  • Disadvantages-
  • It can be an expensive operation, not all per
    pixel calculations take the same time. Making
    full use of mip-mapping can have a significant
    effect.

22
Implementing Dot3
  • Dot3 is performed on a per pixel level between-
  • A normal stored using the RGB components in a
    texture.
  • A light vector stored in either TFACTOR or as
    part of the vertex such as the DIFFUSE component.
  • Texture Normal can be in 1 of 2 coordinate
    systems.
  • Object Space.
  • Unsuitable for animating models.
  • Texture Space.

23
Texture Space
  • The normals stored in the texture represent the
    variation in the textured surface from the
    underlying geometry.
  • The Y axis is parallel to the surface normal.
  • X and Z axis are perpendicular to the Y Axis.
  • A flat surface would be represented by a texture
    of a uniform green (127,255,127) colour, assuming
    that the Y axis had been encoded as the green
    channel.

24
Texture Space Cont..
  • If the polygon represented part of a curved
    surface, it would have a graded texture, from
    green(0,255,127) to yellow (255,255,127).

25
Lighting the surface.
  • In order to light this texture we need to-
  • Rotate the light relative to the polygon surface.
  • Store the result as an input to the pixel
    pipeline.
  • To do this we
  • Create a 33 matrix at each vertex that describes
    the rotation from world space to texture space.
  • Store the resultant light vector as either the
    diffuse component of the vertex or if you are
    using a pixel shader to do the dot3 calculation
    as a second set of texture coordinates.
  • Both these values are stored on a per vertex
    basis and are interpolated across the polygon
    face.

26
Generating Texture Space
  • For each D3D vertex in your model make a list of
    which triangles reference it.
  • For each triangle use the xyz and s,t texture
    coordinates of the bump map to create a plane
    equation
  • Ax Bs Ct D 0
  • Ay Bs Ct D 0
  • Az Bs Ct D 0
  • Solve for ds/dx, ds/dy and ds/dz, this is
    referred to as the tangent vector.

27
Normal and binormal
  • Repeat with dt/dx, dt/dy and dt/dz, this is
    referred to as the binormal.
  • Average all the tangents and binormals that are
    shared by the unique D3DVertex.
  • The normal is the cross-product of the tangent
    and the binormal.
  • Compare with the lighting normal .
  • If the texture normal and the lighting normal
    point in opposite directions, the texture has
    been applied backwards and you will need to
    negate the generated normal.

28
Bump map generation
  • Start with grey scale height map.
  • White is high.
  • Black is low.
  • Convert to Normal map by
  • For each pixel
  • Convert to vector(u,height,v).
  • Create triangle with adjacent pixels.
  • Calculate normal of Triangle.
  • Convert back to RGB.
  • Problems the Artists had
  • Adding fine detail.
  • Scaling textures.

29
Things to look out for
  • Lighting Artifacts
  • Texture Seams.
  • Stretched textures.
  • Art guidelines
  • Reduce texture seams, try and skin with a single
    texture.
  • Hide the seams on the underside of geometry, or
    behind additional geometry.

30
Adding to a vertex Shader
  • For a skinned animated model-
  • The Axis of the tangent space matrices must be
    rotated just the same way the normal vector would
    be.
  • If you are using a vertex shader-
  • The matrix can be stored as part of your input
    stream
  • The light vector rotation can be done on the GPU.
  • The vertex shader quickly grows in size.
  • Vertex Throughput
  • 2 weights per vertex 64 Instructions.
  • 200,000,000/64 3.12 Million Verts per Second.
  • 4 weights per vertex 102 Instructions.
  • 200,000,000/102 1.96 Million Verts per Second.

31
Sample Code
  • Rotate the tangent, normal and binormal
  • mul r0, v2, c94.w // scale Index by 4 or
    (4256).
  • mov a0.x, r0.x // move Index into Index
    Register.
  • m3x3 r1, v5, ca0.x48 // multiply tanget vector
    by world matrix.
  • mul r9.xyz, r1, v1.x // multiply by weight 1.
  • m3x3 r1, v6, ca0.x48 // multiply normal vector
    by world matrix.
  • mul r10.xyz, r1, v1.x // multiply by weight 1.
  • m3x3 r1, v7, ca0.x48 // multiply binormal
    vector by world matrix.
  • mul r11.xyz, r1, v1.x // multiply by weight 1.
  • .
  • .
  • Use this matrix to rotate the light
  • dp3 r8.x, r9, c88 // rotate light vector by
    tangent.
  • dp3 r8.y, r10, c88 // rotate light vector by
    normal.
  • dp3 r8.z, r11, c88 // rotate light vector by
    binormal.
  • add r8, r8, c94.yyyy // scale into range 0-2.
    Adding a value of 1.0f.
  • mul oD0, r8, c94.zzzz // scale into range 0-1.
    Multiplying by 0.5f.

32
Optimizations
  • There are several ways to optimise skinned
    models-
  • Use a vertex shader that rotates the minimum
    number of weights needed for a set of vertices
    submitted to draw primitive, if there is a large
    variation across the vertices then split the
    model into smaller draw primitive calls.
  • Rotate just the tangent and binormal/normal
    vectors in the vertex shader, and generate the
    third using a cross product inside the shader.
  • Scale complexity of vertex shader based on
    distance from viewer.
  • Modify lighting techniques (Dot3 Specular/
    Dot3/Gouraud)
  • Reduce Vertex shader complexity(number of
    weights)
  • Reduce number of lights affecting a point
  • Disable per vertex special effects-
  • Morphing.

33
Using Stream to scale content
  • We only want one set of model data.
  • We dont want to waste Bandwidth.
  • Solution
  • Use multiple streams, and change shaders.
  • D3DVSD_STREAM(0)
  • D3DVSD_REG(D3DVSDE_POSITION,D3DVSDT_FLOAT3)
  • D3DVSD_REG(D3DVSDE_BLENDWEIGHT,D3DVSDT_FLOAT4
    )
  • D3DVSD_REG(D3DVSDE_BLENDINDICES,D3DVSDT_UBYTE
    4) 52 Bytes
  • D3DVSD_REG(D3DVSDE_NORMAL,D3DVSDT_FLOAT3)
  • D3DVSD_REG(D3DVSDE_TEXCOORD0,D3DVSDT_FLOAT2)
  • D3DVSD_STREAM(1)
  • D3DVSD_REG(D3DVSDE_TANGENT,D3DVSDT_FLOAT3) 2
    4 Byes
  • D3DVSD_REG(D3DVSDE_BINORMAL,D3DVSDT_FLOAT3)
  • D3DVSD_END()

34
Bandwidth
  • 2AGP Bus has a data rate of 512MB a second.
  • Dot3 Vertex 512MB/76 6.5 Million Vertices a
    second.
  • Normal Vertex 512MB/52 9.5 Million Vertices a
    second.
  • 4AGP Bus has a data rate of 1GB a second.
  • Dot3 Vertex 1024MB/76 13 Million Vertices a
    second.
  • Normal Vertex 1024MB/52 19 Million Vertices a
    second.
  • Bandwidth reduced by 30 between vertex types.
  • Bandwidth is also used by-
  • RenderState Changes.
  • Managed Textures.
  • Changing Vertex Shaders.

35
References
  • IHV Developer WebSites
  • Nvidia
  • ATI
  • Matrox
  • Previous Conference talks
  • DirectX SDK
  • DirectX header files

36
Questions...
  • ?
  • Leigh Davies
  • Daviesl_at_SDream.co.uk
Write a Comment
User Comments (0)
About PowerShow.com