Emerging Technologies for Games Optimisation 2 - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Emerging Technologies for Games Optimisation 2

Description:

Values output from a vertex shader are linearly interpolated for the pixel shader ... Interpolate values from vertex shader. Remove code from pixel shader: ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 17
Provided by: lauren80
Category:

less

Transcript and Presenter's Notes

Title: Emerging Technologies for Games Optimisation 2


1
Emerging Technologies for GamesOptimisation 2
  • CO3303
  • Week 6

2
Todays Lecture
  • Memory Optimisation
  • Cache Recap
  • Cache Optimisation
  • Shader Optimisation

3
Memory Optimisation
  • Last week we looked primarily at optimising for
    speed
  • Often need to minimise the memory used by an
    application too
  • These objectives frequently conflict, e.g.
  • Use a look-up table of pre-calculated values to
    speed up a calculation uses more memory
  • Compress some data to minimise memory program
    needs to decompress, slowing it down
  • Memory optimisations are usually algorithmic

4
Memory Optimisations
  • Compress data
  • Standard algorithms RLE, LZW (zip) etc.
  • Store a minimum of data
  • Dont store data that can be calculated from
    other data
  • E.g. Store X Y axis of matrix calculate Z
    axis (like targeting)
  • Avoid sparse data structures arrays/lists etc.
    with many empty slots
  • Perhaps use RLE style compression
  • Store data on hard-drive/DVD rather than memory
  • Implement streaming ability to read data from
    external storage whilst running other processes
  • Note that hard-drives etc. have a cache also

5
Cache Recap
  • A cache stores data efficiently that would
    otherwise be expensive to fetch/calculate
  • A memory cache is a small local memory store with
    very fast access
  • Duplicates data held in the main memory, but with
    much faster read/write speeds
  • Anywhere 2-10 times faster
  • There may be several caches for a CPU
  • A small fast cache (L1), larger less fast one
    (L2) etc.
  • The GPU also has memory caches
  • Vertex cache (fast access to vertex data)
  • Texture cache (fast access to pixel data)

6
Cache Use
  • When data is read
  • If in cache, fetched quickly from there - cache
    hit
  • otherwise fetched from slower memory/cache -
    cache miss
  • In any case, any data read is placed in the cache
  • The entire row containing the data is stored
  • Typically a power of 2, e.g. a 128 byte block
  • This means subsequent reads of the same, or
    nearby data will be a cache hit
  • Rewards access to closely clustered data

7
Efficient Cache Use
  • To use cache efficiently
  • Try to read data near data we recently accessed
  • Accessing data sequentially is ideal
  • Random access to data is cache-inefficient
  • Particularly if random access to a large area,
    makes cache redundant major efficiency loss
  • Arrays and vectors can be very cache friendly
  • If we mainly sequentially access them
  • Linked lists can be problematic
  • If the nodes become distributed around memory
  • N.B. Writing to the cache follows a similar
    scheme
  • Although actual writes to main memory can be
    delayed

8
GPU Caches
  • Have seen the vertex cache on modern GPUs
  • We should attempt to order the triangles in our
    geometry to revisit recently used vertices
  • Also have a texture cache
  • Recently used blocks of textures can be accessed
    more quickly
  • Suggests we try to render any geometry with
    similar textures together
  • Avoid using large textures on small polygons
  • Pixels will be widely spaced in the source
    texture random access

9
Cache Optimisation
  • Cache performance affects all applications
  • Failure to consider the cache coherency of your
    app can lead to very poor performance
  • Without anything obvious being wrong
  • Note that cache issues often stop promising
    optimisations from performing effectively
  • Particularly Look-up table of pre-calculated
    values...
  • random access into a large table may be slower
    than actually performing the calculation
  • We shouldnt cache optimise everything, but
    should be aware of the issues

10
Shader Optimisations
  • Shaders almost always need to be optimised
  • Very often the bottleneck on current games
  • Particularly the pixel shader for more elaborate
    effects
  • The HLSL shader compiler is good and catches many
    optimisations
  • But we really need to squeeze out every drop of
    speed if we want to match the competition
  • Shader optimisation is tricky
  • Methods are not widely documented (trade secrets)
  • Nvidia / ATI websites and tools are the best
    source of ideas

11
Basic Shader Optimisations
  • Ensure optimisation is enabled on the shader
    compiler (it is by default)
  • Do some performance analysis
  • Put a timer on screen
  • Use time per frame, not FPS. FPS can be
    misleading
  • NVShaderPerf
  • Expect your pixel shader to need optimisation if
    you use fancy materials / lighting
  • Or the vertex shader if you perform complex
    vertex blending / deformation etc
  • But more usually the pixel shader

12
Shader Optimisations
  • Use optimisations from last week, especially
  • Take constant calculations out of loops
  • In fact avoiding loops is usually better
  • Especially those with a variable count (i.e. 1 to
    n)
  • Early return from functions
  • Break calculations into small steps
  • Use simpler instructions
  • E.g. Preventing diffuse value becoming negative
  • float DiffuseLevel max(0.0f, dot(N, L))
  • The function saturate is faster (clamp to 0-gt1
    range)
  • float DiffuseLevel saturate(dot(N, L))

13
Shader Optimisations
  • Calculate constant values on the CPU
  • Move code out of shaders.
  • E.g. Wiggling texture as used last year
  • float wiggle // Set wiggle from main CPU code
  • output.UV cos(wiggle) // Wiggle texture UVs
  • The term cos(wiggle) is constant for each
    primitive
  • Instead calculate cos in the main code and pass
    in
  • float cosWiggle // cos calculated in CPU code
  • output.UV cosWiggle
  • CPU only executes the cos once, a pixel shader
    will may calculate it millions of times (once for
    each pixel)

14
Shader Optimisations
  • Values output from a vertex shader are linearly
    interpolated for the pixel shader
  • E.g. A pixel halfway between two vertices will
    get values (world position, UVs, normal etc.)
    exactly halfway the vertex values
  • OK for positions / scalars, but vectors suffer
    similar problems to rotational lerp
  • Need to normalise vectors in the pixel shader -
    nlerp

15
Shader Optimisations
  • We can use this knowledge for optimisation
  • Interpolate values from vertex shader
  • Remove code from pixel shader
  • float3 LightVector LightPos - i.WorldPosition
  • Add a parameter from vertex to pixel shader
  • float3 LightVector TEXCOORD2
  • Move code to vertex shader
  • o.LightVector LightPos - i.WorldPosition
  • Normalise in pixel shader if using vectors
  • Will probably need to change setup code too
  • Powerful method to remove excessive calculation
  • But problems with large rotational changes (again)

16
Shader Optimisations
  • Use textures to store calculations
  • Convert
  • spec saturate(dot(N,L))
  • Kspow((dot(N,L)gt0) ? saturate(dot(N,H)) 0),n)
  • Into
  • spec tex2D(dot(N,L), dot(N,H))
  • And load a texture that encodes the first
    calculation
  • Tricky to prepare
  • Very powerful, especially for complex shaders
Write a Comment
User Comments (0)
About PowerShow.com