GPU Computation Strategies

About This Presentation

Title:

GPU Computation Strategies

Description:

DirectX or OpenGL? DirectX Render to Texture. SetRenderTarget() No float targets on NV3x ... Issue points, set point x,y in vertex shader using address texture ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 19

Provided by: steve1631

Category:

more less

Transcript and Presenter's Notes

Title: GPU Computation Strategies

1
GPU Computation Strategies Tricks

Ian Buck Stanford University

2
DirectX or OpenGL?

DirectX
Render to Texture
SetRenderTarget()
No float targets on NV3x
Write once run anywhere
DBMON
Short programs
Only 96 instr required
ps_2_a compiler target allows long programs on
NV3x
Readback is slow!
50 MB/sec

OpenGL
0 to N texture addressing
GL_TEXTURE_RECTANGLE_EXT
Readback is fast
Render to Texture not finalized
Pbuffer rendering can be slow
SuperBuffers
GL_EXT_render_target
Specialized float formats forATI and NV
No ARB standard for creating float Pbuffer
ATI float2 Red and Alpha
NV float2 Red and Green

3
ATI Radeon 9800XT or NVIDIA GeForce 5900 Ultra?
Instruction Timings
4
Floating Point Precision

NVIDIA FP32
s23e8 (largest counting number 16,777,217)
ATI 24-bit float
s16e7 (largest 131,073)
NVIDIA FP16
s10e5 (largest 2,049)

mantissa
exponent
s
sign 1.mantissa 2(exponentbias)
5
Floating Point Precision

Common Mistake
Pack large 1D array in 2D texture
Compute 1D address in shader
Convert 1D address into 2D
FP precision will leave unaddressable texels!

NVIDIA FP32 16,777,217 ATI 24-bit float
131,073 NVIDIA FP16 2,049
6
Multiple Outputs

Hardware supported multiple outputs
Not as fast as you think

ATI 9800XT
7
Multiple Outputs

Software solution
Let cgc or fxc do dead code elimination
can be a good idea if shader is separable

kernel void foo (float3 altgt, float3 bltgt,
, out float3 xltgt, out float3 yltgt)
kernel void foo1(float3 altgt, float3 bltgt,
, out float3 xltgt)
kernel void foo2(float3 altgt, float3 bltgt,
, out float3 yltgt)
8
Scatter Techniques

Problem ai p
indirect write
Cant set the x,y of fragment in pixel shader
Also want to do ai p

9
Scatter Techniques

Solution 1
Sort Search
Shader outputs destination address and data
Bitonic sort based on address
Run binary search shader over destination buffer
Each fragment searches for source data
See Sorting and Searching course notes

10
Scatter Techniques

Solution 2
Render points
Use vertex shader to set destination
or just read back the data and reissue

11
Scatter Techniques

Solution 3
Vertex Textures
Render data and address to texture
Issue points, set point x,y in vertex shader
using address texture
Requires texld instruction in vertex program

12
Conditional Mask

How to efficiently implement if (a) then cb
Kill instruction or LRP c, a, b, c
Executes all conditional code
Using early Z-kill
Set Zbuffer equal to conditional
Z test can prevent shader execution

13
Conditional Mask

Using early Z-kill
Z-kill operates at 4x4 block resolution
Good only if locality in conditional

14
Optimizing Execution

Two methods for GPGPU shader execution

glBegin(GL_QUADS) glVertex2f(left,
bottom) glVertex2f(right, bottom) glVertex2f(rig
ht, top) glVertex2f(left, top) glEnd()
glViewport(0,0,width,height) glBegin(GL_TRIANGLE)
glVertex2f( 0, 0) glVertex2f(width2,
0) glVertex2f( 0, height2) glEnd()
Faster Higher observed bandwidth
15
Performance Issues