GPU Shading and Rendering - PowerPoint PPT Presentation

About This Presentation
Title:

GPU Shading and Rendering

Description:

GPU Shading and Rendering. HLSL / ATI (:50 Scheuermann) 2:35. Cg / NVIDIA (:50 Kilgard) ... DP4 Vp.x, state.matrix.mvp.row[0], vertex.position; ... – PowerPoint PPT presentation

Number of Views:173
Avg rating:3.0/5.0
Slides: 27
Provided by: Ros156
Category:
Tags: gpu | mvp | rendering | shading

less

Transcript and Presenter's Notes

Title: GPU Shading and Rendering


1
(No Transcript)
2
GPU Shading and Rendering
3
GPU Shading and RenderingIntroduction
  • Marc Olano
  • UMBC

4
GPU
  • GPU Graphics Processing Unit
  • Designed for real-time graphics
  • Present in almost every PC
  • Increasing realismand complexity

Americas Army
5
GPU computation
CPU
Displayed Pixels
6
Low-level code
!!ARBvp1.0 Transform the normal to view
space TEMP Nv,Np DP3 Nv.x,state.matrix.modelview.
invtrans.row0,vertex.normal DP3
Nv.y,state.matrix.modelview.invtrans.row1,vertex
.normal DP3 Nv.z,state.matrix.modelview.invtrans.
row2,vertex.normal MAD Np,Nv,.9,.9,.9,0,0,0,
0,1 screen position from vertex TEMP Vp DP4
Vp.x, state.matrix.mvp.row0, vertex.position DP
4 Vp.y, state.matrix.mvp.row1,
vertex.position DP4 Vp.z, state.matrix.mvp.row2
, vertex.position DP4 Vp.w, state.matrix.mvp.row
3, vertex.position interpolate MAD Np,
Np, -vertex.color.x, Np MAD result.position, Vp,
vertex.color.x, Np END
7
High-level code
void main() vec4 Kin gl_Color //
key input // screen position from vertex,
texture and normal vec4 Vp ftransform()
vec4 Tp vec4(gl_MultiTexCoord0.xy1.8-.9,
0,1) vec4 Np vec4(nn.9,1) //
interpolate between Vp, Tp and Np gl_Position
Vp gl_Position mix(Tp,gl_Position,pow(1.-
Kin.x,8.)) gl_Position mix(Np,gl_Position,p
ow(1.-Kin.y,8.)) // copy to output
gl_TexCoord0 gl_MultiTexCoord0
gl_TexCoord1 Vp gl_TexCoord3 Kin
8
Non-real time vs. Real time
  • Not real-time
  • Developed from General CPU code
  • Seconds to hours per frame
  • 1000s of lines
  • Unlimited computation, texture, memory,
  • Real-time
  • Developed from fixed-function hardware
  • Tens of frames per second
  • 1000s of instructions
  • Limited computation, texture, memory,

9
Non-real time vs. Real-time
  • Non-real time
  • Real-time

Application
Application
Displacement
Texture/ Buffer
Vertex
Surface
Light
Volume
Geometry
Atmosphere
Fragment
Imager
Displayed Pixels
Displayed Pixels
10
History (not real-time)
  • Testbed Whitted and Weimer 1981
  • Shade Trees Cook 1984
  • Image Synthesizer Perlin 1985
  • RenderMan Hanrahan and Lawson 1990
  • Multi-pass RenderMan Peercy et al. 2000
  • GPU acceleration Wexler et al. 2005

11
History (real-time)
  • Custom HW Olano and Lastra 1998
  • Multi-pass standard HW Peercy et al. 2000
  • Register combiners NVIDIA 2000
  • Vertex programs Lindholm et al. 2001
  • Compiling to mixed HW Proudfoot et al. 2001
  • Fragment programs
  • Standardized languages
  • Geometry shaders Blythe 2006

12
Choices
  • OS Windows, Mac, Linux
  • API DirectX, OpenGL
  • Language HLSL, GLSL, Cg,
  • Compiler DirectX, OpenGL, Cg, ASHLI
  • Runtime CgFX, ASHLI, OSG ( others), sample code

13
Major Commonalities
  • Vertex Fragment/Pixel
  • C-like, if/while/for
  • Structs arrays
  • Float small vector and matrix
  • Swizzle mask (a.xyz b.xxw)
  • Common math shading functions

14
GPU Parallelism
Pipeline
15
GPU Parallelism
Pipeline
SPMD ParallelFragment Stream
16
GPU Parallelism
SIMD Parallel2x2 Block
SPMD ParallelFragment Stream
17
GPU Parallelism
SIMD Parallel2x2 Block
Pipeline (NVIDIA)
18
GPU Parallelism
Vector ParallelLimited MIMD
Pipeline (NVIDIA)
19
Managing GPU Programming
  • Simplified computational model
  • Bonus consistent as hardware changes
  • All stages SIMD
  • Explicit 4-element SIMD vectors
  • Fixed conversion / remapping between each stage

20
Vertex
  • One element in / one out
  • NO communication
  • Can select fragment address

21
Geometry
  • More next (Blythe talk)
  • One element in / 0 to 100 out
  • Limited by hardware buffer sizes
  • Like vertex
  • NO communication
  • Can select fragment address

22
Fragment
  • Biggest computational resource
  • One element in / 0 1 out
  • Cannot change destination address
  • I am element x,y in an array, what is my value?
  • Effectively no communication
  • Conditionals expensive
  • Better if block coherence

23
Program / Multiple Passes
  • Communication
  • None in one pass
  • Arbitrary read addresses between passes
  • Data layout
  • No persistent per-processor memory
  • No penalty to change

24
Multiple passes
  • GPGPU
  • Non-local effects
  • Shadow maps
  • Texture space
  • Precomputation
  • Fix some degrees of freedom
  • Factor into functions of 1-3D
  • Project input or output into another space

25
GPU Shading and Rendering
26
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com