GPU presentation | free to download

About This Presentation

Transcript and Presenter's Notes

Title: GPU

1
GPU

Precision, Power, Programmability
CPU x60/decade, 6 GFLOPS, 6GB/sec
GPU x1000/decade, 20 GFLOPs, 25GB/sec
Arithmetic heavy (read OR write) faster hardware
Parallelization
Multi-billion entertainment market drives
innovation
32-bit Floating point
Programmable (graphics, physics, general purpose
data-flow)
Cant simply port CPU code to GPU
David Luebke et al. GPGPU, SIGGRAPH 2004

2
History of the 3D graphics industry

60s
Line drawings, hidden lines, parametric surfaces
(B-splines)
Automated drafting machining for car,
airplane, and ships manufacturers
70s
Mainframes, Vector tubes (HP)
Software Solids, (CSG), Ray Tracing, Z-buffer
for hidden lines
80s
Graphics workstations (50K-1M) Frame buffers,
rasterizers , GL, Phigs
VR CAVEs and head-mounted displays
CAD/CAM GIS CATIA, SDRC, PTC
Sun, HP, IBM, SGI, ES, DEC
90s
PCs (2K) Graphics boards, OpenGL, Java3D
CADVideogamesAnimations AutoCAD, SolidWorks,
Alias-Wavefront
Intel, many board vendors
00s
Laptops, PDAs, Cell Phones Parallel graphic
chips
Everything will be graphics, 3D, animated,
interactive
Nvidia, Sony, Nokia

3
History of GPU

Pre-GPU Graphics Acceleration
SGI, Evans Sutherland. Introduced concepts like
vertex transformation and texture mapping. Very
expensive!
First-Generation GPU (-1998)
Nvidia TNT2, ATI Rage, Voodoo3. Vertex
transformation on CPU, limited set of math
operations.
Second-Generation GPU (1999-2000)
GeForce 256, Geforce2, Radeon 7500, Savage3D.
Transformation Lighting. More configurable,
still not programmable.
Third-Generation GPU (2001)
Geforce3, Geforce4 Ti, Xbox, Radeon 8500. Vertex
Programmability, pixel-level configurability.
Fourth-Generation GPU (2002-)
Geforce FX series, Radeon 9700 and on.
Vertex-level and pixel-level programmability.

4
Architecture
Application
Vertex Shader
transformed vertices, normals, colors
Geometry Shader
Rasterizer
fragments (surfels per pixel)
texture
Fragment Shader
pixel color, depth, stencil
Compositor
Display
5
Buffers

Color 8-bit index to color table, float/16-bit
true color
Depth 24-bit or float (0 at back plane)
Back and front display front, update back, swap
Stereo Shutter glasses, HMD. Alternate frames
Auxiliary off-screen working space. Helps reduce
passes.
Stencil 8 bits (left-over of depth buffer). lt,gt
mask,
Accumulation sum, scale (supersampling, blur)
P-buffer, superbuffers Render to texture

6
Fragment operations

Depth tests lt, lt, gt, lt, , Z?depth-interval
Stencil test mask?, counter, parity.
Alpha tests compare to reference alpha
Alpha blending max, min, replace, blend

7
Data Parallelism in GPUs

Data flow vertices gt fragments gt pixels
Parallelism at each stage
No shared or static data (except textures)
ALU-heavy (multiple ALUs per stage in pipe)
Fight memory latency with more computation

8
GPGPU

Stream collection of records (pixels, vertices)
Stored in Textures (a computational grid)
Kernel Function applied to each element in
stream
Transform, evolve (no dependency between records)
Matrix algebra
Image/volume processing
Physical simulation
Global illumination
Ray tracing
Photon mapping
Radiosity

9
Computational Resources

Programmable parallel processors
Vertex Fragment pipelines
Rasterizer
Mostly useful for interpolating addresses
(texture coordinates) and per-vertex constants
Texture unit
Read-only memory interface
Render to texture (or Copy to texture)
Write-only memory interface

10
Vertex Processor

Fully programmable (SIMD / MIMD)
Processes 4-vectors (RGBA / XYZW)
Capable of scatter but not gather (Ai,jx)
Can change the location of current vertex
Cannot read info from other vertices
Can only read a small constant memory
Vertex Texture Fetch
Random access memory for vertices
Arguably still not gather

11
Fragment Processor

May be invoked at each pixel by drawing a full
screen quad
Fully programmable (SIMD)
Processes 4-vectors (RGBA / XYZW)
Random access memory read (textures)
Capable of gather (xAi1,j) and some scatter
RAM read (texture), but no RAM write
Output address fixed to a specific pixel
But can change that address
Typically more useful than vertex processor
More fragment pipelines than vertex pipelines
Gather
Direct output (fragment processor is at end of
pipeline)

12
Branching

Not supported or expensive
Avoid, replace by math
Depth test
Stencil test
Occlusion query (conditional execution)
Pre-computation (region of interest, use to set
stencil mask)

Write a Comment

User Comments (0)

About PowerShow.com

GPU PowerPoint PPT Presentation