GPU Data Formatting and Addressing - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

GPU Data Formatting and Addressing

Description:

Save frame buffer(s) for later use as texture or vertex array ... T. Purcell, C. Donner, M. Cammarano, H. W. Jensen, P. Hanrahan, 'Photon Mapping ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 47

Provided by: aarone9

Category:

more less

Transcript and Presenter's Notes

Title: GPU Data Formatting and Addressing

1
GPU Data Formatting and Addressing

Aaron Lefohn University of California, Davis

2
Overview

GPU Memory Model
GPU-Based Data Structures
Performance Considerations

3
GPU memory model

GPU Data Storage
Vertex data
Texture data
Frame buffer

PS3.0 GPUs
Texture Data
Frame Buffer(s)
Vertex Data
4
GPU memory model

Read-Only
Traditional use of GPU memory
CPU writes, GPU reads
Read/Write
Save frame buffer(s) for later use as texture or
vertex array
Save up to 16, 32-bit floating values per pixel
Multiple Render Targets (MRTs)

5
How to Save Render Result

Copy framebuffer result to other GPU memory
Copy-to-texture
Copy-to-vertex-array
Write directly to other GPU memory''
Render-to-texture
Render-to-vertex-array

6
OpenGL GPU Memory Writes

Texture
Copy frame buffer to texture
Render-to-texture
WGL_ARB_render_texture
GL_EXT_render_target
Superbuffers
Vertex Array
Copy frame buffer to vertex array
GL_EXT_pixel_buffer_object
Superbuffers
Render-to-vertex-array
Superbuffers

7
Render-To-Texture 1

Copy-To-Texture
Good
Cross-Platform texture writes
Flexible output
2D output ? Copy to 1D, 2D, or 3D texture
Bad
Slow
Consumes internal GPU memory bandwidth

8
Render-To-Texture 2

WGL_ARB_render_texture
Render-to-texture (RTT) using pbuffers
http//oss.sgi.com/projects/ogl-sample/registry/A
RB/wgl_render_texture.txt
Good
Fast RTT
Current state of the art for RTT
Bad
Only works on Windows
Slow OpenGL context switches
Many hacks to avoid this bottleneck

9
Render-To-Texture 3

GL_EXT_render_target
Proposed extension for cross-platform RTT
http//www.opengl.org/resources/features/GL_EXT_r
ender_target.txt
Good
Cross-platform, efficient RTT solution
Lightweight, simple extension
Bad
Specification not approved (April 24, 2004)
No implementations exist (April 24, 2004)

10
Render-To-Texture 4

Superbuffers
Proposed new memory model for GPUs
http//www.ati.com/developer/gdc/SuperBuffers.pdf
Good
Unified GPU memory model
Render to any GPU memory
Cross platform (OpenGL owns memory, not OS)
Mix-and-match depth/stencil/color buffers
Bad
Large, complex extension
Specification not approved (April 24, 2004)
Only driver support is alpha version (ATI)

11
Render-To-Texture Summary

OpenGL RTT Currently Only Under Windows
Pbuffers
Complex and awkward RTT mechanism
Current state of the art
Cross-Platform RTT Coming Soon

12
Render-To-Vertex-Array 1

GL_EXT_pixel_buffer_object
Copy framebuffer to vertex buffer object
http//developer.nvidia.com/object/nvidia_opengl_
specs.html
Good
Only GPU/AGP memory bandwidth
Works with current drivers (NVIDIA)
Bad
No direct render-to-vertex-array (slower than
true RTVA)
No ATI implementation

13
Render-To-Vertex-Array 2

Superbuffers
Write to memory object as render target
Read from memory object as vertex array
Good
Direct render-to-vertex-array (fast)
Bad
Can render results always be interpreted as
vertex data?
Large, complex, unapproved extension,

14
Render-To-Vertex-Array Summary

Current OpenGL Support
NVIDIA GL_EXT_pixel_buffer_object
ATI Superbuffers
Semantics Still Under Development

15
Fbuffer Capturing Fragments

Idea
Rasterization-Order FIFO Buffer
Render results are fragment values instead of
pixel values
Mark and Proudfoot, Graphics Hardware 2001
http//graphics.stanford.edu/projects/shading/pubs
/hwws2001-fbuffer/
Uses
Designed for multi-pass rendering with
transparent geometry
New possibilities for GPGPU?
Varying number of results per pixel
RTT and RTVA with an fbuffer?

16
Fbuffer Capturing Fragments

Implementations
ATI Radeon 9800 and newer ATI GPUs
Not yet exposed to user (ask for it!)
Problems
Size of fbuffer is not known before rendering
GPUs cannot perform dynamic memory allocation
How to handle buffer overflow?

17
Overview

GPU Memory Model
GPU-Based Data Structures
Performance Considerations

18
GPU-Based Data Structures

Building Blocks
GPU memory addresses
Address Generation
Address Use
Pointers
Multi-dimensional arrays
Sparse representations

19
GPU Memory Addresses

Where Are Addresses Generated?
CPU Vertex stream or textures
Vertex processor Input stream, ALU ops or
textures
Rasterizer Interpolation
Fragment processor Input stream, ALU ops or
textures

20
GPU Memory Addresses

Where Are Addresses Used?
Vertex textures (PS3.0 GPUs)
Fragment textures

Texture Data
Vertex Processor
21
GPU Memory Addresses

Pointers
Store addresses in texture
Dependent texture read
Example See Tim Purcells ray tracing talk
float2 addr tex2D( addrTex, texCoord )
float2 data tex2D( dataTex, addr )

Address Texture
Data Texture
0
3
Data
1
3
Data
1
2
Data
1
3
Data
22
GPU-Based Data Structures

Building Blocks
GPU memory addresses
Address Generation
Address Use
Pointers
Multi-dimensional arrays
Sparse representations

23
Multi-Dimensional Arrays

Build Data Structures in 2D Memory
Read/Write GPU memory optimized for 2D
Images
But Isnt Physical Memory 1D?
GPU memory hierarchy optimized to capture 2D
locality
Rasterization
Texture filtering
Igehy, Eldridge, Proudfoot, "Prefetching in a
Texture Cache Architecture, Graphics Hardware,
1998
Conclusion Use illusion of 2D physical memory

24
GPU Arrays

Large 1D Arrays
Current GPUs limit 1D array sizes to 2048 or 4096
Pack into 2D memory
1D-to-2D address translation

25
GPU Arrays

3D Arrays
Problem
GPUs do not have 3D frame buffers
No RTT to slice of 3D texture (except
Superbuffers)
Solutions
Stack of 2D slices
Multiple slices per 2D buffer

26
GPU Arrays

Problems With 3D Arrays for GPGPU
Cannot read stack of 2D slices as 3D texture
Must know which slices are needed in advance
Visualization of 3D data difficult
Solutions
Need render-to-slice-of-3D-texture (Superbuffers)
Volume rendering of slice-based 3D data
Course 28, Real-Time Volume Graphics, Siggraph
2004

27
GPU Arrays

Higher Dimensional Arrays
Pack into 2D buffers
N-D to 2D address translation
Same problems as 3D arrays if data does not fit
in a single 2D texture
Conclusions
Fundamental GPU memory primitive is a fixed-size
2D array
GPGPU needs more general memory model

28
GPU-Based Data Structures

Building Blocks
GPU memory addresses
Address Generation
Address Use
Pointers
Multi-dimensional arrays
Sparse representations

29
Sparse Data Structures

Why Sparse Data Structures?
Reduce computational workload
Reduce memory pressure
Examples
Sparse matrices
Krueger et al., Siggraph 2003
Bolz et al., Siggraph 2003
Implicit surface computations (sparse volumes)
Sherbondy et al., IEEE Visualization 2003
Lefohn et al., IEEE Visualization 2003

Premoze et al. Eurographics 2003
30
Sparse Computation

Option 1 Store Complete Data Set on GPU
Cull unused data
Conditional execution tricks (discussed earlier)
Option 2 Store Only Sparse Data on GPU
Saves memory
Potentially much faster than culling
Much more complicated (especially if time-varying)

31
Sparse Data Structures

Basic Idea
Pack active data elements into GPU memory
For more information
Linear algebra section in this course Static
structures
Level-set case study in this course Dynamic
structures

32
Sparse Data Structures

Addressing Sparse Data
Neighborhoods no longer implicitly defined on
grid
Use pointer-based data structures to locate
neighbors
Pre-compute neighbor addresses if possible
Use CPU or vertex processor
Removes pointer dereference from fragment program
Separate common addressing case from boundary
conditions
Common case must be cache coherent
See Harris and Lefohn case studies for
substream technique

33
Overview

GPU Memory Model
GPU-Based Data Structures
Performance Considerations

34
Memory Performance Issues

Pbuffer Survival Guide
Dependent Texture Costs
Computational Frequency

35
Pbuffer Survival Guide

Pbuffers Give us Render-To-Texture
Designed to create an environment map or two
Never intended to be used for GPGPU (100s of
pbuffers)
Problem
Each pbuffer has its own OpenGL render context
Each pbuffer may have depth and/or stencil buffer
Changing OpenGL contexts is slow
Solution
Many optimizations to avoid this bottleneck

36
Pbuffer Survival Guide

Pack Scalar Data Into RGBA
gt 4x memory savings
4x reduction in context switches
Be careful of read-modify-write hazard

1 RGBA Pbuffer
Scalar Data in 4 RGBA Pbuffers
37
Pbuffer Survival Guide

Use Multi-Surface Pbuffers
Each RGBA surface is its own render-texture
Front, Back, AuxN (N 0,1,2,)
Greatly reduces context switches
Technically illegal, but blessed by ATI. Works
on NVIDIA.

1 Pbuffer 5 RGBA Surfaces
5 Pbuffers 1 RGBA Surface Each
38
Pbuffer Survival Guide

Using Multi-Surface Pbuffers
Allocate double buffer pbuffer (and/or with AUX
buffers)
Set render target to back buffer
glDrawBuffer(GL_BACK)
Bind front buffer as texture
wglBindTexImageARB(hpbuffer, WGL_FRONT_ARB)
Render
Switch buffers
wglReleaseTexImageARB(hpbuffer, WGL_FRONT_ARB)
glDrawBuffer(GL_FRONT)
wglBindTexImageARB(hpbuffer, WGL_BACK_ARB)

39
Pbuffer Survival Guide

Pack 2D domains into large buffer
Flat 3D textures
Be careful of read-modify-write hazard

Flattened Volume
3D Volume
40
Dependent Texture Costs

Cache Coherency
Dependent reads fast if they hit cache
Even chained dependencies can be same speed as
non-dependent reads
Very slow if out of cache
Example
3 levels of dependent cache misses can be gt10x
slower
More detail in GPU Computation Strategies and
Tricks

41
Computational Frequency

Compute Memory Addresses at Low Frequency
Compute memory addresses in vertex program
Let rasterizer interpolation create per-fragment
addresses
Compute neighbor addresses this way
Avoid fragment-level address computation whenever
possible
Consumes fragment instructions
Computation often redundant with neighboring
fragments
May defeat texture pre-fetch

42
Conclusions

GPU Memory Model Evolving
Writable GPU memory forms loop-back in an
otherwise feed-forward streaming pipeline
Memory model will continue to evolve as GPUs
become more general stream processors
GPGPU Data Structures
Basic memory primitive is limited-size, 2D
texture
Use address translation to fit all array
dimensions into 2D
Maintain 2D cache locality
Render-To-Texture
Use pbuffers with care and eagerly adopt their
successor

43
Selected References

J. Boltz, I. Farmer, E. Grinspun, P. Schoder,
Spare Matrix Solvers on the GPU Conjugate
Gradients and Multigrid, SIGGRAPH 2003
N. Goodnight, C. Woolley, G. Lewin, D. Luebke, G.
Humphreys, A Multigrid Solver for Boundary Value
Problems Using Programmable Graphics Hardware,
Graphics Hardware 2003
M. Harris, W. Baxter, T. Scheuermann, A. Lastra,
Simulation of Cloud Dynamics on Graphics
Hardware, Graphics Hardware 2003
H. Igehy, M. Eldridge, K. Proudfoot, Prefetching
in a Texture Cache Architecture, Graphics
Hardware 1998
J. Krueger, R. Westermann, Linear Algebra
Operators for GPU Implementation of Numerical
Algorithms, SIGGRAPH 2003
A. Lefohn, J. Kniss, C. Hansen, R. Whitaker, A
Streaming Narrow-Band Algorithm Interactive
Deformation and Visualization of Level Sets,
IEEE Transactions on Visualization and Computer
Graphics 2004

44
Selected References

A. Lefohn, J. Kniss, C. Hansen, R. Whitaker,
Interactive Deformation and Visualization of
Level Set Surfaces Using Graphics Hardware, IEEE
Visualization 2003
W. Mark, K. Proudfoot, The F-Buffer A
Rasterization-Order FIFO Buffer for Multi-Pass
Rendering, Graphics Hardware 2001
T. Purcell, C. Donner, M. Cammarano, H. W.
Jensen, P. Hanrahan, Photon Mapping on
Programmable Graphics Hardware, Graphics
Hardware 2003
A. Sherbondy, M. Houston, S. Napel, Fast Volume
Segmentation With Simultaneous Visualization
Using Programmable Graphics Hardware, IEEE
Visualization 2003

45
OpenGL References

GL_EXT_pixel_buffer_objecthttp//www.nvidia.com/d
ev_content/nvopenglspecs/GL_EXT_pixel_buffer_objec
t.txt
GL_EXT_render_target, http//www.opengl.org/resour
ces/features/GL_EXT_render_target.txt
OpenGL Extension Registryhttp//oss.sgi.com/proje
cts/ogl-sample/registry/
Superbuffershttp//www.ati.com/developer/gdc/Supe
rBuffers.pdf
WGL_ARB_render_texturehttp//oss.sgi.com/projects
/ogl-sample/registry/ARB/wgl_render_texture.txtht
tp//oss.sgi.com/projects/ogl-sample/registry/ARB/
wgl_pbuffer.txt

46
Questions?