GPU Data Formatting and Addressing - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

GPU Data Formatting and Addressing

Description:

Save frame buffer(s) for later use as texture or vertex array ... T. Purcell, C. Donner, M. Cammarano, H. W. Jensen, P. Hanrahan, 'Photon Mapping ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 47
Provided by: aarone9
Category:

less

Transcript and Presenter's Notes

Title: GPU Data Formatting and Addressing


1
GPU Data Formatting and Addressing
  • Aaron Lefohn University of California, Davis

2
Overview
  • GPU Memory Model
  • GPU-Based Data Structures
  • Performance Considerations

3
GPU memory model
  • GPU Data Storage
  • Vertex data
  • Texture data
  • Frame buffer

PS3.0 GPUs
Texture Data
Frame Buffer(s)
Vertex Data
4
GPU memory model
  • Read-Only
  • Traditional use of GPU memory
  • CPU writes, GPU reads
  • Read/Write
  • Save frame buffer(s) for later use as texture or
    vertex array
  • Save up to 16, 32-bit floating values per pixel
  • Multiple Render Targets (MRTs)

5
How to Save Render Result
  • Copy framebuffer result to other GPU memory
  • Copy-to-texture
  • Copy-to-vertex-array
  • Write directly to other GPU memory''
  • Render-to-texture
  • Render-to-vertex-array

6
OpenGL GPU Memory Writes
  • Texture
  • Copy frame buffer to texture
  • Render-to-texture
  • WGL_ARB_render_texture
  • GL_EXT_render_target
  • Superbuffers
  • Vertex Array
  • Copy frame buffer to vertex array
  • GL_EXT_pixel_buffer_object
  • Superbuffers
  • Render-to-vertex-array
  • Superbuffers

7
Render-To-Texture 1
  • Copy-To-Texture
  • Good
  • Cross-Platform texture writes
  • Flexible output
  • 2D output ? Copy to 1D, 2D, or 3D texture
  • Bad
  • Slow
  • Consumes internal GPU memory bandwidth

8
Render-To-Texture 2
  • WGL_ARB_render_texture
  • Render-to-texture (RTT) using pbuffers
  • http//oss.sgi.com/projects/ogl-sample/registry/A
    RB/wgl_render_texture.txt
  • Good
  • Fast RTT
  • Current state of the art for RTT
  • Bad
  • Only works on Windows
  • Slow OpenGL context switches
  • Many hacks to avoid this bottleneck

9
Render-To-Texture 3
  • GL_EXT_render_target
  • Proposed extension for cross-platform RTT
  • http//www.opengl.org/resources/features/GL_EXT_r
    ender_target.txt
  • Good
  • Cross-platform, efficient RTT solution
  • Lightweight, simple extension
  • Bad
  • Specification not approved (April 24, 2004)
  • No implementations exist (April 24, 2004)

10
Render-To-Texture 4
  • Superbuffers
  • Proposed new memory model for GPUs
  • http//www.ati.com/developer/gdc/SuperBuffers.pdf
  • Good
  • Unified GPU memory model
  • Render to any GPU memory
  • Cross platform (OpenGL owns memory, not OS)
  • Mix-and-match depth/stencil/color buffers
  • Bad
  • Large, complex extension
  • Specification not approved (April 24, 2004)
  • Only driver support is alpha version (ATI)

11
Render-To-Texture Summary
  • OpenGL RTT Currently Only Under Windows
  • Pbuffers
  • Complex and awkward RTT mechanism
  • Current state of the art
  • Cross-Platform RTT Coming Soon

12
Render-To-Vertex-Array 1
  • GL_EXT_pixel_buffer_object
  • Copy framebuffer to vertex buffer object
  • http//developer.nvidia.com/object/nvidia_opengl_
    specs.html
  • Good
  • Only GPU/AGP memory bandwidth
  • Works with current drivers (NVIDIA)
  • Bad
  • No direct render-to-vertex-array (slower than
    true RTVA)
  • No ATI implementation

13
Render-To-Vertex-Array 2
  • Superbuffers
  • Write to memory object as render target
  • Read from memory object as vertex array
  • Good
  • Direct render-to-vertex-array (fast)
  • Bad
  • Can render results always be interpreted as
    vertex data?
  • Large, complex, unapproved extension,

14
Render-To-Vertex-Array Summary
  • Current OpenGL Support
  • NVIDIA GL_EXT_pixel_buffer_object
  • ATI Superbuffers
  • Semantics Still Under Development

15
Fbuffer Capturing Fragments
  • Idea
  • Rasterization-Order FIFO Buffer
  • Render results are fragment values instead of
    pixel values
  • Mark and Proudfoot, Graphics Hardware 2001
  • http//graphics.stanford.edu/projects/shading/pubs
    /hwws2001-fbuffer/
  • Uses
  • Designed for multi-pass rendering with
    transparent geometry
  • New possibilities for GPGPU?
  • Varying number of results per pixel
  • RTT and RTVA with an fbuffer?

16
Fbuffer Capturing Fragments
  • Implementations
  • ATI Radeon 9800 and newer ATI GPUs
  • Not yet exposed to user (ask for it!)
  • Problems
  • Size of fbuffer is not known before rendering
  • GPUs cannot perform dynamic memory allocation
  • How to handle buffer overflow?

17
Overview
  • GPU Memory Model
  • GPU-Based Data Structures
  • Performance Considerations

18
GPU-Based Data Structures
  • Building Blocks
  • GPU memory addresses
  • Address Generation
  • Address Use
  • Pointers
  • Multi-dimensional arrays
  • Sparse representations

19
GPU Memory Addresses
  • Where Are Addresses Generated?
  • CPU Vertex stream or textures
  • Vertex processor Input stream, ALU ops or
    textures
  • Rasterizer Interpolation
  • Fragment processor Input stream, ALU ops or
    textures

20
GPU Memory Addresses
  • Where Are Addresses Used?
  • Vertex textures (PS3.0 GPUs)
  • Fragment textures

Texture Data
Vertex Processor
21
GPU Memory Addresses
  • Pointers
  • Store addresses in texture
  • Dependent texture read
  • Example See Tim Purcells ray tracing talk
  • float2 addr tex2D( addrTex, texCoord )
  • float2 data tex2D( dataTex, addr )

Address Texture
Data Texture
0
3
Data
1
3
Data
1
2
Data
1
3
Data
22
GPU-Based Data Structures
  • Building Blocks
  • GPU memory addresses
  • Address Generation
  • Address Use
  • Pointers
  • Multi-dimensional arrays
  • Sparse representations

23
Multi-Dimensional Arrays
  • Build Data Structures in 2D Memory
  • Read/Write GPU memory optimized for 2D
  • Images
  • But Isnt Physical Memory 1D?
  • GPU memory hierarchy optimized to capture 2D
    locality
  • Rasterization
  • Texture filtering
  • Igehy, Eldridge, Proudfoot, "Prefetching in a
    Texture Cache Architecture, Graphics Hardware,
    1998
  • Conclusion Use illusion of 2D physical memory

24
GPU Arrays
  • Large 1D Arrays
  • Current GPUs limit 1D array sizes to 2048 or 4096
  • Pack into 2D memory
  • 1D-to-2D address translation

25
GPU Arrays
  • 3D Arrays
  • Problem
  • GPUs do not have 3D frame buffers
  • No RTT to slice of 3D texture (except
    Superbuffers)
  • Solutions
  • Stack of 2D slices
  • Multiple slices per 2D buffer

26
GPU Arrays
  • Problems With 3D Arrays for GPGPU
  • Cannot read stack of 2D slices as 3D texture
  • Must know which slices are needed in advance
  • Visualization of 3D data difficult
  • Solutions
  • Need render-to-slice-of-3D-texture (Superbuffers)
  • Volume rendering of slice-based 3D data
  • Course 28, Real-Time Volume Graphics, Siggraph
    2004

27
GPU Arrays
  • Higher Dimensional Arrays
  • Pack into 2D buffers
  • N-D to 2D address translation
  • Same problems as 3D arrays if data does not fit
    in a single 2D texture
  • Conclusions
  • Fundamental GPU memory primitive is a fixed-size
    2D array
  • GPGPU needs more general memory model

28
GPU-Based Data Structures
  • Building Blocks
  • GPU memory addresses
  • Address Generation
  • Address Use
  • Pointers
  • Multi-dimensional arrays
  • Sparse representations

29
Sparse Data Structures
  • Why Sparse Data Structures?
  • Reduce computational workload
  • Reduce memory pressure
  • Examples
  • Sparse matrices
  • Krueger et al., Siggraph 2003
  • Bolz et al., Siggraph 2003
  • Implicit surface computations (sparse volumes)
  • Sherbondy et al., IEEE Visualization 2003
  • Lefohn et al., IEEE Visualization 2003

Premoze et al. Eurographics 2003
30
Sparse Computation
  • Option 1 Store Complete Data Set on GPU
  • Cull unused data
  • Conditional execution tricks (discussed earlier)
  • Option 2 Store Only Sparse Data on GPU
  • Saves memory
  • Potentially much faster than culling
  • Much more complicated (especially if time-varying)

31
Sparse Data Structures
  • Basic Idea
  • Pack active data elements into GPU memory
  • For more information
  • Linear algebra section in this course Static
    structures
  • Level-set case study in this course Dynamic
    structures

32
Sparse Data Structures
  • Addressing Sparse Data
  • Neighborhoods no longer implicitly defined on
    grid
  • Use pointer-based data structures to locate
    neighbors
  • Pre-compute neighbor addresses if possible
  • Use CPU or vertex processor
  • Removes pointer dereference from fragment program
  • Separate common addressing case from boundary
    conditions
  • Common case must be cache coherent
  • See Harris and Lefohn case studies for
    substream technique

33
Overview
  • GPU Memory Model
  • GPU-Based Data Structures
  • Performance Considerations

34
Memory Performance Issues
  • Pbuffer Survival Guide
  • Dependent Texture Costs
  • Computational Frequency

35
Pbuffer Survival Guide
  • Pbuffers Give us Render-To-Texture
  • Designed to create an environment map or two
  • Never intended to be used for GPGPU (100s of
    pbuffers)
  • Problem
  • Each pbuffer has its own OpenGL render context
  • Each pbuffer may have depth and/or stencil buffer
  • Changing OpenGL contexts is slow
  • Solution
  • Many optimizations to avoid this bottleneck

36
Pbuffer Survival Guide
  • Pack Scalar Data Into RGBA
  • gt 4x memory savings
  • 4x reduction in context switches
  • Be careful of read-modify-write hazard

1 RGBA Pbuffer
Scalar Data in 4 RGBA Pbuffers
37
Pbuffer Survival Guide
  • Use Multi-Surface Pbuffers
  • Each RGBA surface is its own render-texture
  • Front, Back, AuxN (N 0,1,2,)
  • Greatly reduces context switches
  • Technically illegal, but blessed by ATI. Works
    on NVIDIA.

1 Pbuffer 5 RGBA Surfaces
5 Pbuffers 1 RGBA Surface Each
38
Pbuffer Survival Guide
  • Using Multi-Surface Pbuffers
  • Allocate double buffer pbuffer (and/or with AUX
    buffers)
  • Set render target to back buffer
  • glDrawBuffer(GL_BACK)
  • Bind front buffer as texture
  • wglBindTexImageARB(hpbuffer, WGL_FRONT_ARB)
  • Render
  • Switch buffers
  • wglReleaseTexImageARB(hpbuffer, WGL_FRONT_ARB)
  • glDrawBuffer(GL_FRONT)
  • wglBindTexImageARB(hpbuffer, WGL_BACK_ARB)

39
Pbuffer Survival Guide
  • Pack 2D domains into large buffer
  • Flat 3D textures
  • Be careful of read-modify-write hazard

Flattened Volume
3D Volume
40
Dependent Texture Costs
  • Cache Coherency
  • Dependent reads fast if they hit cache
  • Even chained dependencies can be same speed as
    non-dependent reads
  • Very slow if out of cache
  • Example
  • 3 levels of dependent cache misses can be gt10x
    slower
  • More detail in GPU Computation Strategies and
    Tricks

41
Computational Frequency
  • Compute Memory Addresses at Low Frequency
  • Compute memory addresses in vertex program
  • Let rasterizer interpolation create per-fragment
    addresses
  • Compute neighbor addresses this way
  • Avoid fragment-level address computation whenever
    possible
  • Consumes fragment instructions
  • Computation often redundant with neighboring
    fragments
  • May defeat texture pre-fetch

42
Conclusions
  • GPU Memory Model Evolving
  • Writable GPU memory forms loop-back in an
    otherwise feed-forward streaming pipeline
  • Memory model will continue to evolve as GPUs
    become more general stream processors
  • GPGPU Data Structures
  • Basic memory primitive is limited-size, 2D
    texture
  • Use address translation to fit all array
    dimensions into 2D
  • Maintain 2D cache locality
  • Render-To-Texture
  • Use pbuffers with care and eagerly adopt their
    successor

43
Selected References
  • J. Boltz, I. Farmer, E. Grinspun, P. Schoder,
    Spare Matrix Solvers on the GPU Conjugate
    Gradients and Multigrid, SIGGRAPH 2003
  • N. Goodnight, C. Woolley, G. Lewin, D. Luebke, G.
    Humphreys, A Multigrid Solver for Boundary Value
    Problems Using Programmable Graphics Hardware,
    Graphics Hardware 2003
  • M. Harris, W. Baxter, T. Scheuermann, A. Lastra,
    Simulation of Cloud Dynamics on Graphics
    Hardware, Graphics Hardware 2003
  • H. Igehy, M. Eldridge, K. Proudfoot, Prefetching
    in a Texture Cache Architecture, Graphics
    Hardware 1998
  • J. Krueger, R. Westermann, Linear Algebra
    Operators for GPU Implementation of Numerical
    Algorithms, SIGGRAPH 2003
  • A. Lefohn, J. Kniss, C. Hansen, R. Whitaker, A
    Streaming Narrow-Band Algorithm Interactive
    Deformation and Visualization of Level Sets,
    IEEE Transactions on Visualization and Computer
    Graphics 2004

44
Selected References
  • A. Lefohn, J. Kniss, C. Hansen, R. Whitaker,
    Interactive Deformation and Visualization of
    Level Set Surfaces Using Graphics Hardware, IEEE
    Visualization 2003
  • W. Mark, K. Proudfoot, The F-Buffer A
    Rasterization-Order FIFO Buffer for Multi-Pass
    Rendering, Graphics Hardware 2001
  • T. Purcell, C. Donner, M. Cammarano, H. W.
    Jensen, P. Hanrahan, Photon Mapping on
    Programmable Graphics Hardware, Graphics
    Hardware 2003
  • A. Sherbondy, M. Houston, S. Napel, Fast Volume
    Segmentation With Simultaneous Visualization
    Using Programmable Graphics Hardware, IEEE
    Visualization 2003

45
OpenGL References
  • GL_EXT_pixel_buffer_objecthttp//www.nvidia.com/d
    ev_content/nvopenglspecs/GL_EXT_pixel_buffer_objec
    t.txt
  • GL_EXT_render_target, http//www.opengl.org/resour
    ces/features/GL_EXT_render_target.txt
  • OpenGL Extension Registryhttp//oss.sgi.com/proje
    cts/ogl-sample/registry/
  • Superbuffershttp//www.ati.com/developer/gdc/Supe
    rBuffers.pdf
  • WGL_ARB_render_texturehttp//oss.sgi.com/projects
    /ogl-sample/registry/ARB/wgl_render_texture.txtht
    tp//oss.sgi.com/projects/ogl-sample/registry/ARB/
    wgl_pbuffer.txt

46
Questions?
  • Acknowledgements
  • Cass Everitt, Craig Kolb, Chris Seitz, and Jeff
    Juliano at NVIDIA
  • Mark Segal, Rob Mace, and Evan Hart at ATI
  • GPGPU Siggraph 2004 course presenters
  • Joe Kniss and Ross Whitaker
  • Brian Budge
  • John Owens
  • National Science Foundation Graduate Fellowship
  • Pixar Animation Studios
Write a Comment
User Comments (0)
About PowerShow.com