Title: GeneralPurpose Computation on Graphics Hardware
1General-Purpose Computation on Graphics Hardware
2Introduction
- David Luebke
University of Virginia
3Course Introduction
- The GPU on commodity video cards has evolved into
an extremely flexible and powerful processor - Programmability
- Precision
- Power
- This course will address how to harness that
power for general-purpose computation
4Motivation Computational Power
- GPUs are fast
- 3 GHz Pentium4 theoretical 6 GFLOPS, 5.96 GB/sec
peak - GeForceFX 5900 observed 20 GFLOPs, 25.3 GB/sec
peak - GPUs are getting faster, faster
- CPUs annual growth ? 1.5 ? decade growth ? 60
- GPUs annual growth gt 2.0 ? decade growth gt 1000
Courtesy Kurt Akeley,Ian Buck Tim Purcell, GPU
Gems (see course notes)
5MotivationComputational Power
GPU
CPU
Courtesy Naga Govindaraju
6An Aside Computational Power
- Why are GPUs getting faster so fast?
- Arithmetic intensity the specialized nature of
GPUs makes it easier to use additional
transistors for computation not cache - Economics multi-billion dollar video game market
is a pressure cooker that drives innovation
7MotivationFlexible and precise
- Modern GPUs are deeply programmable
- Programmable pixel, vertex, video engines
- Solidifying high-level language support
- Modern GPUs support high precision
- 32 bit floating point throughout the pipeline
- High enough for many (not all) applications
8MotivationThe Potential of GPGPU
- The power and flexibility of GPUs makes them an
attractive platform for general-purpose
computation - Example applications range from in-game physics
simulation to conventional computational science - Goal make the inexpensive power of the GPU
available to developers as a sort of
computational coprocessor
9The ProblemDifficult To Use
- GPUs designed for and driven by video games
- Programming model is unusual tied to computer
graphics - Programming environment is tightly constrained
- Underlying architectures are
- Inherently parallel
- Rapidly evolving (even in basic feature set!)
- Largely secret
- Cant simply port code written for the CPU!
10Course goals
- A detailed introduction to general-purpose
computing on graphics hardware - We emphasize
- Core computational building blocks
- Strategies and tools for programming GPUs
- Tips tricks, perils pitfalls of GPU
programming - Several case studies to bring it all together
11Why a SIGGRAPH Course?
- Why SIGGRAPH, instead of (say) Supercomputing?
- Many graphics applications stand to benefit from
GPGPU - Hot topic case studies tone mapping, level
sets, fluids - Keeping computation on-card!
- Many graphics applications strive for visual
plausibility rather than rigorous scientific
realism - Better tolerate GPU limitations in precision,
memory - Well suited as GPGPU early adopters
- GPGPU programming still requires expertise of
SIGGRAPH audience
12Course Prerequisites
- We assume
- Familiarity with interactive graphics and
computer graphics hardware - Ideally, some experience programming vertex
pixel shaders - Target audience
- Researchers interested in GPGPU
- Graphics and games developers interested in
incorporating these techniques into their
applications - Attendees wishing a survey of this exciting new
field
13Course Topics
- GPU building blocks
- Languages and tools
- Effective GPU programming
- GPGPU case studies
14Course Topics Details
- GPU building blocks
- Linear algebra
- Sorting and searching
- Database operations
- Languages and tools
- High-level languages
- Debugging tools
15Course Topics Details
- Effective GPU programming
- Efficient data-parallel programming
- Data formatting addressing
- GPU computation strategies tricks
- Case studies in GPGPU Programming
- Physically-based simulation on GPUs
- Ray tracing photon mapping on GPUs
- Tone mapping on GPUs
- Level sets on GPUs
16SpeakersIn Order of Appearance
- David Luebke, University of Virginia
- Mark Harris, NVIDIA
- Jens Krüger, TU-Munich
- Tim Purcell, Stanford (NVIDIA)
- Naga Govindaraju, University of North Carolina
- Ian Buck, Stanford
- Cliff Woolley, University of Virginia
- Aaron Lefohn, University of California Davis
17Course ScheduleGPU Building Blocks
Luebke Harris Krüger Purcell
- 830 Introduction
- Welcome, overview, the graphics pipeline
- 900 Mapping computational concepts to the GPU
- Streaming, Resources, CPU-GPU analogies,
branching - 920 Linear algebra
- Representations, operations, example algorithms
- 955 Sorting searching (part 1)
- Bitonic sort, binary search
- 1015 Break
18Course ScheduleLanguages Tools
Purcell Govindaraju Buck Purcell
- 1030 Sorting searching (part 2)
- Nearest-neighbor search
- 1045 Database operations
- Queries, boolean predicates, aggregation
- 1115 High-level languages
- Cg/HLSL/GLslang, Sh, Brook
- 1145 Debugging tools
- imdebug, DirectX/OpenGL shader IDEs, ShadeSmith
- 1215 Lunch break
19Course ScheduleEffective GPU Programming
Woolley Lefohn Buck All
- 145 Efficient data-parallel GPU programming
- Computational frequency, profiling, load
balancing - 215 Data formatting addressing
- Memory layout, data structures
- 245 GPU Computation Strategies Tricks
- Precision, performance, scatter, branching
- 315 Q A
- Questions for the speakers?
- 330 Break
20Course ScheduleGPGPU Case Studies
Harris Woolley Lefohn Purcell
- 345 Physically-based simulation on GPUs
- Reaction-diffusion, fluids, clouds
- 410 Tone mapping on GPUs
- High-dynamic range images, tone mapping
- 435 Level sets on GPUs
- Streaming level sets, visualization, segmentation
- 500 Global illumination on GPUs
- Ray tracing, photon mapping
- 530 Wrap!
21GPU FundamentalsThe Graphics Pipeline
GPU
CPU
Graphics State
Application
Transform
Rasterizer
Shade
VideoMemory(Textures)
Vertices(3D)
Xformed,LitVertices(2D)
Fragments(pre-pixels)
Finalpixels(Color, Depth)
Render-to-texture
- A simplified graphics pipeline
- Note that pipe widths vary
- Many caches, FIFOs, and so on not shown
22GPU FundamentalsThe Modern Graphics Pipeline
GPU
CPU
Graphics State
VertexProcessor
FragmentProcessor
Application
VertexProcessor
Rasterizer
PixelProcessor
VideoMemory(Textures)
Vertices(3D)
Xformed,LitVertices(2D)
Fragments(pre-pixels)
Finalpixels(Color, Depth)
Render-to-texture
- Programmable vertex processor!
- Programmable pixel processor!
23GPU Pipeline Transform
- Vertex Processor (multiple operate in parallel)
- Transform from world space to image space
- Compute per-vertex lighting
24GPU Pipeline Rasterizer
- Rasterizer
- Convert geometric rep. (vertex) to image rep.
(fragment) - Fragment image fragment
- Pixel associated data color, depth, stencil,
etc. - Interpolate per-vertex quantities across pixels
25GPU Pipeline Shade
- Fragment Processors (multiple in parallel)
- Compute a color for each pixel
- Optionally read colors from textures (images)
26Coming Up
- Next Mapping computational concepts to the GPU
- Also coming up
- Core building blocks for GPGPU computing
- Memory layout, data structures, and algorithms
- Detailed advice on writing high performance GPGPU
code - Lots of examples