Title: Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware
 1Interactive Time-Dependent Tone Mapping Using 
Programmable Graphics Hardware
Eurographics Symposium on Rendering 2003 
25-27th June - Leuven, Belgium 
 2HDR and Tone Mapping
Compressed
Clamped to 0,1 
 3Advances in graphics hardware
- Physically-based rendering on the GPU 
-  (Purcell et al, 2003) 
- High dynamic range texture mapping 
-  (Debevec et al, 2001)
4System Overview
- Interactive tone mapping system for an OpenGL 
 application
tone mapping system
application
Display callback
HDR image
Frame buffer
LDR image 
 5Interface to the application
tone mapping system
application
- tmInitialize() // Initialize the system 
- tmEnable() // Retarget GL calls 
- Draw geometry 
- tmCompress() // Compress output 
- tmDisable() // Restore app context
6Choosing a tone mapping operator
- Photographic Tone Reproduction for High Contrast 
 Images (Reinhard et al, 2002)
- Global operator is a simple transfer function 
1
scaled luminance
0 
 7Choosing a tone mapping operator
- Local operator 
- Digital analog to burning and dodging
local area luminance
Center-surround  
 8Why use this tone mapping operator?
- Global operator is simple and fast to compute 
- Only one global computation 
- We can dynamically choose the number of zones 
9Variable number of zones 3
3 Zones 
 10Variable number of zones 4
3 Zones 
 11Variable number of zones 5
3 Zones 
 12Variable number of zones 6
3 Zones 
 13Variable number of zones 7
3 Zones 
 14Variable number of zones 8
3 Zones 
 15System block diagram 
 16Implementation
- Target architecture 
- ATI Radeon 9800 (R350) 
- Data storage 
- Floating-point off-screen buffers (pbuffers) 
- Multiple rendering surfaces (GL_AUXi) 
- Algorithms 
- ARB fragment and vertex assembly 
- Generate fragments with image-sized quads 
- Data representation 
- Vector vs. scalar organization
17Global operator block diagram 
 18Implementation global operator
- Simple luminance transform 
- Store luminance and log luminance in separate 
 channels
HDR image
Luminance Log luminance
luminance
log luminance
Mipmap reduction
LDR image
Single buffer 
 19Implementation global operator
Single rendering surface
HDR image
Luminance Log luminance
Mipmap reduction
log luminance channel
log average luminance
LDR image
Single buffer 
 20Implementation global operator
HDR image
texture 0
operator shader
Luminance Log luminance
texture 1
texture 2
Mipmap reduction
LDR image
Single buffer 
 21Local operator block diagram 
 22Implementation GPU-based convolutions
- Transform n-vector product into multiple 4-vector 
 products
filter
luminance 
 23Vectorizing the luminance
- Output 4 pixels at the same time 
- Useful for expensive algorithms 
- Requires a conversion back to scalar form.
Stacked domain 
 24Vectorizing the luminance
- A simple method for luminance vectorization 
luminance
R
G
B
A 
 25Vectorizing the luminance
- A simple method for luminance vectorization 
luminance
R
G
B
A 
 26Vectorizing the luminance
- A simple method for luminance vectorization 
luminance
R
G
B
A 
 27Vectorizing the luminance
- A simple method for luminance vectorization 
luminance
R
G
B
A 
 28Vectorizing the luminance
- A simple method for luminance vectorization 
- Preserves spatial locality
luminance
R
G
B
A 
 29GPU-based convolutions
filter
image
Example 1 x n inner product
stacked image 
 30GPU-based convolutions
filter
image
Pass 1
stacked image 
 31GPU-based convolutions
filter
image
Pass 1
Pass 2
stacked image 
 32GPU-based convolutions
filter
image
Pass 1
Pass 2
Pass 3
stacked image 
 33GPU-based convolutions
- Compute multiple 4-vector products per pass 
- Less shader and texture switching
Single render pass
stacked image 
 34GPU-based convolutions
- Compute multiple 4-vector products per pass 
- Less shader and texture switching
Single render pass
stacked image 
 35GPU-based convolutions
- Compute multiple 4-vector products per pass 
- Less shader and texture switching
Single render pass
stacked image 
 36GPU-based convolutions
- Compute multiple 4-vector products per pass 
- Less shader and texture switching
Single render pass
stacked image 
 37GPU-based convolutions
- Compute multiple 4-vector products per pass 
- Less shader and texture switching
Single render pass
stacked image 
 38GPU-based convolutions
- Advantages 
- Handles large kernels 
- Efficient memory access 
- No transform back to scalar values 
512 X 512 image
11 x 11 kernel
 6 ms
21 x 21 kernel
 10 ms
41 x 41 kernel
 16 ms 
 39System block diagram 
 40Calculating adaptation zones on the GPU
luminance
luminance
FRONT
0
1
BACK
Buffer 0
Buffer 1 
 41Calculating adaptation zones on the GPU
luminance
luminance
FRONT
2
1
BACK
Buffer 0
Buffer 1 
 42Calculating adaptation zones on the GPU
luminance
luminance
FRONT
2
3
BACK
Buffer 0
Buffer 1 
 43Calculating adaptation zones on the GPU
luminance
luminance
FRONT
4
3
BACK
Buffer 0
Buffer 1 
 44Performance global operator
16 bit floats
32 bit floats
Frames per second
Image size 
 45Performance local operator
16 bit floats
32 bit floats
Frames per second
Number of zones 
 46Performance comparison CPU vs. GPU 
 47Results Accuracy
- Comparison with CPU 512 x 512 image 
Image RMS  error
Scaled luminance 0.022 
Convolution (5 x 5) 0.026 
Convolution (49 x 49) 0.032 
Final image 1.051  
 48False-color zone images 
CPU GPU 
 49Images generated at 30Hz
Compressed 2 zones
Clamped 0,1 
 50Images generated at 30Hz
Compressed 2 zones
Clamped 0,1 
 51Images generated at 30Hz
Compressed 2 zones
Clamped 0,1 
 52Images generated at 30Hz
Compressed 2 zones
Clamped 0,1 
 53Images generated at 30Hz
Compressed 2 zones
Clamped 0,1 
 54Images generated at 30Hz
Compressed 2 zones
Clamped 0,1 
 55Conclusion and Future Work
- Summary 
- System for interactively compressing HDR output 
 from an OpenGL application
- Complex tone mapping operator on the GPU 
- Future Work 
- Other tone mapping operators 
- Further optimizations 
- Non-invasive implementation