Title: Numerical-Precision-Optimized Volume Rendering
1 Numerical-Precision-Optimized Volume Rendering
Sqeeze
Ingmar Bitter Neophytos Neophytou Klaus
Mueller Arie Kaufman
2 Numerical-Precision-Optimized Volume Rendering
Sqeeze
Ingmar Bitter Neophytos Neophytou Klaus
Mueller Arie Kaufman
3Outline
- Numerical precision - a rendering resource
4Outline
- Numerical precision - a rendering resource
- Fixed-point arithmetic
5Outline
- Numerical precision - a rendering resource
- Fixed-point arithmetic
- Reverse order precision analysis
- Compositing, shading, gradients, classification,
sampling/splatting, sample/splat location
6Outline
- Numerical precision - a rendering resource
- Fixed-point arithmetic
- Reverse order precision analysis
- Compositing, shading, gradients,
classification,sampling/splatting, sample/splat
location - Results
7Outline
- Numerical precision - a rendering resource
- Fixed-point arithmetic
- Reverse order precision analysis
- Compositing, shading, gradients, classification,
sampling/splatting, sample/splat location - Results
- Conclusions
8Numerical Precision A Resource
- Double precision computation for all ideal?
9Numerical Precision A Resource
- Double precision computation for all ideal?
- slower then all other alternatives
- not possible on graphics cards (at least for now)
- expensive on custom chip implementations
- and most importantly
- not needed to create best possible images!!
10Numerical Precision A Resource
- Double precision computation for all ideal?
- slower then all other alternatives
- not possible on graphics cards (at least for now)
- expensive on custom chip implementations
- and most importantly
- not needed to create best possible images!!
- reasons predominantly 8-bit displays (per
channel) - limited range intervals
throughout
11Current Status
- Stable volume rendering pipeline both CPU and
GPULL94, Lev88, MJC02, Wes90, EKE01, RSEB00 - Interpolation before classification, even for
splatting MMC99 - Caching optimized for volume renderingKni00,
LCCK02, PSL98 - Precision-limited rendering systems ATI,
NVidia,VolumePro PHK99, VizardII MKW02,
UltraVis Kni00 - Completely fixed final output image display bit
precision - 8 bits per RGB color channel on CRTs and LCDs
- 8 bits max in DVI standard
- SGIs 12 bit color displays are nearly extinct
- Radiologists requirements are not mass market,
same analysis applies
12OpenGL Arithmetic 121?
- Representation 0, 255 ? a b 255
- Computation a0, 255 b0, 255 gtgt 8
- 254 ? wrong
- ? 1 mult, one shift
- Alternatively tmp a0, 255 b0, 255
128 result (tmp(tmp gtgt
8)) gtgt 8 - 255, correct
Bli95 - ? 1 mult, 2 adds, 2 shifts
13OpenGL Arithmetic 121?
- Representation fixed-point I.Fb
- I.Fb I integer bits, F fraction bits
- 8 bits ? 1.7b fixed point number
- then a b 11.7b 128
- Computation a1.7b b1.7b gtgt 7
- 128 ? correct
- ? 1 mult, one shift
- ? one fewer bit of resolution, but OK (we will
see)
14Reverse Order Precision Analysis
Ray Casting
Splatting
- Unified ray casting and splatting pipelines
- Composite creates the final image
Sample Location
Splat Location
Sample
Splat
Classify
Gradient
Shade
Composite
15Reverse Order Precision Analysis
Ray Casting
Splatting
- Unified ray casting and splatting pipelines
- Composite creates the final image
- Precision requirements propagate backwards
Sample Location
Splat Location
Sample
Splat
Classify
Gradient
Shade
Composite
16Compositing - Math
- Pre-(alpha)-multiplied colors
- C aC aR, aG, aB
- Alpha correction (r samples per unit)
- Tcorrected (1- a)r
17Compositing - Math
- Pre-(alpha)-multiplied colors
- C aC aR, aG, aB
- Alpha correction
- Tcorrected (1- a)r
- With back-to-front compositing
- CCompositingBuffer Tcorrected Cfront
- TCompositingBuffer Tcorrected
aCompositingBuffer 1-Tcorrected - perform multiplication N times per pixel
- ? correct solution needs N F r bits
precision
T/CCompositingBuffer
Tcorrected, Cfront
T/CCompositingBuffer
18Compositing Precision Theory
- 8-bit destination resolution
- therefore all partial results can be rounded
- drop all bits not contributing to the 8 most
significant bits (MSB) - Adding N 2p samples
- allows 8p bits to influence the 8 MSB
- Conversion from aCompositingBufferC to C for
display (division) - allows 8p more bits to influence the 8 MSB
- Conversion from acorrectedC to C for display
- allows r times as many bits to influence the 8
MSB - Sufficient resolution is r 2 (8p) for C, r
(8p) for a - 32/16 bits for C/aCompositingBuffer for 2563
volumes and no super-sampling - 608 bits for 51222048 volumes and 16 samples per
voxel
19Compositing Precision Theory
- 8-bit destination resolution
- therefore all partial results can be rounded
- drop all bits not contributing to the 8 most
significant bits (MSB) - Adding N 2p samples
- allows 8p bits to influence the 8 MSB
- Conversion from aCompositingBufferC to C for
display (division) - allows 8p more bits to influence the 8 MSB
- Conversion from acorrectedC to C for display
- allows r times as many bits to influence the 8
MSB - Sufficient resolution is r 2 (8p) for C, r
(8p) for a - 32/16 bits for C/aCompositingBuffer for 2563
volumes and no super-sampling - 608 bits for 51222048 volumes and 16 samples per
voxel
20Compositing Precision Theory
- 8-bit destination resolution
- therefore all partial results can be rounded
- drop all bits not contributing to the 8 most
significant bits (MSB) - Adding N 2p samples
- allows 8p bits to influence the 8 MSB
- Conversion from aCompositingBufferC to C for
display (division) - allows 8p more bits to influence the 8 MSB
- Conversion from acorrectedC to C for display
- allows r times as many bits to influence the 8
MSB - Sufficient resolution is r 2 (8p) for C, r
(8p) for a - 32/16 bits for C/aCompositingBuffer for 2563
volumes and no super-sampling - 608 bits for 51222048 volumes and 16 samples per
voxel
21Compositing Precision Theory
- 8-bit destination resolution
- therefore all partial results can be rounded
- drop all bits not contributing to the 8 most
significant bits (MSB) - Adding N 2p samples
- allows 8p bits to influence the 8 MSB
- Conversion from aCompositingBufferC to C for
display (division) - allows 8p more bits to influence the 8 MSB
- Conversion from acorrectedC to C for display
- allows r times as many bits to influence the 8
MSB - Sufficient resolution is r 2 (8p) for C, r
(8p) for a - 32/16 bits for C/aCompositingBuffer for 2563
volumes and no super-sampling - 608 bits for 51222048 volumes and 16 samples per
voxel
22Compositing Precision Theory
- 8-bit destination resolution
- therefore all partial results can be rounded
- drop all bits not contributing to the 8 most
significant bits (MSB) - Adding N 2p samples
- allows 8p bits to influence the 8 MSB
- Conversion from aCompositingBufferC to C for
display (division) - allows 8p more bits to influence the 8 MSB
- Conversion from acorrectedC to C for display
- allows r times as many bits to influence the 8
MSB - Sufficient resolution is r 2 (8p) for C, r
(8p) for a - 32/16 bits for C/aCompositingBuffer for 2563
volumes and no super-sampling - 608 bits for 51222048 volumes and 16 samples per
voxel
23Compositing Precision Practice
- No alpha correction (r 1) 2 (8p) bits
- Iso-surface rendering using old fashioned
OpenGL - store not aC but C in frame buffer (8p)
- bright colors 5p
- at most 8 non-zero samples per ray (p3) 538
bits - ? standard 24 bit RGBA frame buffer is
adequate - Fog visualization
- what matters is the ability to see objects though
volumetric fog (substance with low opacity) - visual experiments show 15 fractional bits are
sufficient
24Compositing Precision Practice
- No alpha correction (r 1) 2 (8p) bits
- Iso-surface rendering using old fashioned
OpenGL - store not aC but C in frame buffer (8p)
- bright colors 5p
- at most 8 non-zero samples per ray (p3) 538
bits - ? standard 24 bit RGBA frame buffer is
adequate - Fog visualization
- what matters is the ability to see objects though
volumetric fog (substance with low opacity) - visual experiments show 15 fractional bits are
sufficient
25Compositing Precision Practice
- No alpha correction (r 1) 2 (8p) bits
- Iso-surface rendering using old fashioned
OpenGL - store not aC but C in frame buffer (8p)
- bright colors 5p
- at most 8 non-zero samples per ray (p3) 538
bits - ? standard 24 bit RGBA frame buffer is
adequate - Fog visualization
- what matters is the ability to see objects though
volumetric fog (substance with low opacity) - visual experiments show 15 fractional bits are
sufficient
26Compositing Precision Practice
- No alpha correction (r 1) 2 (8p) bits
- Iso-surface rendering using old fashioned
OpenGL - store not aC but C in frame buffer (8p)
- bright colors 5p
- at most 8 non-zero samples per ray (p3) 538
bits - ? standard 24 bit RGBA frame buffer is
adequate - Fog visualization
- what matters is the ability to see objects though
volumetric fog (substance with low opacity) - visual experiments show 15 fractional bits are
sufficient
27Compositing Precision Practice
- No alpha correction (r 1) 2 (8p) bits
- Iso-surface rendering using old fashioned
OpenGL - store not aC but C in frame buffer (8p)
- bright colors 5p
- at most 8 non-zero samples per ray (p3) 538
bits - ? standard 24 bit RGBA frame buffer is
adequate - Fog visualization
- what matters is the ability to see objects though
volumetric fog (substance with low opacity) - visual experiments show 15 fractional bits are
sufficient
28Compositing Precision Practice
- No alpha correction (r 1) 2 (8p) bits
- Iso-surface rendering using old fashioned
OpenGL - store not aC but C in frame buffer (8p)
- bright colors 5p
- at most 8 non-zero samples per ray (p3) 538
bits - ? standard 24 bit RGBA frame buffer is
adequate - Fog visualization
- what matters is the ability to see objects though
volumetric fog (substance with low opacity) - visual experiments show 15 fractional bits are
sufficient
29Compositing Conclusion
Least-significant-bit-fog at various bit
precisions
8
10
12
14
15
16
5123 dataset r 2
- Preferred bit-aware back-to-front compositing
equations - aC1.15b T1.15bsample C1.15bsample
- T1.15b T1.15bsample
30Shading - Math
- PhongCcolor kambient OobjectColor
IlightIntensity kdiffuse O Si Ii
(NLi) kspecular Si Ii (RLi)r - k ? 0,1 kambient kdiffuse
kspecular 1 - OobjectColor (8 bit) and IlightIntensity ? 0,1
- NLi and RLi ? -1,1, but ? 0,1 after
clamping - PhongCcolor ? 0,1 (possibly clamping Si)
31Shading - Analysis
- PhongCcolor needs to be as precise as 1.15b
- Use 16.16b for all multiplications 0,1) 0,1
- sufficient precision and no overflow
32Shading New Computation
- Replace specular exponentiation with recursive
multiplies - repeatedly multiply number with itself
- works for all exponents r2n
- when r26 (16 bit precision), then max error lt
0.005 - better results than Knittels parabola
approximation
33Shading New Computation
- Replace specular exponentiation with recursive
multiplies - repeatedly multiply number with itself
- works for all exponents r2n
- when r26 (16 bit precision), then max error lt
0.005 - better results than Knittels parabola
approximation
Knittels parabola
pow
r2n
34Shading - Conclusion
- Preferred bit-aware Phong shading equation
- C16.16b k16.16bambient O0.8bobjectColor
I16.16blight k16.16bdiffuseO0.8b
Si I16.16bi (N16.16bL16.16bi)
k16.16bspecular Si I16.16bi (R16.16bL16.16bi)2
n
35Gradients - Math
- Gx 0.5 sample(x1,y,z) - 0.5 sample(x-1,y,z)
- Gy 0.5 sample(x,y1,z) - 0.5 sample(x,y-1,z)
- Gy 0.5 sample(x,y,z1) - 0.5 sample(x,y,z-1)
36Gradients - Analysis
- G G1.Fb
- Discrete nearest gradient vector neighbors
- sin f 1/2F, sin f f ? f 1/2F
- Maximum error for specular intensity, large r
- r 64, 164 ! 1, but 164 (1- 1/2F)64
- error of 22, 6.1, 1.6, 0.4for F of 8,
10, 12, 14
f
37Gradients - Analysis
- 5123-sized spheres with Phong highlights
- 4, 6, 8, 10, 12, 14 bit gradients
- Diffuse artifacts for 4 and 6 bits
- Specular artifacts up to 10 bits
6
4
8
12
10
14
12
10
14
38Gradients - Conclusion
- Thus, 12 bits dynamic range is needed
- Now consider normalization
- reduces I.Fb to 1.Fb
- up to I bits will be added to the fractional part
- Volume samples often have 12 bits
- Gx,y,z with 12.12b minimum representation
- Gx,y,z with 16.16b preferred representation
- leaves room for interpolation bits in
normalization
39Classification Prelims and Recaps
- Use of T instead of a is more efficient in
compositing operation - Largest visual precision/quantization error
occurs at high transparencies (low opacities) - need more bits for T than for C, just to be sure
- Want transfer function lookup table to be
cache-friendly - power-of-2 RGBA-tuple alignment
- Would like to use pre-integrated classification
for color and opacity transfer functions EKE01,
MGS02
40Classification Prelims and Recaps
- Use of T instead of a is more efficient in
compositing operation - Largest visual precision/quantization error
occurs at high transparencies (low opacities) - need more bits for T than for C, just to be sure
- Want transfer function lookup table to be
cache-friendly - power-of-2 RGBA-tuple alignment
- Would like to use pre-integrated classification
for color and opacity transfer functions EKE01,
MGS02
41Classification Prelims and Recaps
- Use of T instead of a is more efficient in
compositing operation - Largest visual precision/quantization error
occurs at high transparencies (low opacities) - need more bits for T than for C, just to be sure
- Want transfer function lookup table to be
cache-friendly - power-of-2 RGBA-tuple alignment
- Would like to use pre-integrated classification
for color and opacity transfer functions EKE01,
MGS02
42Classification Prelims and Recaps
- Use of T instead of a is more efficient in
compositing operation - Largest visual precision/quantization error
occurs at high transparencies (low opacities) - need more bits for T than for C, just to be sure
- Want transfer function lookup table to be
cache-friendly - power-of-2 RGBA-tuple alignment
- Would like to use pre-integrated classification
for color and opacity transfer functions EKE01,
MGS02
43Classification - Math
- Desired lookup table entries
- R1.8bG1.8bB1.8bT1.16b ? 5.5 bytes
- Common lookup table entries
- R0.8bG0.8bB0.8ba0.8b ? 4 bytes
44Classification - Math
- Desired lookup table entries
- R1.8bG1.8bB1.8bT1.16b ? 5.5 bytes
- Common lookup table entries
- R0.8bG0.8bB0.8ba0.8b ? 4 bytes
- Better lookup table entries
- R0.8bG0.8bB0.8bsqrt(a)0.8b ? spreads low a
- Computed lookup after T 1-(sqrt(a)2)
- R0.8bG0.8bB0.8bT1.16b ? squaring doubles
precision
45Classification - Conclusion
Foot with least-significant-thin-tissue-fog
a0.8b
sqrt(a)0.8b
a0.16b
- Preferred bit-aware lookup table entries
R0.8bG0.8bB0.8bsqrt(a)0.8b
46Sample Interpolation - Math
- sample voxel0 (1-w) voxel1 w
- sample w (voxel1 - voxel0) voxel0
- Requirements
- Gx,y,z, derived from samples, need 12 bit dynamic
range - samples need 12 bit values for transfer function
lookup - cover both low and high dynamic range
neighborhoods - Therefore, sample12.12b is a minimum requirement
- integer part comes from voxels ? voxel12.0b
- fractional part comes from interpolation ? w1.12b
47Sample Interpolation - Conclusion
- Preferred bit-aware sample interpolation
- sample12.12b w1.12b (voxel112.0b -
voxel012.0b) voxel012.0b - Splats start on voxels, need no interpolation
- splat12.0b voxel12.0b
48Sample Location - Math
k
- k-th sample location startPos Sk Vinc
- Perspective rays need to differ enough to allow
1024 rays across 60 degrees, or 0.05? - sin f (k 1/2F) / k, sin f f ? f 1/2F
- F 6, 12, 16 ? f 0.9?, 0.05?, 0.0009?
- Also, need to address 2048 slices (integer
positions) ? 11bits - Thus, need overall 11.12b
f
49Sample Location - Conclusion
- Preferred bit-aware sample location
- perspective projection
- sampleLocation11.12b startPos11.12b S
Vinc1.12b - parallel projection sampleLocation11.6b (0.9? OK)
50Splat Scan Conversion - Math
- Splats project onto image grid ? reverse rays
- Allow as many as 2048 splat rays across 60
degrees, or 0.025? - Hence, twice the ray casting precision
- one extra fractional bit F13
- Also address 2048 slices (11bits)
- Thus, need overall 11.13b
f
51Splat Scan Conversion - Conclusion
- Preferred bit-aware splat scan conversionsplatLo
cation11.13b startVoxelPos11.13b S
Vinc1.13b - Splats are usually pre-transformed and stored in
bucket lists (one per sheet-buffer) - Preferred voxel location sheet buffer
formatx11.13b u8.0b y11.13b v8.0b (64 bits
total) - x, y location on splat plane
- u index into pre-integrated splat table
- v voxel value
u
(x, y) y)
52Results
- Summary of minimum precision requirements
Rendering Stage Input Output
Sample locations N/A 11.12b
Sample interpolation 12.00b 12.12b
Classification 12.00b 4 0.8b
Gradients 12.12b 1.12b
Shading 1.12b 1.15b
Compositing 1.15b 1.15b
53Results
- Restricted iso-surface rendering
- texture map volume rendering can be done using
plain OpenGL or Direct X and 8 bit frame buffers - General volume rendering, all pipeline stages
- 32 bit single precision floating point format
- 16.16b fixed point format (up to 4x faster in our
tests) - Pentium allows 2 simple 32-bit integer ops
per clock cycle
54Conclusions
- 8 bits per RGB channel on final display
- Analysis of requirements by back propagation
- Sufficient precision computations using
- either 32 bits single precision floating point
format - or 16.16b fixed point format
- Voxel location sheet buffer x11.13bu8.0by11.13bv8.
0b - Transfer functions stored as R0.8bG0.8bB0.8bsqrt(a
)0.8b - Compositing/fragment buffer R1.15bG1.15bB1.15bT1.1
5b
55Acknowledgements
- Hewlett Packard Laboratories
- ONR grant N000140110034
- NSF CAREER grant ACI-0093157
- DOE grant MO-068
- Thanks to Tom Malzbender and Michael Meissner for
technical discussions. - Thanks to Ronald Summers for resources.