Title: Towards Acquiring and Rendering Real
1Hardware-Assisted Visibility Sortingfor
Tetrahedral Volume Rendering
Steven Callahan Milan Ikits João Comba
Cláudio Silva
2Overview
- Introduction
- Previous Work
- Hardware-Assisted Visibility Sorting
- Results
- Future Work
- Conclusion
3Research Goal
- Real-time volume rendering
- Scalable (machine performance)
- Data of arbitrary size
- Simple and robust implementations
4Volume Rendering
Regular
Irregular
5Why Irregular Grids ?
- Unstructured grids are the preferred data type in
scientific computations - Level-Of-Detail (LOD) techniques intrinsically
need unstructured grids
El-Sana et al, Ben-Gurion
6 Optical Models
Absorption plus emission
Light
s
?s
7Compositing
Front-to-back
I1
I0
I2
?1
?2
?0
I01
I2
?2
?01
8Volume Rendering (Intersection) Sampling
Sorting
9Sampling Triangle-Based Approach
Class 1 (, , , -)
Class 2 (, , -, -)
Projected Tetrahedra Shirley-Tuchman 1990
10Sorting
Application
Object-Space Sorting
i.e., lets sort the geometry!
Rasterization
Image Space
Display
11Cell-Projection
12Object-Space Sorting Williams MPVO
Idea Define ordering relations by looking at
shared faces.
D
A
C
F
B
E
Viewing direction
13MPVO Limitations
Missing relations!
14XMPVO
Idea Using ray shooting queries to complement
ordering relations.
C
D
A
B
A lt B
Viewing direction
15Sorting
i.e., lets sort the pixels!
16Image-Space Sorting A-Buffer
- Idea Keep a list of intersections for each
pixel.
Carpenter 1984
17Cell-Projection With An A-Buffer
18Cell-Projection With An A-Buffer
19Cell-Projection With An A-Buffer
20Cell-Projection With An A-Buffer
21Cell-Projection With An A-Buffer
22Cell-Projection With An A-Buffer
23Cell-Projection With An A-Buffer
24Cell-Projection With An A-Buffer
25Cell-Projection With An A-Buffer
26Cell-Projection With An A-Buffer
27Cell-Projection With An A-Buffer
28Cell-Projection With An A-Buffer
29Cell-Projection With An A-Buffer
Not sorted!
30Cell-Projection With An A-Buffer
Sorted!
31A-Buffer Limitations
2
Number of Intersections O(cn )
n x n pixels
c cells
- Problems
- Time sorting takes too long
- Memory storage too high
32Sorting
33Approximate Object-Space Sorting
1
34Approximate Object-Space Sorting
1
2
35Approximate Object-Space Sorting
3
1
2
36Approximate Object-Space Sorting
3
5
1
4
2
37Approximate Object-Space Sorting
3
6
5
7
1
4
2
38Approximate Object-Space Sorting
3
7
5
6
1
4
2
A Solution Use an insertion-sort A-buffer!
39Approximate Object-Space Sorting
What about the space problem?
3
7
5
6
1
4
2
? Use a conservative bound on the intersections
40Hardware Assisted Visibility Sorting (HAVS)
- Sort in image-space and object-space
- Do an approximate object-space sorting of the
cells on the CPU (i.e. sort by face centroid) - Complete the sort in image-space by using a fixed
depth A-buffer (called a k-buffer)
implemented on the GPU - Can handle non-convex meshes, has a low memory
overhead, and requires minimal pre-processing of
data
41HAVS Overview
42k-buffer
- Fixed size A-buffer of depth k
- Fragment stream sorter
- Stores k entries for each pixel. Each entry
consists of the fragments scalar value and its
distance to the viewpoint - An incoming fragment replaces the entry that is
closest to the eye (front-to-back compositing) - Given a sequence of fragments such that each
fragment is within k positions from its position
is sorted order, it will output the fragments in
sorted order
43k-buffer Hardware Implementation
- Use multiple render target capability of ATI
graphics cards (ATI_draw_buffers in OpenGL) - Use P-buffer to accumulate color and opacity and
three Aux buffers for the k-buffer entries
P-buffer
Aux 0
Aux 1
Aux 2
44Fragment Shader Overview
45Details
- Fix incorrect screen-space texture coordinates
caused by perspective-correct interpolation
Projecting vertices to find tex coords
Projecting tex coords in shader
Perspective interpolation
46Details
- Simultaneously reading and writing to a buffer is
undefined when fragments are rasterized in
parallel
47Details
- The buffers are initialized and flushed using k
screen-aligned rectangles with negative scalar
values - Handling non-convex objects requires the exterior
faces to be tagged with a negative distance d and
keeping track of when we are inside or outside of
the mesh with the sign of the scalar value v
48Details
- Early ray termination reads accumulated opacity
and kills fragment if it is over a given
threshold. Early z-test is currently not
available on ATI 9800 when using multiple
rendering targets
49Pre-Integrated Transfer Function
- Previous Work
- Volume density optical model
- Williams and Max 1992
- Pre-integration on GPU
- Roettger et al. 2000
- 5 s to update a 128x128x128 table
- Incremental pre-integration on CPU
- Wieler et al. 2003
- 1.5 s to update a 128x128x128 table
50Pre-Integrated Transfer Function
Williams and Max
51Pre-Integrated Transfer Function
52Pre-Integrated Transfer Function
Weiler et al.
53Pre-Integrated Transfer Function
- Our Approach
- Incremental pre-integration of the 3D transfer
function completely on the GPU - Compute base slice using Roettger et al.
- Compute the other slices using the base slice and
the previously computed slice Weiler et al. - 0.067 s to update a 128x128x128 table
- This allows interactive updates to the colormap
and transfer function opacity
54Experiments
- Environment
- 3.0 GHz Pentium 4
- 1024 MB RAM
- Windows XP
- ATI Radeon 9800 Pro
- Results
- k-buffer analysis
- Performance results
55K-buffer Analysis
- Accuracy analysis
- Analysis of k depth required to correctly render
datasets - Max values from 14 fixed viewpoints
Dataset Max A Max k k gt 2 k gt 6
Spx2 476 22 10,262 512
Torso 649 15 43,317 1,683
Fighter 904 3 1 0
56k-buffer Analysis
- Distribution analysis
- Shows actual pixels that require large k depths
to render correctly for each viewpoint
k lt 2 (green) 2 lt k lt 6 (yellow) k gt 6 (red)
57Results
- Performance
- Average values from 14 fixed viewpoints
- Does not include partial sort on CPU
- 512 x 512 viewport with a 128 x 128 x 128
pre-integrated transfer function
Dataset Cells K 2 Fps K 2 Tets/s K 6 Fps K 6 Tets/s
Spx2 0.8 M 2.07 1712 K 1.7 1407 K
Torso 1.1 M 3.13 3390 K 1.86 1977 K
Fighter 1.4 M 2.41 3387 K 1.56 2190 K
58Image Blunt Fin
59Image - Spx
60Image Torso
61Image - Fighter
62Future Work
- Optimize partial sort on CPU
- Develop techniques to refine datasets to respect
a given k (subdivide degenerate tets) - Incorporate isosurface rendering
- Parallel techniques
- Proper hole handling
- Dynamic data
- Use early z-test
63Conclusion
- Renders up to 6 million Tets/sec when using a
linear transfer function - Handles arbitrary non-convex meshes
- Requires minimal pre-processing of data
- Maximum data size is bounded by main memory
- Uses simple vertex and fragment shaders