Real-time Mesh Simplification Using the GPU - PowerPoint PPT Presentation

About This Presentation
Title:

Real-time Mesh Simplification Using the GPU

Description:

Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 37
Provided by: Nataly150
Category:

less

Transcript and Presenter's Notes

Title: Real-time Mesh Simplification Using the GPU


1
Real-time Mesh Simplification Using the GPU
  • Christopher DeCoro
  • Natasha Tatarchuk
  • 3D Application Research Group

2
Introduction
  • Implement Mesh Decimation in real-time
  • Utilizes new Geometry Shader stage of GPU
  • Achieves a 20x speedup over CPU

3
Project Motivation
  • Massive Increases in submitted geometry
  • Geometry rendered per shadow map (6x for
    cubemap!)
  • Not always needed at highest resolution
  • Geometry not always known at build-time
  • Dynamically-skinned objects only finalized at
    run-time
  • May be customized to users machine based on
    capabilities, would need to be adapted at program
    load time
  • Could be dynamically generated per level, need to
    be adapted at level load time
  • Simplification therefore needs to be fast (or
    even real-time)
  • Also, just as importantly
  • We want applications that exercise stress
    GS/GPU
  • Evaluate new capabilities of the GPU
  • Learn how to adapt previously CPU-bound
    algorithms
  • Develop GPU-centric methodologies
  • Identify future feature set for GS/GPU as a whole
  • Limitations still exist which should be
    addressed?

4
Contributions
  • Mapping of Decimation to GPU
  • 20x speedup vs. CPU
  • Enables load-time or real-time usage
  • Detail Preservation by Non-linear Warping
  • Also applicable to CPU out-of-core decimation
  • General-purpose GPU Octree
  • Adaptive decimation w/ constant memory
  • Applications not limited to simplification
    collision detection, frustum culling, etc.

5
Outline
  • Project Introduction and Motivation
  • Background
  • Decimation with Vertex Clustering
  • Geometry Shaders in Direct3D 10
  • Geometry Shader-based Vertex Clustering
  • Adaptive Simplification w/ Non-linear Warps
  • Probabalistic Octrees on the GPU

6
Vertex Clustering
  • Reduces mesh resolution
  • High-res mesh as input
  • Low-res as output
  • All implemented on the GPU
  • Ideal for processing streamed out data
  • Useful when rendering multiple times (i.e.
    shadows)
  • Can handle enormous models from scanned data
  • Based on Out-of-Core Simplification of Large
    Polygonal Models, P. Lindstrom, 2000

Figure from Lindstrom 2000
7
Previous Rendering Pipeline
  • Vertex Shaders and Pixel Shaders
  • Limits 1 output per 1 input
  • No culling of triangles for decimation
  • Fixed destination for each stage
  • Result meshes cannot be (easily) saved and reused

8
DirectX10 Rendering Pipeline
  • Geometry Shader in between VS PS
  • Called for each primitive (usually triangle)
  • Able to access all vertices of a primitive
  • Can compute per-face quantities
  • Breaks 11 input-output limitation
  • Allows triangles to be culled from pipeline
  • Allows stream-out of processed geometry
  • Decimated meshes can easily be saved and reused

9
Outline
  • Project Introduction and Motivation
  • Background
  • Geometry Shader-based Vertex Clustering
  • Overview
  • Quadric Generation
  • Optimal Position Computation
  • Final Clustering
  • Adaptive Simplification w/ Non-linear Warps
  • Probabilistic Octrees on the GPU

10
Algorithm Overview
  • Start with the input mesh
  • Shown divided into clusters
  • Pass 1 Compute the quadric map from mesh
  • Use GS to compute quadric
  • Accumulate in cluster map, an RT used as large
    array
  • Pass 2 For each cluster, compute optimal
    position
  • Solves a linear system given by quadrics
  • Pass 3 Collapse each vertex to representative
  • 9x9x9 grid shown

Model Courtesy of Stanford Graphics Lab
11
Vertex Clustering Pipeline
  • Pass 1 Create Quadric Map
  • Input Original Mesh
  • Computation
  • Determine plane equation, face quadrics for
    triangle
  • Compute the cluster and address of each vertex
  • Pack quadric into RT at appropriate address
  • Output Render Targets representing clusters with
    packed quadrics and average positions

12
Quadric Map Implementation
//Map a point to its location in the cluster map
array float2 writeAddr( float3 vPos ) uint
iX clusterId(vPos) / iClusterMapSize.x uint
iY clusterId(vPos) iClusterMapSize.y
return expand( float2(iX,iY)/float(iClusterMapSize
.x) ) 1.0/iClusterMapSize.x maxvertexcount(
3) void main( triangle ClipVertex input3,
inout PointStreamltFragmentDatagt stream )
//For the current triangle, compute the area and
normal float3 vNormal (cross(
input1.vWorldPos - input0.vWorldPos,
input2.vWorldPos - input0.vWorldPos ))
float fArea length(vNormal)/6 vNormal
normalize(vNormal) //Then compute the
distance of plane to the origin along the normal
float fDist -dot(vNormal, input0.vWorldPos)
//Compute the components of the face
quadrics using the plane coefficients float3x3
qA fAreaouter(vNormal, vNormal) float3 qb
fAreavNormalfDist float qc
fAreafDistfDist //Loop over each vertex
in input triangle primitive for(int i0 ilt3
i) //Assign the output position in
the quadric map FragmentData output
output.vPos float4(writeAddress(inputi.vPos),
0,1) //Write the quadric to be accumulated in
the quadric map packQuadric( qA, qb, qc,
output ) stream.Append( output )
  • Start with the input mesh
  • Shown divided into clusters
  • Compute the quadric map from mesh
  • Use GS to compute quadric
  • Accumulate in cluster map, an RT used as large
    array
  • For each cluster, compute optimal position
  • Collapse each vertex to representative
  • 9x9x9 grid shown

13
Vertex Clustering Pipeline
  • Pass 2 Find Optimal Positions
  • Input Cluster Map Render Targets, Full-screen
    Quad
  • Computation
  • Determine if we can solve for optimal position
  • If not, fall back to vertex average
  • Output Render Targets representing clusters with
    optimal position of representative vtx.

14
Optimal Positions
Original Mesh
  • For each cell, need representative
  • Naïve solution Use averages
  • Looks very blocky
  • Does not consider the original faces, only
    vertices
  • Implemented solution Use quadrics
  • Quadrics are a measure of surface
  • We can solve for optimal position

Simplified w/ Averages
Simplified w/ Quadrics
15
Optimal Positions Implementation
float3 optimalPosition(float2 vTexcoord)
float3 vPos float3(0,0,0) float4 dataWorld,
dataA0, dataB, dataA1 //Read the vertex
average from the cluster map dataWorld
tClusterMap0.SampleLevel( sClusterMap0,
vTexcoord, 0 ) int iCount dataWorld.w
//Only compute optimal position if there are
vertices in this cluster if( iCount ! 0 )
//Read all the data from the clustermap
to reconstruct the quadric dataA0
tClusterMap1.SampleLevel( sClusterMap1,
vTexcoord, 0 ) dataA1 tClusterMap2.Sample
Level( sClusterMap2, vTexcoord, 0 ) dataB
tClusterMap3.SampleLevel( sClusterMap3,
vTexcoord, 0 ) //Then reassemble the
quadric float3x3 qA dataA0.x, dataA0.y,
dataA0.z, dataA0.y,
dataA0.w, dataA1.x,
dataA0.z, dataA1.x, dataA1.y float3 qB
dataB.xyz float qC dataA1.z
//Determine if inverting A is stable, if so,
compute optimal position //If not, default
to using the average position const float
SINGULAR_THRESHOLD 1e-11
if(determinant(quadricA) gt SINGULAR_THRESHOLD )
vPos -mul( inverse(quadricA), quadricB
) else vPos dataWorld.xyz /
dataWorld.w return vPos
  • Start with the input mesh
  • Shown divided into clusters
  • Compute the quadric map from mesh
  • Use GS to compute quadric
  • Accumulate in cluster map, an RT used as large
    array
  • For each cluster, compute optimal position
  • Collapse each vertex to representative
  • 9x9x9 grid shown

16
Vertex Clustering Pipeline
  • Pass 3 Decimate Mesh
  • Input Cluster Map Render Targets, Input Mesh
  • Computation
  • Find clusters, Remap vertices to representative
  • Determine if triangle becomes degenerate
  • If not, stream output new triangle at new
    positions
  • Output Low-resolution Mesh

17
Final Clustering Implementation
maxvertexcount(3) void main( triangle
ClipVertex input3, inout TriangleStreamltStreamou
tVertexgt stream ) //Only emit a triangle if
all three vertices are in diff. clusters if(
all_different(clusterId(input0.vPos),
clusterId(input1.vPos),
clusterId(input2.vPos)) )
for(int i0 ilt3 i) //Lookup
optimal position in the RT computed in Step 2
vPos tClusterMap3.SampleLevel(
sClusterMap3, readAddr(input0.vPos), 0 )
//Output vertex to stream out
stream.Append( vPos ) return
  • Start with the input mesh
  • Shown divided into clusters
  • Compute the quadric map from mesh
  • Use GS to compute quadric
  • Accumulate in cluster map, an RT used as large
    array
  • For each cluster, compute optimal position
  • Collapse each vertex to representative
  • 9x9x9 grid shown

18
Vertex Clustering Pipeline
  • Alternate Pass 2 Downsample RTs
  • Input and Output as before
  • Computation
  • Collapse 8 adjacent cells by adding cluster
    quadrics
  • Compute optimal position for 2x larger cell
  • Create multiple lower levels of detail without
    repeatedly incurring Pass 1 overhead (75)
  • Pass 3 can use previous streamed-out mesh
  • Lower levels of detail almost free

19
Timing Results
  • Recorded Time Spent in Decimation
  • GPU AMD/ATI XXX
  • CPU 3Ghz Intel P4
  • Significant Improvement over CPU
  • Averages 20x speedup on large models
  • Scales linearly

20
More Results
  • Models shown at varying resolutions

Buddha, 45x130x45 grid
Bunny, 90x90x90 grid
Dragon, 100x60x20 grid
Models Courtesy of Stanford Graphics Lab
21
More Results
  • Models shown at varying resolutions

Buddha, 20x70x20 grid
Bunny, 60x60x60 grid
Dragon, 50x25x10 grid
22
More Results
  • Models shown at varying resolutions

Buddha, 10x40x10 grid
Bunny, 20x20x20 grid
Dragon, 30x15x6 grid
23
Outline
  • Project Introduction and Motivation
  • Background
  • Geometry Shader-based Vertex Clustering
  • Adaptive Simplification w/ Non-linear Warps
  • View-dependent Simplification
  • Region-of-interest Simplification
  • Probabalistic Octrees on the GPU

24
View-dependent Simplification
  • Standard simplification does not consider view
  • Preserves uniform amount of detail all over
  • Simplify in post-projection space to use view
  • Preserves more detail closer to viewer (left)

View Direction
25
Arbitrary Warping Functions
  • View Transform special case of nonlinear warp
  • Can use arbitrary warp for adaptive
    simplification
  • Regular grids allow data-independence,
    parallelism
  • Constant time mapping from position to grid cell
  • Maps well onto GPU render targets
  • Forces uniform resolution throughout output mesh
  • Irregular geometry grids allow non-uniform output
  • Cells can be larger/smaller in certain regions
  • Corresponds to lower/greater output triangle
    density
  • We lose constant-time mapping of position to cell
  • Solution apply inverse warp to vertices
  • Equivalent to applying forward warp to grid cells
  • Clustering still performed in uniform grid
  • Flexibility of irregular geometry w/ speed of
    regular
  • One proposal Gaussian weighting functions

26
Region-of-Interest Specification
  • Importance specified w/ biased Gaussian
  • Highest preservation at mean
  • Width of region given by sigma
  • Bias prevents falloff to zero
  • Integrate to produce corresponding warp function
  • (Derivation given in paper)

27
Region-of-Interest Specification
  • Warping allows non-uniform/adaptive level of
    detail
  • Head has most semantic importance
  • Detail lost in uniform simplification
  • We can warp first to expand center
  • Equivalent to grid density increasing
  • Adaptive simplification preserves head detail

28
Outline
  • Project Introduction and Motivation
  • Background
  • Geometry Shader-based Vertex Clustering
  • Adaptive Simplification w/ Non-linear Warps
  • Probabalistic Octrees on the GPU
  • Motivation
  • Probablistic Storage
  • Adaptive Simplification
  • Randomized Construction
  • Results

29
Octrees - Motivation
  • Basic grid
  • regular geometry, regular topology
  • Limitations as we discussed
  • Warped grid
  • irregular geometry, regular topology
  • Much improved however, we can do better
  • May be difficult to know required detail a priori
  • CPU Solution Multi-resolution grid (i.e. octree)
  • Irregular topology (irregular geometry w/
    warping)
  • Store grid at many levels of detail
  • Measure error at each level, use coarse as
    possible
  • Efficiency requires dynamic memory, storage O(L3)
  • Requires O(L) writes to produce correct tree

30
GPU Solution Probabilistic Octrees
  • Proposal
  • Successful storage not guaranteed, w/ Prob. lt 1
  • However, storage failure detected on read
  • Assumptions allow much flexibility
  • We can have unlimited depth tree (but lim P0)
  • Sparse storage of data
  • Require conservative algorithms for task
  • Vertex clustering (conveniently!) is such an
    example
  • So is collision detection and frustum culling
  • Only studied in brief in this paper, we would
    like to analyze more for future work

31
Implementation Details
  • Storage Spatial Hashes
  • Map (position,level) to cell, cell hashed to
    index
  • Additive blending for quadric accumulation
    (app-specific)
  • Max blending to store (key,-key) with data (i.e.
    min_key,max_key)
  • Retrieval
  • Again map (position, level) to index
  • Retrieve key value from data, collision iff
    min_key ! max_key
  • Use parent level, which will have higher storage
    probability
  • Usage for Adaptive Simplification
  • For each vertex, find maximum error level below
    some threshold
  • Use this as the representative vertex
  • Can perform binary search along path
  • Conservative, because we can maintain validity
    even when using parent of optimal node (just adds
    some error)

32
Randomized Generation
  • Currently hidden
  • We can probably skip this we are too long already

33
Probabilistic Octree Results
  • Adaptive simplification shown on bunny (4K tris)
  • Preserves detail around leg, eyes and ears
  • Simplifies significantly on large, flat regions
  • Using 8 of storage of total tree, we have lt 10
    collisions
  • Only 20 performance hit vs. standard grids

34
Conclusions
  • GS is a powerful tool for interactive graphics
  • Amplification and decimation are important
    applications of GS

35
Geometry Shaders and Other Feature Wish-List
  • Bring back the Point fill mode
  • Important for scatter in GPGPU applications
  • Data amplification improvements with indexed
    stream out
  • Avoiding triangle soups very non-trivial
  • Efficient indexable temps

36
Thanks a lot!
  • Various people here

37
Questions?
Write a Comment
User Comments (0)
About PowerShow.com