Title: Research on Graph-Cut for Stereo Vision
1Research on Graph-Cut for Stereo Vision
- Presenter Nelson Chang
- Institute of Electronics,
- National Chiao Tung University
2Outline
- Research Overview
- Brief Review of Stereo Vision
- Hierarchical Exhaustive Search
- Partitioned Graph-Cut for Stereo Vision
- Hierarchical Parallel Graph-Cut
3Our Research
HRP-2 Head
- A fast vision system for robotics
- Stereo vision
- Local block-based diffusion (M)
- Graph-cut (PhD)
- Belief propagation (PhD)
- Segmentation
- Watershed (M)
- Meanshift
- Approaches
- Embedded solutions
- DSP (U)
- ASIC
- PC-based solutions
- Dual webcam stereo (U)
HRP-2 Tri-Camera Head
4My Research
- A fast graph-cut VLSI engine for stereo vision
- ASIC approach
- Goal 256x256 pixels, 30 depth label, 30 fps
- Stereo vision system prototypes
- PC-based
- DSP-based
- FPGA/ASIC-based
5Review on Stereo Vision
- Presenter Nelson Chang
- Institute of Electronics,
- National Chiao Tung University
6Concept of Stereo Vision
- Computational Stereo to determine the 3-D
structure of a scene from 2 or more images taken
from distinct view points.
Triangulation of non-verged geometry
d disparity Z depth T baseline f focal
length
M. Z. Brown et al., Advances in Computational
Stereo, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 25, no. 8, August
2003.
7Disparity Image
- Disparity Map/Image
- The disparities of all the pixels in the image
- Example
Left Cam
Right Cam
110 pixels
Disparity map of the 4x4 block
0
0
0
0
Left Disparity Map
Right Disparity Map
0
0
110
0
Farthest
0
100
138
0
d 0
80
123
156
176
d 255
Nearest
8How to find the disparity of a pixel? (1/2)
- Simple Local Method
- Block Matching
- SAD?Sum of Absolute Difference
- ?IL-IR
- Find the candidate disparity with minimal SAD
- Assumption
- Disparities within a block should be the same
- Limitation
- Works bad in texture-less region
- Works bad in repeating pattern
0
0
0
0
100
0
dk-1 SAD400
200
300
0
0
0
0
dk SAD0
0
0
0
0
100
0
0
100
0
200
300
0
dk1 SAD600
200
300
0
Left
0
0
0
100
0
0
300
0
0
Right
9How to find the disparity of a pixel? (2/2)
- Complex Global Method
- Graph-cut, Belief Propagation
- Disparity Estimation ? Optimal Labeling Problem
- Assign the label (disparity) of each pixel such
that a given global energy is minimal - Energy is a function of the label set (disparity
map/image) - The energy considers the
- Intensity similarity of the corresponding pixel
- Example Absolute Difference (AD), DIL-IR
- Disparity smoothness of neighboring pixels
- Example Potts Model
If (dL?dR), VK else, V0
d0 V2K d16 V3K d32 V3K d2 V4K
0
0
?
16
32
10Swap and Expansion Moves
More chances of finding more local minimum
E
- Weak move
- Modifies 1 label at a time
- Standard move
- Strong
- Modifies multiple labels at a time
- Proposed swap and expansion move
Init.
Weak
Strong
a-ßswap
aexpansion
Initial labeling
Standard move
114-connected structure
- Most common graph/MRF(BP) structure in stereo
2-variable Graph-Cut
Source
a
Observable nodes
D
V
V
V
V
Hidden nodes
a
Sink
MRF in Belief Propagation D,V are vectors
12Hierarchical Exhaustive Search on
- Presenter Nelson Chang
- Institute of Electronics,
- National Chiao Tung University
13Outline
- Combinatorial Optimization
- Graph-Cut
- Exhaustive Search
- Iterated Conditional Modes
- Hierarchical Exhaustive Search
- Result
- Summary Next Step
14Combinatorial Optimization
- Determine a combination (pattern, set of labels)
such that the energy of this combination is
minimum - Example 4-bit binary label problem
- Find a label-set which yields the minimal energy
- Each individual bit can be set as 0 or 1
- Each label corresponds to an energy cost
- Each neighboring bit pair is better to have the
same label (smoothness)
Energy(0000)
9992100101 392
Energy(0001)
99921009810 399
15Graph-Cut
- Formulate the previous problem into a graph-cut
problem - Find the cut with minimum total capacity (cost,
energy) - Solving the graph-cut Ford-Fulkurson Method
0
3
13
12
2
?
?
?
?
10
10
10
9
7
0
1
2
3
14
4
1
1
1
Total Flow Pushed
997910098
1
10
3
390? Max Flow (Energy of the cut 1100)
16Exhaustive Search
- List all the combinations and corresponding
energy - Example 1100 has the minimal energy of 390
Label set Energy Label set Energy
0000 392 1000 403
0001 399 1001 410
0010 426 1010 437
0011 413 1011 424
0100 399 1100 390
0101 414 1101 397
0110 413 1110 404
0111 400 1111 391
17Iterated Conditional Modes
- Iteratively finds the best label under the
current given condition - Greedy
- Different starting decision (initial condition)
result in different result - Can find local minima
- Example
- Start with bit 1 because it is more reliable
- Iteration order bit1?bit0?bit2?bit3
- Final solution 1100
0
0
1
1
2
3
0
1
100(1)lt9910(0) ? 1
79(1)lt92(0) ? 1
10010 (0)lt114 (1) ? 0
101 (0)lt9810(1) ? 0
18Exhaustive Search Engine
- Exhaustive search can be hardware implemented
- Less sequential dependency
- Not suitable for graph larger than 4x4
Result of fully connected graph, NOT 4-connected
graph
19Hierarchical Graph-Cut?
- Solve large n graph with multiple small n GCE
hierarchically - Example
- Solve n16 with 41 n4 graph-cuts
For each sub-graph, find the best 2 label-sets
Sub-graph 0
Sub-graph 1
For each sub-graph vertice Label 0 1st label
set Label 1 2nd label set
Assumption !! The optimal solution must be
within the combinations of sub-graph label sets !!
Sub-graph 2
Sub-graph 3
20HGC Speed up Evaluation
- For an 8-point GCE with 8-set of ECUs
- Cost 300 eq. adders
- Latency 41 cycles per graph
- If only 1 GCE is used to compute 64-point 2
variable graph-cut
Latency 41 cycles x 8 41 cycles TV 369
cycles TV If V is computed for each
pixels Tv(8x8)X(8x7/2)X43584 Total Latency
3953 cycles
Question Is this solution the optimal label set
for n64???
21Hierarchical Exhaustive Search
pat0 is the best candidate pattern pat1 is 2nd
best candidate pattern
- 64x64 nodes
- 4x4 based pyramid structure
- 3 levels
Level 2
D_at_lv2 ?E0/E1_at_lv1 Label0_at_lv2 ?pat0_at_lv1 Label1_at_lv2
?pat1_at_lv1
Level 1
D_at_lv1 ?E0/E1_at_lv0 Label0_at_lv1 ?pat0_at_lv0 Label1_at_lv1
?pat1_at_lv0
Level 0
D_at_lv0 ?D0/D1_at_lv0 Label0_at_lv0 ?Label0 Label1_at_lv0
?Label1
22Computing V term at Level 1
- For 1st order neighboring sub-graphs Gi and Gj
- possible neighboring pair combination
- (pat0i, pat0j)
- (pat0i, pat1j)
- (pat1i, pat0j)
- (pat1i, pat1j)
- Compute V(patXi,patXj) with original neighboring
cost - Example
- V(pat0i, pat0j) K
- V(pat0i, pat1j) KKK 3K
Gi
Gj
pat0i
pat0j
?
?
?
0
0
?
?
?
?
?
?
0
0
?
?
?
?
?
?
0
1
?
?
?
?
?
?
1
1
?
?
?
pat0i
pat1j
?
?
?
0
1
?
?
?
?
?
?
0
0
?
?
?
?
?
?
0
1
?
?
?
?
?
?
1
0
?
?
?
23Result of 16x16 (256) 2 level HES
- Random generated 100 graphs
- D/V 10
- Symmetric V20
- Error Rate
- Max 17/256 6.6
- Average 7/256 2.8
- Min 2/256 0.8
24Result of 64x64 (4096) 3 level HES
- Random generated 100 graphs
- D/V 10
- Symmetric V20
- Error Rate
- Max 185/4096 4.5
- Average 146/4096 3.6
- Min 115/4096 2.8
25Death Sentence to HES
- Presenter Nelson Chang
- Institute of Electronics,
- National Chiao Tung University
26Error Rate vs. Graph Size
Error rate range became smaller
Average Energy Increase Average Error Rate Error Rate Standard deviation Min Error Rate Max Error Rate
16x16 (2 level) 0.20 (7/256) 2.74 1.43 0.39 8.59
64x64 (3 level) 0.25 (149/4096) 3.63 0.40 2.58 4.54
256x256 (4 level) 0.28 (2393/65536) 3.65 0.09 3.36 3.89
3.63 vs. 3.65 Error rate did not increase
significantly
27Impact of different V cost
- 64x64(3 level) HES
- 100 patterns per V cost value
- D cost (average over s-link caps of 10
patterns, 2 for each V) - Average 162.8
- Std.Dev 94.4
- V cost
- 10, 20, 40, 60, 80
28Stereo Matching Case
- Stereo Pair Tsukuba
- Expansion with random label order
- 15 labels ? 15 graph-cut computations
- Graph Size 256 x 256
- D term truncated Sum of Squared Error (tSSE)
- Truncated at AD20
- V term Potts model
- K20
291st iteration result
5
BnKs expansion result
4
- Error rate might exceed 20 for important
expansion moves
9
Label (a) Error Rate () Energy Difference () Label (a) Error Rate () Energy Difference ()
0 0.62 12.7 8 5.01 28.3
1 1.07 16.1 9 12.01 42.0
2 0.00 0.0 10 5.55 32.2
3 2.76 24.7 11 5.18 30.5
4 21.59 38.7 12 5.33 31.3
5 22.91 44.2 13 7.07 34.6
6 9.21 32.1 14 2.98 23.2
7 7.83 40.0
Important expansions
30Reason for failure
- Best 2 local candidates does NOT include the
final optimal solution - Error often happen near lv2 and lv3 block
boundary - Majority node has both 0 source and sink link
capacity - More dependent on neighboring nodes label
- DV ratio 5620 ? 2.81
- Similar to DV 16360 case
- Error rate for random pattern 15
Best 2 patterns in does NOT consider the
pattern of
31Partitioned (Block) Graph-Cut
- Presenter Nelson Chang
- Institute of Electronics,
- National Chiao Tung University
32Motivation
- Global
- Considers the whole picture
- More information
- Local
- Considers a limited region of a picture
- Less information
Is it necessary to use that much information in
global methods??
33Concept
- Original full GC
- 1 big graph
- Partitioned GC
- N smaller graphs
Whats the smallest possible partition to achieve
the same performance?
34Experiment Setting
- Energy
- D term
- Luma only
- Birchfield-Tomasi cost (best result at half-pel
position) - Square Error
- V term
- Potts Model V K x T(di?dj)
- K constant is the same for all partition
- Partition Size
- 4x4, 16x16, 32x32, 64x64, 128x128
- Stereo Pairs
- Tsukuba, Teddy, Cones, Venus
35Tsukuba 4x4, 16x16, 32x32, 64x64
4x4
16x16
64x64
32x32
36Tsukuba 96x96, 128x128
Full GC
128x128
96x96
37Venus 32x32, 64x64
64x64
32x32
38Venus 96x96, 128x128
Full GC
96x96
128x128
39Teddy 32x32, 64x64
32x32
64x64
40Teddy 96x96, 128x128
Full GC
96x96
128x128
41Cones 32x32, 64x64
64x64
32x32
42Cones 96x96, 128x128
Full GC
96x96
128x128
43Middleburry Result
Evaluation Web Page http//cat.middlebury.edu/ster
eo/
Tsukuba Tsukuba Tsukuba Venus Venus Venus Teddy Teddy Teddy Cones Cones Cones
BlockSize nonocc ALL Disc nonocc ALL Disc nonocc ALL Disc nonocc ALL Disc
Best 6.0 6.8 24.7 2.9 4.4 19.4 14.5 22.8 29.1 11.2 20.9 20.9
Full 8.6 9.4 27.4 3.3 4.7 18.4 14.5 22.8 29.1 14.5 23.8 24.2
32 16.2 17.0 33.7 24.0 24.9 29.6 27.6 34.9 35.2 24.4 32.7 30.2
64 10.6 11.5 29.6 10.0 11.3 19.5 19.1 27.1 29.8 19.2 28.0 27.3
96 9.4 10.2 28.7 9.1 10.4 20.5 16.3 24.7 29.5 15.1 24.3 24.7
128 8.8 9.5 27.3 8.4 9.6 21.5 15.2 23.5 28.6 14.6 23.9 24.0
Best Full GC with best parameter Full Full GC
with k20(tsukuba) and 60 (others)
44Summary
- Smallest possible partition size (2 accuracy
drop) - Tuskuba?64x64
- Teddy Cones ? 96x96
- Venus ? larger than 128x128
- Benefits
- Possible complexity or storage reduction
- Parallelism increase
- Drawbacks
- Performance (disparity accuracy) drop
- PC computation becomes longer
45Hierarchical Parallel Graph-Cut
- Presenter Nelson Chang
- Institute of Electronics,
- National Chiao Tung University
46Concept of Hierarchical Parallel GC
- Bottom Up
- Solve graph-cut for smaller subgraphs
- Solve graph-cut for larger subgraphs
- Larger subgraphs set of neighboring smaller
subgraphs
!!Each subgraph is temporary independent !!
Larger subgraph sg0sg1sg2sg3
sg0
sg1
Level 0
Level 1
sg2
sg3
47HPGC for solving a 256x256 graph
Step 1 64 32x32 Lv0 subgraphs
Step 2 16 64x64 Lv1 subgraphs
Step 3 4 128x128 Lv2 subgraphs
Step 4 1 256x256 Lv3 subgraphs
Total graph-cut computations 641641 85
!!HPGC must used Ford-Fulkerson-based methods!!
48Boykov and Kolmogorovs Motivation
1
1
1
- Dinic Method
- Search the shortest augmenting path
- Use Breadth First Search (BFS)
- Example
- Search shortest path (length k)
- Use BFS, expand the search tree
- Find all paths of length k
- Search shortest path (length k1),
- Use BFS, RE-expand the search tree again
- Find all paths of length (k1)
- Search shortest path (length k2),
- Use BFS, RE-RE-expand the search tree again
- ..
1
1
1
1
1
1
1
Why dont we REUSE the expanded tree?
49BnKs Method
- Concept
- Reuse the already expanded trees
- Avoid re-expanding the tress from scratch
(nothing) - 3 stages
- Growth
- Grow the search tree
- Augmentation
- Ford-Fulkerson style augmentation
- Adoption
- Reconnect the unconnected sub-trees
- Connect the orphans to a new parent
Augmenting Path
Saturate Critical Edge
Adopt Orphans
50Feature of BnK method
- Based on Ford-Fulkerson
- Bidirection search tree constructon
- Searched tree reuse
- Determine label (source or sink) using tree
connectivity
Source tree
Sink tree
51Connectivity is why HPGC works
- Example a 2x4 binary variable graph
Graph view
Tree view
52Connectivity of the various cases
53How to add edges
- When should node A and B check their edge
- If A B belong to different search trees
- A is in a sink tree, B is in a source tree
- A is in a source tree, B is in a sink tree
- Implies a source-gtsink path
- If A or B is an orphan (not connected to any
tree) - A is an orphan, B is not an orphan
- A is not an orphan, B is an orphan
- Check for possible connectivity of the orphan
B
A
54Complexity Result
- Method
- Annotate each line of code with basic operations
- Read
- Write
- Arithmetic
- Logic
- Compare
- Branch
- Examples
- CAB 2R, 1W, 1A
- If(AB) 2R, 1C, 1B
55Stereo Matching Case
- Stereo Pair Tsukuba
- Expansion with random label order
- 15 labels ? 15 graph-cut computations
- Graph Size 256 x 256
- D term truncated Sum of Squared Error (tSSE)
- Truncated at AD20
- V term Potts model
- K20
561st iteration result
5
BnKs expansion result
4
- Label 4, 5, 9 are key moves
9
Label (a) Error Rate () Energy Difference () Label (a) Error Rate () Energy Difference ()
0 0.62 12.7 8 5.01 28.3
1 1.07 16.1 9 12.01 42.0
2 0.00 0.0 10 5.55 32.2
3 2.76 24.7 11 5.18 30.5
4 21.59 38.7 12 5.33 31.3
5 22.91 44.2 13 7.07 34.6
6 9.21 32.1 14 2.98 23.2
7 7.83 40.0
Important expansions
57Full BnK Graph-cut Operation Distribution
- 256x256 graph Tsukuba iteration 0 label 5
77,407,307 Operations
Memory access dominant Control 22 Arithmetic is
insignificant
58Full GC vs. HPGC
- 256x256 graph Tsukuba iteration 0 label 5
77,407,307 Operations
16PE
4PE
8PE
32PE
64PE
59Conclusion
- HPGC can improve speed with multiple PEs
- To perform 30 fps, 30 labels, 256x256 graph-cut
- 1PE_at_100MHz
- Averge cycle budget for each subgraph 1.3K
cycles - Lv0 subgraph is 32x32
- Next step
- Small BnK graph-cut engine architecture design
- Estimate speed/cost
60Progress Check
- Previous plan
- Parallel graph-cut engine for binary-variable
graph - Based on Boykov and Kolmogorovs graph cut
algorithm - Complexity analysis done
- Hierarchical parallel algorithm SW model done
- Small BnK graph-cut engine architecture design
- Based on Boykov and Kolmogorovs algorithm next
2 weeks - Hierarchical parallel graph-cut engine
architecture design - Based my hierarchical parallel algorithm modified
from BnKs algorithm June/July