Title: Visual Grouping and Recognition
1Visual Grouping and Recognition
2UCB Collaborators
- Prof. Jitendra Malik
- Prof. Dave Patterson
- Charless Fowlkes
- Doron Tal
- See http//www.cs.berkeley.edu/dmartin/papers/icc
v01.pdf
3From images to objects
Labeled sets tiger, grass etc
4Recognition
- Object classes are hierarchical (e.g. leopard vs.
clouded leopard vs. my pet clouded leopard
Leopold) - Must be tolerant to changes in pose and
illumination
5Framework for Recognition Three stages
- Segmentation Images ? Regions
- Association Regions ? Super-regions
- Matching Super-regions ? Prototype views
6Segmentation Images ? Regions
- Slight over-segmentation of objects is OK
- Under-segmentation is BAD
7Association Regions ? Super-Regions
- Simple enumeration of connected components
- Number of super-regions of size k in image with n
regions is approximately (4k)n/k - For typical images, number is 1K-10K
- Plausibility ordering could reduce effective
number substantially
8Matching Super-regions ? Prototype Views
- Objects are represented by a set of prototypical
views (10 per object) - For each super-region S, calculate probability
that it is an instance of view V - Determine most probable labeling of image into
objects - Tolerant to
- pose and illumination changes
- intra-category variation
- error in segmentation and association steps
9Focus on Segmentation
- Segmentation Recognition (?!)
- Need quantitative measures A benchmark!
- MNIST for handwritten digits
- SPEC for CPUs
- WinStone for PCs
- TPC-C for transaction processing DBs
- Segmentation / Recognition on ISTORE
10Step 1 Establish Ground-truth
- We want high-level Gold Standard segmentations.
- Range of granularities
- Segmentation tool in Java
- Explicit partition of image into pixel sets
- Ease of deployment
- 10 UCB grad students
- 60 images, 180 total segmentations
- Up to 5 segmentations/image by different people
- Goal 1K images, 5K segmentations
11Human Segmentations (1)
12Human Segmentations (2)
13Step 2 Similarity / Error Measures
14Segmentation Refinement
B
A
C
D
15Local Refinement Error
- How much is segmentation S1 a refinement of
segmentation S2 at pixel pi?
E(S1,S2,pi) (R(S1,pi)\R(S2,pi)
R(S1,pi)
16Segmentation Error Measures
- Global Consistency Error (GCE)
- Refinement in same direction at all pixels
- GCE 1/n min ?i E(S1,S2,pi), ?i E(S2,S1,pi)
- Local Consistency Error (LCE)
- Refinement in either direction at each pixel
- LCE 1/n ?i min E(S1,S2,pi), E(S2,S1,pi)
17Measure Results
GCE (human vs. human)
LCE (human vs. human)
GCE (NCuts vs. human)
0.11
0.07
0.28
SAME
0.39
0.30
0.38
DIFFERENT
18NCuts Per-Image Error
Red NCuts vs. Human
Blue Human vs. Human
19Future Work Dataset
- Obtain 1000-5000 segmentations
- Hire undergrads
- Vision groups at other schools
- More widespread deployment on web?
- Release dataset to community
- Informal public forum for segmentation algorithm
comparison - e.g. MNIST
20Future Work Segmentation Algorithms
- Cue combination is the key
- luminence, color, texture, motion, stereoscopic
depth, familiar configuration - Feedback between segmentation and matching
- Distributed computational framework for exploring
these issues - Millennium or ISTORE
21Visual Recognition on ISTORE
- Sequential code
- Segmentation 5 minutes / image
- Association Negligible
- Matching 0.5 sec / match
- Memory requirements are very low
- 10K object categories 10 views/category 100
100 pixels/view 1 byte/pixel 1 GB. - Computation on 104 node ISTORE
- Segmentation
- 50 embarrassingly parallel (many convolutions)
- 50 sparse eigenvalue problem
- ? Frame rate throughput, but not latency
22- Matching Embarrassingly parallel
- 1K candidate super-regions
- 20K matches/sec at full resolution
- Consider only 1 of matches at full resolution
(10 pass color/texture filter, 10 of those pass
low resolution shape filter) - If half time spent in pruning and half in full
resolution matching, we get 10K matches/sec - Worst case 100 object categories
- Best case depends on how well one can exploit
context, hierarchy and hashing. - Humans can recognize 10K-100K objects
23ISTORE Applications Summary
- Segmentation algorithm development
- Much compute, little storage
- Real-time recognition of image/video stream
content - Plus storage for subsequent retrieval
- Content-based indexing of all the images/video on
the Internet!