Visual Grouping and Recognition - PowerPoint PPT Presentation

About This Presentation

Title:

Visual Grouping and Recognition

Description:

... (e.g. leopard vs. clouded leopard vs. my pet clouded leopard Leopold) Must be tolerant to changes in pose and illumination. Framework for Recognition: Three stages ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 24

Provided by: sche59

Learn more at: http://iram.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Visual Grouping and Recognition

1
Visual Grouping and Recognition

David Martin
UC Berkeley

2
UCB Collaborators

Prof. Jitendra Malik
Prof. Dave Patterson
Charless Fowlkes
Doron Tal
See http//www.cs.berkeley.edu/dmartin/papers/icc
v01.pdf

3
From images to objects
Labeled sets tiger, grass etc
4
Recognition

Object classes are hierarchical (e.g. leopard vs.
clouded leopard vs. my pet clouded leopard
Leopold)
Must be tolerant to changes in pose and
illumination

5
Framework for Recognition Three stages

Segmentation Images ? Regions
Association Regions ? Super-regions
Matching Super-regions ? Prototype views

6
Segmentation Images ? Regions

Slight over-segmentation of objects is OK
Under-segmentation is BAD

7
Association Regions ? Super-Regions

Simple enumeration of connected components
Number of super-regions of size k in image with n
regions is approximately (4k)n/k
For typical images, number is 1K-10K
Plausibility ordering could reduce effective
number substantially

8
Matching Super-regions ? Prototype Views

Objects are represented by a set of prototypical
views (10 per object)
For each super-region S, calculate probability
that it is an instance of view V
Determine most probable labeling of image into
objects
Tolerant to
pose and illumination changes
intra-category variation
error in segmentation and association steps

9
Focus on Segmentation

Segmentation Recognition (?!)
Need quantitative measures A benchmark!
MNIST for handwritten digits
SPEC for CPUs
WinStone for PCs
TPC-C for transaction processing DBs
Segmentation / Recognition on ISTORE

10
Step 1 Establish Ground-truth

We want high-level Gold Standard segmentations.
Range of granularities
Segmentation tool in Java
Explicit partition of image into pixel sets
Ease of deployment
10 UCB grad students
60 images, 180 total segmentations
Up to 5 segmentations/image by different people
Goal 1K images, 5K segmentations

11
Human Segmentations (1)
12
Human Segmentations (2)
13
Step 2 Similarity / Error Measures
14
Segmentation Refinement
B
A
C
D
15
Local Refinement Error

How much is segmentation S1 a refinement of
segmentation S2 at pixel pi?

E(S1,S2,pi) (R(S1,pi)\R(S2,pi)
R(S1,pi)
16
Segmentation Error Measures

Global Consistency Error (GCE)
Refinement in same direction at all pixels
GCE 1/n min ?i E(S1,S2,pi), ?i E(S2,S1,pi)
Local Consistency Error (LCE)
Refinement in either direction at each pixel
LCE 1/n ?i min E(S1,S2,pi), E(S2,S1,pi)

17
Measure Results
GCE (human vs. human)
LCE (human vs. human)
GCE (NCuts vs. human)
0.11
0.07
0.28
SAME
0.39
0.30
0.38
DIFFERENT
18
NCuts Per-Image Error
Red NCuts vs. Human
Blue Human vs. Human
19
Future Work Dataset

Obtain 1000-5000 segmentations
Hire undergrads
Vision groups at other schools
More widespread deployment on web?
Release dataset to community
Informal public forum for segmentation algorithm
comparison
e.g. MNIST

20
Future Work Segmentation Algorithms

Cue combination is the key
luminence, color, texture, motion, stereoscopic
depth, familiar configuration
Feedback between segmentation and matching
Distributed computational framework for exploring
these issues
Millennium or ISTORE

21
Visual Recognition on ISTORE

Sequential code
Segmentation 5 minutes / image
Association Negligible
Matching 0.5 sec / match
Memory requirements are very low
10K object categories 10 views/category 100
100 pixels/view 1 byte/pixel 1 GB.
Computation on 104 node ISTORE
Segmentation
50 embarrassingly parallel (many convolutions)
50 sparse eigenvalue problem
? Frame rate throughput, but not latency

Matching Embarrassingly parallel
1K candidate super-regions
20K matches/sec at full resolution
Consider only 1 of matches at full resolution
(10 pass color/texture filter, 10 of those pass
low resolution shape filter)
If half time spent in pruning and half in full
resolution matching, we get 10K matches/sec
Worst case 100 object categories
Best case depends on how well one can exploit
context, hierarchy and hashing.
Humans can recognize 10K-100K objects

23
ISTORE Applications Summary