Visual Grouping and Recognition - PowerPoint PPT Presentation

About This Presentation
Title:

Visual Grouping and Recognition

Description:

... (e.g. leopard vs. clouded leopard vs. my pet clouded leopard Leopold) Must be tolerant to changes in pose and illumination. Framework for Recognition: Three stages ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 24
Provided by: sche59
Category:

less

Transcript and Presenter's Notes

Title: Visual Grouping and Recognition


1
Visual Grouping and Recognition
  • David Martin
  • UC Berkeley

2
UCB Collaborators
  • Prof. Jitendra Malik
  • Prof. Dave Patterson
  • Charless Fowlkes
  • Doron Tal
  • See http//www.cs.berkeley.edu/dmartin/papers/icc
    v01.pdf

3
From images to objects
Labeled sets tiger, grass etc
4
Recognition
  • Object classes are hierarchical (e.g. leopard vs.
    clouded leopard vs. my pet clouded leopard
    Leopold)
  • Must be tolerant to changes in pose and
    illumination

5
Framework for Recognition Three stages
  • Segmentation Images ? Regions
  • Association Regions ? Super-regions
  • Matching Super-regions ? Prototype views

6
Segmentation Images ? Regions
  • Slight over-segmentation of objects is OK
  • Under-segmentation is BAD

7
Association Regions ? Super-Regions
  • Simple enumeration of connected components
  • Number of super-regions of size k in image with n
    regions is approximately (4k)n/k
  • For typical images, number is 1K-10K
  • Plausibility ordering could reduce effective
    number substantially

8
Matching Super-regions ? Prototype Views
  • Objects are represented by a set of prototypical
    views (10 per object)
  • For each super-region S, calculate probability
    that it is an instance of view V
  • Determine most probable labeling of image into
    objects
  • Tolerant to
  • pose and illumination changes
  • intra-category variation
  • error in segmentation and association steps

9
Focus on Segmentation
  • Segmentation Recognition (?!)
  • Need quantitative measures A benchmark!
  • MNIST for handwritten digits
  • SPEC for CPUs
  • WinStone for PCs
  • TPC-C for transaction processing DBs
  • Segmentation / Recognition on ISTORE

10
Step 1 Establish Ground-truth
  • We want high-level Gold Standard segmentations.
  • Range of granularities
  • Segmentation tool in Java
  • Explicit partition of image into pixel sets
  • Ease of deployment
  • 10 UCB grad students
  • 60 images, 180 total segmentations
  • Up to 5 segmentations/image by different people
  • Goal 1K images, 5K segmentations

11
Human Segmentations (1)
12
Human Segmentations (2)
13
Step 2 Similarity / Error Measures
14
Segmentation Refinement
B
A
C
D
15
Local Refinement Error
  • How much is segmentation S1 a refinement of
    segmentation S2 at pixel pi?

E(S1,S2,pi) (R(S1,pi)\R(S2,pi)
R(S1,pi)
16
Segmentation Error Measures
  • Global Consistency Error (GCE)
  • Refinement in same direction at all pixels
  • GCE 1/n min ?i E(S1,S2,pi), ?i E(S2,S1,pi)
  • Local Consistency Error (LCE)
  • Refinement in either direction at each pixel
  • LCE 1/n ?i min E(S1,S2,pi), E(S2,S1,pi)

17
Measure Results
GCE (human vs. human)
LCE (human vs. human)
GCE (NCuts vs. human)
0.11
0.07
0.28
SAME
0.39
0.30
0.38
DIFFERENT
18
NCuts Per-Image Error
Red NCuts vs. Human
Blue Human vs. Human
19
Future Work Dataset
  • Obtain 1000-5000 segmentations
  • Hire undergrads
  • Vision groups at other schools
  • More widespread deployment on web?
  • Release dataset to community
  • Informal public forum for segmentation algorithm
    comparison
  • e.g. MNIST

20
Future Work Segmentation Algorithms
  • Cue combination is the key
  • luminence, color, texture, motion, stereoscopic
    depth, familiar configuration
  • Feedback between segmentation and matching
  • Distributed computational framework for exploring
    these issues
  • Millennium or ISTORE

21
Visual Recognition on ISTORE
  • Sequential code
  • Segmentation 5 minutes / image
  • Association Negligible
  • Matching 0.5 sec / match
  • Memory requirements are very low
  • 10K object categories 10 views/category 100
    100 pixels/view 1 byte/pixel 1 GB.
  • Computation on 104 node ISTORE
  • Segmentation
  • 50 embarrassingly parallel (many convolutions)
  • 50 sparse eigenvalue problem
  • ? Frame rate throughput, but not latency

22
  • Matching Embarrassingly parallel
  • 1K candidate super-regions
  • 20K matches/sec at full resolution
  • Consider only 1 of matches at full resolution
    (10 pass color/texture filter, 10 of those pass
    low resolution shape filter)
  • If half time spent in pruning and half in full
    resolution matching, we get 10K matches/sec
  • Worst case 100 object categories
  • Best case depends on how well one can exploit
    context, hierarchy and hashing.
  • Humans can recognize 10K-100K objects

23
ISTORE Applications Summary
  • Segmentation algorithm development
  • Much compute, little storage
  • Real-time recognition of image/video stream
    content
  • Plus storage for subsequent retrieval
  • Content-based indexing of all the images/video on
    the Internet!
Write a Comment
User Comments (0)
About PowerShow.com