Title: Sliding Windows
1Sliding Windows Silver Bullet or Evolutionary
Deadend?
- Alyosha Efros, Bastian Leibe, Krystian
Mikolajczyk - Sicily Workshop, Syracusa, 23.09.2006
2What is a Sliding Window Approach?
- Search over space and scale
- Detection as subwindow classification problem
- In the absence of a more intelligent strategy,
any global image classification approach can be
converted into a localization approach by using a
sliding-window search.
3Task Object Localization in Still Images
- What options do we have to choose from?
- Sliding window approaches
- Classification problem
- PapageorgiouPoggio,00, SchneidermanKanade,0
0, ViolaJones,01, Mikolajczyk et al.,04,
Torralba et al.,04, DalalTriggs,05,
WuNevatia,05, Laptev,06, - Feature-transform based approaches
- Part-based generative models, typically with a
star topology - Fergus et al.,03, LeibeSchiele,04,
Fei-Fei et al.,04, FelszenszwalbHuttenlocher,
05, WinnCriminisi,06, Opelt et al.,06,
Mikolajczyk et al.,06, - Massively parallel NN architectures
- e.g. convolutional NNs
- LeCun et al.,98, Osadchy et al.,04, Garcia
et al.,??, - Smart segmentation based approaches
- Localization based on robustified bottom-up
segmentation - TodorovicAhuja,06, RothOmmer,06
4Sliding-Window Approaches
- Pros
- Can draw from vast stock of ML methods.
- Independence assumption between subwindows.
- Makes classification easier.
- Process can be parallelized.
- Simple technique, can be tried out very easily.
- No translation/scale invariance required in
model. - There are methods to do it very fast.
- Cascades with AdaBoost/SVMs
- Good detection performance on many benchmark
datasets. - e.g. face detection, VOC challenges
- Direct control over search range (e.g. on ground
plane).
5Sliding-Window Approaches
- Cons
- Can draw from vast stock of ML methodsas long
as they can be evaluated in a few ms. - Need to evaluate many subwindows (100000s).
- ? Needs very fast accurate classification
- ? Many training examples required, often limited
to low training resolution. - ? Can only deal with relatively small occlusions.
- Still need to fuse resulting detections
- ? Hard/suboptimal from binary classification
output - Classification task often ill-defined
- How to label half a car?
- Difficult to deal with changing aspect ratios
6Duality to Feature-Based Approaches
- How to find maxima in the Hough space
efficiently? - Maxima search coarse-to-fine sliding window
stage! - Main differences
- All features evaluated upfront (instead of in
cascade). - Generative model instead of discriminative
classifier. - Maxima search already performs detection fusion.
7So What is Left to Oppose?
- Feature-based vs. Window-based?
- (Almost) exclusive use of discriminative methods
- Low training resolutions
- How to deal with changing aspect ratios?
81. Feature-based vs. Window-based
- May be mainly an implementation trade-off
- Few, localized features ? feature-based
evaluation better - Many, dense features ? window-based evaluation
better - Noticed already by e.g. Schneiderman,04
- The trade-offs may change as your method develops
92. Exclusive Use of Discriminative Methods
Leibe Schiele,04
10Generative Models for Sliding Windows
- Continuous confidence scores
- Smoother maxima in hypothesis space
- Coarser sampling possible
11Generative Models for Sliding Windows
- Continuous confidence scores
- Smoother maxima in hypothesis space
- Coarser sampling possible
- Backprojection capability
- Determine a hypothesiss support in the image
- Resolve overlapping cases
12Generative Models for Sliding Windows
- Continuous confidence scores
- Smoother maxima in hypothesis space
- Coarser sampling possible
- Backprojection capability
- Determine a hypothesiss support in the image
- Resolve overlapping cases
- Easier to deal with partial occlusion
- Part-based models
- Reasoning about missing parts
13Sliding Windows for Generative Models
- Apply cascade idea to generative models
- Discriminative training
- Evaluate most promising features first
14Sliding Windows for Generative Models
- Apply cascade idea to generative models
- Discriminative training
- Evaluate most promising features first
- Direct control over search range
- Only need to evaluate positions in search
corridor - Only need to consider subset of features
- Easier to adapt to different geometry(e.g.
curved ground surface) - ? Should combine discriminative and generative
elements!
Search corridor
153. Low Training Resolutions
- Many current s-w detectors operate on tiny images
- ViolaJones 24?24 pixels
- Torralba et al. 32?32 pixels
- DalalTriggs 64?96 pixels (notable exception)
- Main reasons
- Training efficiency (exhaustive feature selection
in AdaBoost) - Evaluation speed
- Want to recognize objects at small scales
- But
- Limited information content available at those
resolutions - Not enough support to compensate for occlusions!
164. Changing Aspect Ratios
- Sliding window requires fixed window size
- Basis for learning efficient cascade classifier
- How to deal with changing aspect ratios?
- Fixed window size
- ? Wastes training dimensions
- Adapted window size
- ? Difficult to share features
- Squashed views DalalTriggs
- ? Need to squash test image, too
17- What is wrong with sliding window? Search
complexity?
18- Is there anything that cannot be done with
sliding window?
19Sliding-Window Approaches
- Pros
- Can draw from vast stock of ML methods.
- Simple technique, can be tried out very easily.
- There are methods to do it very fast.
- Good detection performance on many benchmark
datasets. - Direct control over search range (e.g. on ground
plane). - Cons
- Need to evaluate many subwindows (100000s).
- ? Needs very fast accurate classification ?
cascades, AdaBoost - ? Many training examples, often limited to low
training resolution. - ? Can only deal with relatively small occlusions.
- Still need to fuse resulting detections
- ? Hard/suboptimal from binary classification
output - Difficult to deal with changing aspect ratios
20So What is Left to Oppose?
- Feature-based vs. Window-based?
- Mainly implementation trade-off
- (Almost) exclusive use of discriminative methods
- Why not apply generative methods instead, or
combinations? - ? Smoother maxima in sampled 3D space.
- ? Ability to backproject responses (top-down
segmentation). - ? Easier to deal with partial occlusions.
- Low training resolutions
- Only limited information content
- How to deal with changing aspect ratios?
- E.g. front side views of cars?
- Fixed/adaptive window size?
- How to share features between those?