Title: Feature Harvesting for TrackingbyDetection
1Feature Harvesting for Tracking-by-Detection
- Mustafa Ozuysal
- Vincent Lepetit
- Francois Fleuret
- Pascal Fua
2The Problem
- 3-D object detection and pose estimation problem.
- Starting from the simple ellipsoid, we robustly
learn both geometry and appearance. - Detecting the features at run-time and compute
the pose of a moving object. -
3The Ellipsoid
We define an ellipsoid that roughly projects at
the objects location in the first frame (This is
a rough model for initialization). Then, we
extract feature points inside this projection and
use the image patches surrounding them to train a
first classifier.
4Randomized Trees for Feature Recognition--outline
- The approach relies on matching image features
from training images and those from run-time
(large perspective and scale variations) - Formulating this wide baseline matching problem
as a classification problem. - Treating the set of all possible views of each
individual 3-D point (an object feature) as a
class - During training, image feature associated to an
object feature are extracted, i.e., some image
patches surrounding the image features are
extracted to train a set of randomized trees
5Classifier, Object Features, Image Patches
- A set of 3-D object features on the target
object - A number of image patches centered on the
projections of into image j - A classifier is trained using
- maps a given patch to a feature
- At running time, can then be used to
recognize the object features by considering the
image patch f around a detected image feature.
The 3-D position of is required to compute
3-D pose
6Randomized Trees as a classifier
- Randomized Trees are particularly well adapted
because naturally handle multi-class problems. - The tree leaves contain estimates of the
posterior distribution over the classes, which is
learned from training data.
7Randomized Trees as a classifier (contd)
- A patch f is classified by dropping it down each
tree and performing an elementary test at each
node, which sends it to one side or the other,
and considering the sum of the probabilities
stored in the leaves it reaches.
8Elementary Test in Trees
- Simple binary test at each node
- If I(f,m1)ltI(f,m2)
- go to left child,
- Otherwise
- go to right child
- I intensity f image patch m1,m2 pixels
9Randomized Trees as a classifier (contd)
10Randomized Trees and On-line Learning
- The approach described above assumes that the
complete training set is available from the
beginning, which is not true in our case as
object features may be added or removed while the
classifier is being trained. - A tree-building method should be used
11Randomized Trees and On-line Learning --Tree
Building
- we build tree by randomly selecting the
elementary tests. - The training data is only used to evaluate the
posterior probabilities in the leaves of these
randomly generated trees.
12Randomized Trees and On-line Learning --Tree
Updating
- Incorporating new views of object features
- This only requires storing the normalizing term
and keep the counters for each class. We then
used newly detected patches to increment the
counters. - Removing object features
- Replacing object features
- 3-D coordinates M1?M2, image patches associated
to M2 to update posterior
13From Harvesting to Detection--overview
- How to generate classes To initialize the
training process, positioning the ellipsoid,
projecting to the image, extracting some image
features and back-projecting them to the
ellipsoid, then creating an initial set of object
features Mi. - How to generate training image patches By affine
warping lightly the image patches surrounding the
images features, we create the image patches that
let us instantiate a first set of randomized
trees. - During training, new features detected on the
objects are integrated into the classifier, but
we need to select among the existing object
features
14From Harvesting to Detection-Five Steps of
Harvesting
- Given the trained features up to time t-1
- We extract image features from frame t and use
the classifier to match them, which, in general,
will only be successful for a subset of these
features. - We derive a first estimate of the camera pose
from these correspondences using a robust
estimator that lets us reject erroneous
correspondences. - We use to project unmatched image features
from frame t-1 into frame t and match them by
looking for the image features closest to their
projections
15From Harvesting to Detection-Five Steps of
Harvesting (contd)
- Using these additional correspondences, we derive
a refined estimate - We use small affine warping of the patches around
image features matched in frame to update the
classifier (incorporating new views). Features
that have not been recognized often are removed
to be replaced by new ones.
16From Harvesting to Detection-Detection
- At run-time, we use the exact same procedure,
with one single change we stop updating the
classifier.
173-D Tracking by Detection
183-D Tracking by Detection (contd)
19Feature Harvesting
- During training we use the same process but now
the classifier is not initially available and we
want to create it incrementally by feature
harvesting. - Let us first denote by the best classifier
obtained with the images and the feature
correspondences computed using the poses
20Feature Harvesting (contd)
21Feature Harvesting (contd)
Once the pose is found, is updated using
correspondences between object features and image
features to give
22Feature Harvesting (contd)
- To validate this training procedure, we performed
the experiment depicted in the following figure,
which clearly shows that the recovered camera
trajectory does not drift
23Experiments
24Experiments