Title: Tracking Across Multiple Cameras With Disjoint Views by Omar Javed, Zeeshan Rasheed, Khurram Shafiqu
1Tracking Across Multiple Cameras With Disjoint
Viewsby Omar Javed, Zeeshan Rasheed, Khurram
Shafique, and Mubarak Shah
- Shane Brennan
- 1 / 31 / 07
2The Problem
- Techniques to track individuals from a single
camera have improved, but methods of tracking an
individual across multiple cameras where there is
no overlapping region is still extremely
difficult - Object is hidden for an uindeterminate amount of
time - Appearance of an individual can change greatly
due to changes in viewpoint, lighting, and other
environmental changes
3An Example of a Camera Setup
4Some Notation
- Assume single-camera tracking problem solved
- K cameras, C1, C2, ..., Ck
- Oj Oj,1, Oj,2, ..., Oj,mj set of tracks
observed by camera Cj - Observations are broken into two parts,
appearance (app) and space-time (st) features
(location, velocity, time)
5Some Notation continued...
- Let be an ordered pair (Oa,b, Oc,d) be
hypothesis that observations Oa,b, and Oc,d are
consecutive tracks of the same object - Find correspondences K such that each
observation is preceded or succeeded by at most 1
other observation and exists in K only if
Oa,b and Oc,d correspond to the consecutive
tracks of the same object
6The Formulation
- Maximize the Posterior!
- is the
probability of the correspondance given the
observations Oi,a and Oj,b for two cameras Ci and
Cj.
7The Formulation, continued...
- From Bayes theorem, we getSince appearance
and space-time information are considered
independent we get
8The Prior
- The prior, is the probability that
an object transitions from Ci to Cj - By also assuming obervations are uniformly
distributed, Pi,j(Oi,a, Oj,b) becomes a constant
scale factor, so the posterior maximization
problem becomes
9Space-Time Conditional Probs
- Assume camera correspondances known
- Call S a sample of n, d-dimensional data points
x1, x2, ... xn, from a multivariate distribution
p(x). Estimate p(x) using the parzen window
technique - K(x) is a multivariate kernel equal to
10Space-Time cond probs continued
- H is a symmetric dxd bandwidth matrix which is
assumed to be diagonal in order to simplify
matters - X is a 7-dimensional feature vector containing,
exit location/velocity, entry location/velocity,
time of travel (inter-arrival time) - During training, as correspondances are found (or
hand labeled) the feature vector is added to S
11Inter-Arrival Times
- Is dependent on magnitude and direction of motion
- Dependent on location of exit and entry between
camera views - Locations of exit and entry points between
cameras is also correlated - Prior probability of correspondence of object
moving from Ci to Cj is calculated from ratio of
people exit Ci and enter Cj to the total number
of people that exit Ci during the learning phase
12Appearance Probs
- Need to model change in appearance across
cameras. Need to learn the appearance change
function! - Use color histogram, represent distance between
two histograms k and q using modified
Bhattacharyya coefficient
13Appearance Probs, continued
- Find the D for every object that goes between
cameras i and j, model the distances as a
Gaussian, this allows us to computeas being
equal towhere and are the
mean and variance of the color distance data
between cameras i and j
14Some Distance Histograms
15Establishing Correspondances
- Finding the K can be modelled as a path through a
directed graph. Each node is an observation Oi,a,
and a correspondance is an arc between two
nodes. The weight of the arc is the value from
the log-likelihood function - A solution K is a set of disjoint directed paths
in the graph, covering the entire graph (every
vertex is in exactly one path). Solution to the
MAP problem such that sum of weights of the arcs
in K is maximimum among all sets
16Establishing Corresp, continued...
- Can reduce this to finding maximum matching of an
undirected bipartite graph - Can be solved in O(n2.5) time using the method
described by Hopcroft and Karp in An n2.5
algorithm for maximum matchings in bipartite
graphs - Split each vertex into two vertices, v- and v,
v- is for the arcs coming into the vertex, and
v is for the vertices leaving the vertex
17The bad part...
- Method of establishing correspondances assumes
all observations available, cant be used in
real-time! - Fix using a sliding window. Is a tradeoff
between accuracy and timely availability of
results - Authors adjust size of sliding window online, but
still a sub-optimal solution, best to not need a
sliding window, but is inherent in the method!
18Online Update
- Incorporate new observations, discard old ones
- Achieve by estimating density of D from most
recent N samples. Update Gaussian
parameterswhere D is from the N recent
samples, and is a learning parameter
19Results
20Appearance Modeling for Tracking in Multiple
Non-overlapping Camerasby Omar Javed, Khurram
Shafique, and Mubarak Shah
- The goal A better representation for finding the
brightness transfer function between an
individual as seen in two separate cameras - Authors represent the change in appearance as a
function they call a Brightness Transfer Function
(BTF)
21Brightness-Transfer Functions
- The BTFs for a pair of cameras lies in a small
subspace of the space of all possible BTFs - For a one-to-one mapping of brightness values
objects must be planar and only have diffuse
reflectance
22Some Notation
- Li(p, t) is scene radiance at a world point p of
an object illuminated by white light from camera
Ci at time t - Assuming objects have no specular reflectance,
Li(p, t) is a product of a material term Mi(p, t)
M(p) (the albedo) and illumination/camera
geometry Gi(p, t) - So Li(p, t) M(p) Gi(p, t)
23BTF Formulation
- Assuming planarity, Gi(p, t) Gi(q, t) Gi(t)
for all points p and q on an object, so Li(p, t)
M(p)Gi(t) - Image irradiance Ei(p, t) is given as Ei(p, t)
Li(p, t)Yi(t) M(p)Gi(t)Yi(t) whereand is a
function of camera parameters, hi(t) and di(t)
are the focal length and aperture of the lens,
and is the angle the principal ray
from p makes with the optical axis. The cos term
is negligable and is replaced with a constant c
24BTF Formulation, continued...
- Denote Xi(t) as time of exposure, and gi as the
radiometric response function of camera Ci, then
the measured image brightness of world point p
Bi(p, t) can be written asBi(p, t) gi(Ei(p,
t)Xi(t)) gi(M(p)Gi(t)Yi(t)Xi(t)
)the radiometric response times the material
properties times the geometric properties times
the camera parameters times the time of exposure
25Calculating the BTF
- Assume a point p is viewed by cameras i and j,
since material properties remain constant - So the BTF, B(p, t), is given bywhere w(ti,
ti) is a function of camera parameters and
illumination/scene geometry of cameras i and j at
time ti and ti
26Calculating the BTF, continued...
- Previous equation is valid for any p, so can drop
p from the notation. Is implicit that BTF is same
for any pair of frames so can drop ti and ti for
simplicity. Let fij denote a BTF from camera i to
j, so - Create vector for fij by sampling and creating a
set of fixed increasing brightness values, Bi(1)
(fij(Bi(1)), ..., fij(Bi(d))
27Calculating the BTF, continued...
- Denote space of BTFs by , its dimension is
at most d, where d is number of brightness levels
(256 for typical cameras). Can show BTFs actually
lie in a small subspace, use theorem 1 - Theorem 1 The subspace of brightness transfer
functions has dimension at most m if for all
where gj is the radiometric response function of
camera Cj, and for all u, 1 are arbitrary but fixed 1D functions
28Calculating the BTF, continued...
- From Theorem 1, upper bound on dimension of
subspace depends on radiometric response of
camera j. Such functions are usually nonlinear
and differ from one camera to another, but do not
have exotic forms and are well-approximated by
simple parametric models - Can model by the gamma function, ieSo, for all a
and x in R
29Calculating the BTF, continued...
- Since can represent with gamma function, has
dimension of at most 2, as opposed to 256 - Can better represent the radiometric response
function with a polynomial if one desires, though
the dimension of the space of BTFs will be the
degree of the polynomial
30Estimating BTFs
- View objects in cameras i and j, normalize
histograms by assuming percentage of pixels with
brightness less than Bi is the same in both
views - Hi and Hj are normalized cumulative histograms of
object observations Oi and Oj, thenHi(Bi)
Hj(Bj) Hj(fij(Bi)), so fij(Bi) Hj-1(Hi(Bi)) - Use the function to compute BTF fij for every
pair of observations in training set, and define
Fij as the collection of all the fij's
31Estimating BTFs, continued...
- Use Principal PCA to learn the subspace of Fij
- By this model, a d-dimensional BTF, fij, can be
written as fij Wy fij e where y is a
normally distributed q dimensional subspace
variable that is less than d. W is a dxq
projection matrix, fij is the mean of Fij, and e
is isotropic Gaussian noise, ie e N(0, o2I).
Since y and e are normally distributed, fij is
given asfij N(fij, Z) where Z WWT o2I
32Estimating BTFs, continued...
- W is estimated as
where the q column vectors in the dxq
dimensional Uq are the eigenvectors of the
sample covariance matrix of Fij. Eq is the qxq
diagonal matrix of corresponding eigenvalues. R
is an arbitrary rotation matrix, set to be the
identity matrix, and - Can now compute the probability a particular BTF
belonging to the learned subspace of BTFs. Can do
this process for each color channel separately
33Incorporating BTFs into Tracking
- Use the BTF as a better estimate for finding the
distance between objects tracked in two separate
cameras as discussed in the previous
presentation - Provides a better comparison of object
appearances, leading to overall better tracking
34Results
35Results continued...
36A Comparison
37Histogram Comparison
38Thank You