Title: Segmentation and Tracking of Multiple Humans in Crowded Environments
1Segmentation and Tracking of Multiple Humans in
Crowded Environments
- Tao Zhao, Ram Nevatia, Bo WuIEEE TRANSACTIONS
ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 30, NO. 7, JULY 2008
2Outline
- Introduction
- Overview
- Probabilistic modeling
- Computing MAP by efficient MCMC
- Experimental results
- Conclusion
3Introduction
- Segmentation and tracking of multiple humans in
crowded situations is made difficult by
interobject occlusion.
4Introduction
- The method is feasible for a crowed scene
- persistent and temporarily heavy occlusion
- Do not require that humans isolated when they
first enter the scene. - More complex shape models are needed.
- Joint reasoning about the collection of objects
is needed..
5Introduction
- Main features of this work
- A three-dimensional part-based human body model
which enables the segmentation and tracking of
humans in 3D and the inference of interobject
occlusion naturally. - A Bayesian framework that integrates segmentaion
and tracking based on a joint likelihood for the
appearance of multiple objects.
6Introduction
- The design of an efficient Markov chain dynamics,
directed by proposal probabilities based on image
cues. - The incorporation of a color-based background
model in a mean-shift tracking step.
7Overview
- The prior models
- Background model
- Based on a background model, the foreground blobs
are extracted as the basic observation. - 3D human shape model
- Since the hypotheses are in 3D, occlusion
reasoning is straightforward. - Camera model Ground Plane
- Multiple 3D human hypotheses are projected onto
the image plane and matched with the foreground
blobs.
8Overview
- The segmentation and tracking are integrated in a
unified framework and interoperate along time
9Overview
- We formulate the problem as one of Bayesian
inference to find the best interpretation given
the image observations, the prior model, and the
estimates from the previous frame analysis. - That is the maximun a posteriori (MAP) estimation.
10Overview
- The state to be estimated at each frame
- The number of objects
- Their correspondences to the objects in the
previous frame (if any). - Their parameters (for example, position)
- Uncertainty of the parameters
-
11Probabilistic modeling
- Our goal is to estimate the state at time t,
?(t), given the image observation, I(1),, I(t) - ? the state of the objects.? the solution
space.
12Probabilistic modeling
- a state containing n objects can be written
aswhere ki is the unique identity of the ith
object whose parameters are mi and ?n is the
solution space of exactly n objects. - The entire solution space is
133D human shape model
- The parameter of an individual human, m, are
defined based on a 3D human shape model. - Do not attempt to capture the detailed shape and
articulation parameters of the human body.
Head, torso, and legs, with fixed spatial
relationship.
143D human shape model
- The parameters (mi) to describe 3D human
hypothesis - size (hi) 3D height of the model, it also
control the overall scaling of the object in the
three directions. - thickness (fi) captures extra scaling in the
horizontal directions. - position (ui or (xi,yi)) the image position of
the head.
153D human shape model
- orientation (oi) 3D orientation of the body
- Orientations of the models are quantized into few
levels for computation efficiency. - inclination (ii) 2D inclination of the body
- There is the chance that the body may be inclined
slgithly.
16Object appearance model
- We use a color histogram of the object,
defined within the object
shape. - It help establish correspondence in tracking
because it is insensitive to the nonrigidity of
human motion. - There exists an efficient algorithm, for example,
the mean-shift technique, to optimize a
histogram-based object function.
17Background appearance model
- The probability of pixel j being from the
background is
18The prior distribution
-
-
- The first term
- is independent of time and is defined
by - Si is the projected image of the ith object and
Si is its area. -
19The prior distribution
-
- P(ofrontal)P(oprofile)1/2
- P(xi,yi) is a uniform distribution in the region
where a human head is plausible - P(hi) is a Gaussian distribution N(?h,?h2)
truncated in the range of hmin,hmax - P(fi) is a Gaussian distribution N(?f,?f2)
truncated in the range of fmin,fmax - P(ii) is a Gaussian distribution N(?i,?i2)
20The prior distribution
- the second term
- We approximate it by
- We rearrange ?(t) and ?(t-1) as such that
one of
is true.
21The prior distribution
-
-
- Passoc
- We assume that the position and the inclination
of an object follow constant velocity models with
Gaussian noise.
22The prior distribution
- The height and thickness follow a Gaussian
distribution. - We use Kalman filters for temporal estimation.
- Pnew Pdead
- the likelihood of the initialization of a new
track - the likelihood of the termination of a existing
track - They are set empirically according to the
distance of the object to the entrance/exits.
23Joint image likelihood for multiple objects and
the background
-
- The visible part of object ( )
- determined by the depth order of all of the
objects, which can be inferred from their 3D
position and the camera model. - Non object region ( )
24Joint image likelihood for multiple objects and
the background
- The joint likelihood P(I?) consists of two
terms - The first term
25Joint image likelihood for multiple objects and
the background
- di is the color histogram of the background image
within the visibility mask of object i. - pi is the color histogram of the object.
- is
the Bhattachayya coefficient, which reflects the
similarity of the two histogram.
26Joint image likelihood for multiple objects and
the background
- The second term is
- ejlog(Pb(Ij)) is the probability of belonging to
the background model
27Computing MAP by efficient MCMC
- Computing the MAP is an optimization problem.
- Optimization is challenging
- An unknown number of objects, the solution space
contains subspaces of varying dimension. - Includes both discrete variables and continuous
variable. - we adapt a data-driven Markov chain Monte Carlo
(MCMC) approach to explore this complex solution
space.
28Computing MAP by efficient MCMC
- MCMC method with jump/diffusion dynamics to
sample the posterior probability. - Jump cause the Markov chain to move between
subspaces with different dimension and traverse
the discrete variables. - Diffusions make the Markov chain sample
continuous variables. - In the process of sampling, the best solution is
recorded and the uncertainty associated with the
solution is also obtained.
29Computing MAP by efficient MCMC
30Computing MAP by efficient MCMC
- MCMC method
- We want to design a Markov chain with stationary
distribution
. - At the gth iteration, we sample a candidate state
? from a proposal distribution q(?g ?g-1). - If the candidate state ? is accepted, ?g ? .
- Otherwise, ?g ?g-1.
31Computing MAP by efficient MCMC
- Markov chain constructed in this way has its
stationary distribution equal to P(), independent
of the choice of the proposal probability q() and
the initial state ?0. - The choice of the proposal probability q() can
affect the efficiency of MCMC significantly. - Using more informed proposal probabilities, for
example, as in the data-driven MCMC, will make
the Markov chain traverse the solution space more
efficiently. Therefore, the proposal distribution
is written as q(?g ?g-1, I).
32Markov chain dynamic
- The dynamics correspond to the proposal
distribution with a mixture densitywhere A is
the set of all dynamic add, remove, establish,
break, exchange, diff - We assume that we have the sample in the (g-1)th
iteration
,and now propose a
candidate ? for the gth iteration.
33Markov chain dynamic
- Dynamics
- object hypothesis addition
- Sample the parameter of a new human hypothesis
(kn1,mn1) and add it to ?g-1. -
- object hypothesis removal
-
- establish correspondence
-
34Markov chain dynamic
- break correspondence
-
- exchange identity
-
- Parameter update
-
35Experimental results
- Evaluation on an outdoor scene
36(No Transcript)
37Experimental results
- There are 20 occlusions events overall, nine of
which are heavy occlusions. - We use 500 iterations per frame.
- Trajectory-based errors
- Trajectories of three objects are broken once (ID
28 -gt ID 35, ID 31 -gt ID 32, ID 30 -gt ID 41) - Trajectories initialization
- Some start when the objects are only partial
inside. - Only the initialization of three objects (object
31, 50, 52) are noticeably delayed. - Partially occlusion and/or the lack of contrast
with the background are the causes of the delays. - The detection rate and the false the false-alarm
are 98.13 and 0.27 percent.
38Conclusion
- A principled approach to simultaneously detect
and track humans in a crowed scene. - We formulate the problem as a Bayesian MAP
estimation problem. - The inference is performed by an MCMC-based
approach to explore the joint solution space. - The success lies in the integration of the
top-down Bayesian formulation following the image
formation process and the bottom-up features that
are directly extracted from images.