Title: Dressed Human Model Detection
1Dressed Human Model Detection
2Overview
- Purpose
- Why we want to
- What challenges we have to overcome
- Means
- Model Classification
- Bayesian Similarity Measure (BSM) and Body Part
Identification - Recursive Context Reasoning (RCR)
3Why Would We Want To?
- Mobile Robot Navigation
- Working safely among humans
- Visual Surveillance
- Human Motion Capture
- Animation, VR, and HCI applications
- Shape-Based Image Retrieval
4Challenges
- Appearance Variations
- Clothing
- Articulation
- Occlusion
- Other People (crowded areas)
- Objects (Partially visible person)
- Projection Ambiguities
- Projection from 3D model to Image Plane
5Appearance Variations
Variations due to Clothing
Variations due to Articulation
6Occlusion
By an Object
By other People
7Model Classification
- Requirements for good Object Classification
- Independent of Scale, Orientation, and Position
- Handle partial occlusions
- Allow for articulated moving parts
- Handle shape distortions due to noise
- Allow for some shape variations
- Support efficient shape recognition and
classification
8Model Classification (cont.)
1st Find Contour Outlines
9Model Classification (cont.)
2nd Shape Decomposition
Random NCM Natural
NCM - Negative Curvature Minima (small circles)
A good start, but not exactly what we need. So
we introduce another constraint
10Model Classification (cont.)
- Short-Cut Rule for a valid cut
- Must be a straight line
- Cross an axis of local symmetry
- Join 2 points on the outline (at least 1 NCM)
- Be the shortest cut, if there are several
possible competing cuts - Extraneous calculations alternate method next
11Model Classification (cont.)
- Salience Constraint replaces axis of symmetry
calculation - Salience defines how pronounced a part is, or
part-like it is - 3 Factors determine Salience
- Size relative to whole object
- Degree to which the part protrudes
- Strength of its boundaries
12Model Classification (cont.)
- Lets take a look at an example and break down
the formula
- Cl and Cr have equal arc length based on cut PPM
- Tp represents the threshold determining if a cut
- makes a significant part or not
- P represents a test point for the cuts end
The Smallest Cut
Salience of Part ( Curve / Cut )
13Model Classification (cont.)
- Final Step Grouping over-segmented parts
- We consider all NCM over a certain magnitude
threshold to avoid noise - This may leave extraneous cuts for the same parts
- So we group starting with the largest parts first
when several possible merges exist - Continue until no more non-decomposable larger
parts can be created and all existing large parts
cant be decomposed into significant parts
14Model Classification Summary
Find Humans Silhouette-Contour Outline
Smooth Outline and Find all significant NCM over
a magnitude threshold to cut out noise
15Model Classification Summary
Group All Over-Segmented Parts to get Natural
Shape Decomposition
Use Short-Cut Method and Salience Constraint to
create cuts
16Human Body Model
Side View
Front View
- We are working with Images, so 2D models are
preferred - We also find that these two models are sufficient
- The models also have probability distributions of
the spatial relationships between the parts and
the torso
17Human Body Model (cont.)
- Each body part is modeled by a ribbon
Width (w) the average width along the
ribbon Length (l) the major axis from the ribbon
spine Aspect Ratio (a) w/l, invariant under
similarity transforms, captures the global shape
while ignoring small local shape deformations
18Human Body Model (cont.)
- Aspect Ratio alone is too ambiguous to
distinguish different parts (ex. Head and Torso) - The Origin of each part is located at the joint
connecting the part to its parent in the connect
to hierarchy (more on this in a bit) - Exception to this is the torso, whose center is
its geometric center
19Human Body Model (cont.)
General Connect To Hierarchy used as a guide
map todetermining parts relationships to each
other
20Human Body Model (cont.)
- Now we can parameterize a body part with the
following vector - v (a, l, x, y, ?)
- a - Aspect Ratio, l - length
- x, y - origin located in the parent part
- ? - intersection angle between major axis of this
part and its parent part
21Human Body Model (cont.)
- With m parts as defined in the previous slide, we
can represent a model with four model matrices - Aspect Ratio Vector A a1, , am
- Length Ratio Matrix S sij i, j 1,,m where
sij li / lj - Relative Position Vector X x1, y1,,xm,ym
- Orientation/Posture Vector T ?1, , ?m
- A and S are TRS-Invariant
22Human Body Model (cont.)
- We only need to know the relative positions of
the six main parts (head, torso, two arms, and
two legs) - So we put the parts into a normalized torso
coordinate system with the torso length as 1 - We get TRS-invariant relative positions
- U (0,0,u1,v1,,u5,v5)
- Where (ui, vi) ((xi,yi) - (x1,y1)) / l1 and
(x1,y1) are the coordinates of the torsos center
with l1 being the torsos length
23Human Body Model (cont.)
- We now have the human model (H) parameterized
with three TRS-invariant matrices H A, S, U - This constrains the aspect ratios, relative
sizes, and positions of the body parts - This could be extended by imposing constraints on
the orientation T to form stronger constraints on
the appearance of a person, but this is not done
in this paper
24Human Body Model (cont.)
- TRS-invariant Probabilistic Model
- Implementing probability distributions to
accommodate shape variation among people - For simplicity, we assume Gaussian distributions
and statistical independence between A, S, U
matrices - They can be estimated by
Variance
Distribution Function
Average
25Human Body Model (cont.)
- Importance of the TRS-invariant Probabilistic
Model - These probability distributions provide metrics
to evaluate the shape, size relationship, and
configuration similarities between the detected
contour and the human body model (used later on!) - The parameters for the averages (means) and
variances are pre-calculated from observed data
in the human population and stored in a table for
reference
26Human Body Model (cont.)
- Dressed Human Modeling
- What if clothing obscures some of the body parts
we are looking for? - The model class should represent the generic
shapes of its objects and emphasize shape
differences between classes, while the shape
variations within the class should not influence
its description. - For the Human Model, clothing is often the
primary cause of these shape variations.
27Human Body Model (cont.)
- Merged Body Parts are introduced to handle the
variations caused by clothing and other occluders
Example
Example A lady is wearing a skirt, you will not
see her two legs and may or may not even see her
feet. For this you would represent her legs as
body part lower. This gives us more flexibility
with handling clothing.
28Human Body Model (cont.)
- Sample models using merged body parts with
different levels of detail, which become the
Dressed Human Models
29Human Body Model (cont.)
- How do merged parts help?
- Like regular body parts, the merged parts are
also modeled with ribbons and connect with each
other at joints - The TRS-Invariant Probabilistic representation is
used to encode the shapes of the merged parts and
their relationships with the other parts - A new relationship is introduced in the form of
part-of (Using the previous example, the two
legs are part-of the merged body part lower) - The location of a part can now be inferred from
their connected parts or from the merged parts
covering them
30Human Body Model (cont.)
- Trunk the special merged body part
- Covers the torso and some other body parts
- Occupies the same position as the torso
- Length and width are adjustable to handle various
different self occlusions and clothing variations - Length constraint is therefore the sum of the
lengths of the trunk and the attached parts
head
Sample Trunk
31Human Body Model Summary
- TRS-invariant dressed human model
- Independent of size, pose, articulation, and
clothing - Part-based representation used to model occluding
contour - Merged parts representing multiple body parts
- Parts organized into hierarchy to facilitate
coarse-to-fine decomposition and classification
32Similarity Measure
- What makes a good shape similarity measure for
classifying a highly deformable shape such as a
person - Large similarity measurements within the class
- Independent of position, size, and orientation
- Support articulation and partial occlusion
- Handle noise, deformation, and low resolution
blur - Needs to be efficient to compute
33Similarity Measure (cont.)
- How is Similarity Measure used?
- It evaluates the resemblance between a contour
(test data) and a model (known) based on the best
match between their body parts - Each ribbon that is decomposed from the contour
is compared against our models body parts
34Similarity Measure (cont.)
- Next we set up the problem
- n ribbons are found in contour C
- C c1, c2, , cn
- m body parts found in human model F
- F f1, f2, , fm
- H is the match hypothesis, v is view
- H h1, h2, , hm , v
- Note there is a one-to-one correspondence
between F and H - For this paper, v is limited to 2 views
(front/back and side)
35Similarity Measure (cont.)
- How do we define a match hypothesis H
- First establish a mapping value for each hx ? H,
1 ? x ? m - Consider the corresponding fx from our model body
parts - Cycle through all ribbons cy ? C, 1 ? y ? n
- If ribbon cy corresponds to part fx, then hx will
be set to value yElse if no ribbon corresponds
to part fx, then hx will be set to zero - Then choose a view v to be portrayed
36Similarity Measure (cont.)
- Note that when hx is set to zero, it is allowing
for some ribbons to not be representing parts due
to occlusion or not being a body part at all - Next we want to optimize (maximize) our
hypothesis match H and find H - However, before we can do that we need to
understand Bayes Rule and Maximum A Posteriori
(MAP) theory
37Bayes Rule
- Basic Probability
- P(A), P(B) the probability that A, B will occur
(independent events) - P(AB) the probability that A will occur given
that B has been observed - Conditional Probability
- P(A,B) P(AB)P(B) P(BA)P(A) the
probability of observing both A and B
(intersection of Venn diagram) - Example A is all Red Fish and B is all 6 Fish,
so P(A,B) would be the probability of observing
6 Red Fish, which is based on the probability of
observing Red Fish (A) based on our known
observation of 6 Fish (B) times the probability
of observing a 6 Fish (more later)
38Bayes Rule (cont.)
- Based on the conditional Probability definition,
we can manipulate the equation to get - P(A,B) P(AB)P(B) P(BA)P(A)
- P(AB)P(B) P(BA)P(A)
- P(AB)P(B)/P(A) P(BA)P(A)/P(A)
- P(AB)P(B)/P(A) P(BA) (Bayes Rule)
39Bayes Rule (cont.)
1
3
2
- Importance of P(BA) P(AB)P(B)/P(A)
- Think of A as cause and B as effect, assuming A
is present, we know the probability of B being
observed (left side of eq, 1) - Allows us to use the likelihood of a cause A
given an observation of B (right side of eq, 2) - It can also be seen how the probability for A
changes from prior P(A) before we observe
anything, to posterior P(AB) once we have
observed B (right side of eq, 3) - Now Lets take a look at an example -gt
40Bayes Rule Example
- Now for another look at the fish example
- Red Fish make up 40 of the fish population
P(A) - 60 of the total fish population are 6 or longer
P(B) - 15 of the fish are 6 Red Fish P(A,B)
- We want to know the probability of seeing a 6
Fish based on observing a Red Fish P(BA) - With this information we want to first find the
probability of having a Red Fish when a 6 Fish
is observed P(AB) - Next, lets look at the equations to see how we
made this decision -gt
41Bayes Rule Example (cont)
- We know P(A), P(B), and P(A,B)
- We want P(BA), and we know Bayes Rule says
P(BA) P(AB) P(B) / P(A) - So its easy to see that we need to first know
P(AB) before we can apply Bayes Rule - So we use our rule for conditional probability we
saw earlier - P(A,B) P(AB) P(B)
- If we divide both sides by P(B), we get
- P(AB) P(A,B) / P(B) 0.15/0.6 25
42Bayes Rule Example (cont)
- Now we can apply Bayes Rule to find P(BA)
- P(BA) P(AB) P(B) / P(A) ?
- P(BA) 0.25 0.6 / 0.4 ?
- P(BA) 37.5
- We could then also apply the alternate form of
the conditional probability rule as a check - P(A,B) P(BA) P(A)
- P(A,B) 0.375 0.4 15
- So we see that once observing a Red fish, we have
a 37.5 chance of observing a 6 fish -
43MAP Maximum A Posteriori
- In essence, MAP is used to select the world
parameters (or data model) that will maximize the
probability of the desired result based on the
observed conditions - Now back to our human detection problem and
applying Bayes Rule.
44Similarity Measure (cont.)
- We are looking for our optimized hypothesis H,
which is the MAP hypothesis - H arg maxH P(H, C person)
- In plain English given that a person is present
in the image, what is the probability that the
hypothesis set from contour C is a person
45Similarity Measure (cont.)
- H arg maxH P(H, C person) ?
- First Application of Bayes Rule to get this
- P(H,C person) P(C H, person) P(H person)
- Ignore the person for now to see this
- P(H,C person) P(C H, person) P(H person)
- We can see
- P(A,B) P(B A) P(A)
- Now make the substitution and this gives us
- H arg maxH P(C H, person) P(H person)
46Similarity Measure (cont.)
- H arg maxH P(C H, person) P(H person)
- ? P(C H, person) P(person H) P(H) /
P(person) - represents the second application of
Bayes Rule - Assume all hypotheses have the same prior, then
P(H) / P(person) is a constant and can be
dropped - H arg maxH P(C H, person) P(person H)
- Accordingly the goodness function that rates the
hypothesis is defined as - G(H) P(C H, person) P(person H)
47The Goodness Function
- G(H) P(C H, person) P(person H)
- P(C H, person) this evaluates the degree of
resemblance between the matched pairs (the
likelihood ) - P(person H) is proportional to the number of
identified body parts, the more parts identified,
the more likely the extracted contour is a person
(the posterior) - Thus the MAP hypothesis H maximizes the
resemblance between the matched pairs and the
number of identified body parts - Finally this leads us to our Bayesian Similarity
Measure (BSM) that evaluates the resemblance
between the contour C and the human model - BSM(C) G(H)
48The Goodness Function (cont.)
- To calculate goodness, we need to estimate the
likelihood P(C H, person) and the posterior
probability P(person H) - Â, S, Û are the aspect ratios, relatives
sizes (with the other identified parts), and
relative positions (to the torso) of the
identified body parts - This information is compared to the reference
data for the Model Parts corresponding to the
identified parts - Ex Take the identified head and compare it to
the model head information
49The Goodness Function (cont.)
- Next we define function N to take the best match
from each set to use for calculating the
likelihood - Show N, using A as an example
- N(Â A, SA ) highest probability of match
- where A is the average, and SA is the variance
of the Model parameters - Assuming that Â, S, and Û are statistically
independent, we can estimate the likelihood as - P(C H,person) N(Â A, SA ) N(Si Si ,
SSi) N(Û U, SU) - Thus taking the best a, s, and u from Â, S, Û
and using their probabilities to determine the
likelihood of the contour being a person
50The Goodness Function (cont.)
- Next estimate the posterior P(person H)
- P(person H) Sdiwi
- where di 0 if hi 0, and di 1 if hi ? 0,
basically if the part was identified, then add
its weight wi to the estimate - wi is the contribution to the presence of a
person from the identification of this part, and
is defined as ni /n, where n Sni, ?fj that have
no subparts. - ni 1 if fi does not have subparts, head and
torso are exceptions to this as their presence
greatly indicate a person (wi gt 0.5)
51The Goodness Function (cont.)
- For body parts that do have subparts, ni is
defined recursively - ni ?Snj ,?fj being the subpart of fi
- where ? lt 1 is a degeneration factor used to
reduce the false alarms caused by the generality
of the human model
52The Goodness Function Summary
- This function is designed to evaluate the
goodness of a hypothesis H - G(H) N(Â A, SA ) N(Si Si , SSi)
N(Û U, SU) P(person H) - It is looking for the best match between the
extracted ribbons and the human body parts, the
higher the value, the better the match
53Dynamic Model Assembling Step 1
- Find the coarse-level decomposition of the
contour C by grouping the ribbons whose major
axes share a common endpoint and have similar
widths
54Dynamic Model Assembling Step 2
- Identify the coarse-level body parts and
evaluate them based on the goodness function
55Dynamic Model Assembling Steps 34
- If parts at the coarse-level can be further
decomposed into subparts, consider the fine-level
information, label parts, and calculate locations
of the subparts that were missed at the
coarse-level - This will be looked at in more depth a little
later on
56Body Part Identification
- So why work at the coarse-level instead of just
using the fine-level data? - There are fewer body parts, and thus a much
smaller hypothesis space - A coarse-level hypothesis will result in a lower
posterior P(personH), because of the
degeneration factor (?) - By combining the constraints on the aspect
ratios, relative sizes and positions, and the
posterior P(personH), the goodness function
selects the human model and model parts at the
right resolution to label the decomposed contour
correctly (see the next slide for examples)
57Body Part Identification (cont.)
- Examples of the coarse-to-fine hypothesis
selection and dynamic model assembling - H2 results in the highest goodness value and
also the most accurate body parts identification
58Multiple Person Example
- What if the contour represents two people as in
the example below
In this case, the system will need to pick
multiple hypotheses.
59Multiple Person Example (cont.)
- Then select the best hypothesis to identify a
person contained in the contour and remove those
parts from the contour
Decompose the entire contour into parts
Now go back to the hypothesis selection step and
repeat the selection and body part identification
until all significant parts have been taken care
of.
60Applying BSM to Human Detection
- For Human Detection a contour is provided and
the BSM needs to determine whether this contour
represents the silhouette of a person or not - BSM(C) gt threshold
- C is a persons silhouette when the above
expression evaluates to true
61Human Detection (cont.)
- How to handle ambiguous contours?
- When rotated slightly, contour may no longer look
like a person - To avoid false alarms, the threshold is replaced
by an upper and lower bounds meaning that the
similarity measurement must be sufficiently high
or sufficiently low, otherwise it cannot make a
decision
62Human Detection (cont.)
- If no decision can be made on the contour
because it falls in the uncertainty region
between ?1 and ?2, then a more distinguished
contour will need to be found, and for this we
use a Recursive Context Reasoning (RCR) algorithm
(more on this in a bit, but first some results
from the human detection method just discussed)
63Human Detection Results
Correctly identified as humans when compared
against the human model and the goodness values
are listed with the contours
64Human Detection Results (cont.)
Correctly identified as NOT humans when compared
against the human model and the goodness values
are listed with the contours
65RCR Algorithm
- The purpose of the Recursive Context Reasoning
(RCR) Algorithm is to provide a contour updating
procedure to help re-evaluate the BSM and produce
a refined (more detailed) contour to determine if
the original detected contour is a person or not.
66RCR Algorithm (cont.)
- The basic algorithm
- Step1 Original Contour extraction
- For each contour run Steps 2-7
- Step2 Contour decomposition into natural parts
(model classification) - Step3 Body Part Identification (finding best H
using goodness function) - Step4 Human Detection (BSM)
- If BSM is in ambiguous region continue, else
done - Step5 Update the locations and labels of the
body parts - Step6 Align the predicted outlines of the
missed body parts to the edge features in the
image - Step7 Recalculate the similarity measure and
determine if a person is present or not
new
67RCR Algorithm (cont.)
- Step5 Update Shapes and Locations of the body
parts - Remember a body part is parameterized with a
vector (a,l,x,y,?) - Using a weighted Least Squares Method (LSM) we
can integrate the parameters estimated from the
labeled contour with the corresponding model body
parts - Example on the next slide ?
68RCR Algorithm (cont.)
- Example find the connection joint for the left
arm with the torso - Finding estimate P4, using LSM with estimates P1,
P2, and P3 - P1 the initial estimate based on body part
identification - P2 estimate based on the locations of the torso
and head - P3 estimate based on the major axis of the arm
P2
P4
P3
P1
69RCR Algorithm (cont.)
- Step6 Predict the parameters of the missed parts
- Goal is to estimate the missing body parts
parameter vector (aj, lj, xj, yj) - Assume aspect ratios of the different body parts
are independent of each other, and the MAP
estimation of aj is simply its mean (average)
value and its variance - aj aj and Saj Saj
- again these mean values and variances are
pre-computed and stored in a reference table
70RCR Algorithm (cont.)
- The length of the body part can be estimated
from any of the identified body parts (note lj
lj)
where
Basically comparing what we know about the
average length ratios between these two parts and
the length of the model body part to calculate
the missing parts length
71RCR Algorithm (cont.)
- If more than one part have been identified, then
the MAP estimate of lj is the weighted summation
I is the set of identified parts
72RCR Algorithm (cont.)
- Next we need to calculate the position of our
missing part, which is done using Transform
matrix T to change the model coordinates to the
image coordinate system - XI TUI
- T XIUTI(UIUTI)-1
- where X is the set of coordinates for the
identified parts in the image coordinate system
and U is the set of coordinates for the
identified parts in the model coordinate system - Estimate the position of the unidentified body
part as - (xj, yj, 1)T T(uj, vj, 1) T
73RCR Algorithm (cont.)
- Now take the translation, rotation angle, and
scaling involved in transformation T and get
vector t (tx, ty, ?, s) - Assume that the predicted location Xj can be
approximated by a first-order Taylor Series
Expansion about the mean of t, then the
uncertainty with Xj is approximated as - SXj ? JtStJtT TUjTT
- where Jt is the Jacobian and Uj is the position
in model space - A Jacobian Matrix in simple terms is the first
order derivative of each of the given elements in
the original matrix
74RCR Algorithm (cont.)
- If the part being predicted is a subpart, then
the position in the image frame (xj, yj) can be
inferred from the part fi directly connected to
it - (xj, yj) (xi, yi) li(cos?i, sin?i)
- Or from the extended body part fk that is
covering it - (xj, yj) Rl(uj, vj) - (uk, vk) (xk, yk)
-
- where R ( ) and l lk/lk
cos?k sin?k
-sin?k cos?k
75RCR Algorithm (cont.)
- Examples
- predicting the subparts from the identified parts
Later on, more examples will be shown were the
locations of some of the primary body parts will
be predicted
76RCR Algorithm (cont.)
- Contour Alignment
- Step1 Render the outline of a body part, if
unidentified then set orientation to the
orientation of the torso - Step2 Align the rendered outline with the edge
features such that - ?i arg max? N(B? ? E)
- where B? is the rendered boundary of fi at
orientation ?, E is set of edge pixels, and N(s)
is the number of points in the point set s - Step3 If N(B?i ? E) gt threshold, then body part
fi is detected and the points in B?i ? E are
removed from the edge image E, otherwise fi is
not detected
77RCR Algorithm (cont.)
- Contour Alignment
- Examples
- 1 the right arm and both legs start at default
orientation (un-identified) and then aligns them
to contour - 2 similarly corrects arm positions
Ex 1
Ex 2
78RCR Algorithm Example
- Putting it all together and running an example
SM0.60
SM0.67
SM0.33
79RCR Algorithm Example (cont.)
- After the first iteration, we can kick the car
out, but the peoples similarity values are too
low to be classified, so we apply the RCR
re-evaluation steps - Find the joints, render missing parts, align to
contour, run similarity measure again, see if we
detect or not
SM 0.87
SM 0.76
Two iterations are normally sufficient
80Resources
- Main Paper
- http//www.ri.cmu.edu/pubs/pub_3767.html
- Bayes Rule and MAP
- http//webcourse.cs.technion.ac.il/236607/Winter20
02-2003/ho/WCFiles/Tutorial2.ppt - http//www.cs.ust.hk/martin/comp327/t1.pdf
- Jacobian Matrices
- http//mathworld.wolfram.com/Jacobian.html
- Statistical Variance and Deviation
- http//dorakmt.tripod.com/mtd/glosstat.html