Title: Recovering Human Body Configurations: Combining Segmentation and Recognition
1Recovering Human Body Configurations Combining
Segmentation and Recognition
- Greg Mori, Xiaofeng Ren, and Jitentendra Malik
(UC Berkeley) - Alexei A. Efros (Oxford)
2The goal
- Given an image
- Detect a human figure
- Localize joints and limbs
- Create a skeleton of their pose
- Create a segmentation mask of the person
3Other approaches Simple features
- Model people as generalized cylinders (1980s)
- Easily implemented bottom up
- Often use tree to express relations
- Problems
- Cylinders are common
- Often dependencies between body parts
- Really need context
4Other approaches Probable pose
- Often use probable pose
- Template matching
- Top down constraints on pose
- But even highly improbable poses are still
possible
5Other approaches Frequent simplifications
- Nude models
- Limited poses
- Background subtraction or limited clutter
6Arguably the most difficult recognition problem
in computer vision
- Variation in clothing
- Variation in limbs
- Variation in pose
7Solution Islands of Saliency
- Use low-level features that are informative
independent of context - Based on these islands, one is able to fill in
gaps with context
8Algorithm
9Algorithm Segmenting into regions and superpixels
10Segmentation
- Combine boundary finder (Martin et al., 2002)
with Normalized Cuts (Malik, Belongie, et al.,
2001) - Groups similar pixels into regions
11Segmentation Regions
- 40 regions
- Most salient parts of body become regions
- Limbs usually two half-limbs
12Segmentation Superpixels
- 200 region (oversegmentation)
- Retains virtually all structures in original
- Still reduces complexity from 400,000 pixels to
200 superpixels
13Algorithm Finding salient limbs and torsos
14Finding limbs
- Candidates all 40 regions
- Four cues for half-limb detection
- Contour Probability of the boundary
- Average probability of the regions boundary, as
measured by Martins boundary finder - Shape How close to a rectangle
- Area of overlap with reconstructed rectangle,
15Find limbs
- Shading
- Limbs are roughly cylindrical, so should have 3D
pop out due to shading - Compare Ix-, Ix, Iy-, Iy for region to mean of
Ix-, Ix, Iy-, Iy for training set - Focus cue
- Background is often not in focus
- Cfocus Ehigh/(a Elow b)
16Finding limbs
- Cues are combined by summing
- Use logistic regression to learn weights
(training set of hand-labeled half-limbs)
17Evaluation Cues
Number of hits
Number of candidates generated
18Evaluation Performance
19Evaluation summary
- Not very good detectors
- Strength of boundary best cue
- Combining cues yields better performance
- On average 4.08 of top 8 candidates produced were
hits - 89 have at least 3 hits among top 8
- Motivates search for 3 half-limbs combined with
head and torso
20Finding torsos
- Unlike half-limbs, typically several regions
- Consider all sets of adjacent regions within some
range of total sizes - Set of cues
- Contour
- Shape
- Focus
- (No shading)
21Finding torsos
- Find orientation of torso
- Find best matching head
- Again contour, shape, and focus cues with shape a
disk - Score for torso, score for head, and score for
relative positions of head to torso multiplied to
create score for oriented torso
22(No Transcript)
23Evaluation
- Success if all four torso points within 60 pixels
of ground truth
24Algorithm Pruning to form partial configurations
25Body building
- From 5-7 half-limbs and 50 candidate oriented
torsos form partial configurations consisting of - Each torso
- Three half limbs assigned each assigned to
- One of 8 half limb body parts
- One of two polarities
- 2-3 million partial configurations!
26Enforce constraints
- Relative widths
- Foreshortening doesnt affect width of limbs much
- Use anthropomorphic data to rule out limbs more
than 4 standard deviations wider than expected - Length of limbs relative to torso
- Assume torso not too foreshortened
- No more than /- 40 angle with image plane
- Again, prune limbs more than 4 standard
deviations away from mean length, relative to
torso - Seems to be making some assumptions of probable
pose
27Enforce constraints
- Adjacency
- Upper limbs must be adjacent to torso
- Lower limbs must be adjacent to upper limbs
- Symmetry in clothing color histograms must not
be overly dissimilar for corresponding segments - E.g. right and left upper arms should be similar
- Makes some small assumptions about variations in
clothing
28Body building slimming down
- Reduces to 1000 partial configurations
- Sorted by linear combination of the torso and the
three half-limb scores - (This score can be used to improve torso
detection)
29Algorithm
30Extending to full limbs
- Adding additional rectangles evaluated on
adjacent superpixels to empty limb joints - Want high internal similarity and high
dissimilarity to surroundings
31Algorithm
32(No Transcript)
33(No Transcript)
34Summary
- Arguably the most difficult problem in computer
vision - Not solved here
- Method here is appealing
- Dont need to store exemplars
- Island of saliency approach seems useful in many
contexts - Use some configural knowledge to make reasonable
guesses - Good illustration of integrating recognition and
segmentation