Title: Vision II: applications for humanoids
1Vision II applications for humanoids
2What specific requirements doHumanoids have for
vision?
- Need to recognize objects
- Need to track changes over time
- Need to estimate geometry
- Need to estimate pose
3What specific advantages do humanoids bring to
vision?
- Mobility
- Can change viewpoint (active vision)
- Dexterity
- Can interact with objects
- Multiple modes of sensation
- Size?
4Fundamental Issues (my opinion)
- Dynamics Tracking
- Need to follow target objects
- Need to infer how its actions are changing the
world - Segmentation Recognition
- To interact, need to understand what is present
in a scene - I would argue, requires high-level features, and
holistic representations
5Outline
- I) Dynamics Tracking
- Meanshift for following objects
- Visual servoing
- II) Segmentation
- Basic Network flow algorithms
- Overview of other approaches
- III) Putting it together segmentation from
motion - SANE - Segmentation according to natural examples
(Ross et.al. 2006) - Better vision through Manipulation (Fitzpatrick
et al. 2002)
6Outline
- I) Dynamics Tracking
- Meanshift for following objects
- Visual servoing
- II) Segmentation
- Basic Network flow algorithms
- Overview of other approaches
- III) Putting it together segmentation from
motion - SANE - Segmentation according to natural examples
- Better vision through Manipulation
7Tracking I - Intro to Meanshift
- meanshift soccer - http//www.youtube.com/watch?v
zLtjPfPP9HY
8What is Mean Shift ?
A tool for Finding modes in a set of data
samples, manifesting an underlying probability
density function (PDF) in RN
- PDF in feature space
- Color space
- Scale space
- Actually any feature space you can conceive
-
Non-parametric Density Estimation
Discrete PDF Representation
Non-parametric Density GRADIENT Estimation
(Mean Shift)
PDF Analysis
9Intuitive Description
Region of interest
Center of mass
Mean Shift vector
Objective Find the densest region
Distribution of identical billiard balls
10Non-Parametric Density Estimation
Assumption The data points are sampled from an
underlying PDF
Data point density implies PDF value !
Assumed Underlying PDF
Real Data Samples
11But what does the probability describe?
- Pixel-wise probability that the current window
was drawn from the same window as the target - Feature space is a histogram of colors in the
target window - Histogram is backprojected to image to compute
meanshift vector
12Histogram Backprojection
- Want an estimate of probability for each pixel
based on relative abundance in target - Compute ratio histogram
- Each pixel in search window is assigned to a bin
in R - Provides a heuristic for emphasizing colors that
have a large representation in the target image
13Backprojected Histogram
14Uses in robotics Visual servoing
- Visual Servoing - control based on feedback of
visual measurements - weird german visual servoing - http//www.youtube.
com/watch?vzj-779Nsjh8
15Camshift on our arm
- Bugtracking with camshift - http//blip.tv/file/15
87997 - multiple bugs - http//blip.tv/file/1581360
16Problems with Meanshift
- Feature space is limited to color
- Loses spatial information (think geometry) about
target - Easily gets confused by objects with similar
color profiles - meanshift my face - http//blip.tv/file/1764913
17Outline
- I) Dynamics Tracking
- Meanshift for following objects
- Visual servoing
- II) Segmentation
- Basic Network flow algorithms
- Overview of other approaches
- III) Putting it together segmentation from
motion - SANE - Segmentation according to natural examples
- Better vision through Manipulation
18When does spatial information matter?
- Holistic image perception
- Necessary when low-level features cant
disambiguate candidate objects
19Vision methods using more global features
- Simplest example figure/ground segmentation
(also from gestalt psych) - Many methods exist to segment
- Canny edge detector
- Normalized Cuts
- K-means clustering
- LOCUS
20Types of segmentation
- 2 general classes of algorithms
- Region-based outputs label for each pixel
(normalized cuts does this) - Edge-based identifies a boundary between object
and background (canny edge detector) - Doesnt constrain to closed contours
21Normalized Cuts
- Method for optimizing segmentation based on
pixelwise likelhoods separation penalty - Involves constructing a DAG describing the
contributions of each term - Apply ford-fulkerson to solve
- One way or another, everything gets a label
22Great, but where do the likelihoods come from?
- Early segmentation algorithms were human-labeled
at the pixel level - Newer algorithms labeled at image level (type
specified for a class of images)
23Outline
- I) Dynamics Tracking
- Meanshift for following objects
- Visual servoing
- II) Segmentation
- Basic Network flow algorithms
- Overview of other approaches
- III) Putting it together segmentation from
motion - SANE - Segmentation according to natural examples
- Better vision through Manipulation
24Automatic discovery of labels
- Uses motion as a cue for inference about objects
- Different perspectives on a scene allows
inference based on simple assumptions
25SANE learning static segmentation from motion
- Learns local boundary likelihoods by applying
Migdal Grimson to video data - Extracts feature set on 5x5 patches
- Can apply learned patch combination to segment
static images (w/ similar visual properties) - No human labeling required
26SANE features
27Why this is promising
- Evidence from developmental psychology that
humans learn segmentation cues this way - The ability to distinguish object boundaries by
motion and depth perception developmentally
precedes the ability to segment based on cues
such as color, brightness, and texture. This
suggests that segmentation with these cues can be
learned from more primative, causality-dependent
mechanisms. - Spelke, 1980-something
- While an experienced adult can interpret visual
scenes perfectly well without acting upon them,
linking action and perception seems crucial to
the developmental process that leads to that
competence. - -Fitzpatrick, 2002
28Can being a robot help?
- Robots can produce auxiliary information to help
make inferences about visual data - Can move itself to acquire new data about same
scene (active vision paradigm) - Can move the objects themselves
29Better Vision Through Manipulation (Fitzpatrick
et al. 2002)
- Attempts to model a robot learning trajectory
after biological evidence (mirror neurons and
cortical motor regions - can explain, if people
are interested) - Visual learning starts with linking motor events
to visual consequences - Used to learn simple 2D workspace IK solution
- Continue on to learn object segmentations
30Assumptions
- While the relationship between the optic ?ow and
the physical motion is likely to be extremely
complex, the correlation in time of the two
events will generally be exceedingly precise.
This time-correlation can be used as a signature
to identify parts of the scene that are being
in?uenced by the robot motion, even in the
presence of other distracting motion sources. - Simple idea
31Workspace visual servoing
- Learns 2D closed loop policy for manipulator
configuration - Populates a table of joint parameters vs. visual
motion (mainly EE)
32Learning about objects
- Given visuo-motor knowledge of arm, begin
exploring objects - Segmentation of regions via optical flow
- Exploratory poking actions to identify shape of
object
33Identifying rigid bodies
34Identifying shape
Poking can reveal a di?erence in the shape of two
objects without any prior knowledge of their
appearance.
35Future directions represent others actions?
- Given object model, can reverse direction of
inference when observing it move from a foreign
manipulator
36Katz Brock
- Learn Object affordances through manipulator
exploration - Discover object kinematics by affine
transformations - Represent as graphical model, where edges are
pruned if a DOF is discovered
37Feature Finding
- Use stock feature detector (SIFT)
- Track features through poking interactions with
the arm
38Identifying independent links
- Features of rigid bodies will move as a unit
39Representing learned knowledge (sketchy)
- Graphical model
- nodes are features
- Edges connect features that dont change relative
distance over the interaction