Vision II: applications for humanoids - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Vision II: applications for humanoids

Description:

To interact, need to understand what is present in a scene ... Distribution of identical billiard balls. Region of. interest. Center of. mass. Mean Shift ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 40

Provided by: jonsc2

Category:

more less

Transcript and Presenter's Notes

Title: Vision II: applications for humanoids

1
Vision II applications for humanoids

Jon Scholz
BHR

2
What specific requirements doHumanoids have for
vision?

Need to recognize objects
Need to track changes over time
Need to estimate geometry
Need to estimate pose

3
What specific advantages do humanoids bring to
vision?

Mobility
Can change viewpoint (active vision)
Dexterity
Can interact with objects
Multiple modes of sensation
Size?

4
Fundamental Issues (my opinion)

Dynamics Tracking
Need to follow target objects
Need to infer how its actions are changing the
world
Segmentation Recognition
To interact, need to understand what is present
in a scene
I would argue, requires high-level features, and
holistic representations

5
Outline

I) Dynamics Tracking
Meanshift for following objects
Visual servoing
II) Segmentation
Basic Network flow algorithms
Overview of other approaches
III) Putting it together segmentation from
motion
SANE - Segmentation according to natural examples
(Ross et.al. 2006)
Better vision through Manipulation (Fitzpatrick
et al. 2002)

6
Outline

I) Dynamics Tracking
Meanshift for following objects
Visual servoing
II) Segmentation
Basic Network flow algorithms
Overview of other approaches
III) Putting it together segmentation from
motion
SANE - Segmentation according to natural examples
Better vision through Manipulation

7
Tracking I - Intro to Meanshift

meanshift soccer - http//www.youtube.com/watch?v
zLtjPfPP9HY

8
What is Mean Shift ?
A tool for Finding modes in a set of data
samples, manifesting an underlying probability
density function (PDF) in RN

PDF in feature space
Color space
Scale space
Actually any feature space you can conceive

Non-parametric Density Estimation
Discrete PDF Representation
Non-parametric Density GRADIENT Estimation
(Mean Shift)
PDF Analysis
9
Intuitive Description
Region of interest
Center of mass
Mean Shift vector
Objective Find the densest region
Distribution of identical billiard balls
10
Non-Parametric Density Estimation
Assumption The data points are sampled from an
underlying PDF
Data point density implies PDF value !
Assumed Underlying PDF
Real Data Samples
11
But what does the probability describe?

Pixel-wise probability that the current window
was drawn from the same window as the target
Feature space is a histogram of colors in the
target window
Histogram is backprojected to image to compute
meanshift vector

12
Histogram Backprojection

Want an estimate of probability for each pixel
based on relative abundance in target
Compute ratio histogram
Each pixel in search window is assigned to a bin
in R
Provides a heuristic for emphasizing colors that
have a large representation in the target image

13
Backprojected Histogram
14
Uses in robotics Visual servoing

Visual Servoing - control based on feedback of
visual measurements
weird german visual servoing - http//www.youtube.
com/watch?vzj-779Nsjh8

15
Camshift on our arm

Bugtracking with camshift - http//blip.tv/file/15
87997
multiple bugs - http//blip.tv/file/1581360

16
Problems with Meanshift

Feature space is limited to color
Loses spatial information (think geometry) about
target
Easily gets confused by objects with similar
color profiles
meanshift my face - http//blip.tv/file/1764913

17
Outline

I) Dynamics Tracking
Meanshift for following objects
Visual servoing
II) Segmentation
Basic Network flow algorithms
Overview of other approaches
III) Putting it together segmentation from
motion
SANE - Segmentation according to natural examples
Better vision through Manipulation

18
When does spatial information matter?

Holistic image perception
Necessary when low-level features cant
disambiguate candidate objects

19
Vision methods using more global features

Simplest example figure/ground segmentation
(also from gestalt psych)
Many methods exist to segment
Canny edge detector
Normalized Cuts
K-means clustering
LOCUS

20
Types of segmentation

2 general classes of algorithms
Region-based outputs label for each pixel
(normalized cuts does this)
Edge-based identifies a boundary between object
and background (canny edge detector)
Doesnt constrain to closed contours

21
Normalized Cuts

Method for optimizing segmentation based on
pixelwise likelhoods separation penalty
Involves constructing a DAG describing the
contributions of each term
Apply ford-fulkerson to solve
One way or another, everything gets a label

22
Great, but where do the likelihoods come from?

Early segmentation algorithms were human-labeled
at the pixel level
Newer algorithms labeled at image level (type
specified for a class of images)

23
Outline

I) Dynamics Tracking
Meanshift for following objects
Visual servoing
II) Segmentation
Basic Network flow algorithms
Overview of other approaches
III) Putting it together segmentation from
motion
SANE - Segmentation according to natural examples
Better vision through Manipulation

24
Automatic discovery of labels

Uses motion as a cue for inference about objects
Different perspectives on a scene allows
inference based on simple assumptions

25
SANE learning static segmentation from motion

Learns local boundary likelihoods by applying
Migdal Grimson to video data
Extracts feature set on 5x5 patches
Can apply learned patch combination to segment
static images (w/ similar visual properties)
No human labeling required

26
SANE features
27
Why this is promising

Evidence from developmental psychology that
humans learn segmentation cues this way
The ability to distinguish object boundaries by
motion and depth perception developmentally
precedes the ability to segment based on cues
such as color, brightness, and texture. This
suggests that segmentation with these cues can be
learned from more primative, causality-dependent
mechanisms.
Spelke, 1980-something
While an experienced adult can interpret visual
scenes perfectly well without acting upon them,
linking action and perception seems crucial to
the developmental process that leads to that
competence.
-Fitzpatrick, 2002

28
Can being a robot help?

Robots can produce auxiliary information to help
make inferences about visual data
Can move itself to acquire new data about same
scene (active vision paradigm)
Can move the objects themselves

29
Better Vision Through Manipulation (Fitzpatrick
et al. 2002)

Attempts to model a robot learning trajectory
after biological evidence (mirror neurons and
cortical motor regions - can explain, if people
are interested)
Visual learning starts with linking motor events
to visual consequences
Used to learn simple 2D workspace IK solution
Continue on to learn object segmentations

30
Assumptions

While the relationship between the optic ?ow and
the physical motion is likely to be extremely
complex, the correlation in time of the two
events will generally be exceedingly precise.
This time-correlation can be used as a signature
to identify parts of the scene that are being
in?uenced by the robot motion, even in the
presence of other distracting motion sources.
Simple idea

31
Workspace visual servoing

Learns 2D closed loop policy for manipulator
configuration
Populates a table of joint parameters vs. visual
motion (mainly EE)

32
Learning about objects

Given visuo-motor knowledge of arm, begin
exploring objects
Segmentation of regions via optical flow
Exploratory poking actions to identify shape of
object

33
Identifying rigid bodies
34
Identifying shape
Poking can reveal a di?erence in the shape of two
objects without any prior knowledge of their
appearance.
35
Future directions represent others actions?