Vision II: applications for humanoids - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Vision II: applications for humanoids

Description:

To interact, need to understand what is present in a scene ... Distribution of identical billiard balls. Region of. interest. Center of. mass. Mean Shift ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 40
Provided by: jonsc2
Category:

less

Transcript and Presenter's Notes

Title: Vision II: applications for humanoids


1
Vision II applications for humanoids
  • Jon Scholz
  • BHR

2
What specific requirements doHumanoids have for
vision?
  • Need to recognize objects
  • Need to track changes over time
  • Need to estimate geometry
  • Need to estimate pose

3
What specific advantages do humanoids bring to
vision?
  • Mobility
  • Can change viewpoint (active vision)
  • Dexterity
  • Can interact with objects
  • Multiple modes of sensation
  • Size?

4
Fundamental Issues (my opinion)
  • Dynamics Tracking
  • Need to follow target objects
  • Need to infer how its actions are changing the
    world
  • Segmentation Recognition
  • To interact, need to understand what is present
    in a scene
  • I would argue, requires high-level features, and
    holistic representations

5
Outline
  • I) Dynamics Tracking
  • Meanshift for following objects
  • Visual servoing
  • II) Segmentation
  • Basic Network flow algorithms
  • Overview of other approaches
  • III) Putting it together segmentation from
    motion
  • SANE - Segmentation according to natural examples
    (Ross et.al. 2006)
  • Better vision through Manipulation (Fitzpatrick
    et al. 2002)

6
Outline
  • I) Dynamics Tracking
  • Meanshift for following objects
  • Visual servoing
  • II) Segmentation
  • Basic Network flow algorithms
  • Overview of other approaches
  • III) Putting it together segmentation from
    motion
  • SANE - Segmentation according to natural examples
  • Better vision through Manipulation

7
Tracking I - Intro to Meanshift
  • meanshift soccer - http//www.youtube.com/watch?v
    zLtjPfPP9HY

8
What is Mean Shift ?
A tool for Finding modes in a set of data
samples, manifesting an underlying probability
density function (PDF) in RN
  • PDF in feature space
  • Color space
  • Scale space
  • Actually any feature space you can conceive

Non-parametric Density Estimation
Discrete PDF Representation
Non-parametric Density GRADIENT Estimation
(Mean Shift)
PDF Analysis
9
Intuitive Description
Region of interest
Center of mass
Mean Shift vector
Objective Find the densest region
Distribution of identical billiard balls
10
Non-Parametric Density Estimation
Assumption The data points are sampled from an
underlying PDF
Data point density implies PDF value !
Assumed Underlying PDF
Real Data Samples
11
But what does the probability describe?
  • Pixel-wise probability that the current window
    was drawn from the same window as the target
  • Feature space is a histogram of colors in the
    target window
  • Histogram is backprojected to image to compute
    meanshift vector

12
Histogram Backprojection
  • Want an estimate of probability for each pixel
    based on relative abundance in target
  • Compute ratio histogram
  • Each pixel in search window is assigned to a bin
    in R
  • Provides a heuristic for emphasizing colors that
    have a large representation in the target image

13
Backprojected Histogram
14
Uses in robotics Visual servoing
  • Visual Servoing - control based on feedback of
    visual measurements
  • weird german visual servoing - http//www.youtube.
    com/watch?vzj-779Nsjh8

15
Camshift on our arm
  • Bugtracking with camshift - http//blip.tv/file/15
    87997
  • multiple bugs - http//blip.tv/file/1581360

16
Problems with Meanshift
  • Feature space is limited to color
  • Loses spatial information (think geometry) about
    target
  • Easily gets confused by objects with similar
    color profiles
  • meanshift my face - http//blip.tv/file/1764913

17
Outline
  • I) Dynamics Tracking
  • Meanshift for following objects
  • Visual servoing
  • II) Segmentation
  • Basic Network flow algorithms
  • Overview of other approaches
  • III) Putting it together segmentation from
    motion
  • SANE - Segmentation according to natural examples
  • Better vision through Manipulation

18
When does spatial information matter?
  • Holistic image perception
  • Necessary when low-level features cant
    disambiguate candidate objects

19
Vision methods using more global features
  • Simplest example figure/ground segmentation
    (also from gestalt psych)
  • Many methods exist to segment
  • Canny edge detector
  • Normalized Cuts
  • K-means clustering
  • LOCUS

20
Types of segmentation
  • 2 general classes of algorithms
  • Region-based outputs label for each pixel
    (normalized cuts does this)
  • Edge-based identifies a boundary between object
    and background (canny edge detector)
  • Doesnt constrain to closed contours

21
Normalized Cuts
  • Method for optimizing segmentation based on
    pixelwise likelhoods separation penalty
  • Involves constructing a DAG describing the
    contributions of each term
  • Apply ford-fulkerson to solve
  • One way or another, everything gets a label

22
Great, but where do the likelihoods come from?
  • Early segmentation algorithms were human-labeled
    at the pixel level
  • Newer algorithms labeled at image level (type
    specified for a class of images)

23
Outline
  • I) Dynamics Tracking
  • Meanshift for following objects
  • Visual servoing
  • II) Segmentation
  • Basic Network flow algorithms
  • Overview of other approaches
  • III) Putting it together segmentation from
    motion
  • SANE - Segmentation according to natural examples
  • Better vision through Manipulation

24
Automatic discovery of labels
  • Uses motion as a cue for inference about objects
  • Different perspectives on a scene allows
    inference based on simple assumptions

25
SANE learning static segmentation from motion
  • Learns local boundary likelihoods by applying
    Migdal Grimson to video data
  • Extracts feature set on 5x5 patches
  • Can apply learned patch combination to segment
    static images (w/ similar visual properties)
  • No human labeling required

26
SANE features
27
Why this is promising
  • Evidence from developmental psychology that
    humans learn segmentation cues this way
  • The ability to distinguish object boundaries by
    motion and depth perception developmentally
    precedes the ability to segment based on cues
    such as color, brightness, and texture. This
    suggests that segmentation with these cues can be
    learned from more primative, causality-dependent
    mechanisms.
  • Spelke, 1980-something
  • While an experienced adult can interpret visual
    scenes perfectly well without acting upon them,
    linking action and perception seems crucial to
    the developmental process that leads to that
    competence.
  • -Fitzpatrick, 2002

28
Can being a robot help?
  • Robots can produce auxiliary information to help
    make inferences about visual data
  • Can move itself to acquire new data about same
    scene (active vision paradigm)
  • Can move the objects themselves

29
Better Vision Through Manipulation (Fitzpatrick
et al. 2002)
  • Attempts to model a robot learning trajectory
    after biological evidence (mirror neurons and
    cortical motor regions - can explain, if people
    are interested)
  • Visual learning starts with linking motor events
    to visual consequences
  • Used to learn simple 2D workspace IK solution
  • Continue on to learn object segmentations

30
Assumptions
  • While the relationship between the optic ?ow and
    the physical motion is likely to be extremely
    complex, the correlation in time of the two
    events will generally be exceedingly precise.
    This time-correlation can be used as a signature
    to identify parts of the scene that are being
    in?uenced by the robot motion, even in the
    presence of other distracting motion sources.
  • Simple idea

31
Workspace visual servoing
  • Learns 2D closed loop policy for manipulator
    configuration
  • Populates a table of joint parameters vs. visual
    motion (mainly EE)

32
Learning about objects
  • Given visuo-motor knowledge of arm, begin
    exploring objects
  • Segmentation of regions via optical flow
  • Exploratory poking actions to identify shape of
    object

33
Identifying rigid bodies
34
Identifying shape
Poking can reveal a di?erence in the shape of two
objects without any prior knowledge of their
appearance.
35
Future directions represent others actions?
  • Given object model, can reverse direction of
    inference when observing it move from a foreign
    manipulator

36
Katz Brock
  • Learn Object affordances through manipulator
    exploration
  • Discover object kinematics by affine
    transformations
  • Represent as graphical model, where edges are
    pruned if a DOF is discovered

37
Feature Finding
  • Use stock feature detector (SIFT)
  • Track features through poking interactions with
    the arm

38
Identifying independent links
  • Features of rigid bodies will move as a unit

39
Representing learned knowledge (sketchy)
  • Graphical model
  • nodes are features
  • Edges connect features that dont change relative
    distance over the interaction
Write a Comment
User Comments (0)
About PowerShow.com