Title: Robot Vision
1Robot Vision
2Introduction
- Computer vision
- Endowing machines with the means to see
- Create an image of a scene and extract features
- Very difficult problem for machines
- Several different scenes can produce identical
images. - Images can be noisy .
- Cannot directly invert the image to reconstruct
the scene.
3Human Vision (1)
4Human Vision (2)
5Human Vision (3)
6Steering an Automobile
- ALVINN system Pomerleau 1991,1993
- Uses Artificial Neural Network
- Used 3032 TV image as input (960 input node)
- 5 Hidden node
- 30 output node
- Training regime modified on-the-fly
- A human driver drives the car, and his actual
steering angles are taken as correct labels for
the corresponding inputs. - Shifted and rotated images were also used for
training. - ALVINN has driven for 120 consecutive kilometers
at speeds up to 100km/h.
7Steering an Automobile-ALVINN
8Two stages of Robot Vision (1)
- Finding out objects in the scene
- Looking for edges in the image
- Edgea part of the image across which the image
intensity or some other property of the image
changes abruptly. - Attempting to segment the image into regions.
- Regiona part of the image in which the image
intensity or some other property of the image
changes only gradually.
9Two stages of Robot Vision (2)
- Image processing stage
- Transform the original image into one that is
more amendable to the scene analysis stage. - Involves various filtering operations that help
reduce noise, accentuate edges, and find regions. - Scene analysis stage
- Attempt to create an iconic or a feature-based
description of the original scene, providing the
task-specific information.
10Two stages of Robot Vision (3)
- Scene analysis stage produces task-specific
information. - If only the disposition of the blocks is
important, appropriate iconic model can be (C B A
FLOOR) - If it is important to determine whether there is
another block on top of the block labeled C,
adequate description will include the value of a
feature, CLEAR_C.
11Averaging (1)
- Original image can be represented as an mn array
of numbers. The numbers represent the light
intensities at corresponding points in the image. - Certain irregularities in the image can be
smoothed by an averaging operation. - Averaging operation involves sliding an averaging
widow all over the image array.
12Averaging (2)
- Smoothing operation thickens broad lines and
eliminates thin lines and small details. - The averaging window is centered at each pixel,
and the weighted sum of all the pixel numbers
within the averaging window is computed. This sum
then replaces the original value at that pixel.
13Averaging (3)
- Common function used for smoothing is a Gaussian
of two dimensions. - Convolving an image with a Gaussian is equivalent
to finding the solution to a diffusion equation
when the initial condition is given by the image
intensity field.
14Averaging (4)
15Edge enhancement (1)
- Edge any boundary between parts of the image
with markedly different values of some property. - Edges are often related to important object
properties. - Edges in the image occur at places where the
second derivative of the image intensity is zero.
16Edge enhancement (2)
17Combining Edge Enhancement with Averaging (1)
- Edge enhancement alone would tend to emphasize
noise elements along with enhancing edges. - To be less sensitive to noise, both operations
are needed. (First averaging and then edge
enhancing) - We can convolve the one-dimensional image with
the second derivative of a Gaussian curve to
combine both operation.
18Combining Edge Enhancement with Averaging (2)
- Laplacian is second-derivate-type operation that
enhances edges of any orientation. - Laplacian of the two-dimensional Gaussian
function looks like an upside-down hat, often
called a sombrero function. - Entire averaging/edge-finding operation can be
achieved by convolving the image with the
sombrero function(Called Laplacian filtering)
196.4.4 Finding Region
- Another method for processing image
- ? to find regions
- Finding regions ? Finding outlines
20A region of the image
- A region is homogeneous.
- The difference in intensity values of pixels in
the region is no more than some ? - A polynomial surface of degree k can be fitted to
the intensity values of pixels in the region with
largest error less than ? - For no two adjacent regions is it the case that
the union of all the pixels in these two regions
satisfies the homogeneity property. - Each region corresponds to a world object or a
meaningful part of one.
21Split-and-merge method
- The algorithm begins with just one candidate
region, the whole image. - Until no more splits need be made.
- For all candidate regions that do not satisfy the
homogeneity property, are each split into four
equal-sized candidate regions. - Adjacent candidate regions are merged if their
pixels satisfying homogeneity property.
22(No Transcript)
23Regions Found by Split Merge for a Grid-World
Scene (from Fig.6.12)
24Cleaned Up the regions found by Split-and-merge
method
- Eliminating very small regions (some of which are
transitions between larger regions). - Straightening bounding lines.
- Taking into account the known shapes of objects
likely to be in the scene.
256.4.5 Using Image Attributes Other Than Intensity
- Image attributes other than the homogeneity
- ? Visual texture
- fine-grained variation of the surface
reflectivity of the objects - Ex) a field of grass, a section of carpet,
foliage in tree, the fur of animals - The reflectivity variations in objects cause
similar fine-grained structure in image intensity.
26Methods for analyzing texture
- Structural methods
- Represent regions in the image by a tessellation
(??) of primitive texels small shapes
comprising black and white parts - Statistical methods
- Based on the idea that image texture is best
described by a probability distribution for the
intensity values over regions of the image. - Ex) an image of a grassy field in which the
blades of grass are oriented vertically - ? a probability distribution that peaks for
thin, vertically oriented regions of high
intensity, separated by regions of low intensity
27Other attributes
- If we had a direct way to measure the range from
the camera to objects in the scene, we could
produce a range image and look for abrupt range
differences. - Range image each pixel value represents the
distance from the corresponding point in the
scene to the camera. - Motion, color
286.5 Scene Analysis (1)
- Scene Analysis
- Extracting from the image the needed information
about the scene - Requires either additional images (for stereo
vision) or general information about the kinds of
scenes, since the scene-to-image transformation
is many-to-one. - The required knowledge
- very general or quite specific
- explicit or implicit
296.5 Scene Analysis (2)
- Knowledge of surface reflectivity characteristics
and shading of intensity in the image - ? give information about the shape of smooth
objects in the scene. - Iconic scene analysis
- Build a model of the scene or parts of the scene
- Feature-based scene analysis
- Extracts features of the scene needed by task
- Task-oriented or purposive vision
306.5.1 Interpreting Lines and Curves in the Image
- Interpreting the line drawing
- Association between scene properties and the
components of a line drawing
- Trihedral vertex polyhedra
- The scene to contain only planar surfaces such
that no more than three surfaces intersect in a
point
31Three kinds of edges in Trihedral vertex
polyhedra (1/2)
- There are only three kinds of ways in which two
planes can intersect in a scene edge. - Occlude
- One kind of edge is formed by two planes, with
one of them occluding the other. - labeled in Fig. 6.15 with arrows (?).
- the arrowhead pointing along the edge such that
surface doing the occluding is to the right of
the arrow.
32Three kinds of edges in Trihedral vertex
polyhedra (2/2)
- Blade
- Two planes can intersect such that both planes
are visible in the scene. - Two surfaces form a convex edge.
- Labeled with pluses ().
- Fold
- Edge is concave.
- Labeled with minus (?)
33Labels for Lines at Junctions
34Line-labeling scene analysis (1/2)
- Labeling all of the junctions in the image as V,
W, Y, or T junctions according to the shape of
the junctions in the image
35Line-labeling scene analysis (2/2)
- Assign , ?, or ? labels to the lines in the
image. - An image line that connects two junctions must
have a consistent labeling. - If there is no consistent labeling,
- ? there must have been some error in converting
the image into a line drawing. - ? the scene must not have been one of trihedral
polyhedra. - Constraint satisfaction problem
366.5.2 Model-Based Vision (1/2)
- If, we knew that the scene contained a
parallelepiped (in Figure 6.15), we could attempt
to fit a projection of a parallelepiped to
components of an image of this scene.
- A generalized cylinders as building blocks for
model construction - Each cylinder has 9 parameters.
37Model-Based Vision (2/2)
- An example rough scene reconstruction of a human
figure - Hierarchical representation
- Each cylinder in the model can be articulated
into a set of smaller cylinders
386.6 Stereo Vision and Depth Information
- Depth information can be obtained using stereo
vision, which based on triangulation calculations
using two (or more) images. - Some depth information can be extracted from a
single image. - The analysis of texture in the image can indicate
that some elements in the scene are closer than
are others. - More precise depth information If we know that a
perceived object is on the floor and the camera
height above the floor, we can calculate the
distance to the object.
39Depth Calculation from a Single Image
40Stereo Vision
- Stereo vision uses triangulation.
- Two lenses whose centers are separated by a
baseline, b. - The image point of a scene point, at distance d,
created by these lenses. - The angles of these image points from the lens
centers, ?, ?. - The optical axes are parallel, the image planes
are coplanar, and the scene point is in the same
plane as that formed by two parallel optical axes.
41Triangulation in Stereo Vision
42The main complication
- In scenes containing more than one point, it must
be established which pair of points in the two
images correspond to the same scene point. - We must be able to identify a corresponding pixel
in the other image. ? correspondence problem
43Techniques for correspondence problem
- Geometric analysis reveals that we need only
search along one dimension (epipolar line). - One-dimensional searches can be implemented by
cross-correlation of two image intensity profiles
along corresponding epipolar lines. - We do not have to find correspondences between
individual pairs of image points but can do so
between pairs of larger image components, such as
lines.
44Assignments
- Page 111112
- Ex.6.2, Ex. 6.4, Ex. 6.5