Title: Segmentation
1Segmentation
Goal To divide an image into parts that are
closely identified with objects, surfaces,
features, areas, ... of the real world scene.
Problem This is (almost always) too hard to
accomplish directly. We must try to infer
these image-region-to- world-feature
identities through an amalgam of photometric
and geometric observations on the image, as
well as domain knowledge. Reality
Segmentation, as used in the description of a
computer vision system, will typically refer to
an initial cut at the problem (perhaps
involving lots of processing) that is then
passed on to processing for grouping,
description, and recognition. Departure
Notice that, very soon, we will no longer be
processing pixels. We will be looking at groups
of pixels, and then quickly, from segmentation
to what follows, we will make a transition to
processing more abstract entities. This is where
we leave image processing behind.
2 Segmentation is driven by three sources of
information Global Knowledge
What I tend to call domain knowledge. This is
what you know about the problem before you
start. For example... (1) Analyzing
microscopic blood samples. (2) Looking at
aerial photos of highways. (3) Robotics, with
four types of objects. (4) Automotive collision
warning device. (5) Inspection of a known
manufactured part. Edge Measures
By now you what these are! Regional
Measures These will be photometric and
geometric measurements on image regions
(having 2D extent). Since in an ideal
world, regions are bounded by edges,
this is - in a sense - a dual problem to
the edge detection approach.
3Thresholding
The simplest form of segmentation. Only works
for really simple situations, such as an
isolated object against a roughly uniform
background. Form a binary image by setting
all pixels of gray level (or some derived
scalar quantity) greater than or equal to a
threshold t to 1 -- all others to 0. Can
also do band thresholding, in which pixels with
gray levels falling within some range of
values are set to 1. This can work for
roughly uniform backgrounds with objects
that can be either brighter or darker. Fixed
thresholds almost never work. We have to
improve them by adapting to the conditions,
either image by image or -- better -- by
varying the threshold over the image. With
multithresholding we create an N-ary image, with
N relatively small, but greater than two.
Comment
Most of the ideas (below) can be used on
subimages and combined, but the assumptions may
not hold as well.
4Threshold Selection (detection)
Hand-tuning to select the proper threshold for
an image, or even a set of images, will
almost never result in a solution that works
in a realistic test. Thresholding can be
made to work in many cases (not perfectly,
but well enough to get started) by analyzing
either some property of the image itself, or by
analyzing the result of the segmentation
against our expectation. This is one way
in which domain knowledge can be used. P-tile
In p-tile thresholding, the threshold is
iteratively adjusted until 1/p of the pixels
in the image are set to 0. p is a quantity
that must be set in advance based on our a
priori knowledge of the domain. For
instance, we know were looking at machine
parts on a conveyor belt, and we know that each
part will be seen in isolation and will
occupy about 25 of the image area. Set
p0.75 .
5Modal Analysis
Often (assuming a binary segmentation is
reasonable) the histogram of pixel values
(gray level, or some derived quantity) will
be essentially bimodal. Bimodal? Basically,
it has two lumps, or dominant local maxima,
separated by a reasonably distinct valley, or
a dominant minimum. Then, one can analyze
the histogram to set the threshold to
correspond to the valley bottom. Of course,
histograms can be noisy -- so you have to
make sure that the local maxima you select are
displaced far enough from each other to ensure
that they arent part of the same lump.
Once youve done that, take the global minimum
between the two maxima as your threshold.
This approach does not make any use of the notion
of spatial coherence . Indeed, most
thresholding methods do not. There is no use
of the resulting structure or organization
exhibited by the segment(s) to evaluate them.
6Rubber Sheet Background Estimation (Adaptive t)
This is something a buddy of mine and I dreamed
up in EE661 at Purdue -- which is like EE863
here at OSU. (But Im sure its been done by
others.) The underlying assumptions are
A smoothly varying background brightness
function Objects may be either brighter or
darker (locally) than than the
background. The objects do not reach the
image border. At each point in the image,
estimate the local background brightness (as
if no object were present) by computing the
bilinear interpolation of the four background
pixels at the ends of the current row and
column. Do band thresholding around this
estimated background value to segment
objects. Pixels with brightness values lying
outside the band are assigned (initially, at
least) to the object region(s). You can
improve this by first smoothing the border
pixel values there are probably other
tricks you can think up. How could spatial
coherence be injected into this?
7Optimal Thresholding
This is sort of a misnomer. These methods are
optimal, given enough assumptions.... One
popular notion is to fit some number of
Gaussians to the histogram.
Ummm.... how many? Usually two. Then the
threshold is set as the minimum probability
between the maxima of the two Gaussians.
Rather Bayes-like, eh? This works when the
assumptions hold ( of regions, and that the
class-conditional densities are really
Gaussian).
8Iterative Threshold Selection (Optimal?)
This technique iteratively adjusts the
segmentation threshold until it is the
average of the mean background and mean
object brightnesses, as determined by the
segmentation. (Sounds like
something for nothing...) We begin by assuming
we know nothing about the image (good
assumption) except that the four corner pixels
are part of the background (Probably a
good assumption in many cases). We will start
by assigning all of the remaining pixels to
the foreground (object) region. Now, we
iterate... At step t compute µBt and
µOt as the mean value of the pixels
currently assigned to the background (B)
and the object regions (O), respectively.
9 We next compute a new threshold as the
average of the two regional average
brightness values.
This new threshold is then applied to the
image to produce a new segmentation, and
then we compute new values for the µs
and another T . We stop when....
Four to ten iterations is usually sufficient
for this algorithm to converge -- which is a
bit surprising.
(at least to me) I wonder if this could be
used sequentially somehow to do N-ary
segmentation by extracting one region at a
time. What do you
think?