Title: Evaluation Techniques in Computer Vision
1Evaluation Techniques in Computer Vision
- EE4H, M.Sc 0407191
- Computer Vision
- Dr. Mike Spann
- m.spann_at_bham.ac.uk
- http//www.eee.bham.ac.uk/spannm
2Contents
- Why evaluate?
- Images synthetic/natural?
- Noise
- Example 1. Evaluation of thresholding/segmentation
methods - Example 2. Evaluation of optical flow methods
3Why evaluate?
- Computer vision algorithms are complex and
difficult to analyse mathematically - Evaluation is usually through measurement of the
algorithms performance on test images - Use of a range of images to establish performance
envelope - Comparison with existing algorithms
- Performance on degraded (noise-added) images
(robustness) - Sensitivity to algorithm parameter settings
4Test images
- Real images
- Ground truth difficult to establish
- Pseudo-real images
- Could be synthetic objects moving against real
background - Often a good compromise
- Synthetic images
- Noise and illumination variation over object
surfaces hard to model realistically
5Simple synthetic images
- Simple object-background synthetic images used
to evaluate thresholding and segmentation
algorithms - They obey a very simple image model (piecewise
constant Gaussian noise) - Unrealistic in practice images are not like
this!
6Simple synthetic images
Medium noise
Zero noise
Low noise
7Pseudo-real images
- More realistic object background images are
better used to evaluate segmentation algorithms - Images of natural objects in natural illumination
- Ground truth can be established using hand
segmentation tools (such as built into many image
processing packages)
8Pseudo-real images
Screws
Keys
Cars
Washers
9Simple synthetic edges
- Again, piecewise constant Gaussian noise image
model - Ideal step edge
- Precise edge location but not achievable by
finite aperture imaging systems
10Simple synthetic edges
Low noise
Medium noise
High noise
11Pseudo-real edges
- More realistic edge profiles can be created by
smoothing an ideal step edge
Step edge
Gaussian filter
12Pseudo-real movies
- The yosemite sequence is a computer generated
movie of a rendering of a fly-through the
Yosemite valley - Background clouds are real
- Enables true flow (ground truth) to be determined
- Used extensively in the evaluation of optical
flow algorithms - yosemite.avi
- yosemite_flow.avi
13Noise
- Often used to evaluate the robustness of
algorithms - Additive noise usual in optical images but
multiplicative is more realistic in sonar/radar
images - Noise level proportional to signal level
- Usual noise model is independent random variables
(usually Gaussian) - Correlated noise often more realistic
14Noise
- Standard noise model is zero-mean identical
independently distributed (iid) Gaussian (normal)
random variables - Characterised by variance
- Probability distribution of rvs
15Noise
- Noise level characterised by the signal-to-noise
ratio - Usually expressed in dBs
- Defined as
- is the mean-square grey level defined (for a
pixel image) as
16Noise
dB
30dB
0dB
17Noise (mean-square error)
- We can regard the mean-square error (difference)
between 2 images as noise - Often used to evaluate image compression
algorithms in comparing the original and
decompressed images - Image differences can also be expressed as the
peak-signal-to-noise-ratio (PSNR) in dB by taking
the signal level as 255
18Noise (mean-square error)
19Other types of noise
- The other main category of (additive) noise is
impulse (sometimes called salt and pepper)
noise - Characterised by the impulse rate (spatial
density of noise impulses) and mean square
amplitude of impulse - Can normally be easily filtered out using median
filters
20Other types of noise
Salt and pepper noise
Original
De-speckled
21Other types of noise
- There are many other types of noise which can be
considered in algorithm evaluation - Essentially more sophisticated and realistic
probability distributions of noise rvs - For example a generalised Gaussian model is
often considered to model heavy tailed
distributions - However, in my humble opinion, a more realistic
source of noise is the deviation away from the
ideal of the illumination variation across
object surfaces
22Other types of noise
23Other types of noise
24Evaluation of thresholding segmentation methods
- Segmentation and thresholding algorithms
essentially group pixels into regions (or
classes) - Simplest case is object/background
- Simple evaluation metrics just quantify the
number of miss-classified pixels - For basic images models such as constant
greylevel in object/background regions plus iid
Gaussian noise, the probability of error can be
computed analytically
25Evaluation of thresholding segmentation methods
- For a simple object/background image
26Evaluation of thresholding segmentation methods
- Miss-classification probability is a function of
a threshold T - For a simple constant region greylevel model plus
additive iid Gaussian noise we can easily derive
an analytical expression for - Not very useful in practice as limited image
model and we also require the ground truth - More useful just to simply measure the
miss-classification error as a function of
threshold
27Evaluation of thresholding segmentation methods
- Usual to represent correct classification
probabilities and false alarm probabilities
jointly within a receiver operating curve (ROC) - For example, the ROC shows how these vary as a
function of threshold for an object/background
classification
28Evaluation of thresholding segmentation methods
1.0
T0
Prob. of correct classification
T255
0.0
0.0
1.0
Prob. of false alarm
29Evaluation of thresholding segmentation methods
- More useful methods of evaluation can be found by
taking account of the application of the
segmentation - Segmentation is rarely an end in itself but a
component in an overall machine vision system - Also, the level of under- or over- segmentation
of an algorithm needs to be determined
30Evaluation of thresholding segmentation methods
Ground truth
Under-segmentation
Over-segmentation
31Evaluation of thresholding segmentation methods
- Under-segmentation is bad as distinct regions are
merged - Over-segmentation can be acceptable as
sub-regions comprising a single ground truth
region can be merged using high level knowledge - Also, the level of over-segmentation can be
controlled by parameter settings of the algorithm
32Evaluation of thresholding segmentation methods
- A possible segmentation metric is to quantify
correctly detected regions, over-segmentation and
under-segmentation - Depends upon some threshold setting T
- Region rather than pixel based
- Used in Koester and Spanns paper (IEEE Trans.
PAMI, 2000) to evaluate range image segmentations
33Evaluation of thresholding segmentation methods
- Correct detection
- At least T of the pixels in region k of the
segmented image are marked as pixels in region j
of the ground truth image - And vice versa
Segmentation
GT image
34Evaluation of thresholding segmentation methods
- Over-segmentation
- Region j in the ground truth image corresponds
to regions k1, k2 km in the segmented image if - At least T of the pixels in region ki are
marked as pixels of region j - At least T of the pixels in region j are marked
as pixels in the union of regions k1, k2 km
35Evaluation of thresholding segmentation methods
GT image
Segmentation
36Evaluation of thresholding segmentation methods
- Under-segmentation
- Regions j1, j2 jm in the ground truth image
correspond to region k in the segmented image if
- At least T of the pixels in region k are
marked as pixels in the union of regions j1, j2
jm - At least T of the pixels in region ji are
marked as pixels in region k
37Evaluation of thresholding segmentation methods
GT image
Segmentation
38Evaluation of thresholding segmentation methods
- The metric also allows us to quantify missed and
noise regions - Missed regions regions in the ground truth
image not found in the segmented image - Noise regions regions in the segmented image
not found in the ground truth image - Overall, the average number of correct, over,
under, missed and noise regions can be quantified
over an image database and different algorithms
compared
39Evaluation of optical flow methods
- Optical flow algorithms compute the 2D optical
flow vector at each pixel using consecutive
frames in a video sequence - Optical flow algorithms are notoriously un-robust
- Crucial to evaluate the effectiveness of any
method used (or any new method devised) - Usually ground truth difficult to come by
40Evaluation of optical flow methods
41Evaluation of optical flow methods
- This simple error measurement naturally amplifies
errors when the flow vectors are large (for the
same relative flow error) - Can normalize the error by the product of the
magnitudes of the ground truth flow and flow
estimate
42Evaluation of optical flow methods
- Often the ground truth is not available
- A useful (but often crude) way of comparing the
quality of two optical flow fields
and is to compute the displaced
frame difference (DFD) statistic - Uses the two consecutive frames of a sequence
from which the flows were computed
43Evaluation of optical flow methods
44Evaluation of optical flow methods
- DFD is a crude estimate because it says nothing
about the accuracy of the motion field directly
just the quality of the pixel mapping from one
frame to the next - Plus it says nothing about the confidence
attached to optical flow estimates - However, it is the basis of motion compensation
algorithms for most of the current video
compression standards (MPEG, H261 etc)
45Evaluation of optical flow methods
- In optical flow estimation, as in other types of
estimation algorithms, we are often interested in
the quality of the estimates - In classic estimation theory, we often compute
confidence limits on estimates - We can say with a certain degree of confidence
(say 90) that the parameter lies within certain
bounds - We usually assume that the quantities we are
estimating follow some known probability
distribution (for example chi-squared)
46Evaluation of optical flow methods
- In the case of optical flow vectors, confidence
regions are ellipses in 2 dimensions - They essentially characterise the distribution of
the estimation error - Assuming a normal distribution of the flow
error, confidence ellipses can be drawn for
any confidence limit - Orientation and shape of ellipses determined by
the covariance matrix defining the normal
distribution - The eigenvalues of the covariance matrix define a
particular confidence limit
47Evaluation of optical flow methods
99
90
70
Confidence ellipses of
48Evaluation of optical flow methods
Yosemite true flow
Yosemite
Yosemite flow (LK)
Yosemite flow (LK) confidence thresholded
49Conclusions
- Evaluation in computer vision is a difficult and
often controversial topic - I would suggest 3 rules of thumb to consider when
evaluating your work for the purposes of
assignments - Consider carefully your test data. Make it as
realistic as possible - Make your evaluations as much as possible
application driven - Make your algorithms self evaluating if
possible through the use of confidence statistics