ECSE6963, BMED 6961 Cell - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

ECSE6963, BMED 6961 Cell

Description:

If the touching cells have shapes that can be modeled, we can exploit that ... Shape Features. Absorption Features. Texture Features. Good Features. A good feature ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 42

Provided by: ecse1

Category:

more less

Transcript and Presenter's Notes

Title: ECSE6963, BMED 6961 Cell

1
ECSE-6963, BMED 6961Cell Tissue Image Analysis

Lecture 16 Feature Selection Validation
Badri Roysam
Rensselaer Polytechnic Institute, Troy, New York
12180.

2
Recap Blob Segmentation
3
Recap Four Ideas for Blob Segmentation

Indeed, there are lots of ideas out there!
Idea 1
Use other algorithms instead of watershed (e.g.,
clustering)
Idea 2
Touching cells have a membrane between them. If
the membrane is labeled fluorescently, we have an
additional cue.
Idea 3
Touching cells often exhibit some edges. The
watershed algorithm does not exploit them.
Idea 4
If the touching cells have shapes that can be
modeled, we can exploit that information to
improve object separation

4
Recap Features of Connected Components

Area
Feret box, and minimum enclosing rectangle
Diameter
Centroid
Convexity
Radius Circularity
Shape complexity
Boundary curvature and special points

5
Recap

Combining Ideas
When we encounter a new application, a
combination of these ideas can be used
For highest performance, object modeling is
essential
Todays discussion
This topic continues to evolve
Of course, new ideas and novel combinations of
old ones continue to emerge
E.g., use multiple object models to handle
diversity of cell types
Today
Good and bad features of objects
Performance Evaluation Validation

6
The Feature Selection Problem

The features that we have studied are only a
subset of the many that can be defined
Its fun to invent new features, but theres a
caveat to consider
If we consider too many features
High-dimensional space,
need too many examples to estimate model
parameters
How much accuracy do we really need?
The curse of dimensionality
They may not have enough additional
discriminatory value
Computationally Expensive

Covers Inequality
7
Example Cervical Smears
8
Features of Nuclei in Cervical Smears
Absorption Features
Clump
OD
Energy
Energy
mean
1
2
I
I
Corr
Corr
mea
mean
1
2
OD
var
I
Homog
Homog
norm
2
1
R
I
OD
Entropy
Entropy
max
var
int
1
2
R
Contrast
Contrast
mean
2
1
Area
Bclump
CHomog
Elong
Size Features
Fit
R
var
Texture Features
Tort
Shape Features
9
Good Features

A good feature
Is significantly different for each class
Has a small variance/spread within each class
Maximize the Fisher Discriminant ratio
Is not correlated with another feature in use
Correlated features are redundant, and increase
dimensionality without adding value

Bad
Good
10
Discriminating Ability

We start by examining the discriminating power of
each feature independently
Qualitative Method
Clear separation of classes on a scatter plot or
histogram
Quantitative Method
Start with a LABELED scatter plot
Define two hypotheses
H0 The values of the feature do not differ
significantly
(Null hypothesis)
H1 The values of the features differ
significantly
(Alternative hypothesis)
The term significantly is quantified by a
significance level ?.

11
Gaussian Basics
If we gather up enough numbers together, their
average will tend to be Gaussian distributed.
Falls Rapidly 95 of the samples fall within two
standard deviations!
12
Probabilities Tables
Normalization
To calculate probability that x is in a certain
interval, we need to integrate the Gaussian.
Needed Frequently in statistics. Normalize, and
lookup a table of integrals for N(0,1). Boils
down to
Significance level
Acceptance Interval
Old fashioned Table
13
The Sample Mean Variance are Random Variables
Suppose that they are Gaussian distributed, and
mutually independent
14
Hypothesis Testing
Suppose that we have just two classes for now.
15
Hypothesis Testing
test statistic
16
Significance
Suppose we choose a 95 significance level, then
acceptance interval is
If q falls in the above range, decide H0, else
decide H1
17
Case when variances are unknown
?
We can no longer use the Gaussian table Need to
use the T-distribution table instead. Needs two
numbers to look up
DOF
Equal/unequal variance
In MATLAB, H ttest2(x,y,alpha, tail, vartype)
Significance (typ. 5)
18
Discriminant Functions

A function of the features that allows us to
discriminate between classes
Generalization of the likelihood ratios and
thresholds

Linear Discriminant

-
Sign of the discriminant tells us the decision
19
The Next Step

The features that pass the individual
hypothesis tests could still have correlations
among themselves
Correlation implies redundancy, and wasted
dimensions
Procedure
Pick the single best feature
Try all remaining chosen features one at a time,
and add the one that gives the best improvement
Repeat until
The last added feature does not add enough
improvement to justify an extra dimension

20
Stepwise Discriminant Analysis

We can come up with a selection method that goes
the other way
Start off with all features
Remove one feature at a time
Continue until performance is still acceptable
Stepwise discriminant analysis is a method that
combined top-down and bottom-up approaches
Generally, not worth writing our own code
Better to use commercial packages
The above approaches are still sub-optimal
THE VERY BEST approach is to exhaustively
consider all subsets of features and pick the
best one.
This is very expensive. For example,

21
How do we Test our Model?
Why bother?? Because our model should hold up
over images that we havent processed yet!
Image Selected for Feature Computation and
Labeling
Image to be processed automatically
Select a subset of features, build a discriminant
based on them, and evaluate its effectiveness
over the remaining features
A Batch of Images
22
Features of Nuclei in Cervical Smears
Absorption Features
Clump
OD
Energy
Energy
mean
1
2
I
I
Corr
Corr
mea
mean
1
2
OD
var
I
Homog
Homog
norm
2
1
R
I
OD
Entropy
Entropy
max
var
int
1
2
R
Contrast
Contrast
mean
2
1
Area
Bclump
CHomog
Elong
Size Features
Fit
R
var
Texture Features
Tort
Shape Features
23
Feature Sets Compared
Nearest
Wiener Filter
Neighbor
Deblurred
Raw Data
Deblurred
Data
Data
2-D
Features
3-D
Features
24
Classification Results withLinear Discriminant
Classifier
86
3-D
2-D
Wiener
Wiener
85
Filter
Filter
84
3-D
Nearest
Neighbor
83
3-D
2-D
Nearest
Percent Correct
82
Neighbor
2-D
81
80
79
78
Features Used
25
Stepwise Linear Discriminant Analysis Results
Rank
2-D
2-D
2-D
3-D
3-D Nearest
3-D Wiener
Nearest
Wiener
Neighbor
Neighbor
1
R
R
I
R
R
I
mean
mean
norm
mean
mean
norm
2
Corr
Corr
OD
Corr
Corr
OD
2
2
var
2
2
int
3
Clump
Clump
R
Homog
CHomog
OD
mean
1
mean
4
CHomog
Homog
Entropy
CHomog
Clump
CHomog
2
1
Moral Relative importance of features can be
affected by pre-processing
26
Validation and Performance Assessment

Validation
Is the software systems output valid?
Essential to adoption by biologist/clinician
Performance Assessment
Exactly how well is the software working?
Surprisingly tricky issue given the subjectivity
and variability of people
Inter-subject variability
Intra-subject variability

27
Testing Against a Consensus

Ask multiple human observers to manually analyze
the image
From scratch, or
By editing the machine output
Convene a meeting of the human observers
Discuss differences of opinion on each cell
Develop a single consensus opinion
This becomes the Gold Standard
Compare the software output against the gold
standard, and measure concordance

28
Classical Multiple-Observer Validation
Manual Segmentation by observer 1
Manual Segmentation by observer 2
Image
Consensus Building
Measure(s) of automated segmentation performance
Quantitative Comparison
Manual Segmentation by observer N
Automated Segmentation
Appropriate for validating novel algorithms
29
Things that commonly go wrong

Poor data quality
Damaged specimen
Mis-shapen objects
Fragments
Poor image quality
Noise
Spectral bleed-through
Partially-imaged nuclei
Types of segmentation errors
Miss
Inaccurate boundary
False segmentation
Under segmentation
Over segmentation
Separation errors

Go back to the microscope if at all possible
30
Handling Partial Objects

Usually, they need to be deleted based on their
features
Location (close to border)
Size (less than modeled value)
Brick Rule
Define an interior sub-volume in the image
Only accept cells that are wholly contained in it

31
Outlier Detection
Outliers are good candidates for further
inspection
32
Color-codes for highlighting errors
Any measure of the quality of fit to the object
model p(X) can serve as a tool for highlighting
errors Red potentially awful Yellow
questionable Green okay
33
Explanatory Display Coding

Make it easy for user to separate unhandled
errors from handled errors
One idea is to put mini explanation codes
Display detailed explanations when a user clicks
on a cell or rests the mouse
Keep a record trail of all operations that led
to each object

34
Object Separation Error Example
Gallery view indicates 3 objects
Split Error
Split Cells
35
Editing the Output
Add Object

Split/Merge

Dilate/Shrink
36
Edit-Based Validation Protocol
Inspect and Edit by observer 1
Supervisory Inspect Edit
Automated Segmentation
Image
Verified Corrected Segmentation
Record of edits
Record of edits
Edits not made are implicitly interpreted as
correct results
Compute Statistics on Recorded Edit Operations
Measure(s) of automated segmentation performance

Much less effort compared to multi-observer
validation
Subtly different from multi-observer validation,
but a good approximation
Appropriate for mature algorithms in routine usage

37
Algorithm to add an object

Seeded Region Growing
User clicks on a point on the object to be added
Initialize a connected component with this point
Examine each neighboring pixel
If the intensity is within X of starting point,
include that in the connected component
X tolerance set by user
Other criteria can be included
Stop when there are no more points to add
Flexibility in designing stopping criteria!

38
The need to record edits

Often, cell segmentation is performed in a
pharmaceutical and/or legal situation
Much at stake, need protection from cheating and
carelessness!
The Food and Drug Administration has laws and
guidelines, generally called Good Laboratory
Practices (GLP)
Bottom Line
When an edit is made, save the original data, and
allow rollback (undo)
Record the time stamp, and identity of the person
making the edit, and an explanatory note for
inspector

39
Edits are Valuable!

The edit rate is a direct indicator of software
performance
Basis for edit-based validation
The types of edits indicate the most common types
of errors being made by the software
Basis for software revision
They also indicate the kinds of images and
objects for which errors are occurring
Sometimes, a good basis for improving specimen
preparation and imaging steps

40
Summary