Style and Adaptation - PowerPoint PPT Presentation

1 / 66

About This Presentation

Title:

Style and Adaptation

Description:

Style and Adaptation. Harsha Veeramachaneni. Automated Reasoning Systems ... Seismology. Age/gender of writer. Text classification. Land quality, local weather ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 67

Provided by: rpi99

Category:

more less

Transcript and Presenter's Notes

Title: Style and Adaptation

1
Style and Adaptation

Harsha Veeramachaneni
Automated Reasoning Systems Division
IRST, Trento, Italy

2
Outline

Introduction
Style-conscious field classification
Model for continuously distributed styles
Adaptation
Conclusion

3
Terminology

The field has a common source

4
Context

Context Inter-pattern statistical dependence in
pattern fields
Linguistic context Class dependence

? (statistically more likely)
5
Style context
6
Example multiple writers
7
Style context

Style context A type of Inter-pattern feature
dependence
i.e., in a field the renderings of singlet
patterns in the field are not independent of one
another
Present due to multiplicity of dissimilar sources
Can be exploited to reduce inter-source confusions

8
Styles in other domains
9
Utilizing style context
10
Next

Introduction
Style-conscious field classification
Model for continuously distributed styles
Adaptation
Application
Conclusion

11
Desirable properties of style-conscious
classifiers

Should be accurately trainable with readily
available training data with source and class
labels
But labeled training fields of all lengths are
difficult to obtain
Should approach the average intra-style accuracy
asymptotically with field length
Should be computationally feasible

12
Main assumptions

Test fields have style-consistency but source
identity unknown
The context in the class labels of the patterns
is independent of the source
Within a source the singlet patterns in a field
are independent of one another (given field-class)

13
Style-conscious classification
Training samples for all 1010 field-classes
Training
Choose one of 1010 field-classes
Field classification result
518 276 9999
14
Method 2 Font identification
Training samples for all 10 classes for all fonts
Training
Font-specific singlet classifiers

Field classification result
518 276 9999
Font recognition
15
Method 3 Style conscious quadratic field
classifier

Two singlet classes A and B
Two writers Sam and Joe
One feature per singlet

16
Terminology cont
Test field written by one of the writers
Feature extraction
x1
x2
Singlet feature vectors
x1 x2
y
Field feature vector
17
Field feature distributions
18
(No Transcript)
19
Quadratic field classifier
Field-class mean
Field-class covariance matrix
20
Means and covariance matrices

Field-class means are singlet-class means
concatenated
Diagonal blocks in field covariance matrices are
singlet covariance matrices
Means and covariance matrices for any field
length can be constructed from a few basic blocks

21
Cross-covariance matrices

The cross-covariance matrices depend only
on the source-conditional class means
Can be estimated accurately
Style context - due to variation of class means
across sources

22
Next

Introduction
Style-conscious field classification
Model for continuously distributed styles
Adaptation
Application
Conclusion

23
Model for continuously distributed styles
Under such a model our quadratic field classifier
is optimal
24
A model for continuously distributed styles

A source is identified by its singlet class means
The sources are Gaussian distributed
The class means are correlated
(inter-class
correlation)
The singlet covariance matrices are
style independent

25
Experimental results Handwritten numerals

Database NIST special database 19 (handwritten
digits with writer labels)
10 samples per digit per writer
Training set 494 writers 54193 samples
Test set 499 new writers 54481 samples
Features 100 directional chain-code features

26
Error rates style-conscious quadratic classifier
27
Provable Properties

Inter class style gt Intra class style
Asymptotic error rate (with field length)
within style error rate
Order independent classification
Error rate as a function of field length can be
bounded
for a two class problem

28
Example
29
A Breather
The hidden style variable s shadow / no shadow
E.H. Adelson http//web.mit.edu/persci/people/adel
son/checkershadow_illusion.html
30
Styles ?
The hidden style variable s shadow / no shadow
E.H. Adelson http//web.mit.edu/persci/people/adel
son/checkershadow_illusion.html
31
Next

Introduction
Style-conscious field classification
Model for continuously distributed styles
Adaptation
Application
Conclusion

32
Decision Directed Adaptation

Nagy, Shelton, Self Corrective Character
Recognition System, IEEE Trans Info. Theory,
April 1966

33
Decision Directed Adaptation

Training set
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
?
?
B
B
Classifier learnt on the training set
B
?
?
?
?
?
B
?
?
?
Feature 2
B
B
?
?
?
B
B
B
?
?
?
?
?
?
B
?
?
?
?
B
B
?
?
B
B
B
?
?
?
?
?
?
?
?
?
Test set
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Feature 1
34
Decision Directed Adaptation

If the classifier is a minimum distance to
centroid classifier ltgt k-means
Errors in the test set labels not uniform
Implies that if we have an infinite training set
and infinite test set from the same distribution
the classification still changes

35
Adaptation to exploit style context

We have a classifier trained on several styles
The test set is from one (unknown) style
The classifier adapts to the test style
by estimating the style parameters from the test
set
Reclassifying the test set with the new
parameters

36
Adaptation Only class means

Model the test data as mixture of Gaussians
with unknown means
Assume that a priori class probabilities and
class-conditional covariance matrices are known
EM algorithm for Maximum likelihood (ML)
estimation of class means
But from the model for continuously
distributed styles, we have a distribution of
class means

37
Gaussian distributed styles
38
Adaptation Class means

We can use a Bayesian EM algorithm for Maximum A
Posteriori (MAP) estimation of component means
for a Gaussian mixture
When size of test set (field length) increases,
MAP estimates become identical to ML estimates

39
How does MAP adaptation accomplish
style-conscious classification?
Training samples from many different writers
Training
classification result
Style-specific singlet classifier
Test set
MAP Estimation of class means
MAP adaptation is analogous to font-identification
, although for continuously distributed styles
40
Example
41
Decision boundaries in field feature space
Optimal field classifier (optimized for character
error rate)
Optimal field classifier (optimized for field
error rate)
42
Decision boundaries contd.
MAP adaptive classifier (inter class style
context)
MAP adaptive classifier (intra class style
context)
ML adaptive classifier
43
Some results on the MNIST data
Error rate without adaptation 2.12
Error rates for adaptive classification as a
function of field length
44
Small-sample adaptation Two types of
non-representative training sets

We adapt not only when the training set is
different from the test set, but also when it is
small.
Semi-supervised classification
Claim No different than style adaptation

45
Small-sample adaptation

We perform style constrained classification of
the test set under the posterior distribution of
all possible problems as styles.

46
Conclusions

Style context due to commonality of source of
fields .
Modeling style context with second-order
statistics
(pair wise correlations) allows efficient
estimation of model parameters
Style-conscious classifiers approach
style-specific accuracy with increasing field
length.
Adaptation is another means for exploiting style
context.
MAP adaptation can be viewed as style-first
recognition.
Adaptation and semi-supervised classification can
be studied under the umbrella of style.

47
Thank you
48
Adaptation by clustering

Labeling based on linguistic constraints
Give the clusters dummy labels and solve a
substitution cipher
Labeling based on training set
Label each cluster by the closest class in
the training set

49
Common method for adaptation Clustering

Cluster the patterns in the test data
Each cluster is a different class
Label the clusters either
Based on linguistic constraints
Based on proximity to training set

50
EM for clustering

Expectation-Maximization (EM) algorithm
Iterative method for parameter estimation
The test data is modeled as a mixture of
parameterized distributions
We can use EM for mixture identification
(i.e., to estimate the parameters)
Traditionally, EM algorithm is used to obtain the
maximum-likelihood estimates of the parameters

51
Example
52
Example
53
Adaptation and Style

What is the cause of non-representative training
data for our problem?
Training data from large number of styles and
test data from one style
Adaptation is another means to exploit style
consistency

54
Adaptation and Style
55
Style-conscious classification - Recap

Short fields (L 2, 3, )
Few discrete styles
(e.g., printed text in a few different fonts)
Multi-modal field classifier (one mode per style)
Style conscious quadratic field classifier
Continuous styles
(e.g., different writers)
Style conscious quadratic field classifier
Long fields (L 10, 20, 30, ...)
Adaptation
ML (when styles are discrete or
when fields are long enough to make ML
and MAP the same)
MAP

56
Next

Introduction
Style-conscious field classification
Model for continuously distributed styles
Adaptation
Application
Conclusion

57
Error rates ML adaptation
Singlet error rates Test fields 1 test writer
at a time
Error rate
Method
1.40
No adaptation
0.88
Mean adaptation
0.83
Mean Covariance adaptation
Out of 499 test writers 140 improved 30
worsened Max improvement for any writer 20
samples Max worsening
3 samples
58
Next

Introduction
Style-conscious field classification
Model for continuously distributed styles
Adaptation
Application
Conclusion

59
Directions for future research

Further computational improvements for the
quadratic field classifier
Quantification of the amount of useful style
Methods for multiple initializations of the EM
algorithm
Formal study of adaptation encompassing the EM
algorithm (initialization and convergence to
local optima)
Other application areas

60
Method 1 Decision-directed adaptation

Observation Some errors in the training set can
be tolerated
Assume classifier trained on the training set
has low error rate
Assume test set large enough
Method Classify the test set and retrain the
classifier with the labeled test set
Disadvantages
Errors in the test set labels not uniform
No proof that it will improve accuracy
Training set not a fixed point

61
Bounded search algorithm

A fast implementation of the quadratic field
classifier
Greatly reduces the number of field quadratic
discriminant values that are computed for a field
class
We show
The field discriminant value for a field-class
can be bounded from below by a function of
quantities computed by the quadratic singlet
classifier

62
Bounded search algorithm
63
Styles in various classification problems
64
Experimental results Machine-printed numerals

Machine-printed numerals in five fonts
Training set 250 samples/digit/font 12500
samples
Test set 250 samples/digit/font 12500 new
samples
Features 64 directional chain-code features,
of which 8 principal component features were used

65
Error counts style-conscious classification
Number of errors for 5 test fonts (out of 2,500
samples each)
T Times Roman V Verdana
A Avant Garde B Bookman Old Style H
Helvetica
66
Field reject-error trade-off style-conscious
quadratic classifier

Write a Comment

User Comments (0)