Title: Style and Adaptation
1Style and Adaptation
- Harsha Veeramachaneni
- Automated Reasoning Systems Division
- IRST, Trento, Italy
2Outline
- Introduction
- Style-conscious field classification
- Model for continuously distributed styles
- Adaptation
- Conclusion
3Terminology
- The field has a common source
4Context
- Context Inter-pattern statistical dependence in
- pattern fields
- Linguistic context Class dependence
? (statistically more likely)
5Style context
6Example multiple writers
7Style context
- Style context A type of Inter-pattern feature
dependence - i.e., in a field the renderings of singlet
patterns in the field are not independent of one
another -
- Present due to multiplicity of dissimilar sources
- Can be exploited to reduce inter-source confusions
8Styles in other domains
9Utilizing style context
10Next
- Introduction
- Style-conscious field classification
- Model for continuously distributed styles
- Adaptation
- Application
- Conclusion
11Desirable properties of style-conscious
classifiers
- Should be accurately trainable with readily
available training data with source and class
labels - But labeled training fields of all lengths are
difficult to obtain - Should approach the average intra-style accuracy
asymptotically with field length - Should be computationally feasible
12Main assumptions
- Test fields have style-consistency but source
identity unknown - The context in the class labels of the patterns
is independent of the source -
- Within a source the singlet patterns in a field
are independent of one another (given field-class)
13Style-conscious classification
Training samples for all 1010 field-classes
Training
Choose one of 1010 field-classes
Field classification result
518 276 9999
14Method 2 Font identification
Training samples for all 10 classes for all fonts
Training
Font-specific singlet classifiers
Field classification result
518 276 9999
Font recognition
15Method 3 Style conscious quadratic field
classifier
- Two singlet classes A and B
- Two writers Sam and Joe
- One feature per singlet
16Terminology cont
Test field written by one of the writers
Feature extraction
x1
x2
Singlet feature vectors
x1 x2
y
Field feature vector
17Field feature distributions
18(No Transcript)
19Quadratic field classifier
Field-class mean
Field-class covariance matrix
20Means and covariance matrices
- Field-class means are singlet-class means
concatenated - Diagonal blocks in field covariance matrices are
singlet covariance matrices - Means and covariance matrices for any field
length can be constructed from a few basic blocks
21Cross-covariance matrices
- The cross-covariance matrices depend only
- on the source-conditional class means
- Can be estimated accurately
- Style context - due to variation of class means
across sources
22Next
- Introduction
- Style-conscious field classification
- Model for continuously distributed styles
- Adaptation
- Application
- Conclusion
23Model for continuously distributed styles
Under such a model our quadratic field classifier
is optimal
24A model for continuously distributed styles
- A source is identified by its singlet class means
- The sources are Gaussian distributed
- The class means are correlated
- (inter-class
correlation) - The singlet covariance matrices are
- style independent
25Experimental results Handwritten numerals
- Database NIST special database 19 (handwritten
digits with writer labels) - 10 samples per digit per writer
- Training set 494 writers 54193 samples
- Test set 499 new writers 54481 samples
- Features 100 directional chain-code features
26Error rates style-conscious quadratic classifier
27Provable Properties
- Inter class style gt Intra class style
- Asymptotic error rate (with field length)
within style error rate - Order independent classification
- Error rate as a function of field length can be
bounded - for a two class problem
28Example
29A Breather
The hidden style variable s shadow / no shadow
E.H. Adelson http//web.mit.edu/persci/people/adel
son/checkershadow_illusion.html
30Styles ?
The hidden style variable s shadow / no shadow
E.H. Adelson http//web.mit.edu/persci/people/adel
son/checkershadow_illusion.html
31Next
- Introduction
- Style-conscious field classification
- Model for continuously distributed styles
- Adaptation
- Application
- Conclusion
32Decision Directed Adaptation
- Nagy, Shelton, Self Corrective Character
Recognition System, IEEE Trans Info. Theory,
April 1966
33Decision Directed Adaptation
Training set
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
?
?
B
B
Classifier learnt on the training set
B
?
?
?
?
?
B
?
?
?
Feature 2
B
B
?
?
?
B
B
B
?
?
?
?
?
?
B
?
?
?
?
B
B
?
?
B
B
B
?
?
?
?
?
?
?
?
?
Test set
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Feature 1
34Decision Directed Adaptation
- If the classifier is a minimum distance to
centroid classifier ltgt k-means - Errors in the test set labels not uniform
- Implies that if we have an infinite training set
and infinite test set from the same distribution
the classification still changes -
35Adaptation to exploit style context
- We have a classifier trained on several styles
- The test set is from one (unknown) style
- The classifier adapts to the test style
- by estimating the style parameters from the test
set - Reclassifying the test set with the new
parameters
36Adaptation Only class means
- Model the test data as mixture of Gaussians
with unknown means - Assume that a priori class probabilities and
class-conditional covariance matrices are known - EM algorithm for Maximum likelihood (ML)
estimation of class means - But from the model for continuously
distributed styles, we have a distribution of
class means
37Gaussian distributed styles
38Adaptation Class means
- We can use a Bayesian EM algorithm for Maximum A
Posteriori (MAP) estimation of component means
for a Gaussian mixture - When size of test set (field length) increases,
MAP estimates become identical to ML estimates
39How does MAP adaptation accomplish
style-conscious classification?
Training samples from many different writers
Training
classification result
Style-specific singlet classifier
Test set
MAP Estimation of class means
MAP adaptation is analogous to font-identification
, although for continuously distributed styles
40Example
41Decision boundaries in field feature space
Optimal field classifier (optimized for character
error rate)
Optimal field classifier (optimized for field
error rate)
42Decision boundaries contd.
MAP adaptive classifier (inter class style
context)
MAP adaptive classifier (intra class style
context)
ML adaptive classifier
43Some results on the MNIST data
Error rate without adaptation 2.12
Error rates for adaptive classification as a
function of field length
44Small-sample adaptation Two types of
non-representative training sets
- We adapt not only when the training set is
different from the test set, but also when it is
small. - Semi-supervised classification
- Claim No different than style adaptation
45Small-sample adaptation
- We perform style constrained classification of
the test set under the posterior distribution of
all possible problems as styles.
46Conclusions
- Style context due to commonality of source of
fields . - Modeling style context with second-order
statistics - (pair wise correlations) allows efficient
estimation of model parameters - Style-conscious classifiers approach
style-specific accuracy with increasing field
length. - Adaptation is another means for exploiting style
context. - MAP adaptation can be viewed as style-first
recognition. - Adaptation and semi-supervised classification can
be studied under the umbrella of style.
47Thank you
48Adaptation by clustering
- Labeling based on linguistic constraints
- Give the clusters dummy labels and solve a
substitution cipher -
- Labeling based on training set
- Label each cluster by the closest class in
the training set
49Common method for adaptation Clustering
- Cluster the patterns in the test data
- Each cluster is a different class
- Label the clusters either
- Based on linguistic constraints
- Based on proximity to training set
50EM for clustering
- Expectation-Maximization (EM) algorithm
- Iterative method for parameter estimation
- The test data is modeled as a mixture of
parameterized distributions - We can use EM for mixture identification
- (i.e., to estimate the parameters)
- Traditionally, EM algorithm is used to obtain the
maximum-likelihood estimates of the parameters
51Example
52Example
53Adaptation and Style
- What is the cause of non-representative training
data for our problem? - Training data from large number of styles and
test data from one style - Adaptation is another means to exploit style
consistency
54Adaptation and Style
55Style-conscious classification - Recap
- Short fields (L 2, 3, )
- Few discrete styles
- (e.g., printed text in a few different fonts)
- Multi-modal field classifier (one mode per style)
- Style conscious quadratic field classifier
- Continuous styles
- (e.g., different writers)
- Style conscious quadratic field classifier
- Long fields (L 10, 20, 30, ...)
- Adaptation
- ML (when styles are discrete or
- when fields are long enough to make ML
and MAP the same) - MAP
56Next
- Introduction
- Style-conscious field classification
- Model for continuously distributed styles
- Adaptation
- Application
- Conclusion
57Error rates ML adaptation
Singlet error rates Test fields 1 test writer
at a time
Error rate
Method
1.40
No adaptation
0.88
Mean adaptation
0.83
Mean Covariance adaptation
Out of 499 test writers 140 improved 30
worsened Max improvement for any writer 20
samples Max worsening
3 samples
58Next
- Introduction
- Style-conscious field classification
- Model for continuously distributed styles
- Adaptation
- Application
- Conclusion
59Directions for future research
- Further computational improvements for the
quadratic field classifier - Quantification of the amount of useful style
- Methods for multiple initializations of the EM
algorithm - Formal study of adaptation encompassing the EM
algorithm (initialization and convergence to
local optima) - Other application areas
60Method 1 Decision-directed adaptation
- Observation Some errors in the training set can
be tolerated - Assume classifier trained on the training set
has low error rate - Assume test set large enough
- Method Classify the test set and retrain the
classifier with the labeled test set - Disadvantages
- Errors in the test set labels not uniform
- No proof that it will improve accuracy
- Training set not a fixed point
61Bounded search algorithm
- A fast implementation of the quadratic field
classifier - Greatly reduces the number of field quadratic
discriminant values that are computed for a field
class - We show
- The field discriminant value for a field-class
can be bounded from below by a function of
quantities computed by the quadratic singlet
classifier
62Bounded search algorithm
63Styles in various classification problems
64Experimental results Machine-printed numerals
- Machine-printed numerals in five fonts
- Training set 250 samples/digit/font 12500
samples - Test set 250 samples/digit/font 12500 new
samples - Features 64 directional chain-code features,
of which 8 principal component features were used
65Error counts style-conscious classification
Number of errors for 5 test fonts (out of 2,500
samples each)
T Times Roman V Verdana
A Avant Garde B Bookman Old Style H
Helvetica
66Field reject-error trade-off style-conscious
quadratic classifier