Title: Revisiting Output Coding for Sequential Supervised Learning
1Revisiting Output Coding for Sequential
Supervised Learning
- Guohua Hao Alan Fern
- School of Electrical Engineering and Computer
Science - Oregon State University
- Corvallis, OR, U.S.A.
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAAA
2Scalability in CRF Training
- Linear Chain CRF model
- Inference in Training
- partition function forward-backward
algorithm - Maximizing over label sequences Viterbi
algorithm - Complexity of both
- Repeated inference in training
- Computationally demanding
- Can not scale to large label sets
yt-1
yt1
yt
Xt-1
Xt1
Xt
3Recent Work of Focus
- Sequential Error Correcting Output Coding (SECOC)
- Error Correcting Output Coding (ECOC)
4yt-1
yt
yt1
yt-1k
ytk
yt1k
xt-1
xt
xt1
yt-11
yt1
yt11
yt-1n
ytn
yt1n
5Representational Capacity of SECOC
- Intuitively, it feels that training each binary
CRF independently will not be able to capture
rich transition structure - Counter-example to independent training
- Our hypothesis when the transition structure is
critical, independent training will not do as well
1
Y 1 2 3 1 2 3 1 b1(Y) 1 0
0 1 0 0 1 b1(Y) 1 0 1 0 1 0
0 b2(Y) 0 1 0 0 1 0 0 b3(Y) 0 0
1 0 0 1 0
3
2
6Our MethodCascaded SECOC
- Help capture the transition structure
- For problems where a transition model is
critical, we hope to see cascade training
outperform independent training - For problem where a observation model is more
informative but the sliding window is small.
Large sliding window will dominate the effect of
cascade training
Previous binary predictions
7Experimental Results
- Synthetic Data Sets
- Generation by HMM
- Transition Data Set
- Both Data Set
- Base CRF training algorithms
- Gradient Tree Boosting (GTB)
- Voted Perceptron (VP)
- Methods for comparison
- iid-- Non sequential ECOC
- i-SECOC--Independent SECOC
- c-SECOC (h)--Cascaded SECOC w/ history length h
- Beam search
8- Nettalk Data Set (134 labels)
9- Noun Phrase Chunking (NPC) (121 labels)
- Synthetic Data Sets (40 labels)
10 11Summary
- i-SECOC can perform poorly when explicitly
capturing complex transition models is critical - c-SECOC can improve accuracy in such situations
by using cascade features - Performance of c-SECOC can depends strongly on
the base CRF algorithm Algorithms capable of
capturing complex (non-linear) feature
interactions are preferred - When using less powerful base CRF learning
algorithms, other approaches (e.g. beam search)
can outperform c-SECOC
12Future Directions
- Efficient validation procedure for selecting
cascade history length - Incremental generation of code words
- Wide comparison of methods for dealing with large
label ses
Acknowledgements
We thank John Langford for discussion of the
counter example to independent SECOC and Thomas
Dietterich for his support. This work was
supported by NSF grant IIS-0307592.