Revisiting Output Coding for Sequential Supervised Learning

About This Presentation

Title:

Revisiting Output Coding for Sequential Supervised Learning

Description:

Revisiting Output Coding for Sequential Supervised Learning. Guohua Hao & Alan Fern. School of Electrical Engineering and Computer Science. Oregon State University ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 13

Provided by: webEngrOr

Category:

more less

Transcript and Presenter's Notes

Title: Revisiting Output Coding for Sequential Supervised Learning

1
Revisiting Output Coding for Sequential
Supervised Learning

Guohua Hao Alan Fern
School of Electrical Engineering and Computer
Science
Oregon State University
Corvallis, OR, U.S.A.

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAAA
2
Scalability in CRF Training

Linear Chain CRF model
Inference in Training
partition function forward-backward
algorithm
Maximizing over label sequences Viterbi
algorithm
Complexity of both
Repeated inference in training
Computationally demanding
Can not scale to large label sets

yt-1
yt1
yt
Xt-1
Xt1
Xt
3
Recent Work of Focus

Sequential Error Correcting Output Coding (SECOC)
Error Correcting Output Coding (ECOC)

Extension to CRF model

yt-1
yt
yt1
yt-1k
ytk
yt1k
xt-1
xt
xt1

Decoding

yt-11
yt1
yt11
yt-1n
ytn
yt1n
5
Representational Capacity of SECOC

Intuitively, it feels that training each binary
CRF independently will not be able to capture
rich transition structure
Counter-example to independent training
Our hypothesis when the transition structure is
critical, independent training will not do as well

1
Y 1 2 3 1 2 3 1 b1(Y) 1 0
0 1 0 0 1 b1(Y) 1 0 1 0 1 0
0 b2(Y) 0 1 0 0 1 0 0 b3(Y) 0 0
1 0 0 1 0
3
2
6
Our MethodCascaded SECOC

Help capture the transition structure
For problems where a transition model is
critical, we hope to see cascade training
outperform independent training
For problem where a observation model is more
informative but the sliding window is small.
Large sliding window will dominate the effect of
cascade training

Previous binary predictions
7
Experimental Results

Synthetic Data Sets
Generation by HMM
Transition Data Set
Both Data Set

Base CRF training algorithms
Gradient Tree Boosting (GTB)
Voted Perceptron (VP)
Methods for comparison
iid-- Non sequential ECOC
i-SECOC--Independent SECOC
c-SECOC (h)--Cascaded SECOC w/ history length h
Beam search

Nettalk Data Set (134 labels)

Noun Phrase Chunking (NPC) (121 labels)

Synthetic Data Sets (40 labels)

Comparing to Beam Search

11
Summary

i-SECOC can perform poorly when explicitly
capturing complex transition models is critical
c-SECOC can improve accuracy in such situations
by using cascade features
Performance of c-SECOC can depends strongly on
the base CRF algorithm Algorithms capable of
capturing complex (non-linear) feature
interactions are preferred
When using less powerful base CRF learning
algorithms, other approaches (e.g. beam search)
can outperform c-SECOC

12
Future Directions

Efficient validation procedure for selecting
cascade history length
Incremental generation of code words
Wide comparison of methods for dealing with large
label ses

Acknowledgements
We thank John Langford for discussion of the
counter example to independent SECOC and Thomas
Dietterich for his support. This work was
supported by NSF grant IIS-0307592.

Write a Comment

User Comments (0)