Revisiting Output Coding for Sequential Supervised Learning - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Revisiting Output Coding for Sequential Supervised Learning

Description:

Revisiting Output Coding for Sequential Supervised Learning. Guohua Hao & Alan Fern. School of Electrical Engineering and Computer Science. Oregon State University ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 13
Provided by: webEngrOr
Category:

less

Transcript and Presenter's Notes

Title: Revisiting Output Coding for Sequential Supervised Learning


1
Revisiting Output Coding for Sequential
Supervised Learning
  • Guohua Hao Alan Fern
  • School of Electrical Engineering and Computer
    Science
  • Oregon State University
  • Corvallis, OR, U.S.A.

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAAA
2
Scalability in CRF Training
  • Linear Chain CRF model
  • Inference in Training
  • partition function forward-backward
    algorithm
  • Maximizing over label sequences Viterbi
    algorithm
  • Complexity of both
  • Repeated inference in training
  • Computationally demanding
  • Can not scale to large label sets

yt-1
yt1
yt
Xt-1
Xt1
Xt
3
Recent Work of Focus
  • Sequential Error Correcting Output Coding (SECOC)
  • Error Correcting Output Coding (ECOC)

4
  • Extension to CRF model

yt-1
yt
yt1
yt-1k
ytk
yt1k
xt-1
xt
xt1
  • Decoding

yt-11
yt1
yt11
yt-1n
ytn
yt1n
5
Representational Capacity of SECOC
  • Intuitively, it feels that training each binary
    CRF independently will not be able to capture
    rich transition structure
  • Counter-example to independent training
  • Our hypothesis when the transition structure is
    critical, independent training will not do as well

1
Y 1 2 3 1 2 3 1 b1(Y) 1 0
0 1 0 0 1 b1(Y) 1 0 1 0 1 0
0 b2(Y) 0 1 0 0 1 0 0 b3(Y) 0 0
1 0 0 1 0
3
2
6
Our MethodCascaded SECOC
  • Help capture the transition structure
  • For problems where a transition model is
    critical, we hope to see cascade training
    outperform independent training
  • For problem where a observation model is more
    informative but the sliding window is small.
    Large sliding window will dominate the effect of
    cascade training

Previous binary predictions
7
Experimental Results
  • Synthetic Data Sets
  • Generation by HMM
  • Transition Data Set
  • Both Data Set
  • Base CRF training algorithms
  • Gradient Tree Boosting (GTB)
  • Voted Perceptron (VP)
  • Methods for comparison
  • iid-- Non sequential ECOC
  • i-SECOC--Independent SECOC
  • c-SECOC (h)--Cascaded SECOC w/ history length h
  • Beam search

8
  • Nettalk Data Set (134 labels)

9
  • Noun Phrase Chunking (NPC) (121 labels)
  • Synthetic Data Sets (40 labels)

10
  • Comparing to Beam Search

11
Summary
  • i-SECOC can perform poorly when explicitly
    capturing complex transition models is critical
  • c-SECOC can improve accuracy in such situations
    by using cascade features
  • Performance of c-SECOC can depends strongly on
    the base CRF algorithm Algorithms capable of
    capturing complex (non-linear) feature
    interactions are preferred
  • When using less powerful base CRF learning
    algorithms, other approaches (e.g. beam search)
    can outperform c-SECOC

12
Future Directions
  • Efficient validation procedure for selecting
    cascade history length
  • Incremental generation of code words
  • Wide comparison of methods for dealing with large
    label ses

Acknowledgements
We thank John Langford for discussion of the
counter example to independent SECOC and Thomas
Dietterich for his support. This work was
supported by NSF grant IIS-0307592.
Write a Comment
User Comments (0)
About PowerShow.com