Co Training - PowerPoint PPT Presentation

About This Presentation
Title:

Co Training

Description:

X1,X2 re two different views of same example. Label l = f1(X1) = f2(X2) = f(X) ... some instances, we can use X1 to construct X2 for rest of the instances using ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 9
Provided by: Shankar4
Category:

less

Transcript and Presenter's Notes

Title: Co Training


1
Co Training
  • Presented by Shankar B S
  • DMML Lab
  • 05-11-2007

2
Bootstrapping
  • Bootstrapping Use initial labeled data to build
    a predictive labeling procedure. Use the newly
    labeled data to build a new predictive procedure
  • Example
  • EM algorithm In each iteration the model
    parameters are updated. The model defines a joint
    probability distribution on the observed data.
  • Rule based bootstrapping

3
Co-Training
Two views X1, X2 Two distinct hypothesis
classes H1, H2 consisting of functions predicting
Y from X1 and X2 respectively Bootstrap using
h1?H1, h2?H2 If X1 is conditionally independent
of X2 given Y then given a weak predictor in H1
and given an algorithm which can learn H2 under
random misclassification noise, then it is
possible to learn a good predictor in H2
4
Example
  • Description of a web page can be partitioned into
  • Words occurring on that page
  • Words occurring on the hyperlinks pointing to
    that page (Anchor text)
  • Train two learning algorithms on each view.
  • Use the predictions of each algorithm on
    unlabeled example to enlarge training set of the
    other

5
Co-training framework
  • Instance space X X1X2,
  • X1,X2 re two different views of same example
  • Label l f1(X1) f2(X2) f(X)
  • f1,f2 are target functions, f is combined
    target function
  • C1 and C2 are concept classes defined over X1, X2
  • f1?C1, f2?C2 f (f1,f2) ?C1C2
  • Even if C1 and C2 are large concepts with high
    complexity, the set of compatible target
    functions might be simpler and smaller

6
Co-Training framework
  • X1 X2 0,1n
  • C1 C2 Conjunctions over 0,1n
  • If first coordinate of X1 is known to be 0, then
    this gives a negative example of X2
  • If the distribution has non-zero probability only
    on pairs where X1 X2, then no useful information
    about f2 can be obtained.
  • If X2 is conditionally independent of X1 given Y,
    then a new random negative example is obtained
    which is quite useful.

7
Idea 1 Feature selection with multiple views
  • As in Co-training suppose we have two views
  • f1(X1) f2(x2) C
  • We want to do feature selection on X1,
  • Using X2 can reduce the number of labeled
    instances required
  • Or
  • Given a set of labeled instances X2 can be used
    to select better set of features

8
Idea 2 Feature expansion
  • Suppose we have 2 views of same data X1 and X2
    and classifier uses combined data set.
  • If X2 is available only for some instances, we
    can use X1 to construct X2 for rest of the
    instances using the labeled training data and/or
    the unlabeled test data.
  • Related to missing features problem
  • EM algorithm
  • KNN algorithm
  • Median, Mean etc
Write a Comment
User Comments (0)
About PowerShow.com