Co Training

About This Presentation

Title:

Co Training

Description:

X1,X2 re two different views of same example. Label l = f1(X1) = f2(X2) = f(X) ... some instances, we can use X1 to construct X2 for rest of the instances using ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 9

Provided by: Shankar4

Learn more at: https://www.public.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Co Training

1
Co Training

Presented by Shankar B S
DMML Lab
05-11-2007

2
Bootstrapping

Bootstrapping Use initial labeled data to build
a predictive labeling procedure. Use the newly
labeled data to build a new predictive procedure
Example
EM algorithm In each iteration the model
parameters are updated. The model defines a joint
probability distribution on the observed data.
Rule based bootstrapping

3
Co-Training
Two views X1, X2 Two distinct hypothesis
classes H1, H2 consisting of functions predicting
Y from X1 and X2 respectively Bootstrap using
h1?H1, h2?H2 If X1 is conditionally independent
of X2 given Y then given a weak predictor in H1
and given an algorithm which can learn H2 under
random misclassification noise, then it is
possible to learn a good predictor in H2
4
Example

Description of a web page can be partitioned into
Words occurring on that page
Words occurring on the hyperlinks pointing to
that page (Anchor text)
Train two learning algorithms on each view.
Use the predictions of each algorithm on
unlabeled example to enlarge training set of the
other

5
Co-training framework

Instance space X X1X2,
X1,X2 re two different views of same example
Label l f1(X1) f2(X2) f(X)
f1,f2 are target functions, f is combined
target function
C1 and C2 are concept classes defined over X1, X2
f1?C1, f2?C2 f (f1,f2) ?C1C2
Even if C1 and C2 are large concepts with high
complexity, the set of compatible target
functions might be simpler and smaller

6
Co-Training framework

X1 X2 0,1n
C1 C2 Conjunctions over 0,1n
If first coordinate of X1 is known to be 0, then
this gives a negative example of X2
If the distribution has non-zero probability only
on pairs where X1 X2, then no useful information
about f2 can be obtained.
If X2 is conditionally independent of X1 given Y,
then a new random negative example is obtained
which is quite useful.

7
Idea 1 Feature selection with multiple views

As in Co-training suppose we have two views
f1(X1) f2(x2) C
We want to do feature selection on X1,
Using X2 can reduce the number of labeled
instances required
Or
Given a set of labeled instances X2 can be used
to select better set of features

8
Idea 2 Feature expansion

Suppose we have 2 views of same data X1 and X2
and classifier uses combined data set.
If X2 is available only for some instances, we
can use X1 to construct X2 for rest of the
instances using the labeled training data and/or
the unlabeled test data.
Related to missing features problem
EM algorithm
KNN algorithm
Median, Mean etc