An Overview on Semi-Supervised Learning Methods - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

An Overview on Semi-Supervised Learning Methods

Description:

Note: Citations omitted here (given in. my literature review) Semi-Supervised Learning ... Homotopy Continuation (Corduneanu etal.) Just like in Supervised Learning: ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 16
Provided by: Matthia56
Category:

less

Transcript and Presenter's Notes

Title: An Overview on Semi-Supervised Learning Methods


1
An Overview onSemi-Supervised LearningMethods
  • Matthias SeegerMPI for Biological Cybernetics
  • Tuebingen, Germany

2
Overview
  • The SSL Problem
  • Paradigms for SSL. Examples
  • The Importance ofInput-dependent Regularization
  • Note Citations omitted here (given inmy
    literature review)

3
Semi-Supervised Learning
  • SSL is Supervised Learning...
  • Goal Estimate P(yx) from Labeled DataDl
    (xi,yi)
  • But Additional Source tells about P(x)(e.g.,
    Unlabeled Data Duxj)

The Interesting Case
4
Obvious Baseline Methods
The Goal of SSL is To Do Better Not Uniformly
and always(No Free Lunch and yes (of course)
Unlabeled data can hurt) But (as always) If our
modelling and algorithmic efforts reflecttrue
problem characteristics
  • Do not use info about P(x)? Supervised Learning
  • Fit a Mixture Modelusing Unsupervised
    Learning, thenlabel up components using yi

5
The Generative Paradigm
  • Model Class Distributions and
  • Implies model for P(yx)and for P(x)

6
The Joint Likelihood
  • Natural Criterion in this context
  • Maximize using EM (idea as old as EM)
  • Early and recent theoretical work onasymptotic
    variance
  • Advantage Easy to implement forstandard mixture
    model setups

7
Drawbacks of Generative SSL
  • Choice of source weighting l crucial
  • Cross-Validation fails for small n
  • Homotopy Continuation (Corduneanu etal.)
  • Just like in Supervised Learning
  • Model for P(yx) specified indirectly
  • Fitting not primarily concerned with
    P(yx).Also Have to represent P(x) generally
    wellNot just aspects which help with P(yx).

8
The Diagnostic Paradigm
  • Model P(yx,q) and P(xm)directly
  • But Since q,m areindependent a priori,q does
    not depend on m, given data? Knowledge of m does
    not influence P(yx) prediction in a
    probabilistic setup!

9
What To Do About It
  • Non-probabilistic diagnostic techniques
  • Replace expected lossbyTong, Koller Chapelle
    etal.? Very limited effect if n small
  • Some old work (eg., Anderson)
  • Drop the prior independence of q,m?
    Input-dependent Regularization

10
Input-Dependent Regularization
q
  • Conditional priors P(qm)make P(yx)
    estimationdependent on P(x),
  • Now, unlabeled data can really help...
  • And can hurt for the same reason!

11
The Cluster Assumption (CA)
  • Empirical Observation Clustering of data xj
    w.r.t. sensible distance / features often
    fairly compatible with class regions
  • Weaker Class regions do not tend to cut
    high-volume regions of P(x)
  • Why? Ask Philosophers! My guessSelection bias
    for features/distance

No Matter Why Many SSL Methods implement theCA
and work fine in practice
12
Examples For IDR Using CA
  • Label Propagation, Gaussian Random Fields
    Regularization depends on graph structure which
    is built from all xj? More smoothness in
    regions of high connectivity / affinity
    flows
  • Cluster kernels for SVM (Chapelle etal.)
  • Information Regularization(Corduneanu, Jaakkola)

13
More Examples for IDR
  • Some methods do IDR, but implement the CA only in
    special cases
  • Fisher Kernels (Jaakkola etal.)Kernel from
    Fisher features? Automatic feature induction
    from P(x) model
  • Co-Training (Blum, Mitchell)Consistency across
    diff. views (features)

14
Is SSL Always Generative?
  • Wait We have to model P(x) somehow.Is this not
    always generative then? ... No!
  • Generative Model P(xy) fairly directly, P(yx)
    model and effect of P(x) are implicit
  • Diagnostic IDR
  • Direct model for P(yx), more flexibility
  • Influence of P(x) knowledge on P(yx) prediction
    directly controlled, eg. through CA? Model for
    P(x) can be much less elaborate

15
Conclusions
  • Given taxonomy for probabilistic approaches to
    SSL
  • Illustrated paradigms by examples from literature
  • Tried to clarify some points which have led to
    confusions in the past
Write a Comment
User Comments (0)
About PowerShow.com