A Comparison of Methods for Transductive Transfer Learning - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

A Comparison of Methods for Transductive Transfer Learning

Description:

What we are able to do: The neuronal cyclin-dependent kinase ... Learning to predict rain in in humid and arid climates. How to maximize F1 (and not accuracy) ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 33
Provided by: andrewarno1
Category:

less

Transcript and Presenter's Notes

Title: A Comparison of Methods for Transductive Transfer Learning


1
A Comparison of Methods for Transductive Transfer
Learning
  • Andrew Arnold
  • Advised by William W. Cohen
  • Machine Learning Department
  • School of Computer Science
  • Carnegie Mellon University
  • May 30, 2007

2
What we are able to do
  • Supervised learning
  • Train on large, labeled data sets drawn from same
    distribution as testing data
  • Well studied problem

Training data
Test
Test
Train
Reversible histone acetylation changes the
chromatin structure and can modulate gene
transcription. Mammalian histone deacetylase 1
(HDAC1)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
3
What were getting better at doing
  • Semi-supervised learning
  • Same as before, but now
  • Add large unlabelled or weakly labeled data sets
    from same domain
  • Zhu 05, Grandvalet 05

Train
Auxiliary (available for training)
Auxiliary
Train
Reversible histone acetylation changes the
chromatin structure and can modulate gene
transcription. Mammalian histone deacetylase 1
(HDAC1)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
4
What were getting better at doing
  • Transductive learning
  • Unlabeled test data is available during training
  • Easier than inductive learning
  • Learning specific predictions rahter than general
    function
  • Joachims 99, 03, Sindhwani 05, Vapnik 98

Train
Both Auxiliary Eventual Test
Auxiliary Test
Train
Reversible histone acetylation changes the
chromatin structure and can modulate gene
transcription. Mammalian histone deacetylase 1
(HDAC1)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
5
What wed like to be able to do
  • Transfer learning (domain adaptation)
  • Leverage large, previously labeled data from a
    related domain
  • Related domain well be training on (with lots of
    data) Source
  • Domain were interested in and will be tested on
    (data scarce) Target
  • Ng 06, Daumé 06, Jiang 06, Blitzer 06,
    Ben-David 07, Thrun 96

Train (source domain E-mail)
Test (target domain IM)
Test (target domain Caption)
Train (source domain Abstract)
Neuronal cyclin-dependent kinase p35/cdk5 (Fig 1,
a) comprises a catalytic subunit (cdk5, left
panel) and an activator subunit (p35, fmi 4)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
6
What wed like to be able to do
  • Transfer learning (multi-task)
  • Same domain, but slightly different task
  • Related task well be training on (with lots of
    data) Source
  • Task were interested in and will be tested on
    (data scarce) Target
  • Ando 05, Sutton 05

Train (source task Names)
Test (target task Pronouns)
Test (target task Action Verbs)
Train (source task Proteins)
Reversible histone acetylation changes the
chromatin structure and can modulate gene
transcription. Mammalian histone deacetylase 1
(HDAC1)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
7
Motivation
  • Why is transfer important?
  • Often we violate non-transfer assumption without
    realizing. How much data is truly identically
    distributed (i.i.d.)?
  • E.g. Different authors, annotators, time periods,
    sources
  • Large amounts of labeled data/trained classifiers
    already exist
  • Why waste data computation?
  • Can learning be made easier by leveraging related
    domains/problems?
  • Life-long learning
  • Why is transduction important?
  • Why solve a harder problem than we need to?
  • Unlabeled data is vast and cheap
  • Are transduction and transfer so different?
  • Can we learn more about one by studying the other?

8
Outline
  • Motivating Problems
  • Supervised learning
  • Semi-supervised learning
  • Transductive learning
  • Transfer learning domain adaptation
  • Transfer learning multi-task
  • Methods
  • Maximum entropy (MaxEnt)
  • Source regularized maximum entropy
  • Feature space expansion
  • Feature selection
  • Feature space transformation
  • Iterative Pseudo Labeling (IPL)
  • Biased thresholding
  • Support Vector Machines (SVMs)
  • Inductive SVM
  • Transductive SVM
  • Experiment

9
Maximum Entropy (MaxEnt)
  • Discriminative model
  • Matches feature expectations of model to data

Conditional likelihood
Regularized optimization
10
Summary of Learning Settings
10
11
Source-regularized MaxEnt
  • Instead of regularizing towards zero
  • Learn model ?s on source data
  • During target training
  • Regularize towards source-trained ?s

Chelba04
12
Feature Space Expansion
  • Add extra degrees of freedom
  • Allow classifier to discern general/specific
    features

Daumé 06, 07
13
Feature selection
  • Emphasize features shared by source and target
    data
  • Minimize different features
  • How to measure?
  • Fisher exact test
  • Is P(feature source) P(feature target) ?
  • If so, shared feature ? keep
  • If not, different feature ? discard

14
Feature Space Transformation
  • Source and target originally independently
    separable
  • Learn transformation, G, to allow joint
    separation

15
Iterative Pseudo Labeling (IPL)
  • Novel algorithm for MaxEnt based transfer
  • Adjust feature values to match feature
    expectation in source and target
  • ? trades off certainty vs adaptativity

16
IPL analysis
Given linear transform
We can express conditional feature expectations
of target data in terms of a transformation of
source
17
Biased Thresholding
  • Different proportions of positive examples
  • Learning to predict rain in in humid and arid
    climates
  • How to maximize F1 (and not accuracy)?
  • Score Cut (s-cut)
  • Select score threshold over ranked train scores
  • Apply to test data
  • Percentage Cut (p-cut)
  • Estimate proportion of positive examples expected
    in target data
  • Set threshold so as to select this amount

18
Support Vector Machines (SVMs)
  • Inductive (standard) SVM
  • Learn separating hyperplane on labeled training
    data. Then evaluate on held-out testing data.
  • Transductive SVM
  • Learn hyperplane in the presence of labeled
    training data AND unlabeled testing data. Use
    distribution of testing points to assist you.
  • Easier to learn particular labels than a whole
    function.
  • More expensive than inductive

19
Transductive vs. Inductive SVM
Joachims 99, 03
20
Domain
21
Data
ltProtnamegtp35lt/Protnamegt/ltProtnamegtcdk5
lt/Protnamegt binds and phosphorylates
ltProtnamegtbeta-cateninlt/Protnamegt and regulates
ltProtnamegtbeta-catenin lt/Protnamegt /
ltProtnamegtpresenilin-1lt/Protnamegt interaction.
ltprotgt p38 stress-activated protein kinase
lt/protgt inhibitor reverses ltprotgt bradykinin
B(1) receptor lt/protgt-mediated component of
inflammatory hyperalgesia.
  • Notice difference in
  • Length and density of protein names
  • Number of training examples UT 4Yapex
  • positive examples twice as many in Yapex

22
Experiment
  • Examining three dimensions
  • Labeled vs unlabeled vs prior auxiliary data
  • eg. target positive examples, few labeled
    target data
  • Transduction vs induction
  • Transfer vs non-transfer
  • Since few true positives, focused on
  • F1 (2 Precision Recall) / (Precision
    Recall)
  • Source UT, target Yapex
  • For IPL, ? .95 (conservative)

23
Results Transfer
  • Transfer is much more difficult
  • Accuracy is not the problem

24
Results Transduction
  • Transduction helps in transfer setting
  • TSVM copes better than MaxEnt, ISVM

25
Results IPL
  • IPL can help boost performance
  • Makes transfer MaxEnt competitive with TSVM
  • But bounded by quality of initial pseudo-labels

26
Results Priors
  • Priors improve unsupervised transfer
  • Threshold helps balance recall and precision ?
    better F1
  • A little bit of knowledge can help a lot

27
Results Supervision
  • Supervised transfer beats supervised non-transfer
  • Significant at 99 binomial CI on precision and
    recall
  • But not by as much as might be hoped for
  • Even relatively simple transfer methods can help

28
Conclusions Contributions
  • Introduced novel MaxEnt transfer method IPL
  • Can match transduction in unsupervised setting
  • Gives probabilistic results
  • Analyzed and compared various methods related to
    transfer learning and concluded
  • Transfer is hard
  • But made easier when explicitly addressed
  • Transduction is a good start
  • TSVM excels even with scant prior knowledge
  • A little prior target knowledge is even better
  • No need for fully labeled target data set

29
Limitations Future Work
  • Threshold is important
  • Currently only using at test time
  • Why not incorporate earlier, get better pseudo
    labels
  • Priors seem to help a lot
  • Currently only using feature means, what about
    variances?
  • Can structuring feature
  • space lead to parsimonious
  • transferable priors?

token
right
left
token.is.capitalized
token.is.numeric
30
Limitations Future Work high-level
  • How to better make use of source data?
  • Why doesnt source data help so much?
  • Is IPL convex?
  • Is this exactly what we want to optimize?
  • How does regularization affect convexity?
  • What, exactly, is the relationship between
    transduction and transfer?
  • Can their theories be unified?
  • When is it worth explicitly modeling transfer?
  • How different do the domains need to be?
  • How much source/target data do we need?
  • What kind of priors do we need?

31
? Thank you! ?
Questions ?
32
References
Write a Comment
User Comments (0)
About PowerShow.com