A Comparison of Methods for Transductive Transfer Learning - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

A Comparison of Methods for Transductive Transfer Learning

Description:

What we are able to do: The neuronal cyclin-dependent kinase ... Learning to predict rain in in humid and arid climates. How to maximize F1 (and not accuracy) ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 33

Provided by: andrewarno1

Category:

more less

Transcript and Presenter's Notes

Title: A Comparison of Methods for Transductive Transfer Learning

1
A Comparison of Methods for Transductive Transfer
Learning

Andrew Arnold
Advised by William W. Cohen
Machine Learning Department
School of Computer Science
Carnegie Mellon University
May 30, 2007

2
What we are able to do

Supervised learning
Train on large, labeled data sets drawn from same
distribution as testing data
Well studied problem

Training data
Test
Test
Train
Reversible histone acetylation changes the
chromatin structure and can modulate gene
transcription. Mammalian histone deacetylase 1
(HDAC1)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
3
What were getting better at doing

Semi-supervised learning
Same as before, but now
Add large unlabelled or weakly labeled data sets
from same domain
Zhu 05, Grandvalet 05

Train
Auxiliary (available for training)
Auxiliary
Train
Reversible histone acetylation changes the
chromatin structure and can modulate gene
transcription. Mammalian histone deacetylase 1
(HDAC1)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
4
What were getting better at doing

Transductive learning
Unlabeled test data is available during training
Easier than inductive learning
Learning specific predictions rahter than general
function
Joachims 99, 03, Sindhwani 05, Vapnik 98

Train
Both Auxiliary Eventual Test
Auxiliary Test
Train
Reversible histone acetylation changes the
chromatin structure and can modulate gene
transcription. Mammalian histone deacetylase 1
(HDAC1)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
5
What wed like to be able to do

Transfer learning (domain adaptation)
Leverage large, previously labeled data from a
related domain
Related domain well be training on (with lots of
data) Source
Domain were interested in and will be tested on
(data scarce) Target
Ng 06, Daumé 06, Jiang 06, Blitzer 06,
Ben-David 07, Thrun 96

Train (source domain E-mail)
Test (target domain IM)
Test (target domain Caption)
Train (source domain Abstract)
Neuronal cyclin-dependent kinase p35/cdk5 (Fig 1,
a) comprises a catalytic subunit (cdk5, left
panel) and an activator subunit (p35, fmi 4)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
6
What wed like to be able to do

Transfer learning (multi-task)
Same domain, but slightly different task
Related task well be training on (with lots of
data) Source
Task were interested in and will be tested on
(data scarce) Target
Ando 05, Sutton 05

Train (source task Names)
Test (target task Pronouns)
Test (target task Action Verbs)
Train (source task Proteins)
Reversible histone acetylation changes the
chromatin structure and can modulate gene
transcription. Mammalian histone deacetylase 1
(HDAC1)
The neuronal cyclin-dependent kinase p35/cdk5
comprises a catalytic subunit (cdk5) and an
activator subunit (p35)
7
Motivation

Why is transfer important?
Often we violate non-transfer assumption without
realizing. How much data is truly identically
distributed (i.i.d.)?
E.g. Different authors, annotators, time periods,
sources
Large amounts of labeled data/trained classifiers
already exist
Why waste data computation?
Can learning be made easier by leveraging related
domains/problems?
Life-long learning
Why is transduction important?
Why solve a harder problem than we need to?
Unlabeled data is vast and cheap
Are transduction and transfer so different?
Can we learn more about one by studying the other?

8
Outline

Motivating Problems
Supervised learning
Semi-supervised learning
Transductive learning
Transfer learning domain adaptation
Transfer learning multi-task
Methods
Maximum entropy (MaxEnt)
Source regularized maximum entropy
Feature space expansion
Feature selection
Feature space transformation
Iterative Pseudo Labeling (IPL)
Biased thresholding
Support Vector Machines (SVMs)
Inductive SVM
Transductive SVM
Experiment

9
Maximum Entropy (MaxEnt)

Discriminative model
Matches feature expectations of model to data

Conditional likelihood
Regularized optimization
10
Summary of Learning Settings
10
11
Source-regularized MaxEnt

Instead of regularizing towards zero
Learn model ?s on source data
During target training
Regularize towards source-trained ?s

Chelba04
12
Feature Space Expansion

Add extra degrees of freedom
Allow classifier to discern general/specific
features

Daumé 06, 07
13
Feature selection

Emphasize features shared by source and target
data
Minimize different features
How to measure?
Fisher exact test
Is P(feature source) P(feature target) ?
If so, shared feature ? keep
If not, different feature ? discard

14
Feature Space Transformation

Source and target originally independently
separable
Learn transformation, G, to allow joint
separation

15
Iterative Pseudo Labeling (IPL)

Novel algorithm for MaxEnt based transfer
Adjust feature values to match feature
expectation in source and target
? trades off certainty vs adaptativity

16
IPL analysis
Given linear transform
We can express conditional feature expectations
of target data in terms of a transformation of
source
17
Biased Thresholding

Different proportions of positive examples
Learning to predict rain in in humid and arid
climates
How to maximize F1 (and not accuracy)?

Score Cut (s-cut)
Select score threshold over ranked train scores
Apply to test data
Percentage Cut (p-cut)
Estimate proportion of positive examples expected
in target data
Set threshold so as to select this amount

18
Support Vector Machines (SVMs)

Inductive (standard) SVM
Learn separating hyperplane on labeled training
data. Then evaluate on held-out testing data.
Transductive SVM
Learn hyperplane in the presence of labeled
training data AND unlabeled testing data. Use
distribution of testing points to assist you.
Easier to learn particular labels than a whole
function.
More expensive than inductive

19
Transductive vs. Inductive SVM
Joachims 99, 03
20
Domain
21
Data
ltProtnamegtp35lt/Protnamegt/ltProtnamegtcdk5
lt/Protnamegt binds and phosphorylates
ltProtnamegtbeta-cateninlt/Protnamegt and regulates
ltProtnamegtbeta-catenin lt/Protnamegt /
ltProtnamegtpresenilin-1lt/Protnamegt interaction.
ltprotgt p38 stress-activated protein kinase
lt/protgt inhibitor reverses ltprotgt bradykinin
B(1) receptor lt/protgt-mediated component of
inflammatory hyperalgesia.

Notice difference in
Length and density of protein names
Number of training examples UT 4Yapex
positive examples twice as many in Yapex

22
Experiment

Examining three dimensions
Labeled vs unlabeled vs prior auxiliary data
eg. target positive examples, few labeled
target data
Transduction vs induction
Transfer vs non-transfer
Since few true positives, focused on
F1 (2 Precision Recall) / (Precision
Recall)
Source UT, target Yapex
For IPL, ? .95 (conservative)

23
Results Transfer

Transfer is much more difficult
Accuracy is not the problem

24
Results Transduction

Transduction helps in transfer setting
TSVM copes better than MaxEnt, ISVM

25
Results IPL

IPL can help boost performance
Makes transfer MaxEnt competitive with TSVM
But bounded by quality of initial pseudo-labels

26
Results Priors

Priors improve unsupervised transfer
Threshold helps balance recall and precision ?
better F1
A little bit of knowledge can help a lot

27
Results Supervision

Supervised transfer beats supervised non-transfer
Significant at 99 binomial CI on precision and
recall
But not by as much as might be hoped for
Even relatively simple transfer methods can help

28
Conclusions Contributions

Introduced novel MaxEnt transfer method IPL
Can match transduction in unsupervised setting
Gives probabilistic results
Analyzed and compared various methods related to
transfer learning and concluded
Transfer is hard
But made easier when explicitly addressed
Transduction is a good start
TSVM excels even with scant prior knowledge
A little prior target knowledge is even better
No need for fully labeled target data set

29
Limitations Future Work