A TwoStage Approach to Domain Adaptation for Statistical Classifiers - PowerPoint PPT Presentation

About This Presentation

Title:

A TwoStage Approach to Domain Adaptation for Statistical Classifiers

Description:

Digital cameras cell phones. Movies books. Can we do better than standard supervised learning? ... (K 1) source domains and test on the held-out source domain ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 41

Provided by: Jing82

Learn more at: http://www.mysmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A TwoStage Approach to Domain Adaptation for Statistical Classifiers

1
A Two-Stage Approach to Domain Adaptation for
Statistical Classifiers

Jing Jiang ChengXiang Zhai
Department of Computer Science
University of Illinois at Urbana-Champaign

2
What is domain adaptation?
3
Example named entity recognition
persons, locations, organizations, etc.
train (labeled)
test (unlabeled)
standard supervised learning
NER Classifier
85.5
New York Times
New York Times
4
Example named entity recognition
persons, locations, organizations, etc.
train (labeled)
test (unlabeled)
non-standard (realistic) setting
NER Classifier
64.1
New York Times
5
Domain difference ? performance drop
train
test
ideal setting
NER Classifier
NYT
NYT
85.5
New York Times
New York Times
realistic setting
NER Classifier
NYT
Reuters
64.1
Reuters
New York Times
6
Another NER example
train
test
ideal setting
gene name recognizer
54.1
mouse
mouse
realistic setting
gene name recognizer
28.1
fly
mouse
7
Other examples

Spam filtering
Public email collection ? personal inboxes
Sentiment analysis of product reviews
Digital cameras ? cell phones
Movies ? books
Can we do better than standard supervised
learning?
Domain adaptation to design learning methods
that are aware of the training and test domain
difference.

8
How do we solve the problem in general?
9
Observation 1
domain-specific features
wingless daughterless eyeless apexless
10
Observation 1
domain-specific features
wingless daughterless eyeless apexless

describing phenotype
in fly gene nomenclature
feature -less weighted high

CD38 PABPC5
feature still useful for other organisms?
No!
11
Observation 2
generalizable features
12
Observation 2
generalizable features
feature X be expressed
13
General idea two-stage approach
domain-specific features
Source Domain
Target Domain
generalizable features
features
14
Goal
Source Domain
Target Domain
features
15
Regular classification
Source Domain
Target Domain
features
16
Generalization to emphasize generalizable
features in the trained model
Source Domain
Target Domain
features
Stage 1
17
Adaptation to pick up domain-specific features
for the target domain
Source Domain
Target Domain
features
Stage 2
18
Regular semi-supervised learning
Source Domain
Target Domain
features
19
Comparison with related work

We explicitly model generalizable features.
Previous work models it implicitly Blitzer et
al. 2006, Ben-David et al. 2007, Daumé III 2007.
We do not need labeled target data but we need
multiple source (training) domains.
Some work requires labeled target data Daumé III
2007.
We have a 2nd stage of adaptation, which uses
semi-supervised learning.
Previous work does not incorporate
semi-supervised learning Blitzer et al. 2006,
Ben-David et al. 2007, Daumé III 2007.

20
Implementation of the two-stage approach with
logistic regression classifiers
21
Logistic regression classifiers
0 1 0 0 1 0 1 0
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
-less
p binary features
X be expressed
and wingless are expressed in
x
wyT x
wy
22
Learning a logistic regression classifier
0 1 0 0 1 0 1 0
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
regularization term
penalize large weights control model complexity
log likelihood of training data
wyT x
23
Generalizable features in weight vectors
D1
D2
DK
K source domains
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
3.2 0.5 4.5 -0.1 3.5 0.1 -1.0 -0.2
0.1 0.7 4.2 0.1 3.2 1.7 0.1 0.3
domain-specific features

generalizable features
w1
w2
wK
24
We want to decompose w in this way
h non-zero entries for h generalizable features
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
0 0 4.6 0 3.2 0 0 0
0.2 4.5 0.4 -0.3 -0.2 2.1 -0.9 0.4

25
Feature selection matrix A
0 1 0 0 1 0 1 0
matrix A selects h generalizable features
0 0 1 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 1 0
h
z Ax
A
x
26
Decomposition of w
weights for domain-specific features
weights for generalizable features
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
0.2 4.5 0.4 -0.3 -0.2 2.1 -0.9 0.4
0 1 0 0 1 0 1 0
0 1 0 0 1 0 1 0
4.6 3.2 3.6
0 1 0

wT x vT z uT x
27
Decomposition of w
wTx vTz uTx
vTAx uTx
(Av)Tx uTx
w AT v u
28
Decomposition of w
shared by all domains
domain-specific
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
0.2 4.5 0.4 -0.3 -0.2 2.1 -0.9 0.4
0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0
0 0
4.6 3.2 3.6

w AT v
u
29
Framework for generalization
Fix A, optimize
regularization term
likelihood of labeled data from K source domains
wk
?s gtgt 1 to penalize domain-specific features
30
Framework for adaptation
Fix A, optimize
likelihood of pseudo labeled target domain
examples
?t 1 ltlt ?s to pick up domain-specific
features in the target domain
31
How to find A? (1)

Joint optimization
Alternating optimization

32
How to find A? (2)

Domain cross validation
Idea training on (K 1) source domains and test
on the held-out source domain
Approximation
wfk weight for feature f learned from domain k
wfk weight for feature f learned from other
domains
rank features by
See paper for details

33
Intuition for domain cross validation

domains
D1
D2
Dk-1
Dk (fly)
w 1.5 0.05
w 2.0 1.2
expressed -less
-less expressed
expressed -less
1.8 0.1
34
Experiments

Data set
BioCreative Challenge Task 1B
Gene/protein name recognition
3 organisms/domains fly, mouse and yeast
Experiment setup
2 organisms for training, 1 for testing
F1 as performance measure

35
Experiments Generalization
using generalizable features is effective
F fly M mouse Y yeast
domain cross validation is more effective than
joint optimization
36
Experiments Adaptation
F fly M mouse Y yeast
domain-adaptive bootstrapping is more effective
than regular bootstrapping
37
Experiments Adaptation
domain-adaptive SSL is more effective, especially
with a small number of pseudo labels
38
Conclusions and future work

Two-stage domain adaptation
Generalization outperformed standard supervised
learning
Adaptation outperformed standard bootstrapping
Two ways to find generalizable features
Domain cross validation is more effective
Future work
Single source domain?
Setting parameters h and m

39
References

S. Ben-David, J. Blitzer, K. Crammer F.
Pereira. Analysis of representations for domain
adaptation. NIPS 2007.
J. Blitzer, R. McDonald F. Pereira. Domain
adaptation with structural correspondence
learning. EMNLP 2006.
H. Daumé III. Frustratingly easy domain
adaptation. ACL 2007.

40
Thank you!

Write a Comment

User Comments (0)