A TwoStage Approach to Domain Adaptation for Statistical Classifiers - PowerPoint PPT Presentation

About This Presentation
Title:

A TwoStage Approach to Domain Adaptation for Statistical Classifiers

Description:

Digital cameras cell phones. Movies books. Can we do better than standard supervised learning? ... (K 1) source domains and test on the held-out source domain ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 41
Provided by: Jing82
Learn more at: http://www.mysmu.edu
Category:

less

Transcript and Presenter's Notes

Title: A TwoStage Approach to Domain Adaptation for Statistical Classifiers


1
A Two-Stage Approach to Domain Adaptation for
Statistical Classifiers
  • Jing Jiang ChengXiang Zhai
  • Department of Computer Science
  • University of Illinois at Urbana-Champaign

2
What is domain adaptation?
3
Example named entity recognition
persons, locations, organizations, etc.
train (labeled)
test (unlabeled)
standard supervised learning
NER Classifier
85.5
New York Times
New York Times
4
Example named entity recognition
persons, locations, organizations, etc.
train (labeled)
test (unlabeled)
non-standard (realistic) setting
NER Classifier
64.1
New York Times
5
Domain difference ? performance drop
train
test
ideal setting
NER Classifier
NYT
NYT
85.5
New York Times
New York Times
realistic setting
NER Classifier
NYT
Reuters
64.1
Reuters
New York Times
6
Another NER example
train
test
ideal setting
gene name recognizer
54.1
mouse
mouse
realistic setting
gene name recognizer
28.1
fly
mouse
7
Other examples
  • Spam filtering
  • Public email collection ? personal inboxes
  • Sentiment analysis of product reviews
  • Digital cameras ? cell phones
  • Movies ? books
  • Can we do better than standard supervised
    learning?
  • Domain adaptation to design learning methods
    that are aware of the training and test domain
    difference.

8
How do we solve the problem in general?
9
Observation 1
domain-specific features
wingless daughterless eyeless apexless
10
Observation 1
domain-specific features
wingless daughterless eyeless apexless
  • describing phenotype
  • in fly gene nomenclature
  • feature -less weighted high

CD38 PABPC5
feature still useful for other organisms?
No!
11
Observation 2
generalizable features
12
Observation 2
generalizable features
feature X be expressed
13
General idea two-stage approach
domain-specific features
Source Domain
Target Domain
generalizable features
features
14
Goal
Source Domain
Target Domain
features
15
Regular classification
Source Domain
Target Domain
features
16
Generalization to emphasize generalizable
features in the trained model
Source Domain
Target Domain
features
Stage 1
17
Adaptation to pick up domain-specific features
for the target domain
Source Domain
Target Domain
features
Stage 2
18
Regular semi-supervised learning
Source Domain
Target Domain
features
19
Comparison with related work
  • We explicitly model generalizable features.
  • Previous work models it implicitly Blitzer et
    al. 2006, Ben-David et al. 2007, Daumé III 2007.
  • We do not need labeled target data but we need
    multiple source (training) domains.
  • Some work requires labeled target data Daumé III
    2007.
  • We have a 2nd stage of adaptation, which uses
    semi-supervised learning.
  • Previous work does not incorporate
    semi-supervised learning Blitzer et al. 2006,
    Ben-David et al. 2007, Daumé III 2007.

20
Implementation of the two-stage approach with
logistic regression classifiers
21
Logistic regression classifiers
0 1 0 0 1 0 1 0
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
-less
p binary features
X be expressed
and wingless are expressed in
x
wyT x
wy
22
Learning a logistic regression classifier
0 1 0 0 1 0 1 0
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
regularization term
penalize large weights control model complexity
log likelihood of training data
wyT x
23
Generalizable features in weight vectors
D1
D2
DK
K source domains
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
3.2 0.5 4.5 -0.1 3.5 0.1 -1.0 -0.2
0.1 0.7 4.2 0.1 3.2 1.7 0.1 0.3
domain-specific features

generalizable features
w1
w2
wK
24
We want to decompose w in this way
h non-zero entries for h generalizable features
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
0 0 4.6 0 3.2 0 0 0
0.2 4.5 0.4 -0.3 -0.2 2.1 -0.9 0.4


25
Feature selection matrix A
0 1 0 0 1 0 1 0
matrix A selects h generalizable features
0 0 1 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 1 0
h
z Ax
A
x
26
Decomposition of w
weights for domain-specific features
weights for generalizable features
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
0.2 4.5 0.4 -0.3 -0.2 2.1 -0.9 0.4
0 1 0 0 1 0 1 0
0 1 0 0 1 0 1 0
4.6 3.2 3.6
0 1 0


wT x vT z uT x
27
Decomposition of w
wTx vTz uTx
vTAx uTx
(Av)Tx uTx
w AT v u
28
Decomposition of w
shared by all domains
domain-specific
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
0.2 4.5 0.4 -0.3 -0.2 2.1 -0.9 0.4
0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0
0 0
4.6 3.2 3.6


w AT v
u
29
Framework for generalization
Fix A, optimize
regularization term
likelihood of labeled data from K source domains
wk
?s gtgt 1 to penalize domain-specific features
30
Framework for adaptation
Fix A, optimize
likelihood of pseudo labeled target domain
examples
?t 1 ltlt ?s to pick up domain-specific
features in the target domain
31
How to find A? (1)
  • Joint optimization
  • Alternating optimization

32
How to find A? (2)
  • Domain cross validation
  • Idea training on (K 1) source domains and test
    on the held-out source domain
  • Approximation
  • wfk weight for feature f learned from domain k
  • wfk weight for feature f learned from other
    domains
  • rank features by
  • See paper for details

33
Intuition for domain cross validation

domains
D1
D2
Dk-1
Dk (fly)
w 1.5 0.05
w 2.0 1.2
expressed -less
-less expressed
expressed -less
1.8 0.1
34
Experiments
  • Data set
  • BioCreative Challenge Task 1B
  • Gene/protein name recognition
  • 3 organisms/domains fly, mouse and yeast
  • Experiment setup
  • 2 organisms for training, 1 for testing
  • F1 as performance measure

35
Experiments Generalization
using generalizable features is effective
F fly M mouse Y yeast
domain cross validation is more effective than
joint optimization
36
Experiments Adaptation
F fly M mouse Y yeast
domain-adaptive bootstrapping is more effective
than regular bootstrapping
37
Experiments Adaptation
domain-adaptive SSL is more effective, especially
with a small number of pseudo labels
38
Conclusions and future work
  • Two-stage domain adaptation
  • Generalization outperformed standard supervised
    learning
  • Adaptation outperformed standard bootstrapping
  • Two ways to find generalizable features
  • Domain cross validation is more effective
  • Future work
  • Single source domain?
  • Setting parameters h and m

39
References
  • S. Ben-David, J. Blitzer, K. Crammer F.
    Pereira. Analysis of representations for domain
    adaptation. NIPS 2007.
  • J. Blitzer, R. McDonald F. Pereira. Domain
    adaptation with structural correspondence
    learning. EMNLP 2006.
  • H. Daumé III. Frustratingly easy domain
    adaptation. ACL 2007.

40
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com