Hyphenated compounds are tagged as NN. - PowerPoint PPT Presentation

About This Presentation

Title:

Hyphenated compounds are tagged as NN.

Description:

Title: Slide 1 Author: View Last modified by: Tree Created Date: 7/8/2003 8:27:43 PM Document presentation format: Custom Other titles: Arial Wingdings ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 3

Provided by: View61

Category:

more less

Transcript and Presenter's Notes

Title: Hyphenated compounds are tagged as NN.

1
Experimental Results
Constrained Conditional Model
Domain Adaptation
After adding knowledge, POS tagging error reduces
42, SRL error reduces 25 on Be verbs and 9 on
all verbs.

Incorporate prior knowledge as constraints c
Cj(.).
Learn the weight vector w ignoring c.
Impose constraints c at inference time.

Problem Performance of statistical systems drops
significantly when tested on a domain different
than the training domain. Example CoNLL 2007
shared task annotation standard was different
across the source and target domain. Motivation
Prior Knowledge is cheap and readily available
for many domains. Solution Use prior knowledge
on the target domain for better adaptation.
System POS SRL SRL
All Verbs Be Verbs
Baseline 86.2 58.1 15.5
Self-training 86.2 58.3 13.7
PDA-KW 91.8 62.1 34.5
PDA-ST 92.0 62.4 36.4

For POS tagging, we do not have any domain
independent knowledge.
For SRL, we use some domain independent
knowledge.
Example Two arguments can not overlap.

POS Tagging
I eat fruits .
When POS Tagger trained on WSJ domain is tested
on Bio domain, F1 drops 9.
PDA-KW
PRP VB NNS .
Comparison with JiangZh07

Incorporate Target domain specific knowledge c
Ck(.) as constraints.
Impose constraints c and c at inference time.
Adaptation without retraining.

Semantic Role Labeling (SRL)

Without using any labeled data, prior knowledge
reduces error 38 over using 300 labeled
sentences.
Without using any labeled data, prior knowledge
recovers 72 accuracy gain of adding 2730 labeled
sentences.

I eat fruits .
When SRL trained on WSJ domain is tested on
Ontonotes, F1 drops 18.
A0 V A1
Prior Knowledge on Ontonotes
Be verbs are unseen in training domain.
.
System POS Amount of Target Label Data
PDA-ST 92.0 0
JiangZh07-1 87.2 300
JiangZh07-2 94.2 2730

If be verb is followed by a verb immediately,
there can be no core argument.
Example John is eating.
If be verb is followed by the word like, core
arguments of A0 and A1 are possible.
Example And hes like why s the door open ?
Otherwise, A1 and A2 are possible.
Example John is a good man.

PDA-ST

Motivation Constraints are accurate but apply
rarely. So can we generalize to cases where
constraints did not apply?
Solution Embed constraints into self training.

Frame file of be verb
Ds Source domain labeled data Du Target domain
unlabeled data Dt Target domain test data
Conclusion
Prior Knowledge on BioMed

Hyphenated compounds are tagged as NN.
Example H-ras
Digit letter combinations should be tagged as NN.
Example CTNNB1
Hyphen should be tagged as HYPH.

Annotation wiki

Prior knowledge gives competitive results to
using labeled data.
Future Work
Improve the results for self-training.
Find theoretical justifications for self training
Apply PDA to more tasks/ domains.

Suggestions?
Only names of persons, locations etc. are proper
nouns which are very few. Gene, disease, drug
names etc. are marked as common nouns.
References

Any word unseen in source domain followed by the
word gene should be tagged as NN. Example ras
gene
If any word does not appear with tag NNP in
training data, predict NN instead of NNP.
Example polymerase chain reaction ( PCR )

Self-training

J. Jiang and C. Zhai, Instance Weighting for
domain adaptation in nlp, acl07
G. Kundu and D. Roth, Adapting text instead of
the Model An Open Domain Approach, conll 11
J. Blitzer, R. Mcdonald, F. Pereira, Domain
Adaptation with Structural Correspondence
Learning, emnlp06

Motivation How good is self training without
knowledge?

Same as PDA-ST except replace the red boxed line
with the following line.
This research is sponsored by ARL and DARPA,
under the machine reading program.
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAA
2
Only names of persons, locations etc. are proper
nouns which are very few. Gene, disease, drug
names etc. are marked as common nouns.

Write a Comment

User Comments (0)