Hyphenated compounds are tagged as NN. - PowerPoint PPT Presentation

About This Presentation
Title:

Hyphenated compounds are tagged as NN.

Description:

Title: Slide 1 Author: View Last modified by: Tree Created Date: 7/8/2003 8:27:43 PM Document presentation format: Custom Other titles: Arial Wingdings ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 3
Provided by: View61
Category:

less

Transcript and Presenter's Notes

Title: Hyphenated compounds are tagged as NN.


1
Experimental Results
Constrained Conditional Model
Domain Adaptation
After adding knowledge, POS tagging error reduces
42, SRL error reduces 25 on Be verbs and 9 on
all verbs.
  • Incorporate prior knowledge as constraints c
    Cj(.).
  • Learn the weight vector w ignoring c.
  • Impose constraints c at inference time.

Problem Performance of statistical systems drops
significantly when tested on a domain different
than the training domain. Example CoNLL 2007
shared task annotation standard was different
across the source and target domain. Motivation
Prior Knowledge is cheap and readily available
for many domains. Solution Use prior knowledge
on the target domain for better adaptation.
System POS SRL SRL
All Verbs Be Verbs
Baseline 86.2 58.1 15.5
Self-training 86.2 58.3 13.7
PDA-KW 91.8 62.1 34.5
PDA-ST 92.0 62.4 36.4
  • For POS tagging, we do not have any domain
    independent knowledge.
  • For SRL, we use some domain independent
    knowledge.
  • Example Two arguments can not overlap.

POS Tagging
I eat fruits .
When POS Tagger trained on WSJ domain is tested
on Bio domain, F1 drops 9.
PDA-KW
PRP VB NNS .
Comparison with JiangZh07
  • Incorporate Target domain specific knowledge c
    Ck(.) as constraints.
  • Impose constraints c and c at inference time.
  • Adaptation without retraining.

Semantic Role Labeling (SRL)
  • Without using any labeled data, prior knowledge
    reduces error 38 over using 300 labeled
    sentences.
  • Without using any labeled data, prior knowledge
    recovers 72 accuracy gain of adding 2730 labeled
    sentences.

I eat fruits .
When SRL trained on WSJ domain is tested on
Ontonotes, F1 drops 18.
A0 V A1
Prior Knowledge on Ontonotes
Be verbs are unseen in training domain.
.
System POS Amount of Target Label Data
PDA-ST 92.0 0
JiangZh07-1 87.2 300
JiangZh07-2 94.2 2730
  • If be verb is followed by a verb immediately,
    there can be no core argument.
  • Example John is eating.
  • If be verb is followed by the word like, core
    arguments of A0 and A1 are possible.
  • Example And hes like why s the door open ?
  • Otherwise, A1 and A2 are possible.
  • Example John is a good man.

PDA-ST
  • Motivation Constraints are accurate but apply
    rarely. So can we generalize to cases where
    constraints did not apply?
  • Solution Embed constraints into self training.

Frame file of be verb
Ds Source domain labeled data Du Target domain
unlabeled data Dt Target domain test data
Conclusion
Prior Knowledge on BioMed
  • Hyphenated compounds are tagged as NN.
  • Example H-ras
  • Digit letter combinations should be tagged as NN.
  • Example CTNNB1
  • Hyphen should be tagged as HYPH.

Annotation wiki
  • Prior knowledge gives competitive results to
    using labeled data.
  • Future Work
  • Improve the results for self-training.
  • Find theoretical justifications for self training
  • Apply PDA to more tasks/ domains.

Suggestions?
Only names of persons, locations etc. are proper
nouns which are very few. Gene, disease, drug
names etc. are marked as common nouns.
References
  • Any word unseen in source domain followed by the
    word gene should be tagged as NN. Example ras
    gene
  • If any word does not appear with tag NNP in
    training data, predict NN instead of NNP.
    Example polymerase chain reaction ( PCR )

Self-training
  • J. Jiang and C. Zhai, Instance Weighting for
    domain adaptation in nlp, acl07
  • G. Kundu and D. Roth, Adapting text instead of
    the Model An Open Domain Approach, conll 11
  • J. Blitzer, R. Mcdonald, F. Pereira, Domain
    Adaptation with Structural Correspondence
    Learning, emnlp06
  • Motivation How good is self training without
    knowledge?

Same as PDA-ST except replace the red boxed line
with the following line.
This research is sponsored by ARL and DARPA,
under the machine reading program.
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAA
2
Only names of persons, locations etc. are proper
nouns which are very few. Gene, disease, drug
names etc. are marked as common nouns.
Write a Comment
User Comments (0)
About PowerShow.com