Classification of Semantic Relations in Noun Compounds using MeSH - PowerPoint PPT Presentation

About This Presentation
Title:

Classification of Semantic Relations in Noun Compounds using MeSH

Description:

Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 29
Provided by: CIS145
Category:

less

Transcript and Presenter's Notes

Title: Classification of Semantic Relations in Noun Compounds using MeSH


1
Classification of Semantic Relations in Noun
Compounds using MeSH
  • Marti Hearst, Barbara Rosario
  • SIMS, UC Berkeley

2
LINDI Project Synopsis
  • Goal Extract semantics from text
  • Method statistical corpus analysis
  • Focus BioMedical text
  • Interesting inferences (Swanson)
  • Rich lexical resources
  • Difficult NLP problems
  • Noun Compounds

3
Noun Compounds (NCs)
  • Any sequence of nouns that itself functions as a
    noun
  • asthma hospitalizations
  • asthma hospitalization rates
  • bone marrow aspiration needle
  • health care personnel hand wash
  • Technical text is rich with NCs
  • Open-labeled long-term study of the
    subcutaneous sumatriptan efficacy and
    tolerability in acute migraine treatment.

4
NCs 3 computational tasks(Lauer Dras 94)
  • Identification
  • Syntactic analysis (attachments)
  • Baseline headache frequency
  • Tension headache patient
  • Semantic analysis
  • Headache treatment treatment for
    headache
  • Corticosteroid treatment treatment that uses

    corticosteroid






5
NC Semantic Relations
  • Linguistic theories regarding the nature of the
    relations between constituents in NCs all
    conflict.
  • J. Levi 78
  • P. Downing 77
  • B. Warren 78

6
NC Semantic relations
  • 38 Relations found by iterative refinement based
    on 2245 NCs
  • Goals
  • More specific than case roles
  • General enough to aid coverage
  • Allow for domain-specific relations

7
Semantic relations
  • Examples
  • Frequency/time of
  • influenza season, headache interval
  • Measure of
  • relief rate, asthma mortality, hospital survival
  • Instrument
  • aciclovir therapy, laser irradiation, aerosol
    treatment
  • Purpose
  • headache drugs, voice therapy, influenza
    treatment
  • Defect
  • hormone deficiency, csf fistulas, gene mutation
  • Inhibitor
  • Adrenoreceptor blockers, influenza prevention

8
Multi-class Assignment
  • Some NCs can be describe by more than one
    semantic relationships
  • eyelid abnormalities location and defect
  • food allergy cause and activator
  • cell growth change and activity
  • tumor regression change and ending/reductio
    n

9
Extraction of NCs
  1. Titles and abstracts from Medline (medical
    bibliographic database)
  2. Part of Speech Tagger
  3. Extraction of sequences of units tagged as nouns
  4. Collection of 2245 NCs with 2 nouns

10
Models
  • Lexical (words)
  • headache pain
  • Class based model using MeSH descriptors for
    levels of descriptions
  • MeSH 2 C.23 G.11
  • MeSH 3 C23.888 G11.561
  • MeSH 4 C23.888.592 G11.561.796
  • MeSH 5 C23.888.592 G11.561.796
  • MeSH 6 C23.888.592.612 G11.561.796 .444

11
MeSH Tree Structures
  • 1. Anatomy A
  • 2. Organisms B
  • 3. Diseases C
  • 4. Chemicals and Drugs D
  • 5. Analytical, Diagnostic and Therapeutic
    Techniques and Equipment E
  • 6. Psychiatry and Psychology F
  • 7. Biological Sciences G
  • 8. Physical Sciences H
  • 9. Anthropology, Education, Sociology and
    Social Phenomena I
  • 10. Technology and Food and Beverages J
  • 11. Humanities K
  • 12. Information Science L
  • 13. Persons M
  • 14. Health Care N
  • 15. Geographic Locations Z

12
MeSH Tree Structures
  • 1. Anatomy A
  • Body Regions A01
  • Musculoskeletal System A02
    Digestive System A03
  • Respiratory System A04
  • Urogenital System A05
  • Endocrine System A06
  • Cardiovascular System A07
  • Nervous System A08
  • Sense Organs A09
  • Tissues A10
  • Cells A11
  • Fluids and Secretions A12
  • Animal Structures A13
  • Stomatognathic System A14
  • (..)
  • Body Regions A01
  • Abdomen A01.047
  • Groin A01.047.365
  • Inguinal Canal A01.047.412
  • Peritoneum A01.047.596
  • Umbilicus A01.047.849
  • Axilla A01.133
  • Back A01.176
  • Breast A01.236
  • Buttocks A01.258
  • Extremities A01.378
  • Head A01.456
  • Neck A01.598
  • (.)

13
Mapping Nouns to MeSH Concepts
  • headache recurrence
  • C23.888.592.612.441 C23.550.291.937
  • headache pain
  • C23.888.592.612.441 G11.561.796.444
  • breast cancer cells
  • A01.236 C04 A11

14
Levels of Description
  • headache pain (C23.888.592.612.441
    G11.561.796.444)
  • Only Tree C G
  • C(Diseases)
  • G (Biological Sciences)
  • Level 1 C 23 G 11
  • C 23 (Diseases Pathological Conditions)
  • G 11 (Biological Sciences Musculoskeletal,
    Neural, and Ocular Physiology)
  • Level 2 C 23 888 G 11 561
  • C 23.888 (DiseasesPathological Conditions Signs
    and symptoms)
  • G 11.561 (Biological Sciences Musculoskeletal,
    Neural, and Ocular PhysiologyNervous System
    Physiology)
  • Level 3 C 23 888 592 G 11 561 796
  • C 23.888.592 (Diseases Pathological Conditions
    Signs and symptoms Neurologic Manifestations)
  • G 11.561.796 (Biological Sciences
    Musculoskeletal, Neural, and Ocular
    PhysiologyNervous System PhysiologySensation)

15
Classification Task Method
  • Multi-class (18) classification problem
  • Multi layer Neural Networks to classify across
    all relations simultaneously.
  • Evaluation distinguish between
  • Seen NCs where 1 or 2 words appeared in the
    training set
  • Unseen NCs in which neither word appeared in the
    training set

16
Accuracy for 18-way Classification
Correct answer in first three (76-78)
Correct answer in first two (71-73)
Correct answer ranked first (61-62)
Training 855 NCs (50)
Testing 805 NCs (75 unseen)
Baseline (guessing most frequent class)
17
Accuracies for 18-way classification
generalization on unseen NCs
Training 73 NCs (5)
Testing 1587 NCs (810 unseen) (95)
  • MeSH

MeSH on unseen
Lexical
Lexical on unseen
18
Accuracies by Unseen Noun
Case 1 first N unseen
Case 2 second N unseen
Case 3 both N seen
Case 4 neither N seen
Training 73 NCs (5)
Testing 1587 NCs (810 unseen) (95)
19
Accuracy for each relation
20
Accuracy for sample relations
Produces (genetic)
Ex. Test Set thymidine allele tumor dna csf
mrna acetylase gene virion rna ()
21
Accuracy for sample relations
Frequency/time of
Test Set disease recurrence headache
recurrence enterovirus season influenza
season mosquito season pollen season disease
stage transcription stage drive time injection
time ischemia time travel time
22
Accuracy for sample relations
Purpose
Test Set varicella vaccine tb vaccines
poliovirus vaccine influenza vaccination influen
za immunization abscess drainage acne therapy
asthma therapy asthma treatment carcinogen
treatment disease treatment hiv treatment
23
Related work
  • Finin (1980)
  • Detailed AI analysis, hand-coded
  • Rindflesch et al. (2000)
  • Hand-coded rule base to extract certain types of
    assertions

24
Related work
  • Vanderwende (1994)
  • automatically extracts semantic information from
    an on-line dictionary
  • manipulates a set of handwritten rules
  • 13 classes
  • 52 accuracy
  • Lapata (2000)
  • classifies nominalizations into subject/object
    binary distinction
  • 80 accuracy
  • Lauer (1995)
  • probabilistic model
  • 8 classes
  • 47 accuracy

25
Related work
  • Prepositional Phrase Attachment
  • The problem
  • Eat spaghetti with a fork
  • Eat spaghetti with sauce
  • V N1 P N2
  • Attachment/association, not semantics
  • Approaches
  • Word occurrences (Hindle Rooth 93)
  • Using a lexical hierarchy
  • Conceptual association (Resnik 93, Resnik
    Hearst 93)
  • Transformation-based (Brill Resnik 94)
  • MDL to find optimal tree cut (Li Abe 98)
  • Lindi use ML techniques to determine appropriate
    level of lexical hierarchy, classify into
    semantic relations

26
Conclusions
  • A simple method for assigning semantic relations
    to noun compounds
  • Does not require complex hand-coded rules
  • Does make use of existing lexical resources
  • High accuracy levels for an 18-way class
    assignment
  • Small training set gets 60 accuracy on mixed
    seen and unseen words
  • Tiny training set (73 NCs) gets 40 accuracy on
    entirely unseen words
  • Off-the-shelf, unoptimized ML algorithms

27
Future work
  • Analysis of cases where it doesnt work
  • NC with gt 2 terms
  • How to generalize patterns found for noun
    compounds to other syntactic structures?
  • How can we best formally represent semantics?
  • How can we deal with non medical words?
  • Should we use other ontologies (e.g.,WordNet)?

28
Using Relations
  • Eventual plan combine relations with
    constituents ontology memberships
  • Examples
  • Instrument_2 (biopsy,needle) -gt
    Instrument_2(Diagnostic, Tool)
  • Procedure(brain,biopsy) -gt Procedure(Anatomical-E
    lement, Diagnostic)
  • Procedure(tumor, marker) -gt Procedure(Disease-elem
    ent, Indicator)
Write a Comment
User Comments (0)
About PowerShow.com