Semantic Interpretation of Medical Text - PowerPoint PPT Presentation

About This Presentation
Title:

Semantic Interpretation of Medical Text

Description:

10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] ... 2: Therapeutics [E02]| Surgical Procedure[E04]| noun3 treats ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 30
Provided by: rosa6
Category:

less

Transcript and Presenter's Notes

Title: Semantic Interpretation of Medical Text


1
Semantic Interpretation of Medical Text
  • Barbara Rosario, SIMS
  • Steve Tu, UC Berkeley
  • Advisor Marti Hearst, SIMS

2
Semantic Interpretation of Medical Text
  • More accurate representation of the content of
    the input text
  • Enhance text with information (concept,
    relationships) drawn from a medical knowledge
    source
  • Determine semantic meaning of the words (and
    bigger constructs) and the relationships between
    them.

3
Combine Statistical and Symbolic Methods
  • Use of knowledge bases, semantic hierarchies,
    medical knowledge, rules
  • Use of statistic methods and machine learning
    techniques

4
Statistical methods
  • Disambiguation
  • Detection of semantic patterns
  • Classification of semantically related constructs
  • Degrees (weights, probabilities)

5
First Experiment Noun Compounds and MeSH
  • Interpretation of noun compounds is crucially
    semantic
  • Noun compounds extracted from a collection of
    titles and abstracts of medical journals found in
    Medline
  • MeSH (Medical Subject Headings) concepts for the
    labels

6
Preprocessing
Tagger
Noun Compound Extraction
MeSH
Semantic Labeling
Output Semantic Labelled Noun Compounds
7
MeSH Tree Structures (main)
  • 1. Anatomy A
  • 2. Organisms B
  • 3. Diseases C
  • 4. Chemicals and Drugs D
  • 5. Analytical, Diagnostic and Therapeutic
    Techniques and Equipment E
  • 6. Psychiatry and Psychology F
  • 7. Biological Sciences G
  • 8. Physical Sciences H
  • 9. Anthropology, Education, Sociology and
    Social Phenomena I
  • 10. Technology and Food and Beverages J
  • 11. Humanities K
  • 12. Information Science L
  • 13. Persons M
  • 14. Health Care N
  • 15. Geographic Locations Z

8
MeSH Tree Structures (node A expanded)
  • 1. Anatomy A
  • Body Regions A01
  • Musculoskeletal System A02
  • Digestive System A03
  • Respiratory System A04
  • Urogenital System A05
  • Endocrine System A06
  • Cardiovascular System A07
  • Nervous System A08
  • Sense Organs A09
  • Tissues A10
  • Cells A11
  • Fluids and Secretions A12
  • Animal Structures A13
  • Stomatognathic System A14
  • Hemic and Immune Systems A15
  • Embryonic Structures A16
  • Body Regions A01
  • Abdomen A01.047
  • Groin A01.047.365
  • Inguinal Canal A01.047.412
  • Peritoneum A01.047.596
  • Retroperitoneal SpaceA01.047.681
  • Umbilicus A01.047.849
  • Axilla A01.133
  • Back A01.176
  • Breast A01.236
  • Buttocks A01.258
  • Extremities A01.378
  • Head A01.456
  • Neck A01.598
  • Pelvis A01.673
  • Perineum A01.719
  • Skin A01.835
  • Thorax A01.911
  • Viscera A01.960

9
Mapping Nouns to MeSH Concepts
  • Ex migraine headache recurrence

migraine C10.228.140.546.800.525 C10.228.140.300.800.542 C14.907.253.937.542 headache C23.888.592.612.441 C10.597.617.470 C23.888.646.487 recurrence C23.550.291.937
10
More Nouns Compounds
  • migraine headache recurrence
  • C10.228.140.546.800.525 C23.888.592.612.441
    C23.550.291.937
  • blood plasma perfusion
  • A12.207.152 A15.145.693 E05.680
  • migraine headache pain
  • C10.228.140.546.800.525 C23.888.592.612.441
    G11.561.796.444
  • brain stem neurons
  • A08.186.211 E05.595.402.541.250 A08.663
  • rat liver mitochondria
  • B02.649.865.635.560 A03.620 A11.368.702.564
  • plasma arginine vasopressin
  • A15.145.693 D12.125.095.104 D06.472.734.692.781
  • rat thyroid cells
  • B02.649.865.635.560 A06.407.900 A11
  • growth hormone secretion
  • G07.553.481 D27.505.440.472 A12.200
  • blood urea nitrogen
  • A12.207.152 D02.948 D01.362.625
  • breast cancer cells
  • A01.236 C04 A11
  • cancer cell lines
  • C04 A11 G05.331.599.110.708.330.800.400

11
Attachment and Semantic Interpretation
  • Attachment classification
  • acute migraine treatment N N N (LA)
  • intra-nasal migraine treatment N N N (RA)
  • To bootstrap semantic interpretation
  • Decision tree (Quinlan)

12
Levels of Descriptions
  • migraine headache recurrence (LA)
  • C10.228.140.546.800.525 C23.888.592.612.441
    C23.550.291.937

Feature vector
Only Tree C, C, C
Level 1 C, 10, C, 23, C, 23
Level 2 C, 10.228, C, 23.888, C, 23.550
Level 3 C, 10.228.140, C, 23.888.592, C, 23.550.291
Level 4 C, 10.228.140.546, C, 23.888.592.612, C, 23.550.291.937
13
Decision Tree Classification
Training before pruning Training after pruning Testing before pruning Testing after pruning
Only Tree 15.8 16.4 17.3 17.3
Level 1 11.2 11.8 15.4 15.4
Level 2 7.9 8.6 21.2 17.3
Level 3 7.9 10.5 26.9 17.3
Level 4 8.6 9.9 25.0 19.2
14
Expressiveness of Decision Trees
  • first noun tree B ra (33.0/3.7)
  • first noun tree E ra (2.0/1.6)
  • first noun tree F la (0.0)
  • first noun tree G la (4.0/0.3)
  • first noun tree A
  • second noun tree B la (0.0)
  • second noun tree D la (4.0/0.3)
  • second noun tree E la (10.0/0.4)
  • second noun tree F la (0.0)
  • second noun tree G la (6.0/1.6)
  • second noun tree A
  • first tree position lt 4 ra (7.0/1.6)
  • first tree position gt 4 la (36.0/5.8)
  • second noun tree C
  • third noun tree A ra (9.0/0.3)
  • third noun tree B la (0.0)
  • third noun tree D la (1.0/0.3)
  • third noun tree E la (5.0/0.3)
  • third noun tree F la (0.0)

15
(No Transcript)
16
Semantic Interpretation
  • Use decision tree paths for the detection of
    clusters of noun compounds with the same semantic
    interpretation

17
Ex ACA ltanatomygt ltdiseasegt ltanatomygt
breast cancer cells A01.236 C04 A11 ra   bladder cancer cells A05.810.161 C04 A11 ra   colon carcinoma cells A03.492.411.495 C04.557.470 A11 ra   prostate tumor cells A10.336.707 C04 A11 ra   prostate cancer tissue A10.336.707 C04 A10 ra lung cancer cells A04.411 C04 A11 ra   colon cancer cells A03.492.411.495.356 C04 A11 ra   brain tumor tissue A08.186.211 C04 A10 ra   colon cancer tissues A03.492.411.495.356 C04 A10 ra bladder tumor cells A05.810.161 C04 A11 ra
Interpretation ltParty of Bodygt noun3 exhibits ltDiseasegt noun2 in ltlocationgt noun1
18
Ex ACE ltanatomygt ltdiseasegt ltAnalytical,
Diagnostic and Therapeutic Techniques and
Equipmentgt
muscle disease diagnosis A10.690 C23.550.288 E01 la   breast cancer prognosis A01.236 C04 E01.789 la   breast cancer treatment A01.236 C04 E02 la   hip fracture treatment A01.378.592 C21.866.405 E02 la   cell cancer treatment A11 C04 E02 la   brain tumor treatment A08.186.211 C04 E02 la colon adenocarcinoma xenograft A03.492.411.495.356 C04.557.470.200.025 E04.936.764   colon carcinoma xenograft A03.492.411.495.356 C04.557.470.200 E04.936.764   colon carcinoma xenografts A03.492.411.495.356 C04.557.470.200 E04.936.764   neck cancer xenografts A01.598 C04 E04.936.764
Interpretation 1 ltDiagnosis E01gtnoun3 diagnoses ltDiseasegt noun2 in ltlocationgt noun1 2 ltTherapeutics E02 Surgical ProcedureE04gt noun3 treats ltDiseasegt noun2 in ltlocationgt noun1
19
From MeSH to UMLS
  • Unified Medical Language System, project at U.S
    National Library of Medicine
  • 3 UMLS Knowledge Sources
  • Metathesaurus
  • Semantic Network
  • SPECIALIST lexicon and programs

20
Metathesaurus
  • Most extensive of UMLS sources
  • 730,000 concepts representing more then 1,500,000
    strings in over 60 vocabularies and
    classifications
  • Organized by concept or meaning.
  • In essence, its purpose is to link alternative
    names and views of the same concept together and
    to identify useful relationships between
    different concepts.
  • Relationships in the Metathesaurus come from the
    sources themselves or are created by the
    Metathesaurus editors.

21
Semantic Network
  • Consistent categorization of all concepts
    represented in the UMLS Metathesaurus and the
    important relationships between them.
  • Every concept has been assigned a semantic type.
  • The semantic types (134) are the nodes in the
    Network, and the relationships between them are
    the links (54)
  • High level semantic structure

22
"Biologic Function" Hierarchy
23
Noun Compounds, again
  • Very preliminary studies
  • Can we use the information of the Semantic Net
    for the semantic interpretation on the noun
    compounds?
  • Are semantic types and relationships good
    descriptors? Are they useful for disambiguation
    and classification?

24
Mapping of Noun Compounds
NC peptide CRF receptor antagonists C0030956C0010132C0597357C0243076 Amino Acid, Peptide, or ProteinHormoneReceptorPharmacologic Substance A1.4.1.2.1.7A1.4.1.1.3.2A1.4.1.1.3.6A1.4.1.1.1 rel_12.1 (Amino Acid, Peptide, or Protein, Hormone) interacts_with A1.4.1.2.1.7 R3.1.5 A1.4.1.1.3.2 rel_13.1 (Amino Acid, Peptide, or Protein, Receptor) interacts_with A1.4.1.2.1.7 R3.1.5 A1.4.1.1.3.6 rel_14.1 (Amino Acid, Peptide, or Protein, Pharmacologic Substance) interacts_with A1.4.1.2.1.7 R3.1.5 A1.4.1.1.1 rel_23.1 (Hormone, Receptor) interacts_with A1.4.1.1.3.2 R3.1.5 A1.4.1.1.3.6 rel_24.1 (Hormone, Pharmacologic Substance) interacts_with A1.4.1.1.3.2 R3.1.5 A1.4.1.1.1 rel_34.1 (Receptor, Pharmacologic Substance) interacts_with A1.4.1.1.3.6 R3.1.5 A1.4.1.1.1
25
Mapping of Noun Compounds
NC day hospital treatment C0439228C0019994C0039798,C0087111 Temporal ConceptHealth Care Related OrganizationFunctional ConceptTherapeutic or Preventive Procedure A2.1.1A2.7.1A2.1.4B1.3.1.3 rel_12.1 (Temporal Concept, Health Care Related Organization) NOT found in SemNet rel_13.1 (Temporal Concept, Functional Concept) NOT found in SemNet rel_13.2 (Temporal Concept, Therapeutic or Preventive Procedure) NOT found in SemNet rel_23.1 (Health Care Related Organization, Functional Concept) NOT found in SemNet rel_23.2 (Health Care Related Organization, Therapeutic or Preventive Procedure) location_of R2.1
26
Mapping of Noun Compounds
NC brain serotonin metabolism C0006104C0036751C0025519,C0025520 Body Part, Organ, or Organ ComponentNeuroreactive Substance or Biogenic AmineOrganism FunctionFunctional Concept A1.2.3.1A1.4.1.1.3.1B2.2.1.1.1A2.1.4 rel_12.1 (Body Part, Organ, or Organ Component, Neuroreactive Substance or Biogenic Amine) produces R3.2.1 rel_13.1 (Body Part, Organ, or Organ Component, Organism Function) location_of R2.1 rel_13.2 (Body Part, Organ, or Organ Component, Functional Concept) NOT found in SemNet rel_23.1 (Neuroreactive Substance or Biogenic Amine, Organism Function) disrupts R3.1.3 rel_23.2 (Neuroreactive Substance or Biogenic Amine, Functional Concept) NOT found in SemNet
27
Mapping Words - Semantic Types, Semantic
Relationships
  • Semantic types correctly assigned (on 246 nc, 738
    nouns) 59
  • Semantic types disambiguated by the relationships
  • Doesnt disambiguate 42.7
  • Disambiguates wrong 17.3
  • Disambiguates correctly 40

28
(Some of) Future Work
  • Explore in more depth UMLS sources
  • What form the best basis for automatic semantic
    interpretation of noun phrases?
  • Semantic types?
  • Metathesaurus concepts?(and what parts of them)
  • Just MeSH concepts?
  • Machine Learning algorithms to help choose a good
    representation of medical terms

29
Future Work
  • Machine learning algorithms for classification
  • Can we (and how) generalize patterns found for
    noun compounds to other syntactic structures?
  • How can we best formally represent semantics?
  • How can we combine symbolic rules with
    statistical methods?
  • How can we deal with non medical words?
  • Can the system help us disambiguate them?
  • Should we use other ontologies (ex WordNet)?
Write a Comment
User Comments (0)
About PowerShow.com