Title: Semantic Interpretation of Medical Text
1Semantic Interpretation of Medical Text
- Barbara Rosario, SIMS
- Steve Tu, UC Berkeley
- Advisor Marti Hearst, SIMS
2Semantic Interpretation of Medical Text
- More accurate representation of the content of
the input text - Enhance text with information (concept,
relationships) drawn from a medical knowledge
source - Determine semantic meaning of the words (and
bigger constructs) and the relationships between
them.
3Combine Statistical and Symbolic Methods
- Use of knowledge bases, semantic hierarchies,
medical knowledge, rules - Use of statistic methods and machine learning
techniques
4Statistical methods
- Disambiguation
- Detection of semantic patterns
- Classification of semantically related constructs
- Degrees (weights, probabilities)
5First Experiment Noun Compounds and MeSH
- Interpretation of noun compounds is crucially
semantic - Noun compounds extracted from a collection of
titles and abstracts of medical journals found in
Medline - MeSH (Medical Subject Headings) concepts for the
labels
6Preprocessing
Tagger
Noun Compound Extraction
MeSH
Semantic Labeling
Output Semantic Labelled Noun Compounds
7MeSH Tree Structures (main)
- 1. Anatomy A
- 2. Organisms B
- 3. Diseases C
- 4. Chemicals and Drugs D
- 5. Analytical, Diagnostic and Therapeutic
Techniques and Equipment E - 6. Psychiatry and Psychology F
- 7. Biological Sciences G
- 8. Physical Sciences H
- 9. Anthropology, Education, Sociology and
Social Phenomena I - 10. Technology and Food and Beverages J
- 11. Humanities K
- 12. Information Science L
- 13. Persons M
- 14. Health Care N
- 15. Geographic Locations Z
8MeSH Tree Structures (node A expanded)
- 1. Anatomy A
- Body Regions A01
- Musculoskeletal System A02
- Digestive System A03
- Respiratory System A04
- Urogenital System A05
- Endocrine System A06
- Cardiovascular System A07
- Nervous System A08
- Sense Organs A09
- Tissues A10
- Cells A11
- Fluids and Secretions A12
- Animal Structures A13
- Stomatognathic System A14
- Hemic and Immune Systems A15
- Embryonic Structures A16
- Body Regions A01
- Abdomen A01.047
- Groin A01.047.365
- Inguinal Canal A01.047.412
- Peritoneum A01.047.596
- Retroperitoneal SpaceA01.047.681
- Umbilicus A01.047.849
- Axilla A01.133
- Back A01.176
- Breast A01.236
- Buttocks A01.258
- Extremities A01.378
- Head A01.456
- Neck A01.598
- Pelvis A01.673
- Perineum A01.719
- Skin A01.835
- Thorax A01.911
- Viscera A01.960
9Mapping Nouns to MeSH Concepts
- Ex migraine headache recurrence
migraine C10.228.140.546.800.525 C10.228.140.300.800.542 C14.907.253.937.542 headache C23.888.592.612.441 C10.597.617.470 C23.888.646.487 recurrence C23.550.291.937
10More Nouns Compounds
- migraine headache recurrence
- C10.228.140.546.800.525 C23.888.592.612.441
C23.550.291.937 - blood plasma perfusion
- A12.207.152 A15.145.693 E05.680
- migraine headache pain
- C10.228.140.546.800.525 C23.888.592.612.441
G11.561.796.444 - brain stem neurons
- A08.186.211 E05.595.402.541.250 A08.663
- rat liver mitochondria
- B02.649.865.635.560 A03.620 A11.368.702.564
- plasma arginine vasopressin
- A15.145.693 D12.125.095.104 D06.472.734.692.781
- rat thyroid cells
- B02.649.865.635.560 A06.407.900 A11
- growth hormone secretion
- G07.553.481 D27.505.440.472 A12.200
- blood urea nitrogen
- A12.207.152 D02.948 D01.362.625
- breast cancer cells
- A01.236 C04 A11
- cancer cell lines
- C04 A11 G05.331.599.110.708.330.800.400
11Attachment and Semantic Interpretation
- Attachment classification
- acute migraine treatment N N N (LA)
- intra-nasal migraine treatment N N N (RA)
- To bootstrap semantic interpretation
- Decision tree (Quinlan)
12Levels of Descriptions
- migraine headache recurrence (LA)
- C10.228.140.546.800.525 C23.888.592.612.441
C23.550.291.937
Feature vector
Only Tree C, C, C
Level 1 C, 10, C, 23, C, 23
Level 2 C, 10.228, C, 23.888, C, 23.550
Level 3 C, 10.228.140, C, 23.888.592, C, 23.550.291
Level 4 C, 10.228.140.546, C, 23.888.592.612, C, 23.550.291.937
13Decision Tree Classification
Training before pruning Training after pruning Testing before pruning Testing after pruning
Only Tree 15.8 16.4 17.3 17.3
Level 1 11.2 11.8 15.4 15.4
Level 2 7.9 8.6 21.2 17.3
Level 3 7.9 10.5 26.9 17.3
Level 4 8.6 9.9 25.0 19.2
14Expressiveness of Decision Trees
- first noun tree B ra (33.0/3.7)
- first noun tree E ra (2.0/1.6)
- first noun tree F la (0.0)
- first noun tree G la (4.0/0.3)
- first noun tree A
- second noun tree B la (0.0)
- second noun tree D la (4.0/0.3)
- second noun tree E la (10.0/0.4)
- second noun tree F la (0.0)
- second noun tree G la (6.0/1.6)
- second noun tree A
- first tree position lt 4 ra (7.0/1.6)
- first tree position gt 4 la (36.0/5.8)
- second noun tree C
- third noun tree A ra (9.0/0.3)
- third noun tree B la (0.0)
- third noun tree D la (1.0/0.3)
- third noun tree E la (5.0/0.3)
- third noun tree F la (0.0)
15(No Transcript)
16Semantic Interpretation
- Use decision tree paths for the detection of
clusters of noun compounds with the same semantic
interpretation -
17Ex ACA ltanatomygt ltdiseasegt ltanatomygt
breast cancer cells A01.236 C04 A11 ra  bladder cancer cells A05.810.161 C04 A11 ra  colon carcinoma cells A03.492.411.495 C04.557.470 A11 ra  prostate tumor cells A10.336.707 C04 A11 ra  prostate cancer tissue A10.336.707 C04 A10 ra lung cancer cells A04.411 C04 A11 ra  colon cancer cells A03.492.411.495.356 C04 A11 ra  brain tumor tissue A08.186.211 C04 A10 ra  colon cancer tissues A03.492.411.495.356 C04 A10 ra bladder tumor cells A05.810.161 C04 A11 ra
Interpretation ltParty of Bodygt noun3 exhibits ltDiseasegt noun2 in ltlocationgt noun1
18Ex ACE ltanatomygt ltdiseasegt ltAnalytical,
Diagnostic and Therapeutic Techniques and
Equipmentgt
muscle disease diagnosis A10.690 C23.550.288 E01 la  breast cancer prognosis A01.236 C04 E01.789 la  breast cancer treatment A01.236 C04 E02 la  hip fracture treatment A01.378.592 C21.866.405 E02 la  cell cancer treatment A11 C04 E02 la  brain tumor treatment A08.186.211 C04 E02 la colon adenocarcinoma xenograft A03.492.411.495.356 C04.557.470.200.025 E04.936.764  colon carcinoma xenograft A03.492.411.495.356 C04.557.470.200 E04.936.764  colon carcinoma xenografts A03.492.411.495.356 C04.557.470.200 E04.936.764  neck cancer xenografts A01.598 C04 E04.936.764
Interpretation 1 ltDiagnosis E01gtnoun3 diagnoses ltDiseasegt noun2 in ltlocationgt noun1 2 ltTherapeutics E02 Surgical ProcedureE04gt noun3 treats ltDiseasegt noun2 in ltlocationgt noun1
19From MeSH to UMLS
- Unified Medical Language System, project at U.S
National Library of Medicine - 3 UMLS Knowledge Sources
- Metathesaurus
- Semantic Network
- SPECIALIST lexicon and programs
20Metathesaurus
- Most extensive of UMLS sources
- 730,000 concepts representing more then 1,500,000
strings in over 60 vocabularies and
classifications - Organized by concept or meaning.
- In essence, its purpose is to link alternative
names and views of the same concept together and
to identify useful relationships between
different concepts. - Relationships in the Metathesaurus come from the
sources themselves or are created by the
Metathesaurus editors.
21Semantic Network
- Consistent categorization of all concepts
represented in the UMLS Metathesaurus and the
important relationships between them. - Every concept has been assigned a semantic type.
- The semantic types (134) are the nodes in the
Network, and the relationships between them are
the links (54) - High level semantic structure
22 "Biologic Function" Hierarchy
23Noun Compounds, again
- Very preliminary studies
- Can we use the information of the Semantic Net
for the semantic interpretation on the noun
compounds? - Are semantic types and relationships good
descriptors? Are they useful for disambiguation
and classification?
24Mapping of Noun Compounds
NC peptide CRF receptor antagonists C0030956C0010132C0597357C0243076 Amino Acid, Peptide, or ProteinHormoneReceptorPharmacologic Substance A1.4.1.2.1.7A1.4.1.1.3.2A1.4.1.1.3.6A1.4.1.1.1 rel_12.1 (Amino Acid, Peptide, or Protein, Hormone) interacts_with A1.4.1.2.1.7 R3.1.5 A1.4.1.1.3.2 rel_13.1 (Amino Acid, Peptide, or Protein, Receptor) interacts_with A1.4.1.2.1.7 R3.1.5 A1.4.1.1.3.6 rel_14.1 (Amino Acid, Peptide, or Protein, Pharmacologic Substance) interacts_with A1.4.1.2.1.7 R3.1.5 A1.4.1.1.1 rel_23.1 (Hormone, Receptor) interacts_with A1.4.1.1.3.2 R3.1.5 A1.4.1.1.3.6 rel_24.1 (Hormone, Pharmacologic Substance) interacts_with A1.4.1.1.3.2 R3.1.5 A1.4.1.1.1 rel_34.1 (Receptor, Pharmacologic Substance) interacts_with A1.4.1.1.3.6 R3.1.5 A1.4.1.1.1
25Mapping of Noun Compounds
NC day hospital treatment C0439228C0019994C0039798,C0087111 Temporal ConceptHealth Care Related OrganizationFunctional ConceptTherapeutic or Preventive Procedure A2.1.1A2.7.1A2.1.4B1.3.1.3 rel_12.1 (Temporal Concept, Health Care Related Organization) NOT found in SemNet rel_13.1 (Temporal Concept, Functional Concept) NOT found in SemNet rel_13.2 (Temporal Concept, Therapeutic or Preventive Procedure) NOT found in SemNet rel_23.1 (Health Care Related Organization, Functional Concept) NOT found in SemNet rel_23.2 (Health Care Related Organization, Therapeutic or Preventive Procedure) location_of R2.1
26Mapping of Noun Compounds
NC brain serotonin metabolism C0006104C0036751C0025519,C0025520 Body Part, Organ, or Organ ComponentNeuroreactive Substance or Biogenic AmineOrganism FunctionFunctional Concept A1.2.3.1A1.4.1.1.3.1B2.2.1.1.1A2.1.4 rel_12.1 (Body Part, Organ, or Organ Component, Neuroreactive Substance or Biogenic Amine) produces R3.2.1 rel_13.1 (Body Part, Organ, or Organ Component, Organism Function) location_of R2.1 rel_13.2 (Body Part, Organ, or Organ Component, Functional Concept) NOT found in SemNet rel_23.1 (Neuroreactive Substance or Biogenic Amine, Organism Function) disrupts R3.1.3 rel_23.2 (Neuroreactive Substance or Biogenic Amine, Functional Concept) NOT found in SemNet
27Mapping Words - Semantic Types, Semantic
Relationships
- Semantic types correctly assigned (on 246 nc, 738
nouns) 59 - Semantic types disambiguated by the relationships
- Doesnt disambiguate 42.7
- Disambiguates wrong 17.3
- Disambiguates correctly 40
28(Some of) Future Work
- Explore in more depth UMLS sources
- What form the best basis for automatic semantic
interpretation of noun phrases? - Semantic types?
- Metathesaurus concepts?(and what parts of them)
- Just MeSH concepts?
- Machine Learning algorithms to help choose a good
representation of medical terms
29Future Work
- Machine learning algorithms for classification
- Can we (and how) generalize patterns found for
noun compounds to other syntactic structures? - How can we best formally represent semantics?
- How can we combine symbolic rules with
statistical methods? - How can we deal with non medical words?
- Can the system help us disambiguate them?
- Should we use other ontologies (ex WordNet)?