Title: Automatic Identification of Treatment Relations for Medical Ontology Learning: An Exploratory Study
1Automatic Identification of Treatment Relations
for Medical Ontology Learning An Exploratory
Study
- Chew-Hung Lee, Christopher Khoo
- Jin-Cheon Na
Knowledge Organization Discovery Research
Group Division of Information Studies School of
Communication Information Nanyang Technological
University, Singapore
2Background
- An ontology is an explicit specification of a
conceptualization (Gruber, 1993) - contains the knowledge found in domain
- concepts, relations and axioms
- E.g., concepts drug, disease relation treats
- An ontology forms the foundation for the Semantic
Web - It is also important as classification systems in
knowledge management
3Motivation
- Creation of ontologies is non-trivial
- Analysis of domain sources
- Background knowledge
- Obtaining consensus among users
- Current methods manually enumerate concepts and
relations - Labor intensive
- Give rise to inconsistencies
- Not suitable for developing large ontology
4Automatic Ontology Learning
- Proposed work
- is to develop automatic method to build an
ontology from document collection - uses a small seed ontology (i.e., UMLS
metathesaurus and semantic network) and enriches
it with semantic relations identified after
analyzing domain texts
5UMLS
- Medical knowledge base maintained by National
Library of Medicine (USA) - UMLS Metathesaurus
- contains biomedical concepts and terms from many
controlled vocabularies and classification
systems used in medical information systems - UMLS semantic network
- specifies set of basic semantic types that may be
assigned to concepts in the UMLS Metathesaurus - specifies set of relationships that may hold
between semantic types
6UMLS Semantic Network (from UMLS Knowledge
Source)
7Related Works
- Maedche (2002) and Navligli, Velardi Gangemi
(2003) worked on semi-automatic methods to
extract concepts and relations - Build ontology from broad domain documents, such
as travel related documents - Blake and Pratt (2001) mined semantic
relationships among terms from medical texts - Khoo, Chan Niu (2002) looked for causal
relations by matching graphical patterns in
syntactic parse trees
8Early Experiment
- Goal
- See the effectiveness of UMLS semantic network to
identify semantic relations between pairs of
related concepts - Domain
- Focused on colon cancer treatment
- Medical abstracts obtained from MedLine
- A specialized digital library maintained by the
National Library of Medicine
9Early Experiment
- Approach
- Important terms were extracted from medical
abstracts and mapped to medical concepts in UMLS
Metathesaurus - Association rule mining applied to mapped
concepts to find associated concept pairs - Infer semantic relations in associated concept
pairs using UMLS semantic network
10Early Experiment (Ontology Learning Process)
11Early Experiment Results
- Able to infer semantic relations 68 of the time
- 34 rules were generated after filtering rules
containing human, mice and rats - 11 rules without tags (32)
- 4 rules with single tag (12)
- Leucovorin/administrationdosage interact_with
Fluorouracil/administrationdosage (with a
support of 3 and a confidence of 100) - 19 rules with multiple tags (56)
- Method could not distinguish a few possible
relation types (e.g. treat, cause, etc.) - Suggest the use of natural language processing
(NLP) to improve accuracy in identifying relations
12Current Study
- Goal
- Apply Information Extraction (IE) technique to
identify semantic relation between identified
pairs of concepts - Develop a method (automatic and manual) for
constructing patterns for identifying treatment
relations expressed in text (not to extract
concepts)
13Methodology
- Identify sentences containing a reference to a
drug as well as to a disease - Most such sentences express a treatment relation
between the drug and the disease. - Construct patterns
- Use association rule mining to identify
frequently occurring word patterns in the
sentences - Manual construction of extraction patterns from
the sentences
14Data Preparation
- 500 records in area of colonic neoplasms/drug
therapy downloaded from MedLine (National
Library of Medicine) - 408 of these records contained medical abstracts
- Each abstract was segmented into sentences
- Each sentence was passed into MMTx to extract
UMLS concepts - Sentences containing a concept relating to
disease and/or pharmacologic substance (i.e.,
drug) were identified
15Data Preparation
- Abstracts contained at least one sentence with
both disease concept and reference to a drug
categorized as good abstracts (108 abstracts) - 211 drugdisease sentences in good abstract used
in constructing extraction patterns - Remainder 300 abstracts categorized as bad
abstracts
16Examples of drugdisease Sentences
- We report a case of irinotecan-resistant colon
cancer responding to chronotherapy with
oxaliplatin (L-OHP), 5-FU, l-LV (l-Leucovorin). - These results indicate that chronomodulated 5-FU
and LV with L-OHP therapy could be an effective
regimen for cases of irinotecan-resistant colon
cancer.
17Application of Apriori Algorithm
- Each drugdisease sentence divided into
individual word tokens - Punctuation marks, prepositions, determiners,
conjunctions, disjunctions, pronoun and numbers
were removed. - Apriori algorithm (i.e., association rule mining)
applied to dataset using Clementine data mining
software - Parameters
- Minimum support 2
- Minimum confidence 80
- Results
- 281 rules generated
18Top Ten Rules Using Normalized Chi Square
19Results Discussion
- Few terms signified treatment relation
- Statistical association measures not enough to
construct extraction patterns - Study using manually constructed extraction
patterns
20Extraction Patterns
- 224 extraction patterns were manually constructed
from 211 drugdisease sentences. - ranging from single words to phrases with
embedded wildcard tokens - Grouped into semantic categories
- Administration of treatment, e.g. exposure to,
use of, using, clinically used, administered,
receiving treatment with - Treatment dosage, e.g. low-dose, dose of, dosage
schedule - Mortality and survival, e.g. mortality, death
rate, survival benefit, extends the survival - Therapy, e.g. chemotherapy, treatment, regimen,
adjuvant, drug, pro-drug - Clinical trial, e.g. tested on, feasibility
trial, clinical trial
21Extraction Patterns
- Grouped into semantic categories (Continued.)
- Effect, e.g. outcome, responsive, influence,
results, sensitivity, effective. Words referring
to an effect can be subdivided into - Agent of effect, e.g. agents, anti-cancer agent
- Target of effect, e.g. targeting, targeted
- Effect action, e.g. active in
- Effect against something, e.g. anti-cancer,
anti-tumor, antagonist - Effect in controlling or inhibiting something,
e.g. controlling, inhibition, inhibitor,
cytostatic - Effect in decreasing or increasing something,
e.g. impaired, decrease, reduce, regression,
remission, increase, elevated - Effect in killing something, e.g. kill, apoptosis
inducing, cytotoxic - Good effect, e.g. beneficial, useful, benefit,
improve, promising - Therapeutic effect, e.g. treat, curatively,
clinical, clinically - Free of disease, e.g. disease-free,
recurrence-free - Interaction effect, e.g. synergistic, modulation
22Evaluation of Patterns
- Preliminary evaluation of the effectiveness of
the patterns for identifying treatment relations
using 30 good abstracts sample
23Results Discussion
- Recall is at least 60
- Precision is low especially for sentences with
disease only - Patterns also applied to sentences from 30 bad
abstracts - Precision of 37.3 and recall of 69.1 were
obtained - Initial assessment is that bad abstracts report
more theoretical studies that are not directly
focused on treatments
24Additional Preliminary Findings
- Match patterns to other medical domains data set
- Colon Cancer
- 162 matched sentences out of 289 sentences (58)
- Breast Cancer
- 453 matched sentences out of 1134 sentences (40)
25Top six patterns matched
- Colon Cancer
- chemotherapy (37)
- adjuvant (33)
- treated (21)
- therapy (18)
- adjuvant chemotherapy (13)
- survival benefit (11)
- Breast Cancer
- chemotherapy (107)
- therapy (95)
- adjuvant (42)
- treated (41)
- received (29)
- Treated with (29)
26Conclusion
- Explored semi-automatic approach to constructing
linguistic patterns - Association rule mining did not yield useful
patterns - Suggest statistical association measures may be
used in combination with syntactic and semantic
constraints - The extraction patterns manually constructed
yielded promising results. - Still need to improve the patterns through error
analysis - Apply the patterns to other medical domains like
breast cancer, heart disease and AIDS - Intend to develop a method for identifying
treatment relations across sentences
27Error Analysis
- From true negative sentences (sentences
containing a treatment relation that were not
identified by the patterns), 16 additional
patterns were identified - Six of which are new patterns that are not found
in the original patterns list - Four are spelling variations of existing patterns
(e.g. anti-tumour vs. anti-tumor) - Four were parts of existing patterns
- Two were similar to existing patterns either
through rearrangement of word order or have an
important word in common
28Error Analysis
- From False positive sentences (sentences
identified by the patterns but do not contain a
treatment relation), majority of sentences
contained no specific reference to drug or
treatment - Two sentences referred to treatment but no useful
information could be extracted - The treatment was effective and the lesion
disappeared. - There was a massive therapeutic effect without
side effects. - Several sentences contained a reference to a
treatment described in other sentences in the
abstract. - Other sentences described results of theoretical
study, diagnosis, schedule of chemotherapy, etc.