Automatic Identification of Treatment Relations for Medical Ontology Learning: An Exploratory Study - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Automatic Identification of Treatment Relations for Medical Ontology Learning: An Exploratory Study

Description:

Therapy, e.g. chemotherapy, treatment, regimen, adjuvant, drug, pro-drug ... adjuvant chemotherapy (13) survival benefit (11) Breast Cancer. chemotherapy (107) ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 29
Provided by: leeche
Category:

less

Transcript and Presenter's Notes

Title: Automatic Identification of Treatment Relations for Medical Ontology Learning: An Exploratory Study


1
Automatic Identification of Treatment Relations
for Medical Ontology Learning An Exploratory
Study
  • Chew-Hung Lee, Christopher Khoo
  • Jin-Cheon Na

Knowledge Organization Discovery Research
Group Division of Information Studies School of
Communication Information Nanyang Technological
University, Singapore
2
Background
  • An ontology is an explicit specification of a
    conceptualization (Gruber, 1993)
  • contains the knowledge found in domain
  • concepts, relations and axioms
  • E.g., concepts drug, disease relation treats
  • An ontology forms the foundation for the Semantic
    Web
  • It is also important as classification systems in
    knowledge management

3
Motivation
  • Creation of ontologies is non-trivial
  • Analysis of domain sources
  • Background knowledge
  • Obtaining consensus among users
  • Current methods manually enumerate concepts and
    relations
  • Labor intensive
  • Give rise to inconsistencies
  • Not suitable for developing large ontology

4
Automatic Ontology Learning
  • Proposed work
  • is to develop automatic method to build an
    ontology from document collection
  • uses a small seed ontology (i.e., UMLS
    metathesaurus and semantic network) and enriches
    it with semantic relations identified after
    analyzing domain texts

5
UMLS
  • Medical knowledge base maintained by National
    Library of Medicine (USA)
  • UMLS Metathesaurus
  • contains biomedical concepts and terms from many
    controlled vocabularies and classification
    systems used in medical information systems
  • UMLS semantic network
  • specifies set of basic semantic types that may be
    assigned to concepts in the UMLS Metathesaurus
  • specifies set of relationships that may hold
    between semantic types

6
UMLS Semantic Network (from UMLS Knowledge
Source)
7
Related Works
  • Maedche (2002) and Navligli, Velardi Gangemi
    (2003) worked on semi-automatic methods to
    extract concepts and relations
  • Build ontology from broad domain documents, such
    as travel related documents
  • Blake and Pratt (2001) mined semantic
    relationships among terms from medical texts
  • Khoo, Chan Niu (2002) looked for causal
    relations by matching graphical patterns in
    syntactic parse trees

8
Early Experiment
  • Goal
  • See the effectiveness of UMLS semantic network to
    identify semantic relations between pairs of
    related concepts
  • Domain
  • Focused on colon cancer treatment
  • Medical abstracts obtained from MedLine
  • A specialized digital library maintained by the
    National Library of Medicine

9
Early Experiment
  • Approach
  • Important terms were extracted from medical
    abstracts and mapped to medical concepts in UMLS
    Metathesaurus
  • Association rule mining applied to mapped
    concepts to find associated concept pairs
  • Infer semantic relations in associated concept
    pairs using UMLS semantic network

10
Early Experiment (Ontology Learning Process)
11
Early Experiment Results
  • Able to infer semantic relations 68 of the time
  • 34 rules were generated after filtering rules
    containing human, mice and rats
  • 11 rules without tags (32)
  • 4 rules with single tag (12)
  • Leucovorin/administrationdosage interact_with
    Fluorouracil/administrationdosage (with a
    support of 3 and a confidence of 100)
  • 19 rules with multiple tags (56)
  • Method could not distinguish a few possible
    relation types (e.g. treat, cause, etc.)
  • Suggest the use of natural language processing
    (NLP) to improve accuracy in identifying relations

12
Current Study
  • Goal
  • Apply Information Extraction (IE) technique to
    identify semantic relation between identified
    pairs of concepts
  • Develop a method (automatic and manual) for
    constructing patterns for identifying treatment
    relations expressed in text (not to extract
    concepts)

13
Methodology
  • Identify sentences containing a reference to a
    drug as well as to a disease
  • Most such sentences express a treatment relation
    between the drug and the disease.
  • Construct patterns
  • Use association rule mining to identify
    frequently occurring word patterns in the
    sentences
  • Manual construction of extraction patterns from
    the sentences

14
Data Preparation
  • 500 records in area of colonic neoplasms/drug
    therapy downloaded from MedLine (National
    Library of Medicine)
  • 408 of these records contained medical abstracts
  • Each abstract was segmented into sentences
  • Each sentence was passed into MMTx to extract
    UMLS concepts
  • Sentences containing a concept relating to
    disease and/or pharmacologic substance (i.e.,
    drug) were identified

15
Data Preparation
  • Abstracts contained at least one sentence with
    both disease concept and reference to a drug
    categorized as good abstracts (108 abstracts)
  • 211 drugdisease sentences in good abstract used
    in constructing extraction patterns
  • Remainder 300 abstracts categorized as bad
    abstracts

16
Examples of drugdisease Sentences
  • We report a case of irinotecan-resistant colon
    cancer responding to chronotherapy with
    oxaliplatin (L-OHP), 5-FU, l-LV (l-Leucovorin).
  • These results indicate that chronomodulated 5-FU
    and LV with L-OHP therapy could be an effective
    regimen for cases of irinotecan-resistant colon
    cancer.

17
Application of Apriori Algorithm
  • Each drugdisease sentence divided into
    individual word tokens
  • Punctuation marks, prepositions, determiners,
    conjunctions, disjunctions, pronoun and numbers
    were removed.
  • Apriori algorithm (i.e., association rule mining)
    applied to dataset using Clementine data mining
    software
  • Parameters
  • Minimum support 2
  • Minimum confidence 80
  • Results
  • 281 rules generated

18
Top Ten Rules Using Normalized Chi Square
19
Results Discussion
  • Few terms signified treatment relation
  • Statistical association measures not enough to
    construct extraction patterns
  • Study using manually constructed extraction
    patterns

20
Extraction Patterns
  • 224 extraction patterns were manually constructed
    from 211 drugdisease sentences.
  • ranging from single words to phrases with
    embedded wildcard tokens
  • Grouped into semantic categories
  • Administration of treatment, e.g. exposure to,
    use of, using, clinically used, administered,
    receiving treatment with
  • Treatment dosage, e.g. low-dose, dose of, dosage
    schedule
  • Mortality and survival, e.g. mortality, death
    rate, survival benefit, extends the survival
  • Therapy, e.g. chemotherapy, treatment, regimen,
    adjuvant, drug, pro-drug
  • Clinical trial, e.g. tested on, feasibility
    trial, clinical trial

21
Extraction Patterns
  • Grouped into semantic categories (Continued.)
  • Effect, e.g. outcome, responsive, influence,
    results, sensitivity, effective. Words referring
    to an effect can be subdivided into
  • Agent of effect, e.g. agents, anti-cancer agent
  • Target of effect, e.g. targeting, targeted
  • Effect action, e.g. active in
  • Effect against something, e.g. anti-cancer,
    anti-tumor, antagonist
  • Effect in controlling or inhibiting something,
    e.g. controlling, inhibition, inhibitor,
    cytostatic
  • Effect in decreasing or increasing something,
    e.g. impaired, decrease, reduce, regression,
    remission, increase, elevated
  • Effect in killing something, e.g. kill, apoptosis
    inducing, cytotoxic
  • Good effect, e.g. beneficial, useful, benefit,
    improve, promising
  • Therapeutic effect, e.g. treat, curatively,
    clinical, clinically
  • Free of disease, e.g. disease-free,
    recurrence-free
  • Interaction effect, e.g. synergistic, modulation

22
Evaluation of Patterns
  • Preliminary evaluation of the effectiveness of
    the patterns for identifying treatment relations
    using 30 good abstracts sample

23
Results Discussion
  • Recall is at least 60
  • Precision is low especially for sentences with
    disease only
  • Patterns also applied to sentences from 30 bad
    abstracts
  • Precision of 37.3 and recall of 69.1 were
    obtained
  • Initial assessment is that bad abstracts report
    more theoretical studies that are not directly
    focused on treatments

24
Additional Preliminary Findings
  • Match patterns to other medical domains data set
  • Colon Cancer
  • 162 matched sentences out of 289 sentences (58)
  • Breast Cancer
  • 453 matched sentences out of 1134 sentences (40)

25
Top six patterns matched
  • Colon Cancer
  • chemotherapy (37)
  • adjuvant (33)
  • treated (21)
  • therapy (18)
  • adjuvant chemotherapy (13)
  • survival benefit (11)
  • Breast Cancer
  • chemotherapy (107)
  • therapy (95)
  • adjuvant (42)
  • treated (41)
  • received (29)
  • Treated with (29)

26
Conclusion
  • Explored semi-automatic approach to constructing
    linguistic patterns
  • Association rule mining did not yield useful
    patterns
  • Suggest statistical association measures may be
    used in combination with syntactic and semantic
    constraints
  • The extraction patterns manually constructed
    yielded promising results.
  • Still need to improve the patterns through error
    analysis
  • Apply the patterns to other medical domains like
    breast cancer, heart disease and AIDS
  • Intend to develop a method for identifying
    treatment relations across sentences

27
Error Analysis
  • From true negative sentences (sentences
    containing a treatment relation that were not
    identified by the patterns), 16 additional
    patterns were identified
  • Six of which are new patterns that are not found
    in the original patterns list
  • Four are spelling variations of existing patterns
    (e.g. anti-tumour vs. anti-tumor)
  • Four were parts of existing patterns
  • Two were similar to existing patterns either
    through rearrangement of word order or have an
    important word in common

28
Error Analysis
  • From False positive sentences (sentences
    identified by the patterns but do not contain a
    treatment relation), majority of sentences
    contained no specific reference to drug or
    treatment
  • Two sentences referred to treatment but no useful
    information could be extracted
  • The treatment was effective and the lesion
    disappeared.
  • There was a massive therapeutic effect without
    side effects.
  • Several sentences contained a reference to a
    treatment described in other sentences in the
    abstract.
  • Other sentences described results of theoretical
    study, diagnosis, schedule of chemotherapy, etc.
Write a Comment
User Comments (0)
About PowerShow.com