Title: A Knowledge-based Medical Digital Library
1A Knowledge-based Medical Digital Library
- Wesley W. ChuComputer Science Dept,
- UCLA
- wwc_at_cs.ucla.edu
2Data in a Medical Digital Library
- Structured data (patient lab data, demographic
data,)--CoBase - Images (X rays, MRI, CT scans)--KMeD
- Free-text--KMeX
- Patient reports
- Teaching files
- Literature
- News articles
3System Overview
query
Medical Digital Library
relevantinformation
structured data (e.g., lab results, patient
demo-graphic data)
image data (e.g., X-ray images, CT images, etc.)
free-text data (e.g., medical literature, news
articles, etc.)
4Benefits of knowledge based Medical Digital
library
- Content Based Information Retrieval
- Transforms patient records into a sea of
information sources - Provides scenario-specific information for
patient care, medical research and education.
5Characteristics of Medical Queries
- Multimedia
- Temporal
- Evolutionary
- Spatial
- Imprecise
6 CoBase Cooperatrive Database
www.cobase.cs.ucla.edu
- Use knowledge base to
- Derive Approximate Answers
- Answer Conceptual Queries
- Provide Associative Query Answers
7KB Type Abstraction Hierarchy (TAH)
- Using clustering technique to group similar
- Attribute values
- Image features
- Spatial relationships among objects
- Provides multi-level knowledge (conceptual)
representation
8Data mining for KB--TAH
- Clustering data of an attribute
- Value--difference between the exact value and the
returned approximate value - Frequency-- probability of occurrence for each
value - Can be extended to multiple attributes
9Type Abstraction Hierarchies forMedical Domain
10Generalization and Specialization in TAH
11Query Relaxation
12 Cooperative Querying for Medical Applications
- Query
- Find the treatment used for the tumor similar-to
(loc, size) X1 on 12 year-old Korean males. - Relaxed Query
- Find the treatment used for the tumor Class X on
preteen Asians. - Association
- The success rate, side effects, and cost of the
treatment.
13System Overview
query
Medical Digital Library
relevantinformation
structured data (e.g., lab test results, patient
demographic data)
image data (e.g., X-ray images, CT images, etc.)
free-text data (e.g., medical literature, news
articles, etc.)
14KMeD Retrival images by contentswww.kmed.cs.ucla
.edu
- PI Wesley Chu, Ph.D, Computer Science Department
- Co-PIs
- A. Cardenas, Ph.D, Computer Science Department
- Ricky Taira , Ph.D, School of Medicine
- ConsultantsDenies Aberle, M.D.
- C.M. Breant, Ph.D
- Graduate studentsAlex Bui
- Christina Chu
- John Dionisio
- T. PlattnerD. Johnson
- C. Hsu
- T. Ieong
15 KMeD Objectives
- Matching images based on features
- Processing of queries based on spatial
relationships among objects - Answering of imprecise queries
- Visual query interface
16KMeD Retrieval of Images by Features Content
- Features
- size, shape, texture, density, histology
- Spatial Relations
- angle of coverage, shortest distance, overlapping
ratio, contact ratio, relative direction - Evolution of Object Growth
- fusion, fission
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21 Knowledge-Based Image Model
Representation Level (features and content)
22Queries
Query Analysis and Feature Selection
Knowledge- Based Query Processing
Knowledge-Based Content Matching Via TAHs
Query Relaxation
Query Answers
23 User Model
- To customize users
- interest and preference, needs, and goals.
- e.g. query conditions, relaxation control,
etc. - User type
- Default Parameter Values
- Feature and Content Matching Policies
- Complete Match
- Partial Match
24 User Model (cont.)
- Relaxation Control Policies
- Relaxation Order
- Unrelaxable Object
- Preference List
- Measure for Ranking
- Triggering conditions
25 Query Preprocessing
- Segment and label contours for objects of
interest - Determine relevant features and spatial
relationships (e.g., location, containment,
intersection) of the selected objects - Organize the features and spatial relationships
of objects into a feature database - Classify the feature database into a Type
Abstraction Hierarchy (TAH)
26(No Transcript)
27Visual Query Language and Interface
- Point-click-drag interface
- Objects may be represented by icons
- Spatial relationships among objects are
represented graphically
28(No Transcript)
29 Visual Query Example
Retrieve brain tumor cases where a tumor is
located in the region as indicated in the picture
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34A KB Medical Digital Librarywww.cobase.cs.ucla.ed
u
query
Medical Digital Library
relevantinformation
structured data (e.g., lab test results, patient
demographic data)
image data (e.g., X-ray images, CT images, etc.)
free-text data (e.g., medical literature, news
articles, etc.)
35KMeXwww.cobase.cs.ucla.edu
- Project leader Wesley W. Chu
- ConsultantsHooshang Kangaloo, M.D.Denies
Aberle, M.D.
- Graduate studentsVictor Z. LiuWenlei
MaoQinghua Zou
36A Sample Patient Report
-
- Tissue Source
- LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE)
-
- FINAL DIAGNOSIS
- - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - - LUNG CANCER, SMALL CELL, STAGE II.
Tissue Source LUNG (FINE NEEDLE ASPIRATION)
(LEFT LOWER LOBE) FINAL DIAGNOSIS - LUNG
NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - LUNG CANCER, SMALL CELL, STAGE
II.
37Scenario-Specific Queries
- Queries that mention one or more scenarios
- E.g., keratoconus treatment lung cancer
diagnosis and complications - A scenario (e.g., treatment) a repeating
healthcare situation - gt60 medical queries are scenario specific
HMW90, HPH96, EOE99, EOG00, WMH01
38Scenario Specific Retrieval
Tissue Source LUNG (FINE NEEDLE ASPIRATION)
(LEFT LOWER LOBE) FINAL DIAGNOSIS - LUNG
NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - LUNG CANCER, SMALL CELL, STAGE
II.
39- Challenge I Indexing
- Challenge II Terms in the query are too general
- Challenge III Mismatch between terms in the
query and the documents
40IndexFinder http//fargo.homedns.org/umls/demo.asp
x
- Extract key information from clinical free texts
- Search relevant reports
- Search similar patients
- Medical KB (UMLS) provides standard medical
concepts - IndexFinder
- Extracts UMLS concepts from clinical texts
41 Previous Approaches
Free text
42Problems of Previous Approaches
- Concepts cannot be discovered if they are not in
a single noun phrase. - E.g. In second, third, and fourth ribs, Second
rib can not be discovered. - Difficult to scale to large text computing.
- Natural language processing requires significant
computing resources
43 Our Approach IndexFinder (Zou et.al 03)
Previous free text?UMLS
Suppose UMLS contains only Lung cancer
We would discard all words in the text except
lung and cancer.
44Knowledge-based approach
- Using the compact index data without using any
database system. - Permuting words in a sentence to generate UMLS
concept candidates. - Using filters to eliminate irrelevant concepts.
45 Concept Candidates Generation
- Assumptions
- Knowledge base provides a phrase table.
- Each phrase (concept) is a set of words.
- An input text T is represented as a set of words.
- Goal
- Combining words in T to generate concept
candidates - Example
- TD,E,F
- Answer 5
46 Eliminate irrelevant concepts
- Syntactic filter
- Limit the of word combinations within a
sentence. - Semantic filter
- Using semantic types (e.g. body part, disease,
treatment, diagnose) - Using the ISA relationship and filter out general
terms and keep the specific ones.
47 Comparison of Indexfinder with MetaMap
Input A small mass was found in the left hilum
of the lung.
48Summary of IndexFinder http//fargo.homedns.org/u
mls/demo.aspx
- An efficient method that maps from UMLS to free
text to extract concepts without using any
database system. - Syntactic and semantic filters are used to
eliminate irrelevant candidates. - IndexFinder is able to find more specific
concepts than NLP approaches. - IndexFinder is scalable and can be operated in
real time.
49Topic Directory
- Using indexing for document retrieval can not
provide - Standard vocabulary
- Cross reference among topics
- Scenario specific search
- Topic directory resolves these shortcomings by
dynamically clustering documents into knowledge
based topics based on user specified scenarios
50Challenge II Terms used in the query are too
general
- Scenario terms in a query,
- e.g., treatment vs.Specialized terms in
documents, - e.g., epikeratoplasty, contact lens, etc.
51Challenge II Terms used in the query are too
general
- Expanding the general terms in the query to
specific terms that are used in the document
Query lung cancer, diagnosis options
Query lung cancer, chest x-ray, bronchography,
Document the effectiveness of chest x-ray and
bronchography on patients with lung cancer
52The Mismatch Problem
- Scenario concepts are too general to match
specialized ones in relevant docs
Expanded Query keratoconus, treatment,contact
lens, epikeratoplasty, epikeratophakia
Document 1 The use of contact lens after
keratoconic epikeratoplasty
Query keratoconus, treatment
Document 2 Epikeratophakia for aphakia,
keratoconus, and myopia
53Deriving Scenario-Specific Relationships
contact lens
epikeratoplasty
keratoconus
treats
epikeratophakia
- More specific than term co-occurrence
relationships - Not directly available through a knowledge source
54Basic Idea
- Start from pairs of frequently co-occurring
concepts Qiu03, Jing94, Xu96 - Apply knowledge structures to filter out pairs
that are irrelevant to a given scenario, e.g.,
treatment
55Sample Co-Occurring Pairs
- Concepts most frequently co-occurring with
keratoconus
56UMLS The Knowledge Source
- Three major components
- The MetaThesaurus
- gt 800K medical concepts, ltID, multiple string
formsgt - E.g., ltC0022578, Keratoconus, Cornea
conicalgt - Used for detecting concepts from free text
- The Semantic Network
- 100 semantic types, 50 relations among types
- E.g., Disease or Syndrome, containing 44,000
concepts - Used for deriving scenario-specific relationships
- The SPECIALIST Lexicon
57Structure of The Knowledge Source
Disease or Syndrome
Pharmocological Substance
The Semantic Network
treats
insulin
keratoconus
The Meta-Thesaurus
acute hydrops keratoconus
lactase
58Fragment of The Semantic Network for Each Scenario
- E.g., the treatment scenario
Therapeuticor Preventive Procedure
treats
Disease or Syndrome
Pharmocological Substance
MedicalDevice
treats
treats
59 Filtering
corneal
central cornea
penetrating keratoplasty
acute hydrops
epikeratoplasty
keratoconus
griffonia
contact lens
60Knowledge-Based Query Expansion
- Original query ltckey, csgt
- ckey, a key concept, e.g., keratoconus
- cs, a set of scenario concepts, e.g., treatment
- ce, concepts having scenario-specific
relationships with ckey - ckey ? cs ? ce, e.g., keratoconus ? treats ?
contact lens - Expanded query ltckey, cs, cegt
- E.g., keratoconus, treatment, contact lens,
epikeratoplasty, epikeratophakia
61Statistical Expansion(Baseline for Comparison)
- Original query ltckey, csgt
- ce, co-occurring concepts
- ckey co-occurs with cee.g., keratoconus
co-occurs with corneal - Expanded query ltckey, cs, cegt
62Assigning Proper Weights to Expansion Terms
- Vector Space Model (VSM) for computing
query-document similarity - Both the query and the document as vectors of
terms - The weight for a single term might significantly
affect document ranking - ve weight of an expansion term ce
- Based on the co-occurrence between ce and ckey
- ve vkey co(ckey, ce)
- vkey weight of ckey computed from regular tfidf
weighting - co(ckey, ce) statistical co-occurrence between
ckey and ce
63Need for Weight Adjustments
- Weight adjustments needed to compensate for the
filtering
ce ve ce ve
fuchs dystrophy 0.289 penetrating keratoplasty 0.247
penetrating keratoplasty 0.247 epikeratoplasty 0.230
epikeratoplasty 0.230 epikeratophakia 0.119
corneal ectasia 0.168 keratoplasty 0.103
acute hydrops 0.165 contact lens 0.101
keratometry 0.133 thermokeratoplasty 0.092
corneal topography 0.132 button 0.067
corneal 0.130 secondary lens implant 0.057
aphakic corneal edema 0.122 fittings adapters 0.048
epikeratophakia 0.119 esthesiometer 0.043
statistical expansion
knowledge-based expansion
64The OHSUMED Testbed
- A testbed a benchmark query set, a corpus,
relevance judgments for each query - OHSUMED HBL94
- 57 scenario-specific queries
- e.g., keratoconus treatment thrombocytosis
treatment and diagnosis diagnostic and
theraputic work up of breast mass - 348K MEDLINE articles (title abstract), 1988
1992
65OHSUMED Scenarios
Scenario of Queries
treatment of a disease 28
diagnosis of a disease 11
prevention of a disease 2
differential diagnosis of a symptom/disease 12
pathophysiology of a disease 5
complications of a disease/medication 7
etiology of a disease 3
risk factors of a disease 3
prognosis of a disease 1
epidemiology of a disease 1
research of a disease 1
organisms of a disease 1
criteria of medication 3
when to perform a medication 1
preventive health care for a type of patients 1
66Comparison Under Different Expansion Sizes
- s expansion size
- Metric avgp
67Summary of Query Expansion
- Knowledge based approach selects more scenario
specific terms than statistical approach and
achieves better performance - Different quality of knowledge structure for
different scenarios yield different performance
improvements
68- Challenge I Indexing
- Challenge II Terms in the query are too general
- Challenge III Mismatch between terms used in the
query and the documents causes problems in
ranking of results
69Challenge III Mismatching between terms used in
query and documents
Query lung cancer,
?
?
?
Document 3 anti-cancerdrug combinations
Document 1 lung carcinoma
Document 2 lung neoplasm
70 Ranking query results
- Traditional approach
- Word Stem based Vector Space Model (VSM)
- Concept based VSM
- New approach
- Phrase based (word concept) VSM
71Phrase-based Vector Space Model (VSM)
Query lung cancer,
Query lung cancer,
lung cancer lung carcinoma
missing!!!
parent_of
anti-cancer drug combinations
Document lung carcinoma
Document lung neoplasm
Document anti-cancer drugcombinations
Document anti-cancer drugcombinations
lung neoplasm
Knowledge-source
72Phrase-based VSM Examples
(C0242379) lung cancer
(C0003393) anti cancer drug combin
73Retrieval Effectiveness Comparison (Corpus
OHSUMED, KB UMLS)
16100 queries vs. 5 50 queries
74Retrieval Effectiveness Comparison (Corpus
OHSUMED, KB UMLS)
75Summary of free text retrieval
- Knowledge-based (UMLS) approach provides
scenario-specific medical free-text retrieval - IndexFinder
- Word permutation
- Syntactic and semantic filtering
- Extracts domain-specific key concepts for
indexing - Knowledge-based query expansion
- Transforms general terms into the scenario
specific terms - A higher probability for the query to match
relevant documents - Phrase based indexing
- Transforms document indexing into a phrase
paradigm (concept and its word stems) - Improves retrieve effectiveness
76Experimental Results
- Knowledge based query expansion (KQE) is superior
to statistical query expansion. - Knowledge based phrase vector space model (PVSM)
is superior to stem based vector space model
(SVSM). - KQE PVSM can yield 15-20 improvements in
precision/recall than SVSM.
77KMeX Demo
Medical Digital Library(free text documents)
Ad-hoc query
Patient report for content correlation
Query results
News Articles
Patient reports
Medical literature
Teaching materials
78 Query Answering via Templates
- Sample templatesltdiseasegt, treatment,ltdiseas
egt, diagnosis
relevant documents
Phrase-basedVSM
lung cancer
lung cancer
QueryExpansion
radiotherapy
IndexFinder
chemotherapy
Templateltdiseasegt, treatment
lung cancer, treatment
cisplatin
79Applications (contd)
- Scenario-specific content correlation
relevant documents
e.g. treatment, diagnosis, etc.
Phrase-basedVSM
Query Templates
Scenario Selection
QueryExpansion
IndexFinder
Patient Report
80(No Transcript)
81(No Transcript)
82(No Transcript)
83(No Transcript)
84(No Transcript)
85(No Transcript)
86(No Transcript)
87(No Transcript)
88(No Transcript)
89(No Transcript)
90Future Applications
- Patient searches for relevant literature and
specialists regarding the treatment of his/her
specific disease. - Healthcare providers identifies other
individuals with similar demography and disease,
discover the success rates and side effects of
treatment methods used. - Medical researchers studies the characteristics
of new diseases and the effectiveness of
treatment methods for those diseases
91Acknowledgments
- This research was supported by
- Darpa F30602-94-C-0207
- NSF grant IIS-0097438
- NIC/NIH Grant 4442511-33780