Title: Knowledge-based Information Management for Biomedical Applications
1Knowledge-based Information Management for
Biomedical Applications
- Wesley Chu
- Computer Science Department
- University of California
- Los Angeles, CA
- wwc_at_cs.ucla.edu
- www.kmed.cs.ucla.edu
2Outline
- Data types
- Uses of knowledge bases to enhance information
management - Sample systems
- Structured data
- Multi-media
- Free-text
- Conclusion
-
-
3Information Formats used in Biomedical
Applications
- Structure Data
- Multi-media Images
- Semi-structure
- Free-text
4Uses of Knowledge Bases to Enhance Information
Management
- Approximate matching
- Query conditions
- Image features
- Similar conceptual terms
5Uses of Knowledge Bases to Enhance Information
Management
- KB query processing
- Similarity query answering
- Associative query answering
- Scenario-specific query answering
- Sentinel --Triggering and alerting
6Examples of KB Information Systems
- CoBase (1990-1998), DARPA
- A database that cooperates with the user for
structure data - KMeD (1991-2000), NSF
- A Knowledge-based medical multi-media database
- Medical Digital Library (2001-2005), NIH
- A knowledge-based digital file room for patient
care, education, and research.
7CoBase www.cobase.cs.ucla.edu
- Project leader Wesley W. Chu
- Graduate studentsK. Chiang
- C. Larson
- R. Lee
- M. Merzbacher
- M. Minock
- Frank Meng
- Wenlei MaoMark Yang
- K. Zhang
- Staff
- Q. Chen
- Gladys ChowHua Yang
-
8CoBase Cooperative Databases
- Conventional query answering
- Need to know the detailed data based schema
- Cannot get approximate answers
- Cannot answer conceptual queries
- Cooperative query answering
- Derive approximate answers
- Answer conceptual queries
- Provide additional relevant answers that user
does not (or does not know how to) ask for
9Cooperative Queries
CoBase Servers
Heterogeneous Information Sources
Find a nearby friendly airport that can land F-15
Find hospitals with facility similar to St.
Johns near LAX
CoBase provides Relaxation
Approximation Association Explanation
Domain Knowledge
Find a seaport with railway facility in Los
Angeles
10Generalization and Specialization
11 Cooperative Querying for Medical Applications
- Query
- Find the treatment used for the tumor similar-to
(loc, size) X1 on 12 year-old Korean males. - Relaxed Query
- Find the treatment used for the tumor Class X on
preteen Asians. - Association
- The success rate, side effects, and cost of the
treatment.
12Type Abstraction Hierarchies forMedical Domain
13KB Type Abstraction Hierarchy
- Using clustering technique to group similar
- Attribute values
- Image features
- Spatial relationships among objects
- Provides multi-level knowledge (conceptual)
representation
14Data mining for TAH for NumericalAttribute Values
- Clustering metrics relaxation error
- Difference between the exact value and the
returned approximate value - Relaxation error is weighted by the probability
of occurrence of each value - Can be extended to multiple attributes
15Query Relaxation
16 Summary CoBase
- Derive Approximate Answers
- Answer Conceptual Queries
- Provide Associative Query Answers
17KMeD www.kmed.cs.ucla.edu
- PI Wesley Chu, Ph.D, Computer Science Department
- Co-PIs
- A. Cardenas, Ph.D, Computer Science Department
- Ricky Taira , Ph.D, School of Medicine
- Graduate studentsAlex Bui
- Chrisitna Chu
- John Dionisio
- T. PlattnerD. Johnson
- C. Hsu
- T. Ieong
- ConsultantsDenies Aberle, M.D.
- C.M. Breant, Ph.D
18KMeD Goal Retrieval of Images by Features
Content
- Features
- size, shape, texture, density, histology
- Spatial Relations
- angle of coverage, shortest distance, overlapping
ratio, contact ratio, relative direction - Evolution of Object Growth
- fusion, fission
19(No Transcript)
20(No Transcript)
21(No Transcript)
22Characteristics of Medical Queries
- Multimedia
- Temporal
- Evolutionary
- Spatial
- Imprecise
23(No Transcript)
24(No Transcript)
25 Knowledge-Based Image Model
Representation Level (features and content)
26Queries
Query Analysis and Feature Selection
Knowledge- Based Query Processing
Knowledge-Based Content Matching Via TAHs
Query Relaxation
Query Answers
27 User Model
- To customize users
- interest and preference, needs, and goals.
- e.g. query conditions, relaxation control,
etc. - User type
- Default Parameter Values
- Feature and Content Matching Policies
- Complete Match
- Partial Match
28 User Model (cont.)
- Relaxation Control Policies
- Relaxation Order
- Unrelaxable Object
- Preference List
- Measure for Ranking
- Triggering conditions
29 Query Preprocessing
- Segment and label contours for objects of
interest - Determine relevant features and spatial
relationships (e.g., location, containment,
intersection) of the selected objects - Organize the features and spatial relationships
of objects into a feature database - Classify the feature database into a Type
Abstraction Hierarchy (TAH)
30(No Transcript)
31 Similarity Query Answering
- Determine relevant features based on query input
- Select TAH based on these features
- Traverse through the TAH nodes to match all the
images with similar features in the database - Present the images and rank their similarity
(e.g., by mean square error)
32Visual Query Language and Interface
- Point-click-drag interface
- Objects may be represented by icons
- Spatial relationships among objects are
represented graphically
33(No Transcript)
34 Visual Query Example
Retrieve brain tumor cases where a tumor is
located in the region as indicated in the picture
35(No Transcript)
36(No Transcript)
37(No Transcript)
38 Implementation
- Sun Sparc 20 workstations (128 MB RAM, 24-bit
frame buffer) - Oracle Database Management System
- C
- Mass Storage of Images (9 GB)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42 Summary KMeD
- Image retrieval by feature and content
- Matching images based on features
- Processing of queries based on spatial
relationships among objects - Answering of imprecise queries
- Expression of queries via visual query language
- Integrated view of temporal multimedia data in a
timeline metaphor
43(No Transcript)
44Medical Digital Librarywww.kmed.cs.ucla.edu
- Project leader Wesley W. Chu
- Graduate studentsVictor Z. LiuWenlei
MaoQinghua Zou
- ConsultantsHooshang Kangaloo, M.D.Denies
Aberle, M.D.
45Data Types Used in a Medical Digital Library
- Structured data (patient lab data, demographic
data,)--CoBase - Images (X rays, MRI, CT scans)--KMeD
- Free-text (Patient reports, Teaching files,
Literature, News articles)--FTRS (Free-text
retrieval system)
46A Free-Text Retrieval System (FTRS)
Knowledge-based Free- Text Retrieval System
(FTRS)
Ad hoc query
Patient report for content correlation
Query results
News Articles
Patient reports
Medical literature
Teaching materials
47A Sample Patient Report
-
- Tissue Source
- LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE)
-
- FINAL DIAGNOSIS
- - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - - LUNG CANCER, SMALL CELL, STAGE II.
Tissue Source LUNG (FINE NEEDLE ASPIRATION)
(LEFT LOWER LOBE) FINAL DIAGNOSIS - LUNG
NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - LUNG CANCER, SMALL CELL, STAGE
II.
48Scenario-Specific Retrieval
Tissue Source LUNG (FINE NEEDLE ASPIRATION)
(LEFT LOWER LOBE) FINAL DIAGNOSIS - LUNG
NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - LUNG CANCER, SMALL CELL, STAGE
II.
49Challenge I Indexing for Free-Text
- Extracting key concepts in the free-text for
indexing - Free-text Lung cancer, small cell, stage II
- Concept terms in knowledge source stage II small
cell lung cancer - Conventional methods use NLP
- Not scalable
50Challenge II Mismatch between terms used in
query and documents
Query lung cancer,
?
?
?
Document 3 anti-cancerdrug combinations
Document 1 lung carcinoma
Document 2 lung neoplasm
51Challenge III Terms used in the query are too
general
- Expanding the general terms in the query to
specific terms that are used in the document
Query lung cancer, diagnosis options
Query lung cancer, chest x-ray, bronchography,
Document the effectiveness of chest x-ray and
bronchography on patients with lung cancer
52A Medical KBUnified Medical Language System
(UMLS)
- Meta-thesaurus - control vocabulary (1.6M
biomedical phrases, representing 800K concepts) - Semantic Network classify concepts into classes
(e.g. disease and syndrome, treated by,
therapeutic procedure, etc.) - Specialized Lexicon
53Using knowledge sources to resolve these
challenges
- Challenge I Automatic indexing of free text
- Challenge II Mismatch between terms in the
query and the documents - Challenge III Terms in the query are too general
54IndexFinder Extracting domain-specific key
concepts
- Technique
- Permute words from text to generate concept
candidates. - Use knowledge base to select the valid
candidates. - Problem
- Valid candidates may be irrelevant to the
document. - Redundant concept
55 Filtering out Irrelevant Concepts
- Syntactic filter
- Limit permutation of words within a sentence.
- Semantic filter
- Use the semantic type (e.g. body part, disease,
treatment, diagnosis) to filter out irrelevant
concepts - Use ISA relationship to filter out general
concepts and yield specific concepts.
56 IndexFinder Performance
- Two orders of magnitude faster than conventional
approaches - No NLP
- Time complexity is linear with the number of
distinct words in the text - Preliminary Evaluation
- IndexFinder generates more valid terms than that
of NLP (using a single noun phrase) - Filtering is effective to eliminate irrelevant
terms
57Using knowledge sources to resolve these
challenges
- Challenge I Automatic indexing of free text
- Challenge II Mismatch between terms in the
query and the documents - Challenge III Terms in the query are too general
58Phrase-based Vector Space Model (VSM)
Query lung cancer,
Query lung cancer,
lung cancer lung carcinoma
missing!!!
parent_of
anti-cancer drug combinations
Document lung carcinoma
Document lung neoplasm
Document anti-cancer drugcombinations
Document anti-cancer drugcombinations
lung neoplasm
Knowledge source
59Phrase-based VSM Examples
(C0242379) lung cancer
(C0003393) anti cancer drug combin
60Using knowledge sources to resolve these
challenges
- Challenge I Automatic indexing of free text
- Challenge II Mismatch between terms in the
query and the documents - Challenge III Terms in the query are too general
61Query Expansion (QE)
- Queries in the following form benefit from
expansionltkey conceptgt ltgeneral supporting
concept(s)gte.g. lung cancer e.g. treatment
options
ltkey conceptgt ltspecific supporting
concept(s)gte.g. lung cancer e.g.
chemotherapy, radiotherapy
62Knowledge-based Scenario-specific Expansion
63Retrieval Effectiveness Comparison (Corpus
OHSUMED, KB UMLS)
Overall improvement 33,100 queries vs. 5, 50
queries
64FTRS Scenario-specificQuery Answering
- Sample templatesltdiseasegt, treatment,ltdiseas
egt, diagnosis
65FTRS Scenario-specific content correlation
- IndexFinder extracts key concepts from free-text
for content correlation
66 Summary KB Free-text retrieval
- Technologies
- IndexFinder extracts key concepts from the
free-text - Phrase-based VSM a new document indexing
paradigm (concept and its word stems) to improve
retrieval effectiveness - Knowledge-based query expansion match query
with scenario-specific documents - provides scenario-specific free-text retrieval
67Conclusions
- Knowledge sources
- provides
- Approximate matching
- Query conditions
- Image features
- Query processing
- Similarity query answering
- User modeling
- Associative answering
- Triggering and alerting
- Document retrieval
- Convert ad hoc free-text into controlled
vocabulary - Phrase-based VSM
- Content correlation
- Scenario-specific retrieval
- Increase capabilities and effectiveness
Information Management
68Acknowledgement
This research is supported by DARPA, NSF Grant
9619345, and NIC/NIH Grant4442511-33780