A Knowledge-based Medical Digital Library - PowerPoint PPT Presentation

About This Presentation
Title:

A Knowledge-based Medical Digital Library

Description:

Structured data (patient lab data, demographic data,...)--CoBase ... Epikeratophakia for aphakia, keratoconus, and myopia ... 7/31/09. 53. contact lens ... – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 74
Provided by: CSU117
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: A Knowledge-based Medical Digital Library


1
A Knowledge-based Medical Digital Library
  • Wesley W. ChuComputer Science Dept,
  • UCLA
  • wwc_at_cs.ucla.edu

2
Data in a Medical Digital Library
  • Structured data (patient lab data, demographic
    data,)--CoBase
  • Images (X rays, MRI, CT scans)--KMeD
  • Free-text--KMeX
  • Patient reports
  • Teaching files
  • Literature
  • News articles

3
System Overview
query
Medical Digital Library
relevantinformation
structured data (e.g., lab results, patient
demo-graphic data)
image data (e.g., X-ray images, CT images, etc.)
free-text data (e.g., medical literature, news
articles, etc.)
4
Benefits of knowledge based Medical Digital
library
  • Content Based Information Retrieval
  • Transforms patient records into a sea of
    information sources
  • Provides scenario-specific information for
    patient care, medical research and education.

5
Characteristics of Medical Queries
  • Multimedia
  • Temporal
  • Evolutionary
  • Spatial
  • Imprecise

6
CoBase Cooperatrive Database
www.cobase.cs.ucla.edu
  • Use knowledge base to
  • Derive Approximate Answers
  • Answer Conceptual Queries
  • Provide Associative Query Answers

7
KB Type Abstraction Hierarchy (TAH)
  • Using clustering technique to group similar
  • Attribute values
  • Image features
  • Spatial relationships among objects
  • Provides multi-level knowledge (conceptual)
    representation

8
Data mining for KB--TAH
  • Clustering data of an attribute
  • Value--difference between the exact value and the
    returned approximate value
  • Frequency-- probability of occurrence for each
    value
  • Can be extended to multiple attributes

9
Type Abstraction Hierarchies forMedical Domain
10
Generalization and Specialization in TAH
11
Query Relaxation
12
Cooperative Querying for Medical Applications
  • Query
  • Find the treatment used for the tumor similar-to
    (loc, size) X1 on 12 year-old Korean males.
  • Relaxed Query
  • Find the treatment used for the tumor Class X on
    preteen Asians.
  • Association
  • The success rate, side effects, and cost of the
    treatment.

13
System Overview
query
Medical Digital Library
relevantinformation
structured data (e.g., lab test results, patient
demographic data)
image data (e.g., X-ray images, CT images, etc.)
free-text data (e.g., medical literature, news
articles, etc.)
14
KMeD Retrival images by contentswww.kmed.cs.ucla
.edu
  • PI Wesley Chu, Ph.D, Computer Science Department
  • Co-PIs
  • A. Cardenas, Ph.D, Computer Science Department
  • Ricky Taira , Ph.D, School of Medicine
  • ConsultantsDenies Aberle, M.D.
  • C.M. Breant, Ph.D
  • Graduate studentsAlex Bui
  • Christina Chu
  • John Dionisio
  • T. PlattnerD. Johnson
  • C. Hsu
  • T. Ieong

15
KMeD Objectives
  • Matching images based on features
  • Processing of queries based on spatial
    relationships among objects
  • Answering of imprecise queries
  • Visual query interface

16
KMeD Retrieval of Images by Features Content
  • Features
  • size, shape, texture, density, histology
  • Spatial Relations
  • angle of coverage, shortest distance, overlapping
    ratio, contact ratio, relative direction
  • Evolution of Object Growth
  • fusion, fission

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Knowledge-Based Image Model
Representation Level (features and content)
22
Queries
Query Analysis and Feature Selection
Knowledge- Based Query Processing
Knowledge-Based Content Matching Via TAHs
Query Relaxation
Query Answers
23
User Model
  • To customize users
  • interest and preference, needs, and goals.
  • e.g. query conditions, relaxation control,
    etc.
  • User type
  • Default Parameter Values
  • Feature and Content Matching Policies
  • Complete Match
  • Partial Match

24
User Model (cont.)
  • Relaxation Control Policies
  • Relaxation Order
  • Unrelaxable Object
  • Preference List
  • Measure for Ranking
  • Triggering conditions

25
Query Preprocessing
  • Segment and label contours for objects of
    interest
  • Determine relevant features and spatial
    relationships (e.g., location, containment,
    intersection) of the selected objects
  • Organize the features and spatial relationships
    of objects into a feature database
  • Classify the feature database into a Type
    Abstraction Hierarchy (TAH)

26
(No Transcript)
27
Visual Query Language and Interface
  • Point-click-drag interface
  • Objects may be represented by icons
  • Spatial relationships among objects are
    represented graphically

28
(No Transcript)
29
Visual Query Example
Retrieve brain tumor cases where a tumor is
located in the region as indicated in the picture
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
A KB Medical Digital Librarywww.cobase.cs.ucla.ed
u
query
Medical Digital Library
relevantinformation
structured data (e.g., lab test results, patient
demographic data)
image data (e.g., X-ray images, CT images, etc.)
free-text data (e.g., medical literature, news
articles, etc.)
35
KMeXwww.cobase.cs.ucla.edu
  • Project leader Wesley W. Chu
  • ConsultantsHooshang Kangaloo, M.D.Denies
    Aberle, M.D.
  • Graduate studentsVictor Z. LiuWenlei
    MaoQinghua Zou

36
A Sample Patient Report
  • Tissue Source
  • LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE)
  • FINAL DIAGNOSIS
  • - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE
    ASPIRATION)
  • - LUNG CANCER, SMALL CELL, STAGE II.

Tissue Source LUNG (FINE NEEDLE ASPIRATION)
(LEFT LOWER LOBE) FINAL DIAGNOSIS - LUNG
NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - LUNG CANCER, SMALL CELL, STAGE
II.
37
Scenario-Specific Queries
  • Queries that mention one or more scenarios
  • E.g., keratoconus treatment lung cancer
    diagnosis and complications
  • A scenario (e.g., treatment) a repeating
    healthcare situation
  • gt60 medical queries are scenario specific
    HMW90, HPH96, EOE99, EOG00, WMH01

38
Scenario Specific Retrieval
Tissue Source LUNG (FINE NEEDLE ASPIRATION)
(LEFT LOWER LOBE) FINAL DIAGNOSIS - LUNG
NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - LUNG CANCER, SMALL CELL, STAGE
II.
39
  • Challenge I Indexing
  • Challenge II Terms in the query are too general
  • Challenge III Mismatch between terms in the
    query and the documents

40
IndexFinder http//fargo.homedns.org/umls/demo.asp
x
  • Extract key information from clinical free texts
  • Search relevant reports
  • Search similar patients
  • Medical KB (UMLS) provides standard medical
    concepts
  • IndexFinder
  • Extracts UMLS concepts from clinical texts

41
Previous Approaches
Free text
42
Problems of Previous Approaches
  • Concepts cannot be discovered if they are not in
    a single noun phrase.
  • E.g. In second, third, and fourth ribs, Second
    rib can not be discovered.
  • Difficult to scale to large text computing.
  • Natural language processing requires significant
    computing resources

43
Our Approach IndexFinder (Zou et.al 03)
Previous free text?UMLS
Suppose UMLS contains only Lung cancer
We would discard all words in the text except
lung and cancer.
44
Knowledge-based approach
  • Using the compact index data without using any
    database system.
  • Permuting words in a sentence to generate UMLS
    concept candidates.
  • Using filters to eliminate irrelevant concepts.

45
Concept Candidates Generation
  • Assumptions
  • Knowledge base provides a phrase table.
  • Each phrase (concept) is a set of words.
  • An input text T is represented as a set of words.
  • Goal
  • Combining words in T to generate concept
    candidates
  • Example
  • TD,E,F
  • Answer 5

46
Eliminate irrelevant concepts
  • Syntactic filter
  • Limit the of word combinations within a
    sentence.
  • Semantic filter
  • Using semantic types (e.g. body part, disease,
    treatment, diagnose)
  • Using the ISA relationship and filter out general
    terms and keep the specific ones.

47
Comparison of Indexfinder with MetaMap
Input A small mass was found in the left hilum
of the lung.
48
Summary of IndexFinder http//fargo.homedns.org/u
mls/demo.aspx
  • An efficient method that maps from UMLS to free
    text to extract concepts without using any
    database system.
  • Syntactic and semantic filters are used to
    eliminate irrelevant candidates.
  • IndexFinder is able to find more specific
    concepts than NLP approaches.
  • IndexFinder is scalable and can be operated in
    real time.

49
Topic Directory
  • Using indexing for document retrieval can not
    provide
  • Standard vocabulary
  • Cross reference among topics
  • Scenario specific search
  • Topic directory resolves these shortcomings by
    dynamically clustering documents into knowledge
    based topics based on user specified scenarios

50
Challenge II Terms used in the query are too
general
  • Scenario terms in a query,
  • e.g., treatment vs.Specialized terms in
    documents,
  • e.g., epikeratoplasty, contact lens, etc.

51
Challenge II Terms used in the query are too
general
  • Expanding the general terms in the query to
    specific terms that are used in the document

Query lung cancer, diagnosis options
Query lung cancer, chest x-ray, bronchography,
Document the effectiveness of chest x-ray and
bronchography on patients with lung cancer
52
The Mismatch Problem
  • Scenario concepts are too general to match
    specialized ones in relevant docs

Expanded Query keratoconus, treatment,contact
lens, epikeratoplasty, epikeratophakia
Document 1 The use of contact lens after
keratoconic epikeratoplasty
Query keratoconus, treatment
Document 2 Epikeratophakia for aphakia,
keratoconus, and myopia
53
Deriving Scenario-Specific Relationships
contact lens
epikeratoplasty
keratoconus
treats
epikeratophakia
  • More specific than term co-occurrence
    relationships
  • Not directly available through a knowledge source

54
Basic Idea
  • Start from pairs of frequently co-occurring
    concepts Qiu03, Jing94, Xu96
  • Apply knowledge structures to filter out pairs
    that are irrelevant to a given scenario, e.g.,
    treatment

55
Sample Co-Occurring Pairs
  • Concepts most frequently co-occurring with
    keratoconus

56
UMLS The Knowledge Source
  • Three major components
  • The MetaThesaurus
  • gt 800K medical concepts, ltID, multiple string
    formsgt
  • E.g., ltC0022578, Keratoconus, Cornea
    conicalgt
  • Used for detecting concepts from free text
  • The Semantic Network
  • 100 semantic types, 50 relations among types
  • E.g., Disease or Syndrome, containing 44,000
    concepts
  • Used for deriving scenario-specific relationships
  • The SPECIALIST Lexicon

57
Structure of The Knowledge Source
Disease or Syndrome
Pharmocological Substance
The Semantic Network
treats
insulin
keratoconus
The Meta-Thesaurus
acute hydrops keratoconus
lactase
58
Fragment of The Semantic Network for Each Scenario
  • E.g., the treatment scenario

Therapeuticor Preventive Procedure
treats
Disease or Syndrome
Pharmocological Substance
MedicalDevice
treats
treats
59
Filtering
corneal
central cornea
penetrating keratoplasty
acute hydrops
epikeratoplasty
keratoconus
griffonia
contact lens
60
Knowledge-Based Query Expansion
  • Original query ltckey, csgt
  • ckey, a key concept, e.g., keratoconus
  • cs, a set of scenario concepts, e.g., treatment
  • ce, concepts having scenario-specific
    relationships with ckey
  • ckey ? cs ? ce, e.g., keratoconus ? treats ?
    contact lens
  • Expanded query ltckey, cs, cegt
  • E.g., keratoconus, treatment, contact lens,
    epikeratoplasty, epikeratophakia

61
Statistical Expansion(Baseline for Comparison)
  • Original query ltckey, csgt
  • ce, co-occurring concepts
  • ckey co-occurs with cee.g., keratoconus
    co-occurs with corneal
  • Expanded query ltckey, cs, cegt

62
Assigning Proper Weights to Expansion Terms
  • Vector Space Model (VSM) for computing
    query-document similarity
  • Both the query and the document as vectors of
    terms
  • The weight for a single term might significantly
    affect document ranking
  • ve weight of an expansion term ce
  • Based on the co-occurrence between ce and ckey
  • ve vkey co(ckey, ce)
  • vkey weight of ckey computed from regular tfidf
    weighting
  • co(ckey, ce) statistical co-occurrence between
    ckey and ce

63
Need for Weight Adjustments
  • Weight adjustments needed to compensate for the
    filtering

ce ve ce ve
fuchs dystrophy 0.289 penetrating keratoplasty 0.247
penetrating keratoplasty 0.247 epikeratoplasty 0.230
epikeratoplasty 0.230 epikeratophakia 0.119
corneal ectasia 0.168 keratoplasty 0.103
acute hydrops 0.165 contact lens 0.101
keratometry 0.133 thermokeratoplasty 0.092
corneal topography 0.132 button 0.067
corneal 0.130 secondary lens implant 0.057
aphakic corneal edema 0.122 fittings adapters 0.048
epikeratophakia 0.119 esthesiometer 0.043
statistical expansion
knowledge-based expansion
64
The OHSUMED Testbed
  • A testbed a benchmark query set, a corpus,
    relevance judgments for each query
  • OHSUMED HBL94
  • 57 scenario-specific queries
  • e.g., keratoconus treatment thrombocytosis
    treatment and diagnosis diagnostic and
    theraputic work up of breast mass
  • 348K MEDLINE articles (title abstract), 1988
    1992

65
OHSUMED Scenarios
Scenario of Queries
treatment of a disease 28
diagnosis of a disease 11
prevention of a disease 2
differential diagnosis of a symptom/disease 12
pathophysiology of a disease 5
complications of a disease/medication 7
etiology of a disease 3
risk factors of a disease 3
prognosis of a disease 1
epidemiology of a disease 1
research of a disease 1
organisms of a disease 1
criteria of medication 3
when to perform a medication 1
preventive health care for a type of patients 1
66
Comparison Under Different Expansion Sizes
  • s expansion size
  • Metric avgp

67
Summary of Query Expansion
  • Knowledge based approach selects more scenario
    specific terms than statistical approach and
    achieves better performance
  • Different quality of knowledge structure for
    different scenarios yield different performance
    improvements

68
  • Challenge I Indexing
  • Challenge II Terms in the query are too general
  • Challenge III Mismatch between terms used in the
    query and the documents causes problems in
    ranking of results

69
Challenge III Mismatching between terms used in
query and documents
  • Example

Query lung cancer,
?
?
?
Document 3 anti-cancerdrug combinations
Document 1 lung carcinoma
Document 2 lung neoplasm
70
Ranking query results
  • Traditional approach
  • Word Stem based Vector Space Model (VSM)
  • Concept based VSM
  • New approach
  • Phrase based (word concept) VSM

71
Phrase-based Vector Space Model (VSM)
Query lung cancer,
Query lung cancer,
lung cancer lung carcinoma
missing!!!
parent_of
anti-cancer drug combinations
Document lung carcinoma
Document lung neoplasm
Document anti-cancer drugcombinations
Document anti-cancer drugcombinations
lung neoplasm
Knowledge-source
72
Phrase-based VSM Examples
(C0242379) lung cancer
(C0003393) anti cancer drug combin
73
Retrieval Effectiveness Comparison (Corpus
OHSUMED, KB UMLS)
16100 queries vs. 5 50 queries
74
Retrieval Effectiveness Comparison (Corpus
OHSUMED, KB UMLS)
75
Summary of free text retrieval
  • Knowledge-based (UMLS) approach provides
    scenario-specific medical free-text retrieval
  • IndexFinder
  • Word permutation
  • Syntactic and semantic filtering
  • Extracts domain-specific key concepts for
    indexing
  • Knowledge-based query expansion
  • Transforms general terms into the scenario
    specific terms
  • A higher probability for the query to match
    relevant documents
  • Phrase based indexing
  • Transforms document indexing into a phrase
    paradigm (concept and its word stems)
  • Improves retrieve effectiveness

76
Experimental Results
  • Knowledge based query expansion (KQE) is superior
    to statistical query expansion.
  • Knowledge based phrase vector space model (PVSM)
    is superior to stem based vector space model
    (SVSM).
  • KQE PVSM can yield 15-20 improvements in
    precision/recall than SVSM.

77
KMeX Demo
Medical Digital Library(free text documents)
Ad-hoc query
Patient report for content correlation
Query results
News Articles
Patient reports
Medical literature
Teaching materials
78
Query Answering via Templates
  • Sample templatesltdiseasegt, treatment,ltdiseas
    egt, diagnosis

relevant documents
Phrase-basedVSM
lung cancer
lung cancer
QueryExpansion
radiotherapy
IndexFinder
chemotherapy
Templateltdiseasegt, treatment
lung cancer, treatment

cisplatin
79
Applications (contd)
  • Scenario-specific content correlation

relevant documents
e.g. treatment, diagnosis, etc.
Phrase-basedVSM
Query Templates
Scenario Selection
QueryExpansion
IndexFinder

Patient Report
80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
Future Applications
  • Patient searches for relevant literature and
    specialists regarding the treatment of his/her
    specific disease.
  • Healthcare providers identifies other
    individuals with similar demography and disease,
    discover the success rates and side effects of
    treatment methods used.
  • Medical researchers studies the characteristics
    of new diseases and the effectiveness of
    treatment methods for those diseases

91
Acknowledgments
  • This research was supported by
  • Darpa F30602-94-C-0207
  • NSF grant IIS-0097438
  • NIC/NIH Grant 4442511-33780
Write a Comment
User Comments (0)
About PowerShow.com