Freetext Medical Document Retrieval via Phrasebased Vector Space Model - PowerPoint PPT Presentation

About This Presentation
Title:

Freetext Medical Document Retrieval via Phrasebased Vector Space Model

Description:

Synonyms: 'hyperthermia' and 'fever' ... Captures synonyms. Query: 'Hyperthermia, leukocytosis, increased intracranial pressure' ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 22
Provided by: csU5
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Freetext Medical Document Retrieval via Phrasebased Vector Space Model


1
Free-text Medical Document Retrieval via
Phrase-based Vector Space Model
  • Wenlei Mao, MS and Wesley W. Chu, PhD
  • wenlei_at_cs.ucla.edu and wwc_at_cs.ucla.edu
  • Computer Science Department
  • University of California, Los Angeles

2
Outline
  • Vector space model (VSM) in document retrieval
  • Stem-based VSM
  • Concept-based VSM
  • Conceptual similarity
  • Phrase-based VSM
  • Retrieval effectiveness comparison
  • Conclusion

3
Document Retrieval
  • Find free-text documents to answer queries like,
  • Hyperthermia, leukocytosis, increased
    intracranial pressure, and central
    herniation.Cerebral edema secondary to
    infection, diagnosis and treatment.

4
Vector Space Model (VSM)
5
Stem-based VSM
  • Morphological variants bear similar content
  • E.g., edema and edemas
  • Use stemmer to extract stems
  • Lovins stemmer and Porter stemmer
  • Baseline of comparison

6
Shortcomings of Stem-based VSM
  • Inability to capture multi-word concepts
  • Increased intracranial pressure
  • Inability to utilize the relations between
    concepts
  • Synonyms hyperthermia and fever
  • IS-A relation hyperthermia and body
    temperature elevation

7
Concept-based VSM
  • Uses concepts in knowledge base (KB) as terms
  • KB Metathesaurus in UMLS
  • Captures multi-word concepts
  • Captures synonyms

8
Shortcomings of Concept-based VSM
  • Concepts may be related
  • E.g. hyperthermia and body temperature
    elevation are not identical but related concepts
  • Need to quantify conceptual relations
  • Knowledge bases are often incomplete, which
    reduces the retrieval effectiveness

9
Conceptual Similarity Evaluation
10
Deriving Conceptual Similarity From Hypernym
Hierarchy
11
Shortcomings of Concept-based VSM
  • Concepts may be related
  • The conceptual similarity measure, s(ci,cj),
    quantifies relations between concepts.
  • Knowledge bases are often incomplete, which
    reduces the retrieval effectiveness.

12
Incompleteness of the Knowledge Bases
  • In general, concept-based VSM cannot outperform
    stem-based VSM

13
Phrase-based Indexing Examples
14
Evaluate Phrase-based Document Similarity
15
To Compare Retrieval Effectiveness
  • The test set OHSUMED
  • 106 queries, 14K documents
  • Expert relevance judgment R or N
  • Retrieval effectiveness
  • Recall the percentage of relevant documents
    retrieved so far
  • Precision the percentage of retrieved documents
    that are relevant

16
Retrieval Effectiveness Comparison (Corpus
OHSUMED, KB UMLS)
16100 queries vs. 5 50 queries
17
Stem and Concept Similarity Contribution Weights
18
Sensitivity of Retrieval Effectiveness to fs and
fc
19
Computation Complexity Using Phrase-based VSM
  • Data reorganization
  • Build separate indexes on stems and concepts
  • Keep a list of related concepts cjs and
    conceptual similarity s(ci,cj) with ci.
  • Time complexities of document similarity
    calculation, same order of magnitude
  • Stem-based VSM
  • Phrase-based VSM

20
Conclusion
  • A new document indexing paradigm based on phrases
    is proposed
  • Use phrases (concept and its word stems) as terms
  • Document similarity is derived from both the stem
    and the concept contributions
  • Conceptual similarity quantifies the concept
    relations and improves retrieval effectiveness
  • Stems remedy the incomplete coverage of the
    knowledge base (missing concepts and missing
    links between related concepts)
  • Experimental results reveal a significant
    retrieval effectiveness improvement of the
    phrase-based VSM over the stem-based VSM

21
Acknowledgement
This research is supported in part by NIC/NIH
Grant4442511-33780
22
Model Comparison
Write a Comment
User Comments (0)
About PowerShow.com