The Hows and Whys of Semantic Distance - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

The Hows and Whys of Semantic Distance

Description:

The Hows and Whys of Semantic Distance. Vipul Kashyap, National Library of ... http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic -data ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 25
Provided by: vipulk
Category:
Tags: distance | hows | ksl | semantic | whys

less

Transcript and Presenter's Notes

Title: The Hows and Whys of Semantic Distance


1
The Hows and Whys of Semantic Distance
  • Vipul Kashyap, National Library of Medicine, NIH
  • kashyap_at_nlm.nih.gov
  • NIST Semantic Distance Workshop, Gaithersburg, MD
  • November 11, 2003

2
Outline
  • Why Semantic Distance ?
  • Some applications of Semantic Distance
  • The Semantic Conveyance problem
  • Translations across multiple ontologies
  • Role of Semantic Distance
  • One approach of measuring Semantic Distance
  • The Hows of Semantic Distance
  • Types of Semantic Distance measures

3
Why do we need Semantic Distance (in Healthcare)?
  • Healthcare Information is characterized by
    multiple terminologies,
  • E.g., MeSH, CPT, LOINC, SnoMed, etc.
  • Interoperability across terminologies is crucial
    to healthcare information system interoperability
  • Which terminology do I interoperate with?
  • What criteria/measure do we use?
  • Application dependent v/s application specific
  • Should the measure be machine understandable?
  • Should the measure be human understandable?

4
Applications of Semantic Distance
  • Information Retrieval
  • To improve automated assignment of indexing based
    descriptors
  • Semantic Vocabulary Integration
  • To choose the closest related concepts while
    translating in and out of the multiple
    vocabularies
  • Look at the Semantic Vocabulary Interoperability
    Project at
  • http//cgsb2.nlm.nih.gov/kashyap/projects/SVIP

5
Semantic Conveyance
XXX Book
Sender
Receiver
  • The Sender has his own ontology (The Red
    Ontology)
  • The Receiver has his own (The Blue Ontology)
  • For communication to take place,
  • The receiver should translate the message
    (content) from the Red Ontology to the Blue
    Ontology
  • Questions
  • Is it always possible?
  • How many candidate possibilities are there?
  • How do we choose from them, Semantic Distance?

6
Terminology 1 The Red Terminology
Instructions
Reference-Manual
http//www.cogsci.princeton.edu/wn/w3wn.html
7
Terminology 2 The Blue Terminology
Conference
Agent
Person
Organization
Author
Publisher
University
Thesis
Periodical-Publication
http//www-ksl.stanford.edu/knowledge-sharing/onto
logies/html/bibliographic-data/
8
Inter-terminological relationships
Typically represented in the UMLS Metathesaurus
  • Synonyms
  • semantics preserving
  • Hyponyms/Hypernyms
  • semantics altering
  • typically results in loss of information
  • List of Hyponyms
  • technical-manual hyponym manual
  • book hyponym book
  • proceedings hyponym book
  • thesis hyponym book
  • misc-publication hyponym book
  • technical-reports hyponym book
  • press hyponym periodical-publicatio
    n
  • periodical hyponym periodical-publicatio
    n

9
Translations across multiple
terminologies
union(Book, Proceedings, ..., Misc-Publication),
document
Technical-Manual
GuideBook
10
Semantic Conveyance
XXX Book
Sender
Receiver
XXX Document XXX union(.)
  • XXX Document
  • XXX union(.)
  • XXX .
  • How do we chose between the various alternatives
  • Semantic Distance to the rescue!!

11
Proposal for Semantic Distance Extensional
Measure
Loss in Precision
Loss in Recall
Ext(Term)
Ext(Translation)
Precision Ext(Term) ? Ext(Translation)
Ext(Translation)
Recall Ext(Term) ? Ext(Translation)
Ext(Term)
Percentage Loss Ext(Term) ?
Ext(Translation)
Ext(Term) Ext(Translation)
12
Choosing an optimal translationLocal v/s Global
Decision Making
Publication
Document
LOSS(Document, Book)
Document
Document
Journal
Publication
Book
Journal
Book
Journal
LOSS(Publication, Journal)
LOSS(Journal, Book)
LOSS(Document, Publication)
  • Local Decision Making
  • LOSS(Publication, Journal) LOSS(Document,
    Publication)
  • Document is chosen as the translation
  • But LOSS(Book, Document) LOSS(Book, Journal) !!
  • Global Decision Making
  • Both translations Document, Journal are passed
    on to the next level
  • Journal is chosen as the appropriate translation

13
Using Subsumption for tighter bounds on Semantic
Distance
  • Term subsumes Translation
  • Ext(Translation) ? Ext(Term) ? Ext(Term) ?
    Ext(Translation) Ext(Translation)
  • Precision 1,
  • Recall Ext(Translation)
  • Ext(Term)
  • Should be able incorporate other
    application-specific measures to adapt distance
    measures
  • Same terminological translation might be have
    different semantic distances based on application
    specific adaptations

14
Proposal for Semantic DistanceIntensional
Measure
  • Difference in Translation
  • Book ? union(Book, Thesis, Proceedings,
    Technical-Manual, Misc-Publication)
  • Terminological Difference
  • Book ? (AND Publication (ATLEAST 1 ISBN))
  • Publication ? (AND document (ATLEAST 1
    PLACE-OF-PUBLICATION))
  • Book ? (AND document (ATLEAST 1 ISBN) (ATLEAST 1
    PLACE-OF-PUBLICATION))
  • Loss of Information
  • (-) union(Trade-Book, Brochure, SongBook,
    PrayerBook, TextBook)
  • information related to trade books, brochures,
    song books, prayer books and text books is lost
  • () (AND (ATLEAST 1 ISBN) (ATLEAST 1
    PLACE-OF-PUBLICATION))
  • spurious documents that dont have an ISBN number
    and a place of publication are gained

15
Measures for Semantic Distance Pros and Cons
  • Intensional Measure
  • May not make sense as it mixes two vocabularies,
  • e.g., does Book - Book make any sense ?
  • The problem becomes worse if the two
    terminologies are in different languages
  • Makes it hard for the system to differentiate
    between the various alternatives
  • Extensional Measure
  • Based on Standard Information Retrieval Measures
    (F-measure)
  • Can be tailored to reflect change in semantic
    distance for different applications
  • However
  • Probability distributions of various terms need
    to be estimated
  • An information loss interval doesnt make much
    sense to the user.

16
How do we measure Semantic Distance?
  • Extensional Approaches
  • Intensional Approaches (based on concept
    definitions)
  • Combination of the above approaches
  • Combining the above using some weightage schemes
  • Applying semantics of subsumption to the semantic
    distance measures
  • Applying other constraints specified in the
    semantic network

17
Types of Semantic Distance Metrics Intensional
  • Numerical
  • Based on features (e.g., Tverskys measure)
  • Based on traversal of specific conceptual
    relationships (is-a, part-of) and arbitrary
    domain specific relationships
  • Non-numerical
  • Based on semantic concept differences, e.g. a
    book without a publication date
  • Important for human understandability

18
Proposal for Semantic DistanceTverskys measure
from Psycho-semantics
  • S(a, b) A n B
  • A n B a(a, b) A B (1 - a(a, b))B
    A
  • S(a, b) is the similarity between two arbitrary
    objects, a,b
  • A and B are feature sets of a, b respectively
  • a is a real no. ? ? ? a ? 1

19
Types of Semantic Distance Metrics Extensional
  • Numerical Based on estimation of underlying
    concept intensions
  • Computation of joint and conditional probability
    distributions
  • Computation of concept co-occurrences in
    documents
  • Computation of cosine measures in a vector space
    mode

20
A Classification of Numerical (Intensional
Extensional Approaches)
  • Traversal of graph-based information models
  • Traversal of Hierarchical Relationships
  • Intensional
  • Feature contrast based approaches (e.g., Tversky)
  • Intensional
  • Probabilistic approaches (e.g., Precision,
    Recall, F-measure)
  • Based on estimation of extensions/distributions
    of concepts
  • Some combination of the above?

21
Counter-Example Hierarchical approaches
C4
C2
C3
C1
  • Hiearchical approach
  • semantic-distance(C1, C2) C4)
  • However, if we look at concept extensions, it
    might be the case that
  • only 10 of C2 is C1 and
  • 90 of C3 is C1 and 20 of C4 is C2
  • This implies, 18 of C4 is C1
  • Thus, semantic-distance(C1, C2)
    semantic-distance(C1, C4)

22
Counter Example Probabilistic approaches
C2
C3
C1
  • Probabilistic approach
  • semantic-distance(C1, C2) and semantic-distance(C2
    , C3) are given
  • Can we compute semantic-distance(C1, C3) ?
  • It is quite possible that
  • semantic-distance(C1, C3) C2) and
  • semantic-distance(C1, C3) C3)
  • Feature contrast based approach
  • If the features of the concepts are known,
    Tverskys measures may be used
  • However, the Semantic Proximity algorithm has to
    identify C3 as a concept that may have semantic
    resemblance with C1

23
Research areas dealing with Semantic Distances
  • Knowledge Representation
  • primarily non-numerical, intensional
  • Statistical Clustering
  • primarily numerical, vector space model
  • Data Mining
  • primarily numerical, probabilistic
  • Machine Learning
  • primarily numerical
  • Information Retrieval
  • primarily numerical, probabilistic
  • Medical Informatics
  • primarily numerical, intensional
  • Natural Language Processing
  • ??
  • . any other field

24
Conclusions
  • Semantic Distance measures need to be application
    specific
  • Text Retrieval
  • (Structured) Data Retrieval
  • Domain and Context Specific
  • Semantic Distance measures should be both human
    and machine processable
  • They should be based on standard measures as far
    as possible
  • E.g., F-measure from Information Retrieval
  • There is a need for estimation of various
    distributions of medical concepts in a given
    population
  • E.g. May need to mine CDC databases
Write a Comment
User Comments (0)
About PowerShow.com