Semantic distance - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Semantic distance

Description:

determining the structure of texts, text summarization and annotation, ... hyperonym - canine, canid. further mammal, ..., entity ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 20
Provided by: Ser28
Category:

less

Transcript and Presenter's Notes

Title: Semantic distance


1
Semantic distance WordNet
  • Serge B. Potemkin
  • Moscow State University
  • Philological faculty

2
Distance and metrics
  • Fundamental concept
  • distance between entities under consideration
  • Semantic distance between words or concepts
  • Metrical space axioms?

3
Distance is needed for
  • word sense disambiguation,
  • determining the structure of texts,
  • text summarization and annotation,
  • information extraction and retrieval,
  • automatic indexing,
  • lexical selection,
  • the automatic correction of word errors in text

4
Approaches to distance measuring
  • Corpora-based
  • Dictionary-based
  • Roget-structured thesauri
  • WordNet and other semantic networks

5
WordNet
  • Synonym sets (synsets)
  • Subsumption hierarchy (hyponymy / hypernymy),
  • 3 meronymic (PART-OF) relations
  • COMPONENT-OF,
  • MEMBER-OF,
  • SUBSTANCE-OF and their inverses
  • Antonymy,
  • COMPLEMENT-OF

6
WordNet shortcomings
  • 150000 synsets inadequate coverage
  • Non-English versions 20 70 of English
  • (100000 synsets for Russian)
  • Extension is hard
  • Distance measuring is controversial

7
Corpora-based approach
  • Two words wa and wb are as close as often their
    neighbors (/- 5 words) coincide.
  • Ex. (distributional profile of the word)
  • star space 0.28, movie 0.2, famous 0.13,
  • light 0.09, rich 0.04, . .

8
Dictionary-based approach
  • Two words wa and wb are as close as often words
    in definitions coincide.
  • Ex. walinguistics wbstylistics
  • the, study, of, language, in, general, and, of,
    particular, languages, and, their, structure,
    and, grammar, and, history
  • the, study, of, style, in, written, or, spoken,
    language.
  • 2 words coincide in definitions

9
Bilingual dictionary approach
  • Two words wa and wb are as close as often their
    equivalents coincide.
  • ?(Wa, Wb) 1/Sni,
  • Where
  • S is the sum over all coinciding Russian
    equivalents
  • and ni is the number of dictionaries where an
    equivalent occurs
  • Or ?(Wa, Wb) S nai?nbi /(aR?bR)

10
Multidimensional scaling
  • Semantic network is a graph
  • nodes -- words
  • edges -- links between words via bilingual
    lexicon
  • edge ?(Wa, Wb)
  • Immersion of graph is possible to N-dimensional
    space
  • where Nnumber of words in the lexicon (gt100000)
  • Multidimensional scaling for visualization

11
New synonyms
12
1-neighborhood of accolade
  • Links between synonyms (black)
  • Links between synonyms from the dictionary
    (green)
  • 2 isolated clusters.

13
Dominant in acerbity neighborhood
  • ascerbity (?????????) excluded
  • cluster (bold lines) derived by Markovian process
  • asperity (????????) is the centre of the cluster

14
2 dominants for bicycling (wheelcrook)
15
Adjustable parameters
  • - space dimension
  • - minimal number of dictionaries linking
    synonyms
  • - maximal distance from the word under
    consideration
  • - maximal number of displayed words
  • - word excluded from clustering

16
Compare LDB with WordNet (accolade)
n noun, v - verb
17
Controversy 1
  • Immediate hyperonym for the accolade synset in
    WordNet is symbol -- (an arbitrary sign (written
    or printed) that has acquired a conventional
    significance).
  • Immediate hyperonym for commendation, (more
    frequent than accolade) is accolade synset
  • Actually accolade is hyponym for commendation
  • It is impossible to disambiguate accolade
    (bracket) from accolade (praise)

18
Controversy 2
  • WordNet
  • dog 1 domestic dog
  • hyperonym - canine, canid.
  • further mammal, , entity
  • Nor animal, neither pet, are linked with dog as
    hyperonyms.
  • Tree structure is inadequate for semantic coding.

19
Conclusion
  • Each meaning of the polysemic word could be coded
    as pair (wE, wR) in contrast to synset coding.
  • Metrics superimposed over LDB enables homograph
    disambiguation and extraction of dominants
  • Network has particular advantages over
    hierarchical representation of semantic relations
Write a Comment
User Comments (0)
About PowerShow.com