WordNet: An Overview - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

WordNet: An Overview

Description:

WordNet: The Facts. A word or phrase is the basic unit ... {apple, edible_fruit,_at_ (fruit with red or yellow or green skin and crisp whitish flesh) ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 29
Provided by: anubha2
Category:
Tags: wordnet | overview

less

Transcript and Presenter's Notes

Title: WordNet: An Overview


1
WordNet An Overview
  • Anubhav Madan
  • anubhavm_at_comp.nus.edu.sg

2
Todays Discussion
  • WordNet A Lexical Database
  • WordNetSimilarity
  • Some More Applications
  • Limitations
  • Tutorial

3
WordNet A Lexical Database
  • Started in 1985
  • Basic Unit Synset
  • Hierarchical arrangement w/r/t definition
  • Contains compounds phrasal verbs, collocations,
    and idiomatic phrases
  • Bad Person _at_? offender, libertine
  • Establishes a rich, dense network and
    establishes text coherence

4
WordNet The Facts
  • A word or phrase is the basic unit
  • Words are organized into synsets, which are a
    group of units that have the same sense.
  • A gloss is a textual definition of the synset
  • Words organized into hierarchies
  • hypernym/hyponym concept IS-A concept
  • meronym/holonym concept HAS-PART concept
  • Types Nouns, Verbs, Adjectives
  • 80,000 Nouns organized into 60,000 concepts

5
WordNet Architecture
6
WordNet Architecture
  • Word/synset pairs stored in the WordNet DB.
  • Word/List of Word Forms, Pointer to Lexical
    File, frames (for verbs), list of elements,
    (optional gloss), adjective cluster
  • apple, edible_fruit,_at_ (fruit with red or yellow
    or green skin and crisp whitish flesh)
  • Indexes Senses are Ordered
  • Index of Familarity How well known is the word.
  • Index and Data Files
  • Sense Index
  • The Grinder as a Converter takes Lexical Source
    Files written by Lexiographers and converts them
    into a format that is understandable and
    updatable for WN.

7
Todays Discussion
  • WordNet A Lexical Database
  • WordNetSimilarity
  • Some More Applications
  • Limitations
  • Tutorial

8
WordNetSimilarity
  • An application measuring closeness of concepts
    in terms of their definitions
  • Main categories of measures
  • Path based
  • Depth based
  • Information Content Based
  • Gloss Based

9
WordNet Similarity Measures
  • Path Finder
  • Depth Finder
  • Wup (Wu and Palmer) Shortest path by scaling sum
    of values b/w node, root
  • Lch (Leacock and Chodrow) Shortest path by
    scaling the max path
  • Path Inverse of the Shortest Path measures
  • Information Content Finder
  • Resnik Max Distance b/w concepts of both words
  • Jcn (Jiang and Conrath) Inverses the difference
    between Sum and LCS
  • Lin Scales LCS IC with the description
  • Gloss Finder
  • Lesk (Banerjee and Pederson) Finds and scores
    overlaps between glosses
  • Vector (Padwardhan) Creates a co-occurrence
    matrix with glosses in vectors
  • Hso (Hirst and St-Onge) Specifies Direction
    between Words

Demo
10
LCH
Root
2
Medium of Exchange
D5
1
1
Money
Credit
1
1
Cash
Credit Card
1
Coin
Lch Related (Money-Credit) -log (2/10)
0.70
11
WUP
Root
2
Medium of Exchange
D5
1
1
Money
Credit
1
1
Cash
Credit Card
1
Coin
Wup ConSim (Money-Credit) 4/6 0.67
12
Path
Root
2
Medium of Exchange
D5
  • Inverse of the ShortestPath Measures

1
1
Money
Credit
1
1
Cash
Credit Card
1
Path (Money-Credit) 1/ min0.70, 0.67
1/0.67 1.5
Coin
13
Resnik
Medium of Exchange
Money
Credit
Cash
Credit Card
Coin
Resnik Sim (Money-Credit) -log (3/6) 0.30
14
Lin
Medium of Exchange
Money
Credit
Cash
Credit Card
Coin
Lin Sim (Money-Credit) log (6/6 3/6) 0.30
15
JCN
Medium of Exchange
Money
Credit
Cash
Credit Card
Jcn Dist (Money-Coin) log (3/6) log (2/6)
2log(6/6) 0.301 0.477 0.878
Coin
16
Lesk
17
Vector
18
HSO
  • Classfies the relations in WordNet as having
    directions. 
  • The Is-a relations are upwards.  The has-part are
    horizontal. 
  • Establishes a relationship b/w words through a
    path that is neither too long nor changes
    direction very often.

19
Demo
20
Todays Discussion
  • WordNet A Lexical Database
  • WordNetSimilarity
  • Some More Applications
  • Limitations
  • Tutorial

21
Applications
  • Building Semantic Concordances
  • Performance and Confidence in a Semantic
    Annotation Resnik Similarity Measure in Class
    Based Probabilities
  • Lch WordNet Similarity Measure in Word Sense
    Identification
  • Text Retrieval using Wordnet

22
Applications
  • Lexical Chains as Representations of Context for
    the Detection of Correction of Malapropisms
  • Temporal Indexing through Lexical Chaining
  • COLOR-X
  • Knowledge Processing on an Extended WordNet

23
Further Speculation
  • Sense Disambiguation
  • Information Retrieval
  • Semantic Relations and Textual Coherence
  • Knowledge engineering

24
The Limitations
  • Relation IS-NOT or NOT-A-KIND-OF is inexpressible
  • Relation IS-USED-AS-A-KIND-OF is also
    inexpressible
  • No Explicit Distinction between Proper and Common
    Nouns It was too difficult to include this
    information
  • Does not attempt to identify basic-level or
    generic categories. For the concepts in the
    middle of the lexical hierarchy, there can be
    many listed features that can identify the
    differences between words. WordNet doesnt
    support this.
  • Not enough semantic relations in Wordnet.

25
Tutorial
  • What is WordNet?
  • Why is WordNet unique?
  • What is the difference between WordNet and
    WordNetSimilarity
  • What are some of the limiting features?
  • Give an example of a human scenario, where
    WordNet would be instrumental

26
Tutorial
  • What Similarity measure would you use if you had
    only the following information
  • Path linkages between words in an ontology
  • Information Content of the Words
  • Gloss of the Words
  • An ontology with direction

27
References
  • Overview Pedersen, Ted and Patwardhan,
    Siddharth, and Michelizzi, Jason
    "WordNetSimilarity - Measuring the Relatedness
    of Concepts" In Proceedings of Fifth Annual
    Meeting of the North American Chapter of the
    Association for Computational Linguistics
    (NAACL-04), pp. 38-41, Boston, May 2004.
  • Lch Leacock, C., and Chodorow, M. 1998.
    Combining local context and WordNet similarity
    for word sense identification. In Fellbaum, C.,
    ed., WordNet An electronic lexical database. MIT
    Press. 265283.
  • Wup Wu, Z., and Palmer, M. 1994. Verb semantics
    and lexical selection. In 32nd Annual Meeting of
    the Association for Computational Linguistics,
    133138.
  • Res Resnik, P. 1995. Using information content
    to evaluate semantic similarity in a taxonomy. In
    Proceedings of the 14th International Joint
    Conference on Artificial Intelligence, 448453.
  • Lin Lin, D. 1998. An information-theoretic
    definition of similarity. In Proceedings of the
    International Conference on Machine Learning.
  • Jcn Jiang, J., and Conrath, D. 1997. Semantic
    similarity based on corpus statistics and lexical
    taxonomy. In Proceedings on International
    Conference on Research in Computational
    Linguistics, 1933.
  • Hso Hirst, G., and St-Onge, D. 1998. Lexical
    chains as representations of context for the
    detection and correction of malapropisms. In
    Fellbaum, C., ed., WordNet An electronic lexical
    database. MIT Press. 305332.
  • Lesk Banerjee, S., and Pedersen, T. 2003.
    Extended gloss overlaps as a measure of semantic
    relatedness. In Proceedings of the Eighteenth
    International Joint Conference on Artificial
    Intelligence, 805810.
  • Vector Patwardhan, S. 2003. Incorporating
    dictionary and corpus information into a context
    vector measure of semantic relatedness. Masters
    thesis, Univ. of Minnesota, Duluth.
  • Links availiable at http//www.comp.nus.edu.sg/a
    nubhavm/reading.htm

28
Thank You
  • Anubhav Madan
  • anubhavm_at_comp.nus.edu.sg
Write a Comment
User Comments (0)
About PowerShow.com