WordNet - PowerPoint PPT Presentation

About This Presentation
Title:

WordNet

Description:

... are organized into sets of synonyms, each representing a lexicalized concept. ... synset: A synonym set; a set of words that are interchangeable in some context ... – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 18
Provided by: facultyWa9
Category:
Tags: wordnet | synonyms

less

Transcript and Presenter's Notes

Title: WordNet


1
WordNet
  • WordNet, WSD

2
WordNet
  • What is WordNet?
  • Miller 95 WordNet is an online lexical
    database designed for use under program control.
    English nouns, verbs, adjectives, and adverbs are
    organized into sets of synonyms, each
    representing a lexicalized concept. Semantic
    relations link the synonym sets.

3
WordNet
  • Go to the main WordNet site
  • http//wordnet.princeton.edu/
  • Open the wordnet folder on pongo
  • /dropbox/570/wordnet/dict

4
WordNet Vocabulary
  • See glossary at http//wordnet.princeton.edu/glo
    ss
  • synset A synonym set a set of words that are
    interchangeable in some context
  • lemma lower case ASCII text of word as found in
    the WordNet database index files
  • lexical pointer A lexical pointer indicates a
    relation between words in synsets

5
Navigating WordNet files
  • data. files the actual network files (synsets)
  • index. files contains lower case instances of
    all words in WordNet, with pointers to the synset
    entries in the network

6
WordNet data file
Synset file offset
Synset type
File number
words in synset
word
  • 00045430 04 n 01 performance 3 003 _at_ 00033580 n
    0000 00045680 n 0000 00045874 n 0000 any
    recognized accomplishment "they admired his
    performance under stress
  • 00045680 04 n 01 overachievement 0 003 _at_ 00045430
    n 0000 02537922 v 0101 ! 00045874 n 0101
    better than expected performance (better than
    might have been predicted from intelligence
    tests)

pointers to other synsets
Type of pointer
POS
Pointer
See wndb
7
Pointer symbols
  • For nouns
  • !    Antonym _at_    Hypernym      Hyponym m  
     Member holonym s    Substance holonym p  
     Part holonym m    Member meronym s  
     Substance meronym p    Part meronym  
     Attribute    Derivationally related form    
       

See wninput
8
WordNet index file
lemma (word)
POS
pointers
pointers
  • abomination n 3 2 _at_ 3 0 09613960 07401317
    00734041

synset file offset
synsets
9
WordNet tools
  • Many, many tools
  • General documentation
  • http//wordnet.princeton.edu/doc
  • Online query and lookup
  • http//wordnet.princeton.edu/perl/webwn
  • APIs and tools http//wordnet.princeton.edu/link
    s
  • WordNetsimilarity
  • http//wn-similarity.sourceforge.net/
  • WordNetsimilarity web interface
  • http//marimba.d.umn.edu/cgi-bin/similarity/simila
    rity.cgi

10
WordNet and WSD
  • Milhalcea 2002 describes system to sense encode
    text using WordNet (and related tools and
    resources)

11
Milhalcea 2002
  • Some tools and resources described
  • Senseval
  • http//www.senseval.org/
  • Evalutation exercises for Word Sense
    Disambiguation
  • Senseval-1 3, held in last several years,
    workshops at ACL
  • Senseval-4 coming up
  • Data and materials from Senseval-3 can be
    downloaded
  • Some useful materials for multiple languages
  • Materials and test data for English, Italian,
    Basque, Catalan, Chinese, Romanian, and Spanish

12
Milhalcea 2002
  • Some tools and resources described
  • Semcor
  • Sense tagged Brown corpus
  • Created at Princeton
  • Used for training WSD systems
  • Can be downloaded from Milhalceas web site
  • http//www.cs.unt.edu/rada/downloads.html
  • Were also planning on installing it on Pongo

13
McCarthy et al 2004
  • Task find the predominant word senses in
    untagged text
  • Unlike Milhalcea 2002, did not rely on supervised
    method using SemCor
  • Built a thesaurus from raw text and Wordnet
  • Intuition word sense more likely to be
    determined from untagged corpus from context,
    affected by genre, domain or text type
  • Rather than relying on SemCors 250,000 words,
    where the word senses are rather limited

14
McCarthy et al
  • Thesaurus development relies on dependencies
    between neighbors
  • Look at distributional similarities between a
    word and its neighbors

15
McCarthy et al
  • Experimented with several similarity measures
    available in WordNetsimilarity
  • First experiment used SemCor to see how well the
    unsupervised system worked
  • 2595 polysemous nouns in SemCor

16
McCarthy et al
  • Experiment 2 against SENSEVAL-2 English All
    Words Data
  • Comparison between the precision and recall for
    SemCor vs. their automatic data (and the SENSEVAL
    ceiling)

17
McCarthy et al
  • Some experiments with domain specific corpora
    gave these results
Write a Comment
User Comments (0)
About PowerShow.com