New Developments in WordNet: - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

New Developments in WordNet:

Description:

Main relations hyponymy/troponymy (kind-of/way-to), meronymy (part-whole), synonymy, ... biography/biographer/biographical. song/sing/songster. deception/deceive ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 26
Provided by: csCol9
Category:

less

Transcript and Presenter's Notes

Title: New Developments in WordNet:


1
New Developments in WordNet An Electronic
Lexical Resource
Helen Langone Columbia University Princeton
Cognitive Science Laboratory
2
Classic WordNet
  • A lexical semantic network relating word forms
    and lexicalized concepts (i.e., concepts that
    speakers have adopted word forms to express)
  • Main relationshyponymy/troponymy
    (kind-of/way-to), meronymy (part-whole),
    synonymy, antonymy
  • Predominantly hierarchical, few relations across
    grammatical class, glosses example sentences do
    not participate in network
  • Nouns organized under 9 unique beginners
  • Command-line interface C library
  • Prehistoric (but greppable!) db format

3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
Unique beginner synsets
7
(No Transcript)
8
What's ahead/new...
  • Instance-of pointers
  • Morphosemantic links
  • Ontological reorg
  • Glosses disambiguated, parsed and translated into
    logical forms
  • SQL database access Perl library
  • Alternative forms, inflected forms

9
Instance-of pointer
  • Classic WordNet hyponymy relation made no
    distinction between subsumption (type or kind of
    x) and instantiation (is an instance of an x)
  • e.g. George Washington is an Instance-of
    United_States_President, not a Kind-of
  • e.g. a playwright is a Kind-of writer, author,
    while Shakespeare is an Instance-of the type
    dramatist, playwright

10
Morphosemantic links
  • Links between word forms motivated by both form
    and meaning.
  • verb-noun (write/writer), noun-adj
    (duplicity/duplicitous), verb-adj
    (talk/talkative), noun-noun (friend/friendship)
  • Not the sense of morphosemantics used in
    morphology for form-meaning correspondences.
  • For polysemous forms, relates only sense-specific
    derivationally-related forms e.g.
    digest/digestion. 2 distinct senses of the verb
    digest (physiological, psychological) correspond
    with 2 distinct senses of the noun digestion.
  • How is this different from Porter the like?

11
Porter stemmer
  • Pattern-matches on word endings
  • Misses many regular form-form correspondences
  • biography/biographer/biographical
  • song/sing/songster
  • deception/deceive
  • Conflates unrelated forms
  • amor--gtamorous, amorously, amorousness, amoral,
    amorally, amoralism, amorality...!

12
Derivationally-related forms of myth
mythological
mythologic
mythology
mythicise
mythologize
mythologization
myth
mythologise
mythicize
mythologisation
mythic
mythical
mythologist
13
Porter-stemmed forms of myth
mytholog-
mythological
mythologic
mythicis-
mythology
mythicise
mythologize
mythologization
myth
mythologis-
myth-
mythologise
mythicize
mythologisation
mythic
mythical
mythologist
mythic-
mythologist-
14
(No Transcript)
15
Why should we care about the semantic part?
  • Catvar (Habash and Dorr, 2003) clusters over
    100,000 word forms into 63,000 clusters based on
    morphological relatedness.
  • Viegas et al.'s (1996) lexical rules
    automatically derive related words from a shared
    stem.
  • Neither consider polysemylike Porter, they lump
    together all forms having the same stem.
  • Relating forms that derive from different senses
    will affect language understanding.

16
(No Transcript)
17
(No Transcript)
18
Why should we care about the semantic part?
19
Why should we care about the semantic part?
20
Reorg of the top levels
  • All noun hierarchies are now subsumed under a
    single synset, entity .
  • The reorganization brings WordNet more in
    alignment with ontologies that would map into it
    (e.g., SUMO)
  • The noun file is now structured more like a
    general-purpose ontology, but be aware multiple
    inheritance exists.

21
(No Transcript)
22
Disambiguation parsing of the glosses
  • Classic WordNet only synsets and entry word
    forms participate in the network of relations
  • Sense-tagging is a process of disambiguationa
    word form is linked to its context-appropriate
    sense (e.g., run a company vs. run a race)
  • Sense-tagging will do the equivalent of
    hyper-linking every open-class word in the
    glosses to every other semantically-related
    word/concept in WordNet
  • Will add an additional 800,000 direct links to
    the 120,000 bidirectional links currently in
    place the number of words/synsets indirectly
    reachable will be far greater.
  • Disambiguated glosses will be parsed and
    translated into first order predicate logic (by
    Jerry Hobbs at USC/ISI)

23
WNDEV
  • Current database format poses some limitations
  • Restricts the number of kinds of relations
    searches possible
  • Separate files for each part of speech means that
    changes affecting more than one file requires
    coordination among lexicographers
  • Unstructured format means that formatting errors
    may not be caught
  • Byte offset access means maintaining separate
    editable and compiled versions of the data, and a
    lengthy grind-and-fix-parse-errors process
  • Adding to the lexicon must be done manually using
    an arcane syntax no automatic means exists for
    loading entries

24
(No Transcript)
25
WNDEV
  • WordNet development environment a suite of tools
    for developing, editing, and using WordNet
  • SQL database Perl library
  • Graphical interface for editing synsets, linking,
    creating new relations, etc.
  • Tools for automatic loading of externally-created
    entry sets, database patches, and format
    conversions.
Write a Comment
User Comments (0)
About PowerShow.com