Enhancing Wordnet with Morphological Knowledge - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Enhancing Wordnet with Morphological Knowledge

Description:

Bound and free morphemes computer-less, happi-ness, re-create. Derivational parser interface ... compile a suitable format bound morpheme dictionary ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 17
Provided by: E20084
Category:

less

Transcript and Presenter's Notes

Title: Enhancing Wordnet with Morphological Knowledge


1
Enhancing Wordnet with Morphological Knowledge
  • May 28th 2007
  • Emilia Apostolova
  • School of Computer Science, Telecommunication and
    Information Systems
  • DePaul University
  • emilia.aposto_at_gmail.com

2
Enhancing Wordnet with Morphological Knowledge
  • Background
  • Wordnet
  • The Princeton Wordnet (PWN)
  • Limitations of PWN
  • Problem definition Motivation and Hypothesis
  • Design Data and Data Analysis
  • Deliverables and Schedule

3
Background - WordNet
  • Wordnet
  • A semantic lexicon A machine-readable lexical
    database organized by meanings
  • The Princeton Wordnet (PWN)
  • Started in 1985 at the Cognitive Science
    Laboratory of Princeton University.
  • As of 2006, the database contains about 150,000
    words, a total of 207,000 word-sense pairs.
  • Groups words into synsets, provides short,
    general definitions, and records the various
    semantic relations between these synonym sets.
    (antonyms, broader term, narrower term, used for,
    etc)

4
Background PWN
5
Background PWN
  • Motivation
  • Traditional dictionaries organized on
    historical principals.
  • Advances in psycholinguistics and NLP.
  • Purpose
  • To produce a combination of dictionary and
    thesaurus that is more intuitively usable.
  • To support automatic text analysis and
    artificial intelligence applications.

6
Background WordNets
  • EuroWordNet - Dutch, Italian, Spanish, German,
    French, Czech and Estonian
  • The Global WordNet Association Wordnets being
    developed for 52 languages.

7
Background Limitations of PWN
  • No morphological knowledge.
  • Morpheme
  • The smallest meaningful unit of a language.
  • joy-less
  • re-create
  • happi-ness
  • comput-er-ize
  • tree-s

8
Background Limitations of PWN Contd
  • Originally no plans to include any morphological
    knowledge.
  • Inflectional morphological parser interface
    built.
  • Miller, Fellbaum, and Miller suggest that it
    became obvious that programs dealing with
    derivational morphology would greatly enhance the
    value of WordNet, but that more ambitious project
    hasnt been undertaken yet
  • English has a limited inflectional system, but
    a very complex and productive derivational
    morphology.
  • compute (root) - computer, computerize,
    computerization, recomputerize, noncomputerized,
    computerless, etc.
  • Impossible to list exhaustively in a lexicon.

9
Problem Definition
  • Give Wordnet the ability to recognize the
    derived forms of a word (including coined terms
    or inventive uses of language) that might occur
    in natural text by proposing an integrated
    morpheme dictionary and a derivational parser
    implementation. The expectation is that
    morphological parsing and morphemic semantic
    knowledge will increase significantly the
    semantic meaning derived from the texts using the
    PWN system.

10
Problem Definition Contd
  • Integrated morpheme dictionary
  • Bound and free morphemes computer-less,
    happi-ness, re-create
  • Derivational parser interface
  • A parser that can break down an English word
    into its morphemes
  • re-computer-ize

11
Motivation
  • English has a rich derivational morphology. PWN
    used in numerous research projects (more than 420
    listed at the official Wordnet bibliography
    site). PWN lacks morphological knowledge.
  • An increasing number of research projects aim at
    adopting WordNet to specialized scientific
    domains to aid NLP tasks in the corresponding
    fields. The relative frequency of
    neologisms(coined words) in scientific texts
    appears to be higher than in general public
    texts.
  • WordNet is the structural basis for wordnets in
    different languages, most of which have much
    richer morphology than the English language.

12
Hypothesis
  • Morphological parsing and morphemic semantic
    knowledge will increase significantly the
    semantic meaning derived from natural texts using
    the PWD system.

13
Design
  • Data
  • Two different types of English language corpora
    will be used to evaluate the system - general
    public news texts and specialized computer
    science research papers.
  • Wordnet will be used to derive the meaning of
    words for the sample texts. The number of misses
    (words for which WordNet returned no meaning) for
    each type of text with and without derivational
    logic will be recorded.
  • Independent variable
  • Presence of morphological semantic knowledge and
    parser.
  • Dependent variable
  • Number of misses (words without definition) in
    the sample texts.

14
Deliverables and Schedule
  • Integrated Morpheme Dictionary (4 weeks)
  • - compile a suitable format bound morpheme
    dictionary
  • - extend the WordNet schema and integrate into
    the existing database
  • Integrated derivational parser interface (4
    weeks)
  • attempt to modify and integrate, if
    possible, existing English language derivational
    parser, for example PC-KIMMO
  • Find and prepare English language corpora text
    samples (1 week)

15
Deliverables and Schedule Contd
  • Prepare and run a script to extract word
    definitions for the words in the sample text.
    Provide a list of newly found words and their
    meanings. (1 week)
  • Manually inspect the new meanings found through
    morphological knowledge and mark as hits only
    ones whose meaning is, at least partially, a sum
    of the meaning of their morphemes. (1 week)
  • Analyze the results and prepare a final report.
    (2 weeks)

16
??
Write a Comment
User Comments (0)
About PowerShow.com