Minimally Supervised Morphological Analysis by Multimodal Alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Minimally Supervised Morphological Analysis by Multimodal Alignment

Description:

The Algorithm capable of inducing inflectional morphological analyses of regular ... of the candidate noun, verb and adjective roots (from dictionary), and any rough ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 30
Provided by: eda8
Category:

less

Transcript and Presenter's Notes

Title: Minimally Supervised Morphological Analysis by Multimodal Alignment


1
Minimally Supervised Morphological Analysis by
Multimodal Alignment
  • David Yarowsky
  • and
  • Richard Wicentowski

2
Introduction
  • The Algorithm capable of inducing inflectional
    morphological analyses of regular and highly
    irregular forms.
  • The Algorithm combines four original alignment
    models based on
  • Relative corpus frequency.
  • Contextual Similarity.
  • Weighted string similarity.
  • Incrementally retrained inflectional transduction
    probabilities.

3
Lectures Subjects
  • Task definition.
  • Required and Optional resources.
  • The Algorithm.
  • Empirical Evaluation.

4
Task Definition
  • Consider this task as three steps
  • Estimate a probabilistic alignment between
    inflected forms and root forms.
  • Train a supervised morphological analysis learner
    on a weighted subset of these aligned pairs.
  • Use the result from step 2 to iteratively refine
    the alignment in step 1.

5
Example (POS)
  • Definitions

6
Task Definition cont.
  • The target output of step 1

7
Required and Optional resources
  • For the given language we need
  • A table of the inflectional Part of Speech (POS).
  • A list of the canonical suffixes.
  • A large text corpus.

8
Required and Optional resources cont.
  • A list of the candidate noun, verb and adjective
    roots (from dictionary), and any rough mechanism
    for identifying the candidates POS of the
    remaining vocabulary. (not based on morphological
    analysis).
  • A list of the consonants and vowels.

9
Required and Optional resources cont.
  • A list of common function words.
  • A distance/similarity tables generated on
    previously studied languages.

Not essential
If available
10
The Algorithm
  • Combines four original alignment models
  • Alignment by Frequency Similarity.
  • Alignment by Context Similarity.
  • Alignment by Weighted Levenshtein Distance.
  • Alignment by Morphological Transformation
    Probabilities.

11
Lemma Alignment by Frequency Similarity
  • The motivating dilemma

12
Lemma Alignment by Frequency Similarity cont.
  • This Table is based on relative corpus frequency

13
Lemma Alignment by Frequency Similarity cont.
14
Lemma Alignment by Frequency Similarity cont.
  • A problem the true alignments between
    inflections are unknown in advance.
  • A simplifying assumption the frequency ratios
    between inflections and roots is not
    significantly different between regular and
    irregular morphological processes.

15
Lemma Alignment by Frequency Similarity cont.
  • Similarity between regular and irregular forms

16
Lemma Alignment by Frequency Similarity cont.
  • The expected frequency should also be estimable
    from the frequency of any of the other
    inflectional variants.
  • VBD/VBG and VBD/VBZ could also be used as
    estimators.

17
Lemma Alignment by Frequency Similarity cont.
18
Lemma Alignment by Context Similarity
  • Based on contextual similarity of the candidate
    form.
  • Computing similarity between vectors of weighted
    and filtered context features.
  • Clustering inflectional variants of verbs (e.g.
    sipped, sipping, and sip).

19
Lemma Alignment by Context Similarity cont.
  • Example

20
Lemma Alignment by Weighted Levenshtein Distance
  • Consider overall stem edit distance.
  • A cost matrix with initial distance costs
  • initially set to (0.5,0.6,1.0,0.98)

21
Lemma Alignment by Morphological Transformation
Probabilities
  • The goal is to generalize a mapping function via
    a generative probabilistic model.

22
Lemma Alignment by Morphological Transformation
Probabilities
  • Result table

23
Lemma Alignment by Morphological Transformation
Probabilities cont.
unique
24
Lemma Alignment by Morphological Transformation
Probabilities cont.
Example
25
Lemma Alignment by Morphological Transformation
Probabilities cont.
  • Example
  • P(solidified solidify, ed, VBD)
  • P(y?i solidify, ed, VBD)
  • ?1P(y?i ify, ed)
  • (1-?1)( ?2P(y?i fy, ed)
  • (1-?2)( ?3P(y?i y, ed)
  • (1-?3)( ?4P(y?i ed)
  • (1-?4) P(y?i)

POS can be deleted
26
Lemma Alignment by Model Combination and the
Pigeonhole Principle
  • No single model is sufficiently effective on its
    own.
  • The Frequency, Levenshtein and Context Similarity
    models retain equal relative weight.
  • The Morphological Transformation Similarity model
    increases in relative weight.

27
Lemma Alignment by Model Combination and the
Pigeonhole Principle
  • Example

28
Lemma Alignment by Model Combination and the
Pigeonhole Principle cont.
  • The final alignment is based on the pigeonhole
    principle.
  • For a given POS a root shouldn't have more than
    one inflection nor should multiple inflections in
    the same POS share the same root.

29
Empirical Evaluation
  • Performance
Write a Comment
User Comments (0)
About PowerShow.com