Computational Morphology - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Computational Morphology

Description:

Words can be formed by free morphemes only, bound morphemes only, or free and bound morphemes. ... Is it misspelled? Generation. Produce grammatical output (for ... – PowerPoint PPT presentation

Number of Views:187
Avg rating:3.0/5.0
Slides: 35
Provided by: ariadnafo
Category:

less

Transcript and Presenter's Notes

Title: Computational Morphology


1
Computational Morphology
  • Algorithms for NLP
  • Fall 2009

2
Background
  • Morphology
  • The study of the structure of words.
  • Quick, quicker, quickest, quickly, quicken
  • Hablo, hablas, habla, hablamos, habláis, hablan
  • Morphemes
  • Smallest units of meaning.
  • Or smallest recurring units.
  • Morphologymorphemes as syntaxwords.

3
Morphemes
  • Express concepts or relationships.
  • Ex car, table, anti-, re-.
  • Express syntactic features.
  • Ex number (singular, plural), tense (present,
    past, future), gender (masculine, feminine).

4
Morphemes (cont.)
  • Morph
  • Morphemes as parts of a word.
  • Car the morpheme car is realized as the morph
    car to form the word car.
  • Cars the morpheme car and the plural morpheme
    are realized as car and s respectively, to form
    the word cars.
  • Allomorphs
  • The different forms of a morpheme.
  • Ex the plural morpheme in English has several
    allomorphs (es, s, stem vowel alteration,
    etc.).
  • Ex take, took.

5
Morphemes (cont.)
  • Free morphemes
  • Can form words by themselves.
  • Ex Car, dog.
  • Bound morphemes
  • Must be combined with other morphemes to form
    words.
  • Ex Plural morpheme, anti-.
  • Words can be formed by free morphemes only, bound
    morphemes only, or free and bound morphemes.

6
Morphology
  • Study of the rules that govern the combination of
    morphemes.
  • Inflection same word, different syntactic
    information
  • Run/runs/running, book/books
  • Derivation new word, different meaning
  • Often different part of speech, but not always
  • Possible/possibly/impossible, happy/happiness
  • Compounding new word, each part is a word
  • Blackbird, firefighter, hardhat
  • Water hose, garden hose, rubber hose, fire hose

7
Computational Morphology
  • Analysis (words ? encoded meaning)
  • Take a sequence of characters as input, and
    produce an analysis of the information encoded in
    the characters.
  • Ex Plays -gt (play/noun/plural) or (play/verb/3rd
    person/singular/present).
  • Generation (meaning ? words)
  • Generate words from a set of features.
  • Ex (run/verb/1st person/singular/past) -gt ran

8
Motivation
  • Analysis
  • Some non-English languages encode more syntax in
    morphology than in syntax
  • Even spell checking is impossible without
    morphological analysis of surface words
  • Is renationalizationability in the lexicon? Is
    it misspelled?
  • Generation
  • Produce grammatical output (for NLG, MT)

9
Role in Analysis of NL
  • Sequence of Characters
  • Tokenizer
  • Morphological Analysis
  • (lexicon morphology rules)
  • (Syntactic Analysis)

10
Applications
  • Machine translation
  • Spell checker
  • Grammar checker
  • Information retrieval
  • NLG
  • Etc.

11
Techniques
  • List all possible words (full-form lexicon).
  • No need for rules.
  • Each word is listed with all necessary features.
  • Size of comprehensive lexicon may be enormous
  • Feasible? For some languages, yes. Others, No!
  • List base forms in lexicon and perform analysis
    and/or generation
  • Encode all morphemes and rules.
  • More manageable numbers.
  • More complex, but a variety of strategies are
    available.

12
Finite State Approaches
  • Lexicon can be built as a finite state
    transducer.
  • Letter trees.
  • Too much work to do manually.
  • Cascading automata (Kay and Kaplan).
  • Generative morphology.
  • Resulting transducer is very large.
  • Two-level morphology (Kimmo Koskenniemi).
  • Antworth. Introduction to Two-level Phonology,
    1990. http//www.sil.org/pckimmo/two-level_phon.ht
    ml

13
Generative Morphology
  • Generative rules are rewriting rules (thus
    uni-directional).
  • Generative rules are sequential.
  • Generative morphology involves rewriting an
    underlying form into its surface form, via zero
    or many intermediate forms (levels).

14
Generative Morphology (cont.)
  • Example rules
  • Vowel Raising1 e -gt i / __ C0i (C means
    consonants, C0 means zero or more consonants)
  • Palatalization2 t -gt c / __ i
  • Example rule application
  • UR temi Rule 1 timiRule 2 cimi SR
    cimi
  • Reversing the orderinggives wrong answer!

intermediate levels
15
Two-level Morphology
  • Rules are correspondence rules (thus
    bi-directional).
  • Rules are executed in parallel.
  • Two-level morphology involves specifying the
    valid correspondence between segments in the
    lexical form (underlying) and in the surface form.

16
Two-level Morphology (cont.)
  • Example rules
  • Vowel Raising1 ei ltgt ___ CC _at_i (_at_ is
    wildcard, means zero or many)
  • Palatalization2 tc ltgt __ _at_i
  • Example rule applicationUR temiSR cimi

Two levels
2
1
17
Two-level Morphology (cont.)
  • Surface level
  • How words are realized.
  • Ex Buys, misbehaving
  • Lexical level
  • How words are formed from morphemes in the
    lexicon.
  • Ex buys, misbehaveing

18
Two-level Morphology (cont.)
  • Uses finite-state transducers (Mealy machine).
  • Bi-directional.
  • Alternative view two-tape FSA.
  • Knowledge encoded into rules, which are then
    compiled into finite-state machines.

19
Two-level Morphology (cont.)
  • Surface string.
  • moved
  • Lexical string.
  • moveed
  • Aligned correspondence.
  • moveed
  • move00d
  • You can think of these as the letters in the two
    tapes of a two-tape FSA, or the input and output
    of a FST.

20
Two-level Rules
  • We need rules to describe
  • Changes between lexical and surface forms.
  • The context in which these changes occur.
  • Whether the change is obligatory or optional.

21
Two-level Rules (cont.)
  • General form of rules
  • xy op LC _ RC
  • xy is the pair of lexical symbol and surface
    symbol.
  • op is the operator that defines the type of rule.
  • LC is the left context of the pair.
  • RC is the right context of the pair.

22
Two-level Rules (cont.)
  • Operators and types of rules
  • Context Restriction Rule (gt)
  • The specified pair can occur in the given
    context.
  • Surface Coercion Rule (lt)
  • the change (lexical to surface) must occur in
    the given context.
  • Composite Rule (ltgt)
  • An abbreviation of a combination of the two types
    above. Can be viewed as if and only if.

23
Two-level Rules (cont.)
  • Multiple Context Restriction rules covering the
    same pair are interpreted disjunctively at least
    one of the contexts must be satisfied
  • Pair gt (context1 context2 )
  • Multiple Surface Coercion rules covering the same
    pair are interpreted conjunctively
  • (Context1 gt pair) (context2 gt pair)
  • Equivalent to (context1 context2 ) gt pair

24
Two-level Rules (cont.)
  • Example the English word moved analyzed as
    moveed.
  • m o v e e d
  • m o v e 0 0 d
  • Null symbols 0 on the surface level.
  • Lexical symbols such as on the lexical level.
  • Transducer is defined on pairs of symbols,
    denoted xy.
  • Transducer is deterministic (on pairs), but may
    not be defined on all possible pairs.
  • Wild-card symbol _at_.
  • Symbols can be grouped into sets.

25
Two-level Rules (cont.)
  • Rules for the example (and for the example only,
    not rules for English morphology)
  • 0 ltgt m o v e _ e0 d
  • e0 ltgt m o v e 0 _ d
  • Identity pairs (xx) can be abbreviated as
    single symbols or set names.
  • We can build a corresponding transducer.

26
Two-level Rules (cont.)
  • Another example
  • yi gt C (0) _ 0
  • C is the set of consonants (and V is the set of
    vowels).
  • This rule says that if we see yi, the context
    must be the specified one (otherwise reject).
  • The left context is a consonant that could be
    followed (or not) by 0 (optional morpheme
    boundary). We can also use regular expressions
    to describe contexts.

27
Two-level Rules (cont.)
  • Another example adding e in pluralization such
    as in tax taxes only after x, z, c, ch, s, sh,
    or lexical y realized as surface i.
  • e ltgt xzc(h)s(h)yi _ s

28
Two-level Rules (cont.)
  • Example possessives of plurals
  • Bookss
  • Book0s0s
  • Blemish0ss
  • Blemish0es00
  • s0 lt 0 (0e) ss 0 _

29
Two-level Rules (cont.)
  • Example
  • Tieing
  • Ty00ing
  • iy ltgt _ e0 0 i

30
Selecting the Type of Rule
  • gt no yes
  • lt yes no
  • ltgt yes yes

Is ab the only pair allowed in this context?
Is ab allowed in this context only?
31
Converting Rules to FSA
  • Conversion schema for context restriction rules
    xy gt LC_RC
  • A single FSA for each rule
  • All machines across all rules must accept for
    lexical/surface pair to be correct
  • Assumption a rule may be applied only once per
    input string

32
Converting Rules to FSA
  • Conversion schema for context restriction rules
    xy gt LC_RC

_at__at_
LC
LC
xy
RC
?LC
?(xy)
?RC
LC
?LC
xy
?(xy)
_at__at_
33
Converting Rules to FSA
  • xy gt LC_RC
  • Under different assumptions
  • Multiple occurrences of pair allowed
  • LC and RC not allowed to overlap

LC
LC
RC
xy
?LC
?(xy)
?RC
LC
?LC
xy
?(xy)
_at__at_
34
Converting Rules to FSA (cont.)
  • What about lt and ltgt?
  • Similar idea
  • What about multiple rules?
  • More complicated
  • Collection of rules can be compiled into one
    large FST
  • Xerox has a software package that implements
    this PC-Kimmo
Write a Comment
User Comments (0)
About PowerShow.com