Computational Morphology - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Computational Morphology

Description:

The compile replace operation does not create any ill-formed reduplicates such as pelabuhanbagi. ... compile-replace algorithm merges roots and patterns to form stems ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 46
Provided by: Ern149
Category:

less

Transcript and Presenter's Notes

Title: Computational Morphology


1
Computational Morphology
  • Lauri Karttunen

2
Computational morphology
  • The big questions
  • Efficient generation and recognition
  • Common data format
  • Common "runtime" algorithm for all languages
  • Established results
  • Lexical representations are regular languages
  • Morphological alternations are regular relations
  • Regular relations can be compiled into
    finite-state transducers
  • Burning issues
  • Nonconcatenative phenomena reduplication
    (Malay), interdigitation (Arabic)
  • Nonlocal dependencies
  • Syntax/morphology interface

3
Overview
  • Computational morphology
  • A success story
  • Realizational Morphology (is finite-state)
  • Lexical representations
  • Realization rules
  • Morphophonological rules
  • Rules of referral
  • Elsewhere principle (Panini's principle)
  • Challenges

4
Computational morphology
5
Two challenges
  • Morphotactics
  • Words are composed of smaller elements that must
    be combined in a certain order
  • piti-less-ness is English
  • piti-ness-less is not English
  • Phonological alternations
  • The shape of an element may vary depending on the
    context
  • pity is realized as piti in pitilessness
  • die becomes dy in dying

6
Morphology is regular (rational)
  • The relation between the surface forms of a
    language and the corresponding lexical forms can
    be described as a regular relation.
  • A regular relation consists of ordered pairs of
    strings.
  • leafNPl leaves hangVPast hung
  • Any finite collection of such pairs is a regular
    relation.
  • Regular relations are closed under operations
    such as concatenation, iteration, union, and
    composition.
  • Complex regular relations can be derived from
    simple relations.

7
Morphology is finite-state
  • A regular relation can be defined using the
    metalanguage of regular expressions.
  • A regular expression can be compiled into a
    finite-state transducer that implements the
    relation computationally.

8
Regular morphotactics
  • Principles of word-formation in most languages
    can be defined as a regular language or relation
    using operators such as concatenation and union.
  • Toy example union, () optionality
  • Noun ear father
  • Adj clear clever fat
  • Adv ever
  • NPref anti
  • AdjSuff er est
  • NSuf s
  • English (NPref) Noun (NSuf) Adj (AdjSuff)
    Adv

9
Simple lexicon
a
e
n
i
t
f
a
t
a
a
r
l
e
c
v
e
h
r
a
e
s
t
r
v
e
r
f
s
e
e
a
t
h
e
(NPref) Noun (NSuf) Adj (AdjSuff) Adv
10
Regular alternations
  • Phonological alternations can be represented as
    regular relations using special regular
    expression operators
  • Ordered rewrite systems (Panini 500 BC,
    ChomskyHalle 1968)
  • Parallel two-level systems (Koskenniemi 1983)
  • Simple gemination rule for English
  • t -gt t t .. C V _ e r s t
  • Geminate t at the end of a monosyllabic stem
    with a single vowel that is followed by er or
    est. (fater -gt fatter vs. greater-gtgreater).

11
Transducer lexicon
a
e
n
i
t
f
a
t
a
a
r
l
e
c
v
e
h
r
a
e
s
t
r
v
e
r
f
e
s
e
0t
a
t
h
e
(NPref) Noun (NSuf) Adj (AdjSuff) Adv
.o. t -gt t t _ e r s t
12
Lexical transducer
  • Bidirectional generation or analysis
  • Compact and fast
  • Comprehensive systems have been built for over 20
    languages
  • English, German, Dutch, French, Italian, Spanish,
    Portuguese, Finnish, Russian, Turkish, Japanese,
    Basque, Greek, Arabic, Bulgarian,

13
Morphology is a solved problem
14
Who cares?
  • The success of computational morphology has not
    made any impact within linguistics.
  • Computational concerns
  • completeness of coverage, physical size, speed of
    application, formal power,
  • Academic concerns
  • explanation, universal principles,
    generalizations, theoretical predictions, elegant
    formalism,
  • Let's try to build a bridge

15
Realizational Morphology
  • Gregory Stump, Inflectional Morphology. A Theory
    of Paradigm Structure. Cambridge U. Press. 2001.
  • A rich set of notational conventions designed to
    capture important linguistic generalizations.
  • Interpretable, precise formalism.
  • Computational implementation in DATR (Finkel
    Stump 2002).
  • The good news Realizational morphology is a
    finite-state model.

16
Finite-state advantage
  • Casting Stump's system into a regular expression
    formalism that has a compiler has a fundamental
    advantage over implementation in systems such as
    DATR.
  • DATR can be used to generate an inflected surface
    form from its lexical representation but it is
    not directly usable for recognition. In contrast,
    finite-state transducers are bidirectional
    generator/recognizers.
  • Issues to be addressed
  • Lexical representations
  • Realization rules ( rules of exponence)
  • Morphophonological rules
  • Rules of referral
  • Rule ordering by general principles

17
Lexical representation
  • lt Stem, Featuresgt

A phonological representation
A set of morphological properties
18
Realization rule
phonological input
phonological output
features
  • RRn,t,C(ltX,sgt) def ltY', sgt

rule block
features realized by the rule
category
19
Rule application
  • Realization rules are ordered into blocks by the
    linguist.
  • Within blocks, the ordering is determined by
    specificity (Elsewhere rule, Panini's principle).
  • The final output of a realization rule may depend
    on morphophonological rules.
  • X " Y " Y'

20
Cascade of rule applications
ltbet, SubPer1, NumSg, ObjPer2, NumSg,
TnsPastRecgt
21
Observations
  • The lexical representations of Realizational
    Morphology constitute a regular language.
  • They can be described by a regular expression.
  • All examples of realization rules given in
    Stump's book represent regular relations.
  • They can be compiled compiled into finite-state
    transducers.
  • Because regular relations are closed under
    composition, the cascade of rule applications
    yields a single transducer.
  • We can eliminate the features from the surface
    side once the composition has been done.

22
Literal example
In a real application, one would prefer a more
parsimonious encoding of the feature structure.
23
Realization rules
  • Stump's realization rules can easily be expressed
    in Parc/XRCE regular expression formalism.
  • Example
  • RR3, ObjPer2, NumSg, V(ltX,sgt) def ltkoX, sgt
  • define R301 . . -gt ko "lt" _ ObjAgr 2
    Sg
  • "Rule R301 Insert ( rewrite the empty string
    as) "ko"
  • to the beginning of a phonological form whose
    object
  • agreement features contain the values 2 and Sg."

24
Morphophonological rules
  • The output of a realization rule may be subject
    to a morphophonological rule.
  • Stump's morphophonemic rules are simple rewrite
    rules, easily expressed in the Parc/XRCE regular
    expression formalism.
  • If XWvowel1 and YXvowel2Z, then the
    indicated volwel2 is absent from Y'.
  • Vowel -gt 0 Vowel "" _
  • where "" marks the place where the suffix is
    inserted.

25
Rules of referral
  • Realization rules may be defined in terms of
    other realization rules.
  • The same affix can express more than one bundle
    of morphological features (syncretism).
  • In Lingala, mo expresses class 4 singular 3rd
    person agreement for subjects and objects.
  • In the Parc/XRCE regular expression formalism, a
    rule of referral corresponds to a substitution
    operation.
  • If R305 is the object agreement rule, the
    corresponding subject agreement rule is
  • R305, Obj, Sub
  • It yields a transducer identical to R305 except
    that the insertion of mo is controlled by subject
    agreement features.

26
Elsewhere principle
  • While the rule blocks are ordered by the
    linguist, the realization rules within each block
    and the morphophonological rules are ordered by
    specificity.
  • A specific rule takes precedence over a more
    general rule in cases where both are applicable.
  • This principle is very important for Stump. But
    he gives no precise definition for it within his
    formalism.
  • The Elsewhere Principle is an extremely simple
    notion for realization rules and for
    symbol-to-symbol morphophonological rules in a
    finite-state model.

27
Specific vs. General
28
Input/Output languages
  • Rule A and Rule B have the same input language
    the universal ( "sigma star") language.
  • Both rules can be applied without failure to any
    string. If the context is not met, the output is
    the same as the input.
  • The output languages are not the same.
  • A "successful" application an obligatory rule
    removes from the output language the strings to
    which it has applied.
  • Every string missing from the output language of
    Rule B is missing from the output language of
    Rule A, but not vice versa.
  • The output language of Rule A is a proper subset
    of the output language of Rule B.

29
Output language of Rule A
Rule A
k -gt 0 Vowel _ Vowel
Rule B
k -gt v u _ u
30
Output language of Rule B
Rule A
k -gt 0 Vowel _ Vowel
Rule B
k -gt v u _ u
31
Principled rule ordering
  • The relationship of any two rules A and B that
    insert a string or replace a particular symbol
    can be determined by the following method
  • Extract the output languages (a finite-state
    operation).
  • Check whether one is the proper subset of the
    other (a finite-state operation).
  • This determination can be done efficiently and
    without any knowledge of how the rules were
    expressed.

32
Discussion
  • It is evident that Realizational Morphology is
    yet another variant of finite-state morphology.
  • Stump could say "Your theory is a notational
    variant of mine but mine is better."
  • There are many examples where notation matters
  • B gt A _ C "B must occur between A and C."
  • ? A B ? ? B C ?
  • Stump's convoluted and cumbersome notation takes
    no advantage of the nice formal and computational
    properties that it in fact has. It is a
    finite-state model that does not know its name.

33
Morphotactic challenges
  • Most languages build words by concatenation
  • unthinkingly
  • parismutnngauniraqlauqsimanngitjunga
    (Inuktitut)
  • (parimunngauniralauqsimanngittunga I never said
    I wanted to go to Paris)
  • Some languages also have nonconcatenative
    processes of word formation
  • Arabic interdigitation
  • Malay reduplication

34
Interdigitation in Arabic
Concatenative kuutib a
stem suffix
The root, template and vocalization morphemes
interdigitate into a stem.
35
Full-stem reduplication in Malay
  • In Malay, the overt plural of bagi (suitcase)
    is bagibagi (orthographically bagi-bagi) the
    plural of peraturan (rule) is
    peraturanperaturan, etc.
  • To model such pluralization, you need to copy the
    stem, no matter what it is and no matter how long
    it is.
  • Such full-stem reduplication appears to be far
    beyond finite-state power
  • The copy language, ww w e L, is
    context-sensitive.

36
Compile-replace a new algorithm
  • Define networks using concatenation, as before,
    but in such a way that the paths in the network
    may themselves contain regular expressions.
  • Reapply the compiler to its own output, compiling
    the regular expression substrings and replacing
    them with the result of the compilation.

37
A non-linguistic example before compile-replace
Network containing a regular expression,
a delimited with and .
38
Non-linguistic example after compile-replace
Maps every string in the infinite a language to
the regular expression from which the language
was compiled.
39
Iteration operator
  • n
  • A2 denotes two concatenations of the language A
    with itself, equvalent to A A.
  • A bagi, pelanbuhan,
  • A2 bagibagi, bagipelanbuhan, pelanbuhanbagi,
    pelanbuhanpelanbuhan.
  • Finite-state languages and relations are closed
    under n-ary concatenation.

40
Compile-replace in Malay
  • Before
  • Lemma b a g i Noun Plural
  • Underlying form b a g i 2
  • After
  • Lemma b a g i Noun Plural
  • Surface string b a g i b a g i
  • The compile replace operation does not create any
    ill-formed reduplicates such as pelabuhanbagi.

41
Merge operators for Arabic
  • Merge a Filler into a Template
  • .mgt. is the merge to the right operator and
  • .ltm. is the merge to the left operator.
  • k t b .mgt. C V V C V C
  • k V V t V b
  • k V V t V b .ltm. u i
  • k u u t i b

42
Compile-replace in Arabicbefore and after
  • Before
  • Lemma k t b Root C V C V C Template a
    Voc
  • Underlying k t b .mgt. C V C V C .ltm. a
  • After
  • Lemma k t b Root C V C V C Template a
    Voc
  • Surface k a t a b
  • Alternation rules apply to the interdigitated
    stems to produce the real surface strings.

43
XRCE Arabic
  • Lexicon
  • 4930 roots
  • 400 phonologically distinct patterns
  • 90,000 stems
  • 72 million words
  • Rules
  • 66 alternation rules for deletion, assimilation,
    etc.
  • Construction
  • compile-replace algorithm merges roots and
    patterns to form stems
  • composition with alternation rules creates the
    final transducer with optional vowels
  • time required a few minutes

44
Conclusion
  • Computationally, morphology is a solved problem.

Syntax-morphology interface
45
References
  • Lauri Karttunen, "Computing with Realizational
    Morphology" in CICLing-2003, A. Gelbukh (ed.),
    Lecture Notes in Computer Science 2588, pages
    205-216. Springer Verlag. 2003.
  • For a copy write to karttune_at_parc.com
  • This PowerPoint presentation will be available at
    a local web site.
  • Kenneth R. Beesley Lauri Karttunen,
    Finite-State Morphology, CSLI Publications.
    February 2003. (Software included).
Write a Comment
User Comments (0)
About PowerShow.com