Preliminary Experiments in Morphological Evolution - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Preliminary Experiments in Morphological Evolution

Description:

Why do words fall into different inflectional 'equivalence classes' ... Hare, M. and Elman, J. L. (1995) Learning and morphological change. Cognition, 56(1):61--98. ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 63
Provided by: richar782
Category:

less

Transcript and Presenter's Notes

Title: Preliminary Experiments in Morphological Evolution


1
(Preliminary)Experiments in Morphological
Evolution
  • Richard Sproat
  • University of Illinois at Urbana-Champaign
  • rws_at_uiuc.edu
  • 3rd Workshop on "Quantitative Investigations in
    Theoretical Linguistics" (QITL-3)
  • Helsinki, 2-4 June 2008

2
Overview
  • The explananda
  • Previous work on evolutionary modeling
  • Computational models and preliminary experiments

3
Phenomena
  • How do paradigms arise?
  • Why do words fall into different inflectional
    equivalence classes
  • Why do stem alternations arise?
  • Why is there syncretism?
  • Why are there rules of referral?

4
Stem alternations in Sanskrit
zero
guna
Examples from Stump, Gregory (2001) Inflectional
Morphology A Theory of Paradigm Structure.
Cambridge University Press.
5
Stem alternations in Sanskrit
morphomic (Aronoff, M. 1994. Morphology by
Itself. MIT Press.)
vrddhi
lexeme-class particular
lexeme-class particular
6
Evolutionary Modeling (A tiny sample)
  • Hare, M. and Elman, J. L. (1995) Learning and
    morphological change. Cognition, 56(1)61--98.
  • Kirby, S. (1999) Function, Selection, and
    Innateness The Emergence of Language Universals.
    Oxford
  • Nettle, D. "Using Social Impact Theory to
    simulate language change". Lingua,
    108(2-3)95--117, 1999.
  • de Boer, B. (2001) The Origins of Vowel Systems.
    Oxford
  • Niyogi, P. (2006) The Computational Nature of
    Language Learning and Evolution. Cambridge, MA
    MIT Press.

7
Experiment 1 Rules of Referral
8
Rules of referral
  • Stump, Gregory (1993) On rules of referral.
    Language. 69(3), 449-479
  • (After Zwicky, Arnold (1985) How to describe
    inflection. Berkeley Linguistics Society. 11,
    372-386.)

9
Latin declensions
10
Are rules of referral interesting?
  • Are they useful for the learner?
  • Wouldnt the learner have heard instances of
    every paradigm?
  • Are they historically interesting
  • Does morphological theory need mechanisms to
    explain why they occur?

11
Another example Bögüstani nominal declension
sSg Du Pl
sSg Du Pl
sSg Du Pl
Nom Acc Gen Dat Loc Inst Abl Illat
  • Bögüstani
  • A language of Uzbekistan
  • ISO 639-3 bgs
  • Population 15,500 (1998 Durieux).
  • Comments Capsicum chinense and Coffea arabica
    farmers

12
Monte Carlo simulation(generating Bögüstani)
  • Select a re-use bias B
  • For each language
  • Generate a set of vowels, consonants and affix
    templates
  • a, i, u, e
  • n f r w B s x j D
  • V, C, CV, VC
  • Decide on p paradigms (minimum 3), r rows
    (minimum 2), c columns (minimum 2)

13
Monte Carlo simulation
  • For each paradigm in the language
  • Iterate over (r, c)
  • Let a be previous affix stored for r with p B
    retain a in L
  • Let ß be previous affix stored for c with p B
    retain ß in L
  • If either L is non-empty, set (r, c) to random
    choice from L
  • Otherwise generate a new affix for (r, c)
  • Store (r, c)s affix for r and c
  • Note that P(new-affix) (1-B)2

14
Sample language bias 0.04
Consonants x n p w j B t r s S m Vowels a
i u e Templates V, C, CV, VC
15
Sample language bias 0.04
Consonants n f r w B s x j D Vowels a i u
e Templates V, C, CV, VC
16
Sample language bias 0.04
Consonants r p j d G D Vowels a i u e o y
O Templates V, C, CV, VC, CVC, VCV, CVCV, VCVC
17
Sample language bias 0.04
Consonants D k S n b s l t w j B g G d Vowels
a i u e Templates V, C, CV, VC
18
Results of Monte Carlo simulations(8000 runs,
5000 languages per run)
19
Interim conclusion
  • Syncretism, including rules of referral, may
    arise as a chance byproduct of tendencies to
    reuse inflectional exponents --- and hence reduce
    the number of exponents needed in the system.
  • Side question is the amount of ambiguity among
    inflectional exponents statistically different
    from that among lexemes? (cf. Beards
    Lexeme-Morpheme-Base Morphology)
  • Probably not since inflectional exponents tend to
    be shorter, so the chances of collisions are much
    higher

20
Experiment 2Stabilizing Multiple Paradigms in a
Multiagent Network
21
Paradigm Reduction in Multi-agent Models with
Scale-Free Networks
  • Agents connected in scale-free network
  • Only connected agents communicate
  • Agents more likely to update forms from
    interlocutors they trust
  • Each individual agent has pressure to simplify
    its morphology by collapsing exponents
  • Exponent collapse is picked to minimize an
    increase in paradigm entropy
  • Paradigms may be simplified removing
    distinctions and thus reducing paradigm entropy
  • As the number of exponents decreases so does the
    pressure to reduce
  • Agents analogize paradigms to other words

22
Scale-free networks
23
Scale-free networks
  • Connection degrees follow the Yule-Simon
    distribution
  • where for sufficiently large k
  • i.e. reduces to Zipfs law (cf. Baayen, Harald
    (2000) Word Frequency Distributions. Springer.)

24
Scale-free vs. Random1000 nodes
25
Relevance of scale-free networks
  • Social networks are scale-free
  • Nodes with multiple connections seem to be
    relevant for language change.
  • cf James Milroy and Lesley Milroy (1985)
    Linguistic change, social network and speaker
    innovation. Journal of Linguistics, 21339384.

26
Scale-free networks in the model
  • Agents communicate individual forms to other
    agents
  • When two agents differ on a form, one agent will
    update its form with a probability p proportional
    to how well connected the other agent is
  • p MaxP X ConnectionDegree(agent)/MaxConnectionDe
    gree
  • (Similar to Page Rank)

27
Paradigm entropy
  • For exponents f and morphological functions µ,
    define the Paradigm Entropy as
  • (NB this is really just the conditional
    entropy)
  • If each exponent is unambiguous, the paradigm
    entropy is 0

28
Example
29
Syncretism tends to be most common in rarer
parts of paradigm
30
Old Latin 1st/2nd Declensions
31
Simulation
  • 100 agents in scale-free or random network
  • Roughly 250 connections in either case
  • 20 bases
  • 5 cases, 2 numbers each slot associated with
    a probability
  • Max probability of updating ones form for a
    given slot given what another agent has is 0.2 or
    0.5
  • Probability of analogizing within ones own
    vocabulary is 0.01, 0.02 or 0.05
  • Also a mode where we force analogy every 50
    iterations
  • Analogize to words within same analogy group (4
    such groups in current simulation)
  • Winner-takes all strategy
  • (Numbers in the titles of the ensuing plots are
    given as UpdateProb/AnalogyProb (e.g. 0.2/0.01))
  • Run for 1000 iterations

32
Features of simulation
  • At nth iteration, compute
  • The paradigm distribution over agents for each
    word.
  • Paradigm purity is the proportion of the winning
    paradigm
  • The number of distinct winning paradigms

33
Scale-free Network 0.2/0.01
34
Scale-free network 0.5/0.05
35
Random network 0.5/0.05
36
Scale-free network 0.5/0.055000 runs
37
Random network 0.5/0.055000 runs
38
Scale-free network 0.5/0.005000 runs No analogy
39
Scale-free network 0.5/0.0030,000 runs No
analogy
40
Sample final state
0.24
0.21
0.095
0.095
0.06
0.12
0.095
0.048
0.024
0.012
41
Adoption of acc/acc/acc/acc/acc/ACC/ACC/ACC/ACC/AC
Cin a 0.5/0.05 run
42
Interim conclusions
  • Scale-free networks dont seem to matter
    convergence behavior seems to be no different
    from a random network
  • Is that a big surprise?
  • Analogy matters
  • Paradigm entropy (conditional entropy) might be a
    model for paradigm simplification

43
Experiment 3Large-scale multi-agent
evolutionary modeling with learning(work in
progress)
44
Synopsis
  • System is seeded with a grammar and small number
    of agents
  • Initial grammars all show an agglutinative
    pattern
  • Each agent randomly selects a set of phonetic
    rules to apply to forms
  • Agents are assigned to one of a small number of
    social groups
  • 2 parents beget child agents.
  • Children are exposed to a predetermined number of
    training forms combined from both parents
  • Forms are presented proportional to their
    underlying frequency
  • Children must learn to generalize to unseen slots
    for words
  • Learning algorithm similar to
  • David Yarowsky and Richard Wicentowski (2001)
    "Minimally supervised morphological analysis by
    multimodal alignment." Proceedings of ACL-2000,
    Hong Kong, pages 207-216.
  • Features include last n-characters of input form,
    plus semantic class
  • Learners select the optimal surface form to
    derive other forms from (optimal requiring the
    simplest resulting ruleset a Minimum
    Description Length criterion)
  • Forms are periodically pooled among all agents
    and the n best forms are kept for each word and
    each slot
  • Population grows, but is kept in check by
    natural disasters and a quasi-Malthusian model
    of resource limitations
  • Agents age and die according to reasonably
    realistic mortality statistics

45
Population growth, 300 years
46
Phonological rules
  • c_assimilation
  • c_lenition
  • degemination
  • final_cdel
  • n_assimilation
  • r_syllabification
  • umlaut
  • v_nasalization
  • voicing_assimilation
  • vowel_apocope
  • vowel_coalescence
  • vowel_syncope

K ptkbdgmnNfvTDszSZxGCJlrhX L wy V
aeiouAEIOU_at_0âêîôûÂÊÎÔÛãõÕ Regressive
voicing assimilation b -gt p / - _
?ptkfTsSxC d -gt t / - _ ?ptkfTsSxC g -gt k /
- _ ?ptkfTsSxC D -gt T / - _ ?ptkfTsSxC z -gt
s / - _ ?ptkfTsSxC Z -gt S / - _
?ptkfTsSxC G -gt x / - _ ?ptkfTsSxC J -gt C /
- _ ?ptkfTsSxC
K ptkbdgmnNfvTDszSZxGCJlrhX L wy V
aeiouAEIOU_at_0âêîôûÂÊÎÔÛãõÕ td -gt D /
aeiouâêîôûã? _ ?aeiouâêîôûã pb -gt v /
aeiouâêîôûã? _ ?aeiouâêîôûã gk -gt G /
aeiouâêîôûã? _ ?aeiouâêîôûã
47
Example run
  • Initial paradigm
  • Abog placc Abogmeon
  • Abog pldat Abogmeke
  • Abog plgen Abogmei
  • Abog plnom Abogmeko
  • Abog sgacc Abogaon
  • Abog sgdat Abogake
  • Abog sggen Abogai
  • Abog sgnom Abogako
  • NUMBER 'a' sg 0.7 'me' pl 0.3
  • CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen
    0.2 'ke' dat 0.1
  • PHONRULE_WEIGHTING0.60
  • NUM_TEACHING_FORMS1500

48
Behavior of agent 4517 at 300 years
Abog placc Abogmeon Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaon Abog
sgdat Abogake Abog sggen Abogai Abog
sgnom Abogako
Abog placc Abogmeô Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaô Abog sgdat
Abogake Abog sggen Abogai Abog sgnom
Abogako
lArpux placc lArpuxmeô lArpux pldat
lArpuxmeGe lArpux plgen lArpuxmei lArpux
plnom lArpuxmeGo lArpux sgacc lArpuxaô
lArpux sgdat lArpuxaGe lArpux sggen
lArpuxai lArpux sgnom lArpuxaGo
lIdrab placc lIdravmeô lIdrab pldat
lIdrabmeke lIdrab plgen lIdravmei lIdrab
plnom lIdrabmeGo lIdrab sgacc
lIdravaô lIdrab sgdat lIdravaGe lIdrab
sggen lIdravai lIdrab sgnom lIdravaGo
59 paradigms covering 454 lexemes
49
Another run
50
Another run
  • Initial paradigm
  • Adgar placc Adgarmeon
  • Adgar pldat Adgarmeke
  • Adgar plgen Adgarmei
  • Adgar plnom Adgarmeko
  • Adgar sgacc Adgaraon
  • Adgar sgdat Adgarake
  • Adgar sggen Adgarai
  • Adgar sgnom Adgarako
  • PHONRULE_WEIGHTING0.80
  • NUM_TEACHING_FORMS1500

51
Behavior of agent 5061 at 300 years
Albir placc Elbirmen Albir pldat
ElbirmeGe Albir plgen Elbirm Albir plnom
ElbirmeGo Albir sgacc Elbiran Albir
sgdat Elbira Albir sggen Elbi Albir
sgnom Elbira
Abog placc Abogmeon Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaon Abog
sgdat Abogake Abog sggen Abogai Abog
sgnom Abogako
rIsxuf placc rIsxufamen rIsxuf pldat
rIsxufamke rIsxuf plgen rIsxufme rIsxuf
plnom rIsxufmeGo rIsxuf sgacc
rIsxufan rIsxuf sgdat rIsxufaGe rIsxuf
sggen rIsxufa rIsxuf sgnom rIsxufaGo
Utber placc Ubbermen Utber pldat
UbbermeGe Utber plgen Ubberme Utber
plnom UbberameGo Utber sgacc
Ubberan Utber sgdat UbberaGe Utber sggen
Ubbera Utber sgnom UbberaGo
109 paradigms covering 397 lexemes
52
One more example
53
One more example
  • Initial paradigm as before
  • PHONRULE_WEIGHTING0.80
  • NUM_TEACHING_FORMS1000

54
Behavior of agent 4195 at 300 years
Abog placc Abogmeon Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaon Abog
sgdat Abogake Abog sggen Abogai Abog
sgnom Abogako
Odeg placc Odm Odeg pldat Ô Odeg
plgen Odm Odeg plnom Oxm Odeg sgacc
O Odeg sgdat O Odeg sggen O Odeg
sgnom O
fApbof placc fAbofdm fApbof pldat
fAbofm fApbof plgen fAbofdm fApbof plnom
fAbofxm fApbof sgacc fAbof fApbof sgdat
fAbof fApbof sggen fAbof fApbof sgnom fAbof
dugfIp placc dikfIdm dugfIp pldat
dikfÃŽ dugfIp plgen dikfIdm dugfIp plnom
dikfIxm dugfIp sgacc dikfI dugfIp sgdat
dikfI dugfIp sggen dikfI dugfIp sgnom dikfI
unfEr placc ûfEdm unfEr pldat
ûfÊ unfEr plgen ûfEtm unfEr plnom
ûfExm unfEr sgacc ûfE unfEr sgdat
ûfE unfEr sggen ûfE unfEr sgnom ûfE
exgUp placc exgUdm exgUp pldat
exgÛ exgUp plgen exgUgm exgUp plnom
exgUxm exgUp sgacc exgU exgUp sgdat
exgU exgUp sggen exgU exgUp sgnom exgU
66 paradigms covering 250 lexemes
55
One final example
56
Final example
  • NUMBER 'a' sg 0.6 'tu' du 0.1 'me' pl
    0.3
  • CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen
    0.2 'ke' dat 0.1
  • PHONRULE_WEIGHTING0.80
  • NUM_TEACHING_FORMS1000

57
Final example (some agent or other)
Abbus duacc Abbustuon Abbus dudat
Abbustuke Abbus dugen Abbustui Abbus dunom
Abbustuko Abbus placc Abbusmeon Abbus
pldat Abbusmeke Abbus plgen Abbusmei Abbus
plnom Abbusmeko Abbus sgacc Abbusaon Abbus
sgdat Abbusake Abbus sggen Abbusai Abbus
sgnom Abbusako
Agsaf duacc Aksaf Agsaf dudat
AkstuG Agsaf dugen Aksaf Agsaf dunom
Aksaf Agsaf placc Aksafm Agsaf pldat
Aksafm Agsaf plgen Aksafm Agsaf plnom
Aksafm Agsaf sgacc Aksaf Agsaf sgdat
Aksaf Agsaf sggen Aksaf Agsaf sgnom Aksaf
mampEl duacc mãpEl mampEl dudat
mãptuG mampEl dugen mãpEl mampEl dunom
mãpEl mampEl placc mãpElm mampEl pldat
mãpElrm mampEl plgen mãpElm mampEl plnom
mãpElm mampEl sgacc mãpEl mampEl sgdat
mãpEl mampEl sggen mãpEl mampEl sgnom mãpEl
odEs duacc odEs odEs dudat ottuG odEs
dugen odEs odEs dunom oktuG odEs
placc odEsm odEs pldat odEsrm odEs
plgen odEsm odEs plnom odEskm odEs
sgacc odEs odEs sgdat odEs odEs sggen
odEs odEs sgnom odEs
rIndar duacc rÃŽdar rIndar dudat
rÃŽttuG rIndar dugen rÃŽdar rIndar dunom
rÃŽktuG rIndar placc rÃŽdarm rIndar pldat
rÃŽdarm rIndar plgen rÃŽdarm rIndar plnom
rÃŽdarm rIndar sgacc rÃŽdar rIndar sgdat
rÃŽdar rIndar sggen rÃŽdar rIndar sgnom rÃŽdar
171 paradigms covering 228 lexemes
58
Questions
  • Are there too many paradigms?
  • Is there too much irregularity?

59
How many paradigms can there be?
  • Russian nouns belong to one of three declension
    patterns. (Wade, Terence (1992) Comprehensive
    Russian Grammar. Blackwell, Oxford)
  • Wade discusses many subclasses
  • From Zaliznjak, A. (1987) Gramaticheskij slovar
    russkogo jazyka, Russki jazyk, Moscow
  • at least 500 classes spread over 55,000 nouns

60
How irregular can things be? Hindi/Urdu Number
Names
61
Future work
  • More realistic learning
  • Incorporate paradigm reduction and analogy
    mechanisms from Experiment 2
  • Add other sources of variation, such as borrowing
    of other forms
  • Develop evaluation metrics
  • Can we go beyond look Ma, it learns?

62
Acknowledgments
  • Center for Advanced Studies for release time Fall
    2007
  • The National Science Foundation through TeraGrid
    resources provided by the National Center for
    Supercomputing Applications
  • Google Research grant (for infrastructure
    originally associated with another project)
  • For helpful discussion/suggestions
  • Chen Li
  • Shalom Lappin
  • Juliette Blevins
  • Les Gasser the LEADS group
  • Audience at UIUC Linguistics Seminar
Write a Comment
User Comments (0)
About PowerShow.com