Title: Preliminary Experiments in Morphological Evolution
1(Preliminary)Experiments in Morphological
Evolution
- Richard Sproat
- University of Illinois at Urbana-Champaign
- rws_at_uiuc.edu
- 3rd Workshop on "Quantitative Investigations in
Theoretical Linguistics" (QITL-3) - Helsinki, 2-4 June 2008
2Overview
- The explananda
- Previous work on evolutionary modeling
- Computational models and preliminary experiments
3Phenomena
- How do paradigms arise?
- Why do words fall into different inflectional
equivalence classes - Why do stem alternations arise?
- Why is there syncretism?
- Why are there rules of referral?
4Stem alternations in Sanskrit
zero
guna
Examples from Stump, Gregory (2001) Inflectional
Morphology A Theory of Paradigm Structure.
Cambridge University Press.
5Stem alternations in Sanskrit
morphomic (Aronoff, M. 1994. Morphology by
Itself. MIT Press.)
vrddhi
lexeme-class particular
lexeme-class particular
6Evolutionary Modeling (A tiny sample)
- Hare, M. and Elman, J. L. (1995) Learning and
morphological change. Cognition, 56(1)61--98. - Kirby, S. (1999) Function, Selection, and
Innateness The Emergence of Language Universals.
Oxford - Nettle, D. "Using Social Impact Theory to
simulate language change". Lingua,
108(2-3)95--117, 1999. - de Boer, B. (2001) The Origins of Vowel Systems.
Oxford - Niyogi, P. (2006) The Computational Nature of
Language Learning and Evolution. Cambridge, MA
MIT Press.
7Experiment 1 Rules of Referral
8Rules of referral
- Stump, Gregory (1993) On rules of referral.
Language. 69(3), 449-479 - (After Zwicky, Arnold (1985) How to describe
inflection. Berkeley Linguistics Society. 11,
372-386.)
9Latin declensions
10Are rules of referral interesting?
- Are they useful for the learner?
- Wouldnt the learner have heard instances of
every paradigm? - Are they historically interesting
- Does morphological theory need mechanisms to
explain why they occur?
11Another example Bögüstani nominal declension
sSg Du Pl
sSg Du Pl
sSg Du Pl
Nom Acc Gen Dat Loc Inst Abl Illat
- Bögüstani
- A language of Uzbekistan
- ISO 639-3Â bgs
- Population 15,500 (1998 Durieux).
- Comments Capsicum chinense and Coffea arabica
farmers
12Monte Carlo simulation(generating Bögüstani)
- Select a re-use bias B
- For each language
- Generate a set of vowels, consonants and affix
templates - a, i, u, e
- n f r w B s x j D
- V, C, CV, VC
- Decide on p paradigms (minimum 3), r rows
(minimum 2), c columns (minimum 2)
13Monte Carlo simulation
- For each paradigm in the language
- Iterate over (r, c)
- Let a be previous affix stored for r with p B
retain a in L - Let ß be previous affix stored for c with p B
retain ß in L - If either L is non-empty, set (r, c) to random
choice from L - Otherwise generate a new affix for (r, c)
- Store (r, c)s affix for r and c
- Note that P(new-affix) (1-B)2
14Sample language bias 0.04
Consonants x n p w j B t r s S m Vowels a
i u e Templates V, C, CV, VC
15Sample language bias 0.04
Consonants n f r w B s x j D Vowels a i u
e Templates V, C, CV, VC
16Sample language bias 0.04
Consonants r p j d G D Vowels a i u e o y
O Templates V, C, CV, VC, CVC, VCV, CVCV, VCVC
17Sample language bias 0.04
Consonants D k S n b s l t w j B g G d Vowels
a i u e Templates V, C, CV, VC
18Results of Monte Carlo simulations(8000 runs,
5000 languages per run)
19Interim conclusion
- Syncretism, including rules of referral, may
arise as a chance byproduct of tendencies to
reuse inflectional exponents --- and hence reduce
the number of exponents needed in the system. - Side question is the amount of ambiguity among
inflectional exponents statistically different
from that among lexemes? (cf. Beards
Lexeme-Morpheme-Base Morphology) - Probably not since inflectional exponents tend to
be shorter, so the chances of collisions are much
higher
20Experiment 2Stabilizing Multiple Paradigms in a
Multiagent Network
21Paradigm Reduction in Multi-agent Models with
Scale-Free Networks
- Agents connected in scale-free network
- Only connected agents communicate
- Agents more likely to update forms from
interlocutors they trust - Each individual agent has pressure to simplify
its morphology by collapsing exponents - Exponent collapse is picked to minimize an
increase in paradigm entropy - Paradigms may be simplified removing
distinctions and thus reducing paradigm entropy - As the number of exponents decreases so does the
pressure to reduce - Agents analogize paradigms to other words
22Scale-free networks
23Scale-free networks
- Connection degrees follow the Yule-Simon
distribution - where for sufficiently large k
- i.e. reduces to Zipfs law (cf. Baayen, Harald
(2000) Word Frequency Distributions. Springer.)
24Scale-free vs. Random1000 nodes
25Relevance of scale-free networks
- Social networks are scale-free
- Nodes with multiple connections seem to be
relevant for language change. - cf James Milroy and Lesley Milroy (1985)
Linguistic change, social network and speaker
innovation. Journal of Linguistics, 21339384.
26Scale-free networks in the model
- Agents communicate individual forms to other
agents - When two agents differ on a form, one agent will
update its form with a probability p proportional
to how well connected the other agent is - p MaxP X ConnectionDegree(agent)/MaxConnectionDe
gree - (Similar to Page Rank)
27Paradigm entropy
- For exponents f and morphological functions µ,
define the Paradigm Entropy as -
- (NB this is really just the conditional
entropy) - If each exponent is unambiguous, the paradigm
entropy is 0
28Example
29Syncretism tends to be most common in rarer
parts of paradigm
30Old Latin 1st/2nd Declensions
31Simulation
- 100 agents in scale-free or random network
- Roughly 250 connections in either case
- 20 bases
- 5 cases, 2 numbers each slot associated with
a probability - Max probability of updating ones form for a
given slot given what another agent has is 0.2 or
0.5 - Probability of analogizing within ones own
vocabulary is 0.01, 0.02 or 0.05 - Also a mode where we force analogy every 50
iterations - Analogize to words within same analogy group (4
such groups in current simulation) - Winner-takes all strategy
- (Numbers in the titles of the ensuing plots are
given as UpdateProb/AnalogyProb (e.g. 0.2/0.01)) - Run for 1000 iterations
32Features of simulation
- At nth iteration, compute
- The paradigm distribution over agents for each
word. - Paradigm purity is the proportion of the winning
paradigm - The number of distinct winning paradigms
33Scale-free Network 0.2/0.01
34Scale-free network 0.5/0.05
35Random network 0.5/0.05
36Scale-free network 0.5/0.055000 runs
37Random network 0.5/0.055000 runs
38Scale-free network 0.5/0.005000 runs No analogy
39Scale-free network 0.5/0.0030,000 runs No
analogy
40Sample final state
0.24
0.21
0.095
0.095
0.06
0.12
0.095
0.048
0.024
0.012
41Adoption of acc/acc/acc/acc/acc/ACC/ACC/ACC/ACC/AC
Cin a 0.5/0.05 run
42Interim conclusions
- Scale-free networks dont seem to matter
convergence behavior seems to be no different
from a random network - Is that a big surprise?
- Analogy matters
- Paradigm entropy (conditional entropy) might be a
model for paradigm simplification
43Experiment 3Large-scale multi-agent
evolutionary modeling with learning(work in
progress)
44Synopsis
- System is seeded with a grammar and small number
of agents - Initial grammars all show an agglutinative
pattern - Each agent randomly selects a set of phonetic
rules to apply to forms - Agents are assigned to one of a small number of
social groups - 2 parents beget child agents.
- Children are exposed to a predetermined number of
training forms combined from both parents - Forms are presented proportional to their
underlying frequency - Children must learn to generalize to unseen slots
for words - Learning algorithm similar to
- David Yarowsky and Richard Wicentowski (2001)
"Minimally supervised morphological analysis by
multimodal alignment." Proceedings of ACL-2000,
Hong Kong, pages 207-216. - Features include last n-characters of input form,
plus semantic class - Learners select the optimal surface form to
derive other forms from (optimal requiring the
simplest resulting ruleset a Minimum
Description Length criterion) - Forms are periodically pooled among all agents
and the n best forms are kept for each word and
each slot - Population grows, but is kept in check by
natural disasters and a quasi-Malthusian model
of resource limitations - Agents age and die according to reasonably
realistic mortality statistics
45Population growth, 300 years
46Phonological rules
- c_assimilation
- c_lenition
- degemination
- final_cdel
- n_assimilation
- r_syllabification
- umlaut
- v_nasalization
- voicing_assimilation
- vowel_apocope
- vowel_coalescence
- vowel_syncope
K ptkbdgmnNfvTDszSZxGCJlrhX L wy V
aeiouAEIOU_at_0âêîôûÂÊÎÔÛãõÕ Regressive
voicing assimilation b -gt p / - _
?ptkfTsSxC d -gt t / - _ ?ptkfTsSxC g -gt k /
- _ ?ptkfTsSxC D -gt T / - _ ?ptkfTsSxC z -gt
s / - _ ?ptkfTsSxC Z -gt S / - _
?ptkfTsSxC G -gt x / - _ ?ptkfTsSxC J -gt C /
- _ ?ptkfTsSxC
K ptkbdgmnNfvTDszSZxGCJlrhX L wy V
aeiouAEIOU_at_0âêîôûÂÊÎÔÛãõÕ td -gt D /
aeiouâêîôûã? _ ?aeiouâêîôûã pb -gt v /
aeiouâêîôûã? _ ?aeiouâêîôûã gk -gt G /
aeiouâêîôûã? _ ?aeiouâêîôûã
47Example run
- Initial paradigm
- Abog placc Abogmeon
- Abog pldat Abogmeke
- Abog plgen Abogmei
- Abog plnom Abogmeko
- Abog sgacc Abogaon
- Abog sgdat Abogake
- Abog sggen Abogai
- Abog sgnom Abogako
- NUMBER 'a' sg 0.7 'me' pl 0.3
- CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen
0.2 'ke' dat 0.1 - PHONRULE_WEIGHTING0.60
- NUM_TEACHING_FORMS1500
48Behavior of agent 4517 at 300 years
Abog placc Abogmeon Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaon Abog
sgdat Abogake Abog sggen Abogai Abog
sgnom Abogako
Abog placc Abogmeô Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaô Abog sgdat
Abogake Abog sggen Abogai Abog sgnom
Abogako
lArpux placc lArpuxmeô lArpux pldat
lArpuxmeGe lArpux plgen lArpuxmei lArpux
plnom lArpuxmeGo lArpux sgacc lArpuxaô
lArpux sgdat lArpuxaGe lArpux sggen
lArpuxai lArpux sgnom lArpuxaGo
lIdrab placc lIdravmeô lIdrab pldat
lIdrabmeke lIdrab plgen lIdravmei lIdrab
plnom lIdrabmeGo lIdrab sgacc
lIdravaô lIdrab sgdat lIdravaGe lIdrab
sggen lIdravai lIdrab sgnom lIdravaGo
59 paradigms covering 454 lexemes
49Another run
50Another run
- Initial paradigm
- Adgar placc Adgarmeon
- Adgar pldat Adgarmeke
- Adgar plgen Adgarmei
- Adgar plnom Adgarmeko
- Adgar sgacc Adgaraon
- Adgar sgdat Adgarake
- Adgar sggen Adgarai
- Adgar sgnom Adgarako
- PHONRULE_WEIGHTING0.80
- NUM_TEACHING_FORMS1500
51Behavior of agent 5061 at 300 years
Albir placc Elbirmen Albir pldat
ElbirmeGe Albir plgen Elbirm Albir plnom
ElbirmeGo Albir sgacc Elbiran Albir
sgdat Elbira Albir sggen Elbi Albir
sgnom Elbira
Abog placc Abogmeon Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaon Abog
sgdat Abogake Abog sggen Abogai Abog
sgnom Abogako
rIsxuf placc rIsxufamen rIsxuf pldat
rIsxufamke rIsxuf plgen rIsxufme rIsxuf
plnom rIsxufmeGo rIsxuf sgacc
rIsxufan rIsxuf sgdat rIsxufaGe rIsxuf
sggen rIsxufa rIsxuf sgnom rIsxufaGo
Utber placc Ubbermen Utber pldat
UbbermeGe Utber plgen Ubberme Utber
plnom UbberameGo Utber sgacc
Ubberan Utber sgdat UbberaGe Utber sggen
Ubbera Utber sgnom UbberaGo
109 paradigms covering 397 lexemes
52One more example
53One more example
- Initial paradigm as before
- PHONRULE_WEIGHTING0.80
- NUM_TEACHING_FORMS1000
54Behavior of agent 4195 at 300 years
Abog placc Abogmeon Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaon Abog
sgdat Abogake Abog sggen Abogai Abog
sgnom Abogako
Odeg placc Odm Odeg pldat Ô Odeg
plgen Odm Odeg plnom Oxm Odeg sgacc
O Odeg sgdat O Odeg sggen O Odeg
sgnom O
fApbof placc fAbofdm fApbof pldat
fAbofm fApbof plgen fAbofdm fApbof plnom
fAbofxm fApbof sgacc fAbof fApbof sgdat
fAbof fApbof sggen fAbof fApbof sgnom fAbof
dugfIp placc dikfIdm dugfIp pldat
dikfÃŽ dugfIp plgen dikfIdm dugfIp plnom
dikfIxm dugfIp sgacc dikfI dugfIp sgdat
dikfI dugfIp sggen dikfI dugfIp sgnom dikfI
unfEr placc ûfEdm unfEr pldat
ûfÊ unfEr plgen ûfEtm unfEr plnom
ûfExm unfEr sgacc ûfE unfEr sgdat
ûfE unfEr sggen ûfE unfEr sgnom ûfE
exgUp placc exgUdm exgUp pldat
exgÛ exgUp plgen exgUgm exgUp plnom
exgUxm exgUp sgacc exgU exgUp sgdat
exgU exgUp sggen exgU exgUp sgnom exgU
66 paradigms covering 250 lexemes
55One final example
56Final example
- NUMBER 'a' sg 0.6 'tu' du 0.1 'me' pl
0.3 - CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen
0.2 'ke' dat 0.1 - PHONRULE_WEIGHTING0.80
- NUM_TEACHING_FORMS1000
57Final example (some agent or other)
Abbus duacc Abbustuon Abbus dudat
Abbustuke Abbus dugen Abbustui Abbus dunom
Abbustuko Abbus placc Abbusmeon Abbus
pldat Abbusmeke Abbus plgen Abbusmei Abbus
plnom Abbusmeko Abbus sgacc Abbusaon Abbus
sgdat Abbusake Abbus sggen Abbusai Abbus
sgnom Abbusako
Agsaf duacc Aksaf Agsaf dudat
AkstuG Agsaf dugen Aksaf Agsaf dunom
Aksaf Agsaf placc Aksafm Agsaf pldat
Aksafm Agsaf plgen Aksafm Agsaf plnom
Aksafm Agsaf sgacc Aksaf Agsaf sgdat
Aksaf Agsaf sggen Aksaf Agsaf sgnom Aksaf
mampEl duacc mãpEl mampEl dudat
mãptuG mampEl dugen mãpEl mampEl dunom
mãpEl mampEl placc mãpElm mampEl pldat
mãpElrm mampEl plgen mãpElm mampEl plnom
mãpElm mampEl sgacc mãpEl mampEl sgdat
mãpEl mampEl sggen mãpEl mampEl sgnom mãpEl
odEs duacc odEs odEs dudat ottuG odEs
dugen odEs odEs dunom oktuG odEs
placc odEsm odEs pldat odEsrm odEs
plgen odEsm odEs plnom odEskm odEs
sgacc odEs odEs sgdat odEs odEs sggen
odEs odEs sgnom odEs
rIndar duacc rÃŽdar rIndar dudat
rÃŽttuG rIndar dugen rÃŽdar rIndar dunom
rÃŽktuG rIndar placc rÃŽdarm rIndar pldat
rÃŽdarm rIndar plgen rÃŽdarm rIndar plnom
rÃŽdarm rIndar sgacc rÃŽdar rIndar sgdat
rÃŽdar rIndar sggen rÃŽdar rIndar sgnom rÃŽdar
171 paradigms covering 228 lexemes
58Questions
- Are there too many paradigms?
- Is there too much irregularity?
59How many paradigms can there be?
- Russian nouns belong to one of three declension
patterns. (Wade, Terence (1992) Comprehensive
Russian Grammar. Blackwell, Oxford) - Wade discusses many subclasses
- From Zaliznjak, A. (1987) Gramaticheskij slovar
russkogo jazyka, Russki jazyk, Moscow - at least 500 classes spread over 55,000 nouns
60How irregular can things be? Hindi/Urdu Number
Names
61Future work
- More realistic learning
- Incorporate paradigm reduction and analogy
mechanisms from Experiment 2 - Add other sources of variation, such as borrowing
of other forms - Develop evaluation metrics
- Can we go beyond look Ma, it learns?
62Acknowledgments
- Center for Advanced Studies for release time Fall
2007 - The National Science Foundation through TeraGrid
resources provided by the National Center for
Supercomputing Applications - Google Research grant (for infrastructure
originally associated with another project) - For helpful discussion/suggestions
- Chen Li
- Shalom Lappin
- Juliette Blevins
- Les Gasser the LEADS group
- Audience at UIUC Linguistics Seminar