Title: A Lexical Theory of Variation
1A Lexical Theory of Variation
Andries W. Coetzee Workshop on Variation,
Gradience and Frequency in Phonology Stanford
University, July 2007
2- Things that are known to influence variation
- Grammar
- Where Where it appears and where not
- Frequency How often does a process apply in some
context - Lexical frequency
- Some variable processes affect frequent words
more, others affect infrequent words more. - Extra-grammatical factors
- Speech style, speech rate, etc.
3- Existing theories of variation
- Grammatical
- Variable rule in the Labovian tradition (Labov
1972 Sankoff 1988) - Several OT models (Anttila 1997 Boersma and
Hayes 2000 Coetzee 2006 Reynolds 1994) - Reasonably successful at accounting for the
grammatical influence. - Usage-based/exemplar models (Bybee 2001, 2002
Pierrehumbert 2001) - Reasonably successful at accounting for the
influence of lexical frequency. - Interaction between the two
- Models that incorporate both are still largely
absent.
4- Structure of the presentation
- Usage frequency and variation
- The basics of the proposal
- Phonetically motivated variation
- Analogically motivated variation
- Learning lexical distributions
5 Usage Frequency and Variation
6- Phonetically motivated variable process
- Typical phonological process
- Applies more often to lexical items with higher
usage frequency - Example t/d deletion
- Pre-C west bank wes bank
- Pre-V west end wes end
- Pre- west wes
-
-
Influence of frequency (Bybee 200070)
Chicano English (Santa Ana 1991)
High frequency Low frequency
n 1,650 399
deleted 54.4 34.4
Pre-C Pre-V Pre-
n 3,693 1,574 1,024
deleted 62 45 37
7- Analogically motivated variable process
- Usually some kind of regularization process
irregular plural/past tense replaced with regular - Applies more often to lexical items with lower
usage frequency - Example Regularization of past tense verbs
- Infrequent verbs are more likely to
regularize (Hooper 1976100
Bybee 1985120, 2002269 Bybee Slobin 1982) -
- Kucera and Francis frequencies
(1982) as calculated at www.iphod.com. - Also many examples from the historical
literature. (Phillips 1984, 2001 and references
therein.)
Less likely to regularize Less likely to regularize Less likely to regularize More likely to regularize More likely to regularize More likely to regularize
Present Raw Log Present Raw Log
keep 348 2.54 creep 19 1.28
leave 345 2.54 leap 20 1.30
sleep 106 2.03 weep 22 1.34
drive 174 2.24 dive 32 1.93
8- The challenge
- A formal theory of variation that
- Captures the role of grammar
- Determines what kind of variation is possible
- Influences the frequency of application
-
- Captures the role of lexical frequency
- Variable process applies differently to different
lexical items. - Different kinds of processes are differently
influenced by lexical frequency.
9 The Proposal Variation Through Lexical
Indexation
10- Variable lexical indexation
- Lexically indexed constraints (Pater 1994, 2000
Itô Mester 1995, 1999) - Allows a way in for lexical influence
- Yet still keep control in the hands of grammar
- Variation through variable lexical class
affiliation - Note that the grammar stays constant what
varies is the lexical class affiliation of
lexical items. Variation is hence moved from the
grammar into the lexicon.
MAX-L2 M MAX-L1
/west/L2 west
wes !
/west/L1 west !
wes
11- Lexical distribution functions
- What determines the lexical class affiliation of
a lexical item? - Each lexical item is stored with a probability
density function. - Every time a lexical item is submitted to grammar
for evaluation, a value is chosen randomly along
the x-axis of the distribution function. - The x-axis is divided into equally sized adjacent
regions corresponding to the number of indexed
versions of the constraint. - Correlation between frequency and skewness of
distribution function - Frequent lexical items left skewed function
- Infrequent lexical items right skewed function
average
low
high
L2 L1
12 Example 1 Phonetically Motivated Variation
13t/d-deletion again
Frequency
Context
High frequency Low frequency
n 1,650 399
deleted 54.4 34.4
Pre-C Pre-V Pre-
n 3,693 1,574 1,024
deleted 62 45 37
- Grammar
- Markedness constraints
- PRE-C No t/d in the context C_C
- PRE-V No t/d in the context C_V
- PRE- No t/d in the context C_
- Contextual licensing constraints a la Steriade
(1997) - Four indexed versions of MAX.
- Ranking
- MAX-L4 ? PRE-C ? MAX-L3 ? PRE-V ? MAX-L2 ?
PRE- ? MAX-L1
14- The grammar in Pre-C condition
- Preservation if MAX-L4, deletion if MAX-L3,
MAX-L2, MAX-L1
MAX-L4 PRE-C MAX-L3 PRE-V MAX-L2 PRE- MAX-L1
/westL4 bank/ ? west bank
wes bank !
/westL3 bank/ west bank !
? wes bank
/westL2 bank/ west bank !
? wes bank
/westL1 bank/ west bank !
? wes bank
15- The grammar in Pre-V condition
- Preservation if MAX-L4, MAX-L3, deletion if
MAX-L2, MAX-L1
MAX-L4 PRE-C MAX-L3 PRE-V MAX-L2 PRE- MAX-L1
/westL4 end/ ? west end
wes end !
/westL3 end/ ? west end
wes end !
/westL2 end/ west end !
? wes end
/westL1 end/ west end !
? wes end
16- The grammar in Pre-Pause condition
- Preservation if MAX-L4, MAX-L3, MAX-L2, deletion
if MAX-L1
MAX-L4 PRE-C MAX-L3 PRE-V MAX-L2 PRE- MAX-L1
/westL4/ ? west
wes !
/westL3/ ? west
wes !
/westL2/ ? west
wes !
/westL1/ west !
? wes
17- Likelihood of deletion based on grammar alone
- Grammar MAX-L4 ? PRE-C ? MAX-L3 ? PRE-V
? MAX-L2 ? PRE- ? MAX-L1 - Note that grammar determines
- What variation is observed only a process that
reduces markedness, only a process that is
grammatically motivated. - How frequently process applies in which context.
- But we still need to give the lexicon its due.
Context Example Indexation resulting in retention Indexation resulting in deletion indexations resulting in deletion
Pre-C west side L4 L3, L2, L1 75
Pre-V west end L4, L3 L2, L1 50
Pre-Pause west L4, L3, L2 L1 25
18- The influence of lexical frequency
- Frequencies from
Francis Kucera (1982), calculated at
www.iphod.com.
Raw frequency Log frequency Expected deletion
Infrequent vest 6 0.60 Low
Intermediate modest 29 1.46 Medium
Frequent best 361 2.56 High
Mean 29.8 1.47
best
modest
vest
MAX-L4 MAX-L3 MAX-L2
MAX-L1 PRE-C PRE-V
PRE-
19 Example 2 Analogically Motivated Variation
20- Regularization of the strong past tense in
English - Specific examples from Kucera and Francis (1982)
(www.iphod.com) - Irregular morphology/suppletion as allomorphy
- Two morphological options for formation of the
past tense. - Both options are input to grammar, so that choice
of the one allomorph does not violate
faithfulness relative to the other. (Anttila
1997, Bonet 2004, Itô and Mester 2006, Kager
1996, Mascaró 1996, etc.) - Constraints
- OO-FAITH Some kind of paradigm uniformity (Benua
2000, Kenstowicz 1996,etc.) - USELISTED The input of a candidate must be a
single lexical entry (Zuraw 2000)
Base Base
Raw Log Regular past Strong past regular
speed 91 1.96 3 9 25
dive 32 1.51 5 4 56
leap 20 1.2 20 2 91
mean 29.8 1.47
21OO-FAITH-L2 USELISTED OO-FAITH-L1
/leapL1 ed/, /leaptL1/ leaped !
OO-Base leap ? leapt
/leapL2 ed/, /leaptL2/ ? leaped
OO-Base leap leapt !
- And the influence from the lexicon
dive
leap
speed
OO-FAITH-L2
OO-FAITHL1 USELISTED
22 Lexical Distribution Functions
23- What needs to be learned?
- Grammar Ranking between constraints
- Lexicon Lexical items, with their
probabilistic distribution functions. - These are two separate learning problems, each
with their own solution. - Learning the grammar
- Well developed learnability literature in OT.
(Tesar and Smolensky 1998, 2000, etc.) - And specifically on learning an indexed grammar.
(Pater 2006, to appear). - I will therefore not dwell on this aspect here.
- Learning the lexicon
- Focus here on how the lexical distribution
functions might be acquired.
24- General properties of lexical distribution
functions - MAX L1
- DEP L1 L2 L3
- IDENTF L1 L2
L3 L4
average
frequent
infrequent
25- General properties of lexical distribution
functions - Basic requirements
- Minimum and maximum value.
- Shape parameters that determine skewness
- Beta-distribution (Evans, Hastings Peacock
2000) - ? ? ? symmetric
- ? lt ? ? right skewed
- ? gt ? ? left skewed
frequent
average
infrequent
26- A small scale simulation
- IPhOD 1.3 (www.iphod.com)
- 33,432 words, with CMU transcriptions and
KuceraFrancis frequencies - Multiple KF by 10 to avoid having to work with
log(1) - Calculated the following
- Mean frequency of all words in IPhOD 297.89.
Log(?) 2.47. - Collected all words that end -Ct or -Cd,
excluding past tense verbs, and took the log of
the frequency for each of these. - Distribution functions
Frequency Frequency Skewness Skewness ? ?
frequent (f gt ?) left (? gt ?) log(f) log(?)
infrequent (f lt ?) right (? lt ?) log(?) log(f)
27aghast vest modest best most ?
Frequency 10 40 290 3610 11610 297.89
Log 1 1.60 2.46 3.56 4.07 2.47
aghast
most
modest
best
vest
MAX-L4 MAX-L3 MAX-L2
MAX-L1 PRE-C PRE-V
PRE-
28- How well do the predictions line up with reality?
- Once the values of ? and ? for a word are known,
it is easy to calculate the likelihood of an
x-value falling in a specific range along the
x-axis, and hence the likelihood of deletion in
each of the three contexts for each word. - Using this, I ran a simulation, feeding each
-Ct and -Cd word through the grammar,
according to its frequency in IPhOD.
Phonological context (value in brackets is
ratio to Pre-C) (Santa Ana 1991)
Pre-C Pre-V Pre-
Chicano English 62 45 (.73) 37 (.60)
Predictions of LTV 90 62 (.69) 27 (.30)
Frequency (value in brackets is ratio to gt
35/million) (Bybee 2000)
gt 35/million lt 35/million
Chicano English 54 34 (.63)
Predictions of LTV 65 43 (.66)
29- How can this be refined further?
- Currently, the lexical distribution functions are
determined purely based on lexical frequency. But
we know that different dialects show different
deletion rates. - Either different dialects have different lexical
frequencies. - Or there are other parameters that can be set
independently from lexical frequency. - Maybe some constant is added/subtracted from the
mean? - Added more words become infrequent more
conservative dialect. - Subtracted more words become frequent more
deletion. - Maybe the lexical space can be warped i.e. the
regions along the x-axis that correspond to
lexical classes are not of equal size. - Maybe lexical distribution functions are best-fit
functions i.e. learn a function that would
result in the correct deletion rate but then we
lose the connection between usage frequency and
deletion rates.
30 Conclusion
31- Conclusion
- Existing grammatical models of variation do not
allow the lexicon enough opportunity to play a
role. (Pierrehumbert 2001) - p. 138
- p. 148
- Purely usage-based models probably does not allow
the grammar enough say. - Bybee (200073)
- Bybee (2002268)
- LTV is an attempt to do both. Does it succeed?
A second challenge arises from the fact that the
differential phonetic outcomes relate
specifically to word frequency. Standard
generative models do not encode word frequency.
They treat the word frequency effects as
matters of linguistic performance rather than
linguistic competence. Thus the intrusion of word
frequency into a traditional area of linguistics,
namely to conditioning of allophony, is not
readily accommodated in the classical generative
viewpoint. The exemplar model is the only current
model which has these properties.
it does mean that there is no variable rule of
t/d-deletion. Rather there is a gradual process
of shortening or reducing the lingual gesture
If we take linguistic behavior to be highly
practiced neuromotor activity then we can view
reductive sound changes as the result of the
automation of linguistic production. It is well
known that repeated neuromotor patterns become
more efficient as they are practiced transitions
are smoothed by the anticipatory overlap of
gestures, and unnecessary or extreme gestures
decrease in magnitude or are omitted.
32References
Anttila, Arto. 1997. Deriving variation from
grammar. In Frans Hinskens, Roeland van Hout and
Leo Wetzels, eds. Variation, Change and
Phonological Theory, Amsterdam John Benjamins.
p. 35-68. Benua, Laura. 2000. Phonological
Relations Between Words. New York
Garland. Boersma, Paul and Bruce Hayes. 2000.
Empirical tests of the Gradual Learning
Algorithm. Linguistic Inquiry, 32 45-86. Bonet,
Eulà lia. 2004. Morph insertion and allomorphy in
Optimality Theory. International Journal of
English Studies, 473-104. Bybee, Joan L. 1985.
Morphology A Study of the Relation Between
Meaning and Form. Amsterdam Benjamins. Bybee,
Joan L. 2000. The phonology of the lexicon
evidence from lexical diffusion. In Michael
Barlow and Suzanne Kemmer, eds. Usage-Based
Models of Language. Stanford CSLI Publications.
p. 65-85. Bybee, Joan. 2001. Phonology and
Language Use. Cambridge Cambridge University
Press. Bybee, Joan. 2002. Word frequency and
context of use in the lexical diffusion of
phonetically conditioned sound change. Language
Variation and Change, 14261-290. Bybee, Joan L.
and Dan I. Slobin. 1982. Rule and schemas in the
development and use of the English past tense.
Language, 58265-289. Coetzee, Andries W. 2006.
Variation as accessing non-optimal candidates.
Phonology, 23337-385. Itô, Junko and Armin
Mester. 1995. The core-periphery structure of the
lexicon and constraints on reranking. In J.
Beckman, S. Urbanczyk, and L. Walsh, eds.
University of Massachusetts Occasional Papers in
Linguistics 18 Papers in Optimality Theory,
Amherst GLSA. p. 181-209.
Itô, Junko and Armin Mester. 1999. The structure
of the phonological lexicon. In Tsujimura
Natsuko, ed. The Handbook of Japanese
Linguistics. Malden Blackwell. p. 62-100. Itô,
Junko and Armin Mester. 2006. Indulgentia
parentum filiorum pernicies Lexical allomorphy
in Latin and Japanese. In Eric Bakovic, Junko
Ito, and John McCarthy, eds. Wondering at the
Natural Fecundity of Things Essays in Honor of
Alan Prince. Paper 9. (http//repositories.cdlib.o
rg/lrc/prince/9). Hooper, Joan B. 1976. Word
frequency in lexical diffusion and the source of
morphological change. In William M. Christie,
ed. Current Progress in Historical Linguistics.
Amsterdam North-Holland Publishing Co. p.
95-105.  Kager, René. 1996. On affix allomorphy
and syllable counting. In Ursula Kleinhenz, ed.
Interfaces in Phonology. Berlin Akademie
Verlag. p. 155-171. Kenstowicz, Michael. 1996.
Base-identity and uniform exponence alternatives
to cyclicity. In Current Trends in Phonology
Models and methods. In J. Durand and B. Laks,
eds. Paris-X and Salford University of Salford
Publications. p. 363-393 Labov, William. 1972.
The internal evolution of linguistic rules. In
Robert P. Stockwell and Ronald K.S. Maucaulay,
eds. Linguistic Change and Generative Theory.
Bloomington Indiana University Press. p.
101-171. Mascaró, Joan. 1996. External
allomorphy as emergence of the unmarked. In
Jacques Durand and Bernard Laks, eds. Current
Trends in Phonology Models and Methods.
Salford, Manchester European Studies Research
Institute, University of Salford. pp. 473-83.
33References
Pater, Joe. 1994. Against the underlying
specification of an exceptional English stress
pattern. Toronto Working Papers in Linguistics,
1395-121. Pater, Joe. 2000. Non-uniformity in
English secondary stress the role of ranked and
lexically specific constraints. Phonology,
17237-274. Pater, Joe. 2006. The Locus of
Exceptionality Morpheme-Specific Phonology as
Constraint Indexation. In L. Bateman, M. O'Keefe,
E. Reilly, and A. Werle, eds. University of
Massachusetts Occasional Papers in Linguistics
32 Papers in Optimality Theory III. Amherst
GLSA. p. 259-296. Pater, Joe. to appear.
Morpheme-specific phonology constraint
indexation and inconsistency resolution. In Steve
Parker, ed. Phonological Argumentation. London
Equinox Publishers. Phillips, Betty S. 1984. Word
frequency and the actuation of sound change.
Language, 60320-342. Phillips, Betty S. 2001.
Lexical diffusion, lexical frequency, and lexical
analysis. In Joan Bybee and Paul Hopper, eds.
Frequency and the Emergence of Linguistic
Structure. Amsterdam John Benjamins. p.
123-136. Pierrehumbert, Janet. 2001. Exemplar
dynamics Word frequency, lenition, and contrast.
In Joan Bybee and Paul Hopper, eds. Frequency
Effects and the Emergence of Lexical Structure.
Amsterdam John Benjamins. p. 137-157. Reynolds,
Bill. 1994. Variation and Phonological Theory.
Ph.D. dissertation, University of
Pennsylvania. Sankoff, David. 1988. Variable
rules. In Ulrich Ammon, Norbert Dittmar and Klaus
J. Mattheier, eds. Sociolinguistics An
International Handbook of the Science of Language
and Society. Berlin New York Walter de
Gruyter. p. 984-997.
Santa Ana, Otto. 1991. Phonetic Simplification
Processes in the English of the Barrio A
Cross-Generational Sociolinguistic Study of the
Chicanos of Los Angeles. Ph.D. Dissertation,
University of Pennsylvania. Steriade, Donca.
1997. Phonetics in Phonology The Case of
Laryngeal Neutralization. Ms. UCLA. Tesar, Bruce,
Paul Smolensky. 1998. Learnability in
Optimality Theory. Linguistic Inquiry,
29229-268. Tesar, Bruce, Paul Smolensky. 2000.
Learnability in Optimality Theory. Cambridge, MA
MIT Press. Zuraw, Kie. 2000. Patterned
Exceptions in Phonology. Ph.D. dissertation, UCLA.
34Die einde