Title: Stressing what is important: Orthographic cues and Lexical Stress Assignment
1Stressing what is important Orthographic cues
and Lexical Stress Assignment
- Nada eva
- University of York, UK
- Padraic Monaghan
- Lancaster University, UK
- Joanne Arciuli
- Charles Hurst University, Australia
2Previous models of reading in English
- Dual-route cascade (DRC) model
- (Coltheart, 2000 Coltheart, Rastle, Perry,
Langdon, Ziegler, 2001) - rule-based model (Grapheme-to-phoneme (GPC)
rules for novel words) - Connectionist models
- (Harm Seidenberg, 1999, 2004 Plaut,
McClelland, Seidenberg, Patterson, 1996
Seidenberg McClelland, 1989) - -triangle model (Harm Seidenberg, 2004)
interaction between orthography, phonology and
semantics - Connectionist Dual Process (CDP) model (Perry,
Ziegler, Zorzi, 2007)
3- Problems
- Only monosyllabic words
- - There is only approx. 8500 monosyllabic words
in English and over 50000 polysyllabic words - - Extension to other languages
- Increased complexity in grapheme-to-phoneme
coding in polysyllabic words - hothouse
- Stress assignment
4- Stress and spoken words processing
- lexical access (Donselar et al., 2005
Soto-Franco et al., 2001) - the division of words into sub lexical units such
as onset-rime (Goswami, 2003 Wood, 2006) - word, phrase, sentence boundaries (Cutler et al.,
1997 Sebastian-Galles Costa, 1997)
5- Stress and written words processing
- Stress sensitivity facilitate learning of reading
(Wood Terrel, 1998 Wood, 2006) and stress
assignment in second language learning
(Wade-Woolley et al, 2004 Goetry et al, 2006) - Stress representation is activated during silent
reading (Ashby Clifton, 2005)
6-
- Nature of the stress representation?
- Current theories on word production state that
lexical stress is a part of the metrical
representation which is retrieved or computed
parallel to phonological encoding (Caramazza,
1997 Levelt, Roelofs, Meyer, 1999 Schiller,
2006). - Reading and stress assignment in languages with
non-fixed stress placement (English, Dutch,
Italian)? - English
- ZEbra (trochaic) vs. GiRAffe (iambic)
- 70 30
7- Rastle Coltheart (2000) model proposed a
system of sub-lexical rules which will translate
orthographic representation to both segmental and
suprasegmental parts of phonological
representation.
8- Rastle Coltheart (2000) model
- a) Represents part of the Dual-route Cascade
(DRC) model of reading (Coltheart et al., 2001) -
- b) linguistic analysis of stress patterns in
English by Fudge (1984) and Garde (1968) - 54 beginnings and 101 endings (most of them
were morphemes in English) could influence the
placement of stress
9Steps in the algorithm 1) identification of
predefined beginnings and then endings 2)
translation of the remaining parts of words into
phonological representation by using
grapheme-to-phoneme (GPC) rules plus a set of
additional rules for correction of illegal
phoneme combinations 3) stress assignment
based on the stored affix stress position and
the quality of the vowels (presence of schwa)
4) if no prefix and suffix was identified,
application of first syllable stress as the
default stress position.
10- Correct stress assignment for 89.7 of English
disyllabic words from the CELEX database (Baayen
et al., 1993). - Nonwords test
- 210- 115 trochaic and 95 iambic words
- 15 subjects estimated stress position in reading
aloud task -
- -84.8 correct stress assignment on the non-word
test. -
11- Problems?
- Is this really sublexical procedure given the
role of affixes in the stress assignment process? - What is the role of orthography?
12 13- The statistical regularities with respect to
stress assignment could be learned in the same
way as the learning of regularities in the
orthography to phonology mapping (Harm
Seidenberg, 1999, 2004 Plaut et al., 1996
Seidenberg McClelland, 1989).
14- Distributional cues
- general (trochaic words more frequent)
- nouns (trochaic) vs. verbs (iambic) (Kelly
Bock, 1988 Serano, 1986) - Phonological cues
- the rime reduced vowels are unstressed and
consonantal clusters in codas are stressed
(Chomsky Halle, 1968) - the onset consonantal clusters (Kelly, 2004).
- Orthographic cues
- length and complexity of beginnings and
endings, the identity letters (both consonants
and vowels) (Arciuli Cupples, 2006, in press
Kelly, Morris Verrekia, 1998).
15- Experimental studies have demonstrated that
readers are sensitive to such phonological,
orthographic and distributional cues present in
the input (Arciuli Cupples, 2006, in press
Colomobo, 1992 Kelly Bock, 1988 Kelly et al.,
1998)
16Corpus analyses of orthographic cues
To what extent can beginnings and endings predict
stress position?
17Corpus analyses of orthographic cues
- Disyllabic words from CELEX
- with distinct orthography and/or pronunciation
and/or grammatical category count as separate
words. - All words
- 18,571 1st syllable stress, 2387 2nd syllable
stress - Lemma analyses (no inflectional morphology)
- 9485 1st syllable stress, 1813 2nd syllable
stress - Monomorphemic analyses (no inflectional or
derivational morphology) - 2420 1st syllable stress, 375 2nd syllable stress
18Analysis
- Discriminant analysis
- used to determine which variables discriminate
between trochaic vs. iambic words. - Type and token analysis (weighted by frequency)
19Beginnings and endings
- Beginning cue
- Orthography up to and including first vowel (as
in Arciuli Cupples, 2006) - 789 distinct beginnings
- Ending cue
- Orthography from final vowel onwards
- 1411 distinct endings
- E.g.
- penguin pe-, -uin
20Results All Words
21Results Lemmata
22Results Monomorphemes
23- The Educators Word Frequency Guide (Zeno,
1995). - a quantitative summary of the printed vocabulary
encountered by students in American schools. - 60,527 samples of text from over 6,000 textbooks,
works of literature, and popular works of fiction
and nonfiction. - from grade 1(age of 5) to college.
24Results Tokens
25Educators WFL vs. Celex
26- There is a large amount of potential information
in orthography beginnings/endings - That goes well beyond morphemes
- Most beginnings/endings were not morphemes
- For all analyses, better classification from
endings than beginnings (more for children than
for adults)
27Modelling
28- 25016 English disyllabic words
- CELEX lexical database (Baayen et al., 1993)
- 83 trochaic, 17 iambic
-
- learning rate0.005
- alignment left
- 5 million presentations of words, selected
according to their log-compressed frequency - 20 simulations
- 90 training, 10 testing, randomly selected
29d2.1
d2.6
d3.8
30- nouns vs. verbs contrast as a noun versus
contrast as a verb - overgeneralization errors
- ab- about, above, abroad (second syllable)
- CELEX
- 60 ab- (51897) 2nd syllable stress,
- 21 ab- (7708) 1st syllable stress
- error abject.
- evenly distributed errors
- con-
- CELEX
- 101 con- (13008) 1st syllable stress,
- 169 con- (44292) 2nd syllable stress
-
- errors 38 1st syllable
- 44 2nd syllable stress
31- Test on Rastle Coltheart (2000) nonwords?
32RC 2000 nonwords
33- no-/-ate
- nonword nockate (second syllable)
- CELEX
- 104 no- (22077) 1st syllable stress,
- 15 no- (285) 2nd syllable stress
- 108 -ate (6565) 1st syllable stress,
- 165 -ate (3608) 2nd syllable stress
-
34- Why does RC model exhibit better performance
than neural networks? - Limited and non-representative training set for
NN models
35- Training on all polysyllabic words with the
stress on 1st or 2nd syllable - 51948 words, 89.6 of the polysyllabic word types
in the CELEX database. -
- 68.6 1st syllable and 31.4 second syllable
words - (dysillabic words 87 trochaic vs. 13
iambic words) -
36(No Transcript)
37- Why does RC model exhibit better performance
than neural networks? - Limited training set for NN models
- - Explicitly define beginnings and endings
38- Kelly(2004) non-words
- 96 non-words varying in onset complexity
- ½ C onset - pamdeen
- ½ CC onset plamdeen
- 78 trochaic vs.18 iambic words
- 20 subjects in silent reading task
39Kelly2004 nonwords
Trochaic
Iambic
40Kelly2004 results
41- RC(2000) model
- 1/3 of errors were from the noprefix/nosuffix
class of words - (bolay, wispay)
- co- (colvane, corlax)
- Conflicting cues (beginning vs. endings)
- plamdeen, gronvoon
- pl-, gr- (complex onset) trochaic words
- -een, -oon (suffix) iambic words
-
42- Why does RC model exhibit better performance
than neural networks? - Limited training set for NN models
- - Explicitly define beginnings and endings
- Phonology and/or parts-of-speech information
43(No Transcript)
44Phonology and Parts-of-speech
d0.74
d3.06
45- Multiple cue accounts have been shown to
result in more accurate classification in - speech segmentation tasks
- (Onnis, Monaghan, Chater, Richmond, 2005)
- grammatical categorisation tasks (Monaghan,
Christiansen, Chater, 2007).
46(No Transcript)
47Orthography, Phonology, Parts-of-speech
48- What is the role of orthography?
- Orthography and other cues?
- Rule-based vs. connectionist account?
- Sublexical nature of the stress assignment?
49Conclusions
- The present study provided a demonstration that
stress assignment for words and nonwords can be
accomplished with good accuracy in a
connectionist model that learns to map
orthography onto stress position for disyllabic
words in English. - Additional simulations indicated that combination
of orthographical, phonological and
distributional cues can give improved performance
in the stress assignment task.
50Conclusions
- Rule-based vs. connectionist accounts
- Connectionist account allowed more detailed
exploration of different cues relevant for the
stress assignment - Stress assignment is clearly part of sublexical
process.
51Further simulations
- Further testing on novel sets of nonwords,
including phonological and distributional
information - Cross-linguistic comparison with Italian
- Simulations of the developmental results.
52- This work was supported by the ESRC/ARC
Bilateral Research Awards Grant, RES 000-22-1975.
53