Title: Parsing acoustic variability as a mechanism for feature abstraction
1Parsing acoustic variability as a mechanism for
feature abstraction
Jennifer Cole Bob McMurray Gary Linebaugh
Cheyenne Munson University of Illinois
University of Iowa
www.psychology.uiowa.edu/faculty/mcmurray
2Phonetic precursors to phonological sound patterns
- Many phonological sound patterns are claimed to
have precursors in systematic phonetic variation
that arises due to coarticulation
- Assimilation
- Vowel harmony from V-to-V coarticulation
- (Ohala 1994 Beddor et al. 2001)
- Palatalization from V-to-C coarticulation
- (Ohala 1994)
- Nasal Place assimilation (-mb, -nd, -?g) from
C-to-C coarticulation - (Browman Goldstein 1991)
- Assimilation
- Epenthesis
- Epenthetic stops from C-C coarticulation
sentse - (Ohala 1998)
-
- Assimilation
- Epenthesis
- Deletion
- Consonant cluster simplification via deletion
from C-C coarticulation perfec(t) memory - (Browman Goldstein 1991)
3The role of the listener
- Phonologization
- when acoustic properties that arise due to
coarticulation are interpreted by the listener as
primary phonological properties of the target
sound. - generalization over variable acoustic input that
results in a new constraint on sound patterning.
4The role of the listener
- From V-to-V coarticulation
-
i
?
?
?
5The role of the listener
- From V-to-V coarticulation
- ?ii
-
???
i
?
?
?
6The role of the listener
- Perception may yield vowel assimilation
- ?ii
-
i
???
?
i
?
?
?
7The role of the listener
- But distinct factors can produce similar
variants - ?ii
? ?
i
?
?
?
8From perception to phonology
- What is the mechanism for mapping from continuous
perceptual features to phonological categories? - ?i mid and high
- central and front-peripheral
-
-
- ?? mid and low
- central and back
9From perception to phonology
- What is the mechanism for mapping from continuous
perceptual features to phonological categories? - ?i mid and high
- central and front-peripheral
-
-
- ?? mid and low
- central and back
- The problem
- The perceptual system is confronted with
uncertainty due to variation arising from
multiple sources.
- Yet, patterns of variation must get associated
with individual features of the context vowel
(e.g,. high, front) if coarticulation serves as a
precursor to phonological assimilation.
- How do lawful, categorical patterns emerge from
ambiguous, variable input? - the lack of invariance problem!
10Our claims
- What is the mechanism for mapping from continuous
perceptual features to phonological categories? - Our claims
Variability is retained. Acoustic variability
is parsed into components related to the target
segment and the local context.
Feature abstraction through parsing. Acoustic
parsing provides a mechanism for the emergence of
phonological features from patterned variation in
fine phonetic detail.
11Variability is retained
- Listeners are sensitive to fine-grained acoustic
variation. (Goldinger 2000 Hay 2000
Pierrehumbert 2003)
- Variability is retained, not discarded
Consistent with exemplar models of the lexicon,
phonetic detail is encoded and stored, and can
inform subsequent categorization of new sound
tokens.
12Variability is retained
- Variability is useful for the identification of
sounds in contexts of coarticulation. - The perceptual system uses information about
variability to identify a sound and its context,
in parallel.
- Variability due to coarticulation is exploited
to facilitate perception. - -- Listeners benefit from the presence of
anticipatory coarticulation in predicting the
identity of the upcoming sound. - (Martin Bunnell 1982 Fowler 1981, 1984 Gow
2001, 2003 - Munson, this conference)
- Variability due to coarticulation is subtracted
to identify the underlying target sound. - (Fowler 1984 Beddor et al. 2001, 2002 Gow 2003)
13Variability and perceptual facilitation
- Perceptual facilitation from V-to-V
coarticulation is expected to occur only if - The effects of coarticulation are systematican
influencing vowel conditions a consistent
acoustic effect on target vowels - The listener can recognize coarticulatory effects
on the target vowel - The listener can isolate the effects of context
vowel from other sources of variation, and
attribute those effects to the context vowel.
14Feature abstraction through parsing
- More specificallyunder coarticulation of vowel
height and backness, - The listener must parse out the portion of the
variance in F1 and F2 that is due to
coarticulation, and base their perception of the
target vowel on the residual values. - Acoustic parsing isolates the effects of context
vowel on F1 and F2.
15Feature abstraction through parsing
- The parsed acoustic variance defines features of
the context vowel, over which new generalizations
can be formed. ? phonologization -
?i
? i
? i ? high
16Feature abstraction through parsing
- The parsed acoustic variance defines features of
the context vowel, over which new generalizations
can be formed. ? phonologization -
i
? i
? i ? high
phonologized to i
17Feature abstraction through parsing
- The parsed acoustic variance defines features of
the context vowel, over which new generalizations
can be formed. ? phonologization -
Question Why phonologization? If target and
context vowels can both be identified from the
fine phonetic detail. Whats the force driving
phonologization?
18Testing the model
- The acoustic parsing model of speech perception
requires that there is a robust and systematic
pattern of acoustic variation from V-to-V
coarticulation. - This paper we present supporting evidence from
an acoustic study of coarticulation.
- We examine a range of V-to-V coarticulatory
effects in VCV contexts that cross a word
boundary, where coarticulation cannot be
attributed to lexicalized phonetic patterns.
19 Key Questions
- Extent of phenomenon
- Does V-to-V coarticulation cross word boundaries?
- Does V-to-V coarticulation affect both F1 and F2?
- Relative strength of V-to-V effects vs. other
forms of coarticulation? - Usefulness of phenomenon
- How could V-to-V effects translate to perceptual
inferences? - Is the information by V-to-V coarticulation
different when other sources of variation are
explained?
20 Methods
Target vowels ? ? Measure coarticulation Cont
ext vowels i æ ? Induce Coarticulation
- /u/ excluded from contexts (rounded fronted)
- intervening consonant varied in
- - place (labial, coronal, velar)
- - voicing
- - /?g/ excluded (tends to be raised)
21 Methods
22 Methods
- Methods
- 10 University of Illinois students.
- 48 phrases x 3 repetitions.
- Sentences embedded in neutral carrier sentences
- /?/ He said _______ all the time
- /?/ I love _______ as a title
- Coding
- F1, F2, F3
- - Converted to Bark for analysis
- LPC (Burg Method)
- Outliers / misproductions inspected by hand
23 Analysis
Target x Voicing x Context F1 F2 Voicing
p.033 p.001 Target p.005 p.001 Context p.
001 p.001 Interactions n.s. n.s.
V-to-V coarticulation crosses word boundaries.
Clear effects of coarticulatory context on both
F1 and F2.
Target x Place x Context F1 F2 Place n.s.
p.001 Target p.01 p.001 Context p.001 p.0
01 Interactions some some
24Analysis
Male
- A lot of unexplained variance
- How does the perceptual system get to the
V-to-V coarticulation? - How useful is V-to-V coarticulation?
- Does accounting for other sources of variance in
the signal improve the usefulness of V-to-V?
Female
25Strategy
Need to systematically account for sources of
variance prior to evaluating V-to-V
coarticulation.
?-coarticulated ?? or i-coarticulated ??
26Strategy
Need to systematically account for sources of
variance prior to evaluating V-to-V
coarticulation.
A slightly i-coarticulated ?? or A really
i-coarticulated ??
27Strategy
Need to systematically account for sources of
variance prior to evaluating V-to-V
coarticulation.
If you knew the category If ?, then expect
i If ? then expect ?
? - ? Positive (more i-like) ? - ? Negative
(more ?-like)
F2? F2category coarticulation direction
28Strategy
Target F2? coarticulation direction
Strategy 1) Compute mean of a source of
variance 2) Subtract that mean from
F1/F2 3) Residual is coarticulation
direction. 4) Repeat for each source of variance
(speaker, target vowel, place, voicing).
29Strategy
Hierarchical Regression can do exactly these
things. 1) Compute mean of a source of variance
F1predicted ?1 target ?0 If target 0
for /?/ and 1 for /?/ ?) F1predicted ?1 0
?0 Mean /?/ ?0 ?) F1predicted ?1 1
?0 Mean /?/ ?0 ?1
30Strategy
- Hierarchical Regression can do exactly these
things. - Compute mean of a source of variance.
- Subtract that mean from F1/F2
- 3) Residual is coarticulation direction.
Residual F1actual - F1predicted
F1actual - (?1 target ?0) ?) Residtarget
F1actual - ?0 ?) Residtarget F1actual - (?0
?1)
31Strategy
- Hierarchical Regression can do exactly these
things. - Compute mean of a source of variance.
- Subtract that mean from F1/F2
- Residual is coarticulation direction.
- 4) Repeat for each source of variance (speaker,
target vowel, place, voicing).
F1 ?0 Target ?0
Residtarget ?2 Place ?0
Residplace ?3 Voicing ?0
Residvoicing ?4 V-to-V ?0
32Strategy
- Construct a hierarchical regression to
systematically account for known sources of
variance from F1 and F2 - Speaker
- Target vowel
- Place (intervening C)
- Voicing (intervening C)
- Interactions between target, place voicing
- After partialing out these factors, how much
variance does vowel context (V-to-V) account for?
33Regression F2
1) Raw Data
Male
Female
34Regression F2
- 1) Raw Data
- Partialed Out
- 2) Subject
?
?
35Regression F2
- 1) Raw Data
- Partialed Out
- 2) Subject
- 3) Target Vowel
36Regression F2
- 1) Raw Data
- Partialed Out
- 2) Subject
- 3) Target Vowel
- 4) Consonant
37Regression F2
- 1) Raw Data
- Partialed Out
- 2) Subject
- 3) Target Vowel
- 4) Consonant
- 5) Interactions
38Regression F1
39Regression F1
40Regression F1
Total R2.884
Post-hoc analysis height only.
41Regression F1
Total R2.884
Post-hoc analysis height only.
42Regression F2
43Regression F2
44Regression F2
45Regression F2
Total R2.940
Post-hoc analysis height backness.
46Regression Summary
Progressively accounting for variance is
powerful F1 88 of variance F2 94 of
variance using only known sources of
variance V-to-V coarticulation is readily
apparent when other sources of variance are
explained.
Effect of V-to-V coarticulation has a similar
size to place/voicing effects.
How useful would this be?
47Predicting Vowel Identity
- Multinomial Logistic Regression (MLR)
- Classification algorithm
- Predict category membership from multiple
variables. - Categories do not have to be binary
48Predicting Vowel Identity
- Multinomial Logistic Regression (MLR)
- Classification algorithm
- Predict category membership from multiple
variables. - Categories do not have to be binary
- Assumes optimal listener.
- Computes correct.
- How much well could a listener do under ideal
circumstances with information provided.
49Predicting Vowel Identity
60
50
Partialed out Subject Vowel Place Voicing Interact
ions
40
Correct
30
20
10
0
i
?
æ
Same
Vowel
Model does quite well at predicting all vowels
but the identity.
50Predicting Vowel Identity
51Predicting Vowel Identity
Does partialing out other sources of variance
improve the utility of V-to-V coarticulation? -
Use linear regression to partial out variance.
- Use F1, F2 residuals to predict vowels.
FULL Partial out everything RAW No
parsing SPEAKER Partial out speaker variation
only. Assume speaker normalization, but no
interactions between consonant, or vowel and
V-to V. VOWEL Partial out effects of everything
heard at the target vowel (speaker
target) NO-SPKR Assume no normalization, but
interactions between consonants.
52Predicting Vowel Identity
45
43
41
39
37
Correct
35
33
31
29
27
25
FULL
VOWEL
SPEAKER
NO-SPKR
RAW
FULL about 4 better than others. VOWEL parsing
out consonant may not be necessary SPEAKER
Effect of speaker and phonetic cues similar. RAW
V-to-V not useful without some parsing.
53Predicting Vowel Identity
3) Use residuals to predict context vowel
1) Parse out speaker effects on target
target vowel
consonant
context vowel
preceding context
2) Regressively compensate for consonant
coarticulation
Suggests a 3-stage parsing process to maximally
use V-to-V modifications.
54 Key Questions
- Extent of phenomenon
- Word boundaries?
- Both F1 and F2?
- Relative strength of V-to-V effects?
- Usefulness of phenomenon
- Perceptual inferences?
- Parsing our variability?
55Summary Extent
- Clear evidence for V-to-V coarticulation across
word boundariesnot lexicalized. - V-to-V in both formants (height backness).
- Strength is similar to that of place and voicing.
- Known sources of variance (speaker, vowel,
consonant, V-to-V) can account for most of the
variability in vowel production. - Problem of lack of invariance?
- Identifying multiple categories at once may be
easier than identifying one.
56Summary Usefulness
- Idealized listener ( parsing) could identify
upcoming vowel at 40 correct given only V-to-V
coarticulation. - - Near 50 for /i/ and /?/
- Parsing dramatically improves predictive power of
V-to-V coarticulation - Do you need perfect categorization of variance
sources (e.g. speaker, target vowel, voicing)? - Imperfect categorization enhances need for
multiple cues. - Simultaneously evaluating multiple features (e.g.
V1, C, V2) yields correct parse. - How do you determine the order of parsing?
- - Temporal order of information arrival?
57Future Directions
- How do you identify the components you will be
parsing? - See Toscano poster.
- Does the model actually describe perception?
- Parsing is a temporal process.
- Visual world paradigm to time-course of
processing (e.g. McMurray, Clayards, Tanenhaus,
in prep McMurray, Tanenhaus Aslin, 2002
McMurray, Munson Gow, submitted). - Parsing as part of word recognition.
- Lexical structure can contribute to inferences.
- Interactive activation models (McClelland
Elman, 1986) could implement this.
58Conclusions
- Where do features come from?
- Emerge out of progressively accounting for
sources of variance from signal. - Any chunk (segment) of the input can provide
multiple features. - Speaker normalization may work by same process.
- Why phonologize?
- Eliminates one step of parsing.
- How does the system balance need for features
with utility of fine-grained detail? - Features provide tag to parse variance and
utilize continuous detail.