Parsing acoustic variability as a mechanism for feature abstraction presentation

About This Presentation

Transcript and Presenter's Notes

Title: Parsing acoustic variability as a mechanism for feature abstraction

1
Parsing acoustic variability as a mechanism for
feature abstraction
Jennifer Cole Bob McMurray Gary Linebaugh
Cheyenne Munson University of Illinois
University of Iowa
www.psychology.uiowa.edu/faculty/mcmurray
2
Phonetic precursors to phonological sound patterns

Many phonological sound patterns are claimed to
have precursors in systematic phonetic variation
that arises due to coarticulation

Assimilation
Vowel harmony from V-to-V coarticulation
(Ohala 1994 Beddor et al. 2001)
Palatalization from V-to-C coarticulation
(Ohala 1994)
Nasal Place assimilation (-mb, -nd, -?g) from
C-to-C coarticulation
(Browman Goldstein 1991)

Assimilation
Epenthesis
Epenthetic stops from C-C coarticulation
sentse
(Ohala 1998)

Assimilation
Epenthesis
Deletion
Consonant cluster simplification via deletion
from C-C coarticulation perfec(t) memory
(Browman Goldstein 1991)

3
The role of the listener

Phonologization
when acoustic properties that arise due to
coarticulation are interpreted by the listener as
primary phonological properties of the target
sound.
generalization over variable acoustic input that
results in a new constraint on sound patterning.

4
The role of the listener

From V-to-V coarticulation

i
?
?
?
5
The role of the listener

From V-to-V coarticulation
?ii

???
i
?
?
?
6
The role of the listener

Perception may yield vowel assimilation
?ii

i
???
?
i
?
?
?
7
The role of the listener

But distinct factors can produce similar
variants
?ii

? ?

i
?
?
?
8
From perception to phonology

What is the mechanism for mapping from continuous
perceptual features to phonological categories?
?i mid and high
central and front-peripheral
?? mid and low
central and back

9
From perception to phonology

What is the mechanism for mapping from continuous
perceptual features to phonological categories?
?i mid and high
central and front-peripheral
?? mid and low
central and back

The problem
The perceptual system is confronted with
uncertainty due to variation arising from
multiple sources.

Yet, patterns of variation must get associated
with individual features of the context vowel
(e.g,. high, front) if coarticulation serves as a
precursor to phonological assimilation.

How do lawful, categorical patterns emerge from
ambiguous, variable input?
the lack of invariance problem!

10
Our claims

What is the mechanism for mapping from continuous
perceptual features to phonological categories?
Our claims

Variability is retained. Acoustic variability
is parsed into components related to the target
segment and the local context.
Feature abstraction through parsing. Acoustic
parsing provides a mechanism for the emergence of
phonological features from patterned variation in
fine phonetic detail.
11
Variability is retained

Listeners are sensitive to fine-grained acoustic
variation. (Goldinger 2000 Hay 2000
Pierrehumbert 2003)

Variability is retained, not discarded

Consistent with exemplar models of the lexicon,
phonetic detail is encoded and stored, and can
inform subsequent categorization of new sound
tokens.
12
Variability is retained

Variability is useful for the identification of
sounds in contexts of coarticulation.
The perceptual system uses information about
variability to identify a sound and its context,
in parallel.

Variability due to coarticulation is exploited
to facilitate perception.
-- Listeners benefit from the presence of
anticipatory coarticulation in predicting the
identity of the upcoming sound.
(Martin Bunnell 1982 Fowler 1981, 1984 Gow
2001, 2003
Munson, this conference)

Variability due to coarticulation is subtracted
to identify the underlying target sound.
(Fowler 1984 Beddor et al. 2001, 2002 Gow 2003)

13
Variability and perceptual facilitation

Perceptual facilitation from V-to-V
coarticulation is expected to occur only if
The effects of coarticulation are systematican
influencing vowel conditions a consistent
acoustic effect on target vowels
The listener can recognize coarticulatory effects
on the target vowel
The listener can isolate the effects of context
vowel from other sources of variation, and
attribute those effects to the context vowel.

14
Feature abstraction through parsing

More specificallyunder coarticulation of vowel
height and backness,
The listener must parse out the portion of the
variance in F1 and F2 that is due to
coarticulation, and base their perception of the
target vowel on the residual values.
Acoustic parsing isolates the effects of context
vowel on F1 and F2.

15
Feature abstraction through parsing

The parsed acoustic variance defines features of
the context vowel, over which new generalizations
can be formed. ? phonologization

?i
? i
? i ? high
16
Feature abstraction through parsing

The parsed acoustic variance defines features of
the context vowel, over which new generalizations
can be formed. ? phonologization

i
? i
? i ? high
phonologized to i
17
Feature abstraction through parsing

The parsed acoustic variance defines features of
the context vowel, over which new generalizations
can be formed. ? phonologization

Question Why phonologization? If target and
context vowels can both be identified from the
fine phonetic detail. Whats the force driving
phonologization?
18
Testing the model

The acoustic parsing model of speech perception
requires that there is a robust and systematic
pattern of acoustic variation from V-to-V
coarticulation.
This paper we present supporting evidence from
an acoustic study of coarticulation.

We examine a range of V-to-V coarticulatory
effects in VCV contexts that cross a word
boundary, where coarticulation cannot be
attributed to lexicalized phonetic patterns.

19

Key Questions

Extent of phenomenon
Does V-to-V coarticulation cross word boundaries?
Does V-to-V coarticulation affect both F1 and F2?
Relative strength of V-to-V effects vs. other
forms of coarticulation?
Usefulness of phenomenon
How could V-to-V effects translate to perceptual
inferences?
Is the information by V-to-V coarticulation
different when other sources of variation are
explained?

20

Methods
Target vowels ? ? Measure coarticulation Cont
ext vowels i æ ? Induce Coarticulation

/u/ excluded from contexts (rounded fronted)
intervening consonant varied in
- place (labial, coronal, velar)
- voicing
- /?g/ excluded (tends to be raised)

21

Methods
22

Methods

Methods
10 University of Illinois students.
48 phrases x 3 repetitions.
Sentences embedded in neutral carrier sentences
/?/ He said _______ all the time
/?/ I love _______ as a title
Coding
F1, F2, F3
- Converted to Bark for analysis
LPC (Burg Method)
Outliers / misproductions inspected by hand

23

Analysis
Target x Voicing x Context F1 F2 Voicing
p.033 p.001 Target p.005 p.001 Context p.
001 p.001 Interactions n.s. n.s.
V-to-V coarticulation crosses word boundaries.
Clear effects of coarticulatory context on both
F1 and F2.
Target x Place x Context F1 F2 Place n.s.
p.001 Target p.01 p.001 Context p.001 p.0
01 Interactions some some
24
Analysis
Male

A lot of unexplained variance
How does the perceptual system get to the
V-to-V coarticulation?
How useful is V-to-V coarticulation?
Does accounting for other sources of variance in
the signal improve the usefulness of V-to-V?

Female
25
Strategy
Need to systematically account for sources of
variance prior to evaluating V-to-V
coarticulation.
?-coarticulated ?? or i-coarticulated ??
26
Strategy
Need to systematically account for sources of
variance prior to evaluating V-to-V
coarticulation.
A slightly i-coarticulated ?? or A really
i-coarticulated ??
27
Strategy
Need to systematically account for sources of
variance prior to evaluating V-to-V
coarticulation.
If you knew the category If ?, then expect
i If ? then expect ?
? - ? Positive (more i-like) ? - ? Negative
(more ?-like)
F2? F2category coarticulation direction
28
Strategy
Target F2? coarticulation direction
Strategy 1) Compute mean of a source of
variance 2) Subtract that mean from
F1/F2 3) Residual is coarticulation
direction. 4) Repeat for each source of variance
(speaker, target vowel, place, voicing).
29
Strategy
Hierarchical Regression can do exactly these
things. 1) Compute mean of a source of variance
F1predicted ?1 target ?0 If target 0
for /?/ and 1 for /?/ ?) F1predicted ?1 0
?0 Mean /?/ ?0 ?) F1predicted ?1 1
?0 Mean /?/ ?0 ?1
30
Strategy

Hierarchical Regression can do exactly these
things.
Compute mean of a source of variance.
Subtract that mean from F1/F2
3) Residual is coarticulation direction.

Residual F1actual - F1predicted
F1actual - (?1 target ?0) ?) Residtarget
F1actual - ?0 ?) Residtarget F1actual - (?0
?1)
31
Strategy

Hierarchical Regression can do exactly these
things.
Compute mean of a source of variance.
Subtract that mean from F1/F2
Residual is coarticulation direction.
4) Repeat for each source of variance (speaker,
target vowel, place, voicing).

F1 ?0 Target ?0
Residtarget ?2 Place ?0
Residplace ?3 Voicing ?0
Residvoicing ?4 V-to-V ?0
32
Strategy

Construct a hierarchical regression to
systematically account for known sources of
variance from F1 and F2
Speaker
Target vowel
Place (intervening C)
Voicing (intervening C)
Interactions between target, place voicing
After partialing out these factors, how much
variance does vowel context (V-to-V) account for?

33
Regression F2
1) Raw Data
Male
Female
34
Regression F2

1) Raw Data
Partialed Out
2) Subject

?
?
35
Regression F2

1) Raw Data
Partialed Out
2) Subject
3) Target Vowel

36
Regression F2

1) Raw Data
Partialed Out
2) Subject
3) Target Vowel
4) Consonant

37
Regression F2

1) Raw Data
Partialed Out
2) Subject
3) Target Vowel
4) Consonant
5) Interactions

38
Regression F1
39
Regression F1
40
Regression F1
Total R2.884
Post-hoc analysis height only.
41
Regression F1
Total R2.884
Post-hoc analysis height only.
42
Regression F2
43
Regression F2
44
Regression F2
45
Regression F2
Total R2.940
Post-hoc analysis height backness.
46
Regression Summary
Progressively accounting for variance is
powerful F1 88 of variance F2 94 of
variance using only known sources of
variance V-to-V coarticulation is readily
apparent when other sources of variance are
explained.
Effect of V-to-V coarticulation has a similar
size to place/voicing effects.
How useful would this be?
47
Predicting Vowel Identity

Multinomial Logistic Regression (MLR)
Classification algorithm
Predict category membership from multiple
variables.
Categories do not have to be binary

48
Predicting Vowel Identity

Multinomial Logistic Regression (MLR)
Classification algorithm
Predict category membership from multiple
variables.
Categories do not have to be binary

Assumes optimal listener.
Computes correct.
How much well could a listener do under ideal
circumstances with information provided.

49
Predicting Vowel Identity
60
50
Partialed out Subject Vowel Place Voicing Interact
ions
40
Correct
30
20
10
0
i
?
æ
Same
Vowel
Model does quite well at predicting all vowels
but the identity.
50
Predicting Vowel Identity
51
Predicting Vowel Identity
Does partialing out other sources of variance
improve the utility of V-to-V coarticulation? -
Use linear regression to partial out variance.
- Use F1, F2 residuals to predict vowels.
FULL Partial out everything RAW No
parsing SPEAKER Partial out speaker variation
only. Assume speaker normalization, but no
interactions between consonant, or vowel and
V-to V. VOWEL Partial out effects of everything
heard at the target vowel (speaker
target) NO-SPKR Assume no normalization, but
interactions between consonants.
52
Predicting Vowel Identity
45
43
41
39
37
Correct
35
33
31
29
27
25
FULL
VOWEL
SPEAKER
NO-SPKR
RAW
FULL about 4 better than others. VOWEL parsing
out consonant may not be necessary SPEAKER
Effect of speaker and phonetic cues similar. RAW
V-to-V not useful without some parsing.
53
Predicting Vowel Identity
3) Use residuals to predict context vowel
1) Parse out speaker effects on target
target vowel
consonant
context vowel
preceding context
2) Regressively compensate for consonant
coarticulation
Suggests a 3-stage parsing process to maximally
use V-to-V modifications.
54

Key Questions

Extent of phenomenon
Word boundaries?
Both F1 and F2?
Relative strength of V-to-V effects?
Usefulness of phenomenon
Perceptual inferences?
Parsing our variability?

55
Summary Extent

Clear evidence for V-to-V coarticulation across
word boundariesnot lexicalized.
V-to-V in both formants (height backness).
Strength is similar to that of place and voicing.
Known sources of variance (speaker, vowel,
consonant, V-to-V) can account for most of the
variability in vowel production.
Problem of lack of invariance?
Identifying multiple categories at once may be
easier than identifying one.

56
Summary Usefulness

Idealized listener ( parsing) could identify
upcoming vowel at 40 correct given only V-to-V
coarticulation.
- Near 50 for /i/ and /?/
Parsing dramatically improves predictive power of
V-to-V coarticulation
Do you need perfect categorization of variance
sources (e.g. speaker, target vowel, voicing)?
Imperfect categorization enhances need for
multiple cues.
Simultaneously evaluating multiple features (e.g.
V1, C, V2) yields correct parse.
How do you determine the order of parsing?
- Temporal order of information arrival?

57
Future Directions

How do you identify the components you will be
parsing?
See Toscano poster.
Does the model actually describe perception?
Parsing is a temporal process.
Visual world paradigm to time-course of
processing (e.g. McMurray, Clayards, Tanenhaus,
in prep McMurray, Tanenhaus Aslin, 2002
McMurray, Munson Gow, submitted).
Parsing as part of word recognition.
Lexical structure can contribute to inferences.
Interactive activation models (McClelland
Elman, 1986) could implement this.

58
Conclusions

Where do features come from?
Emerge out of progressively accounting for
sources of variance from signal.
Any chunk (segment) of the input can provide
multiple features.
Speaker normalization may work by same process.
Why phonologize?
Eliminates one step of parsing.
How does the system balance need for features
with utility of fine-grained detail?
Features provide tag to parse variance and
utilize continuous detail.

Write a Comment

User Comments (0)

About PowerShow.com

Parsing acoustic variability as a mechanism for feature abstraction PowerPoint PPT Presentation