Parsing acoustic variability as a mechanism for feature abstraction PowerPoint PPT Presentation

presentation player overlay
1 / 58
About This Presentation
Transcript and Presenter's Notes

Title: Parsing acoustic variability as a mechanism for feature abstraction


1
Parsing acoustic variability as a mechanism for
feature abstraction
Jennifer Cole Bob McMurray Gary Linebaugh
Cheyenne Munson University of Illinois
University of Iowa
www.psychology.uiowa.edu/faculty/mcmurray
2
Phonetic precursors to phonological sound patterns
  • Many phonological sound patterns are claimed to
    have precursors in systematic phonetic variation
    that arises due to coarticulation
  • Assimilation
  • Vowel harmony from V-to-V coarticulation
  • (Ohala 1994 Beddor et al. 2001)
  • Palatalization from V-to-C coarticulation
  • (Ohala 1994)
  • Nasal Place assimilation (-mb, -nd, -?g) from
    C-to-C coarticulation
  • (Browman Goldstein 1991)
  • Assimilation
  • Epenthesis
  • Epenthetic stops from C-C coarticulation
    sentse
  • (Ohala 1998)
  • Assimilation
  • Epenthesis
  • Deletion
  • Consonant cluster simplification via deletion
    from C-C coarticulation perfec(t) memory
  • (Browman Goldstein 1991)

3
The role of the listener
  • Phonologization
  • when acoustic properties that arise due to
    coarticulation are interpreted by the listener as
    primary phonological properties of the target
    sound.
  • generalization over variable acoustic input that
    results in a new constraint on sound patterning.

4
The role of the listener
  • From V-to-V coarticulation

i
?
?
?
5
The role of the listener
  • From V-to-V coarticulation
  • ?ii

???
i
?
?
?
6
The role of the listener
  • Perception may yield vowel assimilation
  • ?ii

i
???
?
i
?
?
?
7
The role of the listener
  • But distinct factors can produce similar
    variants
  • ?ii

? ?

i
?
?
?
8
From perception to phonology
  • What is the mechanism for mapping from continuous
    perceptual features to phonological categories?
  • ?i mid and high
  • central and front-peripheral
  • ?? mid and low
  • central and back

9
From perception to phonology
  • What is the mechanism for mapping from continuous
    perceptual features to phonological categories?
  • ?i mid and high
  • central and front-peripheral
  • ?? mid and low
  • central and back
  • The problem
  • The perceptual system is confronted with
    uncertainty due to variation arising from
    multiple sources.
  • Yet, patterns of variation must get associated
    with individual features of the context vowel
    (e.g,. high, front) if coarticulation serves as a
    precursor to phonological assimilation.
  • How do lawful, categorical patterns emerge from
    ambiguous, variable input?
  • the lack of invariance problem!

10
Our claims
  • What is the mechanism for mapping from continuous
    perceptual features to phonological categories?
  • Our claims

Variability is retained. Acoustic variability
is parsed into components related to the target
segment and the local context.
Feature abstraction through parsing. Acoustic
parsing provides a mechanism for the emergence of
phonological features from patterned variation in
fine phonetic detail.
11
Variability is retained
  • Listeners are sensitive to fine-grained acoustic
    variation. (Goldinger 2000 Hay 2000
    Pierrehumbert 2003)
  • Variability is retained, not discarded

Consistent with exemplar models of the lexicon,
phonetic detail is encoded and stored, and can
inform subsequent categorization of new sound
tokens.
12
Variability is retained
  • Variability is useful for the identification of
    sounds in contexts of coarticulation.
  • The perceptual system uses information about
    variability to identify a sound and its context,
    in parallel.
  • Variability due to coarticulation is exploited
    to facilitate perception.
  • -- Listeners benefit from the presence of
    anticipatory coarticulation in predicting the
    identity of the upcoming sound.
  • (Martin Bunnell 1982 Fowler 1981, 1984 Gow
    2001, 2003
  • Munson, this conference)
  • Variability due to coarticulation is subtracted
    to identify the underlying target sound.
  • (Fowler 1984 Beddor et al. 2001, 2002 Gow 2003)

13
Variability and perceptual facilitation
  • Perceptual facilitation from V-to-V
    coarticulation is expected to occur only if
  • The effects of coarticulation are systematican
    influencing vowel conditions a consistent
    acoustic effect on target vowels
  • The listener can recognize coarticulatory effects
    on the target vowel
  • The listener can isolate the effects of context
    vowel from other sources of variation, and
    attribute those effects to the context vowel.

14
Feature abstraction through parsing
  • More specificallyunder coarticulation of vowel
    height and backness,
  • The listener must parse out the portion of the
    variance in F1 and F2 that is due to
    coarticulation, and base their perception of the
    target vowel on the residual values.
  • Acoustic parsing isolates the effects of context
    vowel on F1 and F2.

15
Feature abstraction through parsing
  • The parsed acoustic variance defines features of
    the context vowel, over which new generalizations
    can be formed. ? phonologization

?i
? i
? i ? high
16
Feature abstraction through parsing
  • The parsed acoustic variance defines features of
    the context vowel, over which new generalizations
    can be formed. ? phonologization

i
? i
? i ? high
phonologized to i
17
Feature abstraction through parsing
  • The parsed acoustic variance defines features of
    the context vowel, over which new generalizations
    can be formed. ? phonologization

Question Why phonologization? If target and
context vowels can both be identified from the
fine phonetic detail. Whats the force driving
phonologization?
18
Testing the model
  • The acoustic parsing model of speech perception
    requires that there is a robust and systematic
    pattern of acoustic variation from V-to-V
    coarticulation.
  • This paper we present supporting evidence from
    an acoustic study of coarticulation.
  • We examine a range of V-to-V coarticulatory
    effects in VCV contexts that cross a word
    boundary, where coarticulation cannot be
    attributed to lexicalized phonetic patterns.

19

Key Questions
  • Extent of phenomenon
  • Does V-to-V coarticulation cross word boundaries?
  • Does V-to-V coarticulation affect both F1 and F2?
  • Relative strength of V-to-V effects vs. other
    forms of coarticulation?
  • Usefulness of phenomenon
  • How could V-to-V effects translate to perceptual
    inferences?
  • Is the information by V-to-V coarticulation
    different when other sources of variation are
    explained?

20

Methods
Target vowels ? ? Measure coarticulation Cont
ext vowels i æ ? Induce Coarticulation
  • /u/ excluded from contexts (rounded fronted)
  • intervening consonant varied in
  • - place (labial, coronal, velar)
  • - voicing
  • - /?g/ excluded (tends to be raised)

21

Methods
22

Methods
  • Methods
  • 10 University of Illinois students.
  • 48 phrases x 3 repetitions.
  • Sentences embedded in neutral carrier sentences
  • /?/ He said _______ all the time
  • /?/ I love _______ as a title
  • Coding
  • F1, F2, F3
  • - Converted to Bark for analysis
  • LPC (Burg Method)
  • Outliers / misproductions inspected by hand

23

Analysis
Target x Voicing x Context F1 F2 Voicing
p.033 p.001 Target p.005 p.001 Context p.
001 p.001 Interactions n.s. n.s.
V-to-V coarticulation crosses word boundaries.
Clear effects of coarticulatory context on both
F1 and F2.
Target x Place x Context F1 F2 Place n.s.
p.001 Target p.01 p.001 Context p.001 p.0
01 Interactions some some
24
Analysis
Male
  • A lot of unexplained variance
  • How does the perceptual system get to the
    V-to-V coarticulation?
  • How useful is V-to-V coarticulation?
  • Does accounting for other sources of variance in
    the signal improve the usefulness of V-to-V?

Female
25
Strategy
Need to systematically account for sources of
variance prior to evaluating V-to-V
coarticulation.
?-coarticulated ?? or i-coarticulated ??
26
Strategy
Need to systematically account for sources of
variance prior to evaluating V-to-V
coarticulation.
A slightly i-coarticulated ?? or A really
i-coarticulated ??
27
Strategy
Need to systematically account for sources of
variance prior to evaluating V-to-V
coarticulation.
If you knew the category If ?, then expect
i If ? then expect ?
? - ? Positive (more i-like) ? - ? Negative
(more ?-like)
F2? F2category coarticulation direction
28
Strategy
Target F2? coarticulation direction
Strategy 1) Compute mean of a source of
variance 2) Subtract that mean from
F1/F2 3) Residual is coarticulation
direction. 4) Repeat for each source of variance
(speaker, target vowel, place, voicing).
29
Strategy
Hierarchical Regression can do exactly these
things. 1) Compute mean of a source of variance
F1predicted ?1 target ?0 If target 0
for /?/ and 1 for /?/ ?) F1predicted ?1 0
?0 Mean /?/ ?0 ?) F1predicted ?1 1
?0 Mean /?/ ?0 ?1
30
Strategy
  • Hierarchical Regression can do exactly these
    things.
  • Compute mean of a source of variance.
  • Subtract that mean from F1/F2
  • 3) Residual is coarticulation direction.

Residual F1actual - F1predicted
F1actual - (?1 target ?0) ?) Residtarget
F1actual - ?0 ?) Residtarget F1actual - (?0
?1)
31
Strategy
  • Hierarchical Regression can do exactly these
    things.
  • Compute mean of a source of variance.
  • Subtract that mean from F1/F2
  • Residual is coarticulation direction.
  • 4) Repeat for each source of variance (speaker,
    target vowel, place, voicing).

F1 ?0 Target ?0
Residtarget ?2 Place ?0
Residplace ?3 Voicing ?0
Residvoicing ?4 V-to-V ?0
32
Strategy
  • Construct a hierarchical regression to
    systematically account for known sources of
    variance from F1 and F2
  • Speaker
  • Target vowel
  • Place (intervening C)
  • Voicing (intervening C)
  • Interactions between target, place voicing
  • After partialing out these factors, how much
    variance does vowel context (V-to-V) account for?

33
Regression F2
1) Raw Data
Male
Female
34
Regression F2
  • 1) Raw Data
  • Partialed Out
  • 2) Subject

?
?
35
Regression F2
  • 1) Raw Data
  • Partialed Out
  • 2) Subject
  • 3) Target Vowel

36
Regression F2
  • 1) Raw Data
  • Partialed Out
  • 2) Subject
  • 3) Target Vowel
  • 4) Consonant

37
Regression F2
  • 1) Raw Data
  • Partialed Out
  • 2) Subject
  • 3) Target Vowel
  • 4) Consonant
  • 5) Interactions

38
Regression F1
39
Regression F1
40
Regression F1
Total R2.884
Post-hoc analysis height only.
41
Regression F1
Total R2.884
Post-hoc analysis height only.
42
Regression F2
43
Regression F2
44
Regression F2
45
Regression F2
Total R2.940
Post-hoc analysis height backness.
46
Regression Summary
Progressively accounting for variance is
powerful F1 88 of variance F2 94 of
variance using only known sources of
variance V-to-V coarticulation is readily
apparent when other sources of variance are
explained.
Effect of V-to-V coarticulation has a similar
size to place/voicing effects.
How useful would this be?
47
Predicting Vowel Identity
  • Multinomial Logistic Regression (MLR)
  • Classification algorithm
  • Predict category membership from multiple
    variables.
  • Categories do not have to be binary

48
Predicting Vowel Identity
  • Multinomial Logistic Regression (MLR)
  • Classification algorithm
  • Predict category membership from multiple
    variables.
  • Categories do not have to be binary
  • Assumes optimal listener.
  • Computes correct.
  • How much well could a listener do under ideal
    circumstances with information provided.

49
Predicting Vowel Identity
60
50
Partialed out Subject Vowel Place Voicing Interact
ions
40
Correct
30
20
10
0
i
?
æ
Same
Vowel
Model does quite well at predicting all vowels
but the identity.
50
Predicting Vowel Identity
51
Predicting Vowel Identity
Does partialing out other sources of variance
improve the utility of V-to-V coarticulation? -
Use linear regression to partial out variance.
- Use F1, F2 residuals to predict vowels.
FULL Partial out everything RAW No
parsing SPEAKER Partial out speaker variation
only. Assume speaker normalization, but no
interactions between consonant, or vowel and
V-to V. VOWEL Partial out effects of everything
heard at the target vowel (speaker
target) NO-SPKR Assume no normalization, but
interactions between consonants.
52
Predicting Vowel Identity
45
43
41
39
37
Correct
35
33
31
29
27
25
FULL
VOWEL
SPEAKER
NO-SPKR
RAW
FULL about 4 better than others. VOWEL parsing
out consonant may not be necessary SPEAKER
Effect of speaker and phonetic cues similar. RAW
V-to-V not useful without some parsing.
53
Predicting Vowel Identity
3) Use residuals to predict context vowel
1) Parse out speaker effects on target
target vowel
consonant
context vowel
preceding context
2) Regressively compensate for consonant
coarticulation
Suggests a 3-stage parsing process to maximally
use V-to-V modifications.
54

Key Questions
  • Extent of phenomenon
  • Word boundaries?
  • Both F1 and F2?
  • Relative strength of V-to-V effects?
  • Usefulness of phenomenon
  • Perceptual inferences?
  • Parsing our variability?

55
Summary Extent
  • Clear evidence for V-to-V coarticulation across
    word boundariesnot lexicalized.
  • V-to-V in both formants (height backness).
  • Strength is similar to that of place and voicing.
  • Known sources of variance (speaker, vowel,
    consonant, V-to-V) can account for most of the
    variability in vowel production.
  • Problem of lack of invariance?
  • Identifying multiple categories at once may be
    easier than identifying one.

56
Summary Usefulness
  • Idealized listener ( parsing) could identify
    upcoming vowel at 40 correct given only V-to-V
    coarticulation.
  • - Near 50 for /i/ and /?/
  • Parsing dramatically improves predictive power of
    V-to-V coarticulation
  • Do you need perfect categorization of variance
    sources (e.g. speaker, target vowel, voicing)?
  • Imperfect categorization enhances need for
    multiple cues.
  • Simultaneously evaluating multiple features (e.g.
    V1, C, V2) yields correct parse.
  • How do you determine the order of parsing?
  • - Temporal order of information arrival?

57
Future Directions
  • How do you identify the components you will be
    parsing?
  • See Toscano poster.
  • Does the model actually describe perception?
  • Parsing is a temporal process.
  • Visual world paradigm to time-course of
    processing (e.g. McMurray, Clayards, Tanenhaus,
    in prep McMurray, Tanenhaus Aslin, 2002
    McMurray, Munson Gow, submitted).
  • Parsing as part of word recognition.
  • Lexical structure can contribute to inferences.
  • Interactive activation models (McClelland
    Elman, 1986) could implement this.

58
Conclusions
  • Where do features come from?
  • Emerge out of progressively accounting for
    sources of variance from signal.
  • Any chunk (segment) of the input can provide
    multiple features.
  • Speaker normalization may work by same process.
  • Why phonologize?
  • Eliminates one step of parsing.
  • How does the system balance need for features
    with utility of fine-grained detail?
  • Features provide tag to parse variance and
    utilize continuous detail.
Write a Comment
User Comments (0)
About PowerShow.com