Title: Two b or not too bee:
1Two /b/ or not too bee Gradient sensitivity
to subphonemic variation, categorical perception
and the effect of task.
Bob McMurray
With thanks to
2Outline
- Invariance, Covariance and Gradient
Sensitivity in speech perception. - Categorical Perception and other previous
research. - Experiment 1 Gradient sensitivity in Word
Recognition - Experiment 2-5 The effect of experimental
task - Targets competitors, gradient sensitivity and
temporal dynamics, - Conclusions
3Problem of Invariance
Phonetic features are correlated with many
acoustic realizations. Acoustic realization of a
phonetic feature depends on context. How do we
extract invariant linguistic representations from
a variable acoustic signal? What properties of
the signal provide an invariant mapping to
linguistic representations? How do we extract
discrete units from a graded signal?
4Problem of Invariance
- Two Solutions
- Motor Theory acoustic invariance does not exist,
but specialized mechanisms allow us to unpack
speech into invariant motor representations
(Liberman Mattingly, 1985 Fowler, 1986). - Acoustic Invariance better computational methods
and neurologically inspired models may find
invariant acoustic properties of the signal
(Blumstein,1998 Sussman et al, 1998)
5Rethinking Invariance
The Fundamental Approach How do we pay
attention to the right parts of the signal and
ignore the variation? However, recent work
suggests that this the variation is actually
highly informative covariation.
6Rethinking Invariance
- In measurements of productions, effects of
- speaking rate on VOT (e.g. Kessinger
Blumstein) - prosodic domain and VOT and articulatory
strength (Fougeron and Keating) - Place of articulation and vowel quality 5
syllables away (Local) - Between-consonant coarticulation (Mann Repp)
- suggest that a system sensitive to fine grained
detail could take advantage of all of this
information.
7Rethinking Invariance
Speech perception shows probabilistic effects of
many information sources Lexical Context
Spectral vs. Temporal Cues Visual Information
Transition Statistics Speech Rate Stimulus
Naturalness Sentential Context Compensatory
Coarticulation Embeddings Syllabic
Stress Lexical Stress Phrasal Stress A system
that was sensitive to fine-grained acoustic
detail might be much more efficient than one that
was not. Tracking covariance may help solve the
problem of invariance.
8What sort of sensitivity is needed?
Gradient Sensitivity As fundamentally graded
acoustic information changes (even changes that
still result in the same category), activation
of lexical or sublexical representation changes
monotonically. Activation of linguistic units
reflects the probability that a that unit is
instantiated by the acoustic signal.
9Categorical Perception
CP suggests listeners do not show gradient
sensitivity to subphonemic information.
100
/p/
0
VOT
P
B
10Evidence for Categorical Perception
- Supported by
- Work on VOT and place of articulation.
- Ubiquity of steep identification functions.
- Recent electrophysiological data (e.g. Philips,
Pellathy, Marantz, Yellin, Wexler, Poeppel,
McGinnis Roberts, 2000 Sharma Dorman, 1999)
11Revisiting Categorical Perception?
Evidence against CP comes from Discrimination
Tasks Pisoni and Tash (1974) Pisoni Lazarus
(1974) Carney, Widin Viemeister
(1977) Training Samuel (1977) Pisoni, Aslin,
Perey Hennessy (1982) Goodness Ratings Miller
(1997) Massaro Cohen, 1983 Only goodness
ratings show any hint of gradiency. No gradient
effects from identification tasks. But, 2AFC
metalinguistic tasks may underestimate
sensitivity to subphonemic acoustic information
12Lexical sensitivity
- Andruski, Blumstein Burton (1994)
- Created stimuli that were either voiceless, 1/3
or 2/3 voiced. - 2/3 voiced stimuli primed semantic associates
more weakly than fully voiceless or 1/3 voiced
tokens - First demonstration of lexical sensitivity to
natural variation in consonants. - However
- 2/3 voiced stimuli were close to category
boundary. - No evidence for gradiencydifference between 2
items. - Hard to interpret temporal dynamics in priming
tasks.
13Remaining Questions
- Is sensitivity to subphonemic differences
gradient? - Is it symmetrical (I.e. gradiency on both sides
of category boundary)? - Are differences preserved long enough to be
usefully combined with subsequent input? - Perhaps a more sensitive measure.
14Eye-Tracking
250 Hz realtime stream of eye positions. Parsed
into Saccades, Fixations, Blinks, etc Head
movement compensation. Output in screen
coordinates.
15Eye-Tracking
- Fixations to object in response to spoken
instructions -
- are time locked to incoming information
(Tanenhaus, Spivey-Knowlton, Ebehart and Sedivy,
1995) - can be easily mapped onto lexical activation from
models like TRACE (Allopenna, Magnuson and
Tanenhaus, 1998) - show effects of non-displayed competitors (Dahan,
Magnuson, Tanenhaus Hogen) - provide a glimpse at how activation for
competitors unfolds in parallel over time.
16Experiment 1 Lexical Identification too bee
Can we use eye-tracking methodologies to find
evidence for graded perception of VOT?
17Experiment 1 Lexical Identification
Six 9-step /ba/ - /pa/ VOT continuum
(0-40ms) Bear/Pear Beach/Peach Butter/Putter B
ale/Pale Bump/Pump Bomb/Palm 12 L- and Sh-
Filler items Leaf Lamp Ladder Lock Lip Leg
Shark Ship Shirt Shoe Shell Sheep Identific
ation indicated by mouse click on picture Eye
movements monitored at 250 hz 17 Subjects
18Experiment 1 Lexical Identification
A moment to view the items
19Experiment 1 Lexical Identification
500 ms later
20Experiment 1 Lexical Identification
Bear
21Experiment 1 Identification Results
proportion /p/
VOT (ms)
B
P
High agreement across subjects and items for
category boundary
By subject 17.25 /- 1.33ms By item 17.24 /-
1.24ms
22Analysis of fixations
Trials with low-frequency response excluded.
proportion /p/
VOT (ms)
B
P
23Analysis of fixations
Trials
1 2 3 4 5
200 ms
Target bug Competitor bus Unrelated cat,
fish
Time
24Experiment 1 Eye Movement Results
VOT0 Response
VOT40 Response
Fixation proportion
Time (ms)
More looks to competitor than unrelated items
25Analysis of fixations
Gradient competitor effects
- e.g. Given that
- the subject heard bomb
- clicked on bomb
How often was the Subject looking at the palm?
Categorical Results
Gradient Effect
target
target
Fixation proportion
Fixation proportion
competitor
competitor
time
time
26Experiment 1 Eye Movement Results
Gradient competitor effects of VOT?
Response
Response
VOT
VOT
0 ms
5 ms
Fixation proportion
Time since word onset (ms)
Smaller effect on the amplitude of
activationmore effect on the duration
Competitors stay active longer as VOT
approaches the category boundary.
27Experiment 1 Gradiency?
Looks to
Looks to
Fixation proportion
VOT (ms)
28Experiment 1 Eye Movement Results
Response
Response
Looks to
Fixation proportion
Looks to
Category Boundary
VOT (ms)
29Experiment 1 Eye Movement Results
Response
Response
Looks to
Fixation proportion
Looks to
Category Boundary
VOT (ms)
Unambiguous Stimuli Only
30Experiment 1 Results and Conclusions
Subphonemic acoustic differences in VOT affect
lexical activation.
- Gradient effect of VOT on looks to the
competitor.
- Effect holds even for unambiguous stimuli.
- Effect seems to be long-lasting (well get back
to that).
Conservative Test
- Filter out incorrect responses.
- Use unambiguous stimuli only.
31However
Why was it so hard to find evidence for gradiency
in CP tasks?
Steep identification function
consistently replicated.
- What aspects of the task affect our ability to
see gradient sensitivity? - Phoneme ID vs. Lexical ID?
- Number of Alternatives?
- Type of Stimuli?
- Sensitivity of response measure
32Experiment 2 Categorical Perception 2 /b/, not
too bee
What can the eye-tracking paradigm reveal about
ordinary phoneme identification experiments?
33Experiment 2 Categorical Perception
Replicates classic task 9-step /ba/ - /pa/
VOT continuum (0-40ms) 2AFC Identification
indicated by mouse click. Eye movements
monitored at 250 hz. 17 Subjects
34Experiment 2 Categorical Perception
35Experiment 2 Identification Results
proportion /p/
VOT (ms)
B
P
36Experiment 2 Data Analysis
Proportion of /p/ response
VOT (ms)
Trials with low-frequency response
excluded. Effectively yields a perfect
categorization function.
37Experiment 2 Eye movement data
Response B
Response P
VOT
VOT
Fixation proportion
Looks to B
Looks to P
Time (ms)
- Some hints of gradiency for /p/. Even less for
/b/. - Difference between stimuli near boundary and
endpoints. - Perhaps more for /p/.
38Experiment 2 Eye movement data
/b/ p .044 ptrend.055 /p/ plt.001 ptr
end.005 Could be driven by differences near
category boundary.
39Experiment 2 Eye movement data
ResponseB Looks to P
ResponseB Looks to P
Fixation proportion
Category Boundary
VOT (ms)
Unambiguous Stimuli Only
/b/ p .884 ptrend.678 /p/ p
.013 ptrend.003
40Experiment 2 Results and Conclusions
- Very steep slope for mouse response curves.
- consistent with traditional results
-
- Identical category boundary to experiment 1
- validates stimuli
- Small difference between stimuli near category
boundary and others. - similar to Pisoni Tash, Andruski, et al.
-
- Gradient effect weak for /ba/, moderate for /pa/
41Experiment 3 Number of Response Alternatives Not
2 but /b/?
compare to experiment 2 (BaPa)
42Experiment 3 BaPaLaSha
Given the strong evidence for gradiency in
Experiment 1 and the weaker evidence in
Experiment 2, what is the effect of number of
response alternatives?
- Same 9-step /ba/ - /pa/ VOT continuum (0-40ms)
as experiment 2. - La and Sha filler items added.
- 4AFC Identification indicated by mouse click.
Button locations randomized between subjects. - Eye movements monitored at 250 hz.
-
- 17 Subjects
43Experiment 3 BaPaLaSha
44Experiment 3 Identification Results
proportion /p/ response
VOT
Number of response alternatives accounts for some
of the difference in slope.
45Experiment 3 Data Analysis
Proportion of /p/ response
VOT (ms)
Trials with low-frequency response
excluded. Effectively yields a perfect
categorization function.
46Experiment 3 Eye movement data
VOT40 ResponseP
VOT0 Responseb
Fixation proportion
Time (ms)
- More looks to competitor than unrelated stimuli
(plt.001). - Eye movements in phoneme ID tasks are
sensitive - to acoustic similarity.
47Experiment 3 Eye movement data
Response B
Response P
VOT
Fixation proportion
Looks to P
Looks to B
Time (ms)
Difference between stimuli near boundary and
endpoints
48Experiment 3 Eye movement data
Fixation proportion
ResponseB Looks to P
ResponseB Looks to P
Category Boundary
VOT (ms)
Close but no star Nothing reaches
significance /b/ p.055 ptrend.068 /p/ p
.510 ptrend.199
49Experiment 3 Eye movement data
Fixation proportion
ResponseB Looks to P
ResponseB Looks to P
Category Boundary
VOT (ms)
Unambiguous Stimuli Only even worse /b/ p.37
4 ptrend.419 /p/ p.356 ptrend.151
50Experiment 3 Results
Eye movements in phoneme ID tasks are sensitive
to acoustic similarity between target and
competitor. Number of alternatives explains some
of differences in ID function. VERY weak
subphonemic effects on lexical activation.
51Experiment 4 Response Type too /b/
Is there a difference between phoneme and lexical
identification tasks? compare to experiment 1
(words)
52Experiment 4 Response Type
- Same 6 VOT continua (0-40ms) as experiment 1
- beach/peach bear/pear bomb/palm
- bale/pail bump/pump butter/putter
-
- Same 12 L- and Sh- filler items.
- 4AFC phoneme identification indicated by mouse
click. Button locations randomized between
subjects. - Eye movements monitored at 250 hz.
- 17 Subjects
53Experiment 4 Response Type
54Experiment 4 Identification Results
proportion /p/ response
VOT
Similar category boundary and slope to Exp 1
Exp 1 17.25 /- 1.33ms Exp 4 16.34 /-
1.52ms
55Experiment 4 Eye movement data
Response B
Response P
VOT
VOT
Fixation proportion
Looks to P
Looks to B
Time (ms)
Small differences in the right direction
56Experiment 4 Eye movement data
ResponseB Looks to B
ResponseP Looks to B
Fixation proportion
Category Boundary
VOT (ms)
Gradient effects using the whole range of
stimuli /b/ plt.001 ptrend.002 /p/ p.001
ptrend.031
57Experiment 4 Eye movement data
ResponseB Looks to B
ResponseP Looks to B
Fixation proportion
Category Boundary
VOT (ms)
Marginal effects using unambiguous stimuli
only. /b/ p.074 ptrend.074 /p/ p.137 pt
rend.108
58Experiment 4 Results
Weaker subphonemic effect suggests that offline
metalinguistic tasks are less sensitive to
fine-grained phonetic detail than online
tasks. Some detail is preserved in these tasks
(at least with word stimuli)
59Experiment 5 2AFC Words 2 bee
Bringing it all together
60Experiment 5 2-Words
Is the difference in ID curve slopes purely the
result of number of response alternatives or does
task play a role?
- Same 6 VOT continua (0-40ms) as experiment 1
- beach/peach bear/pear bomb/palm
- bale/pail bump/pump butter/putter
-
- 0 filler items.
- 2AFC phoneme identification indicated by mouse
click. - Eye movements monitored at 250 hz.
61Experiment 5 Task
62Experiment 5 Identification Results
proportion /p/ response
VOT
B
P
Similar category boundary and slope to Exp 1
Exp 1 17.25 /- 1.33ms Exp 5 16.18 /-
1.74ms
63Experiment 5 Eye movement data
Response
Response
VOT
VOT
Fixation proportion
Time (ms)
Clean, but small, gradient effects for
/p/ Effects for /b/ near the category boundary.
64Experiment 5 Eye movement data
Category Boundary
Fixation proportion
VOT (ms)
Gradient effects using the whole range of
stimuli /b/ plt.001 ptrend.005 /p/ p.017
ptrend.026
65Experiment 5 Eye movement data
Category Boundary
Fixation proportion
VOT (ms)
Weaker effects using the prototypical
range /b/ plt.443 ptrend.802 /p/ p.044
ptrend.052
66Experiment 5 Results
Shallow ID curve slope suggests 2AFC alone is not
enough to create steep slope 2AFC- phoneme ID is
needed. Weaker gradient effects fixed response
locations and no filler items make this task more
explicit?
67Trying to make sense out of it all Being and
Nothingness?
68Slope of ID Function
BP gt BaPaLaSha gt all others (plt.05) Words Exp
4 2 Words (pgt.1) 2AFC results in less
sensitivity (in ID function) than 4AFC for
non-word stimuli.
69Gradient Effect across experiments
All stimuli
Without stimuli near c.b.
70Pooled eye movement data
Category Boundary
Fixation proportion
ResponseP Looks to B
ResponseB Looks to P
VOT (ms)
Overall B pvotlt.001 ptrendlt.001 pvot x
expgt.15 P pvotlt.001 ptrendlt.001 pvot x expgt.2
71Pooled eye movement data
Category Boundary
Fixation proportion
ResponseP Looks to B
ResponseB Looks to P
VOT (ms)
Without stimuli near category boundary B
pvot.005 ptrendlt.019 pvot x expgt.1 P
pvotlt.001 ptrendlt.001 pvot x expgt.2
72Conclusions on Task Manipulations
Identification Functions
Phoneme ID tasks with non-words yield the
sharpest categorization functionsmay mask
subphonemic sensitivity.
Even within these tasks, the number of response
alternatives makes a big difference.
73Conclusions on Task Manipulations
Competitor Effects (eye-movements)
Natural, 4AFC lexical identification provides
cleanest evidence for gradiency (measured by
fixations to the competitor) for both /p/ and /b/
halves of continuum.
All experiments offer evidence of subphonemic
sensitivity when we include stimuli near the
category boundary.
Eye-movements provide much more sensitive measure
for assessing the role of fine-grained phonetic
detail.
74Conclusions on Task Manipulations
Competitor Effects (eye-movements)
No one factor seems to account for presence or
absence of gradient effect.
Most experiments showed weak evidence for
gradient effect, but larger effects for /p/ than
/b/.
- Differences in the variance of the distribution
of /b/ and /p/ in the learning environment?
(Lisker Abramson, Gerken Maye) -
- Auditory locus? Double peaked firing in auditory
cortex differs shows more VOT sensitivity to
voiceless than voiced stops. (Steinshneider et
al Sharma et al)
75Targets and Competitors, Gradient Effects and
Temporal Dynamics
and a return to experiment 1
76Targets and Competitors
Why look at exclusively at the competitor? Do
subphonemic differences affect activation of the
target? Andruski et al suggests it does.
77Experiment 1 Target Activation
Response
Response
VOT
VOT
Fixation proportion
Time (ms)
Target effects much weaker, even in Experiment
1 May be limited to range near category boundary.
78Experiment 1 Target Activation
Category Boundary
Fixation proportion
Response Looks to
Response Looks to
VOT (ms)
Overall B pvot.035 ptrend.103 P
pvotlt.001 ptrendlt.010
79Experiment 1 Target Activation
Category Boundary
Fixation proportion
Response Looks to
Response Looks to
VOT (ms)
Unambiguous Stimuli Only B pvot.44 ptrend.98
P pvot.33 ptrend.22
80Target Activation Conclusions
- Target sensitivity to subphonemic differences is
carried by differences between ambiguous and
prototypical stimuli. - Consistent with previous research
- Andruski et al 2/3 voicing is close to ambiguous
region (27 ms). - Pisoni Tash increased RT near boundary.
81Target Activation Conclusions
- Gradient sensitivity to subphonemic differences
is stronger in competitor activation that target
activation. - Consistent with Misuirski, Blumstein, Rissman
and Berman (in press) - This makes sense
- Degrading target activation isnt likely to be
helpful in word recognition. - Augmenting competitor activation could be very
helpful.
82Gradiency and Time
- Phonetic context in speech perception isnt
simultaneous. - Rate information (vowel length) arrives after
consonant. - Coarticulation occurs across multiple segments.
- Lexical information has a large scope than
phonetic information. - Simply tracking graded acoustic features is not
enough. - Graded activation of lexical or sublexical units
must persist over time to be integrated.
83Temporal ambiguity resolution
The lexical/phonetic identity of a segment can be
determined by acoustic features that arrive after
the segment in question.
The ambiguous first consonant of
rown
is clearly a /b/ after hearing rown
Thus, like in higher level language
comprehension, temporal ambiguity resolution is
an important issue.
84Temporal ambiguity resolution
- Lexical/Phonetic Temporal Ambiguity can be caused
by - Vowel length (cue to speaking rate and stress)
- Lexical/Statistical effects
- Embedded words
- Subphonemic sensitivity can minimize or eliminate
the effects of temporary phonetic ambiguity by - Storing how ambiguous a segment is
- Keeping competitors active until resolution
occurs.
85Experiment 1 Effect of Time?
- How long does the gradient sensitivity to VOT
remain? - Need to examine
- the effect of time on competitor fixations
- interaction with VOT
86Experiment 1 Effect of time?
Response
Response
VOT
VOT
0 ms
5 ms
Fixation proportion
Time since word onset (ms)
Time course data suggests that gradiency is
sticking around at least 1600 milliseconds after
syllable onset.
87Experiment 1 Effect of Time?
Analysis
- Randomly sorted trials into two groups (early
and late).
- Ensures independence of data in each time-bin
(since each trial only contributes to one)
88Experiment 1 VOT x Time
Response
Response
Fixation proportion
Looks to
Looks to
Category Boundary
VOT (ms)
Main effect of time /b/ p.001 /p/
p.0001
Main effect of VOT /b/ p.015 /p/ p.001
Linear Trend for VOT /b/ p.022 /p/ p.009
No Interaction pgt.1
89Experiment 1 VOT x Time
Response
Response
Fixation proportion
Looks to
Looks to
Category Boundary
VOT (ms)
Main effect of time /b/ p.001 /p/
p.0001
Main effect of VOT /b/ p.006 /p/ p.013
Linear Trend for VOT /b/ p.0012 /p/ p.02
No Interaction pgt.1
90Finally some conclusions
Lexical activation exhibits gradient effects of
subphonemic (VOT) variation.
Effect is robust and long-lastingcould
potentially be very helpful for resolving
temporal ambiguity and integrating information
over time.
Effect of subphonemic variation is stronger for
competitors than targets.
91Finally some conclusions
Experimental task is crucial to see sensitivity
more responses less metalinguistic more
gradiency.
ID Functions influenced by type of stimuli (e.g.
words/nonwords) as well as number of response
alternatives. Realistic tasks more gradient ID
functions.
92Finally some conclusions