Title: Precedence-based speech segregation in a virtual auditory environment
1Precedence-based speech segregation in a virtual
auditory environment
- Brungart, Simpson Freyman (2005)
2The Precedence Effect
- Sounds produced in areas with multiple surfaces
give rise to reflections. Many copies of a sound
reach a listeners ears. The direct sound arrives
first. - With complex sounds like speech, early
reflections tend to perceptually fuse with the
direct sound (the Haas effect). - The direct sound dominates localisation the
precedence effect.
- /- 0.5 ms gt summing localisation
lt- Perceived direction
two sources perceived
D gt 1 ms gt precedence effect
D gt 20 ms gt echo threshold
3Masking
- the amount of interference one stimulus can
cause in the perception of another stimulus.
(Yost and Nielsen, 1977) - The elevation in threshold of a target signal due
to the presence of a masker. - Energetic masking
- masking that results from competition between
target and masker at the periphery of the
auditory system, i.e., overlapping excitation
patterns in the cochlea or auditory nerve (AN).
(Durlach et al., 2003) - Informational masking
- Non-energetic masking
- Central masking
- difficulty segregating the audible acoustic
components of the target speech signal from the
audible acoustic components of a perceptually
similar speech masker. (pp. 3241).
4Some Assumptions
- Speech target
- Random noise masker purely energetic masking?
- Speech masker energetic and informational
masking? - So if an experimental manipulation affects the
amount of masking produced by the speech masker
but not the noise masker this is due to a
reduction in informational masking? - Seems reasonable
5The Basic Experiment
Freyman et al., 99 free-field. Brungart et al.
virtual auditory space over headphones
F-F Baseline masking
F-R Release from masking regardless of type of
masker
F-RF Release from masking with speech but NOT
with noise masker
6Experiment 1
Adding delayed copy of noise to front presented
stimulus drops performance to baseline
Adding delayed copy of speech to front hardly
makes any difference
Note using a speech recognition task which is
resistant to energetic masking - Therefore large
informational masking component?
7Interpretation
- The precedence effect causes the listener to
localise the RF masker off to the right, which
helps auditory selective attention attend to the
target speech, hence reducing informational
masking. - This doesnt affect the noise masker because it
has no informational masking effect adding it
to the front just increases its energetic masking
effect. - BUT The effect is also observed when the delay
is negative, so that the first copy of the masker
comes from the front (i.e. F-FR). (Freyman et al.
1999) - Precedence should localise the masker to the
front in this condition so why the release from
masking with a speech masker?
8Experiment 2
- What is the effect of varying the delay between
the two masker presentations between / 64 ms? - For a noise masker?
- Very little.
- Some release from masking at delays which cause
notches in the spectrum of the masker far
enough apart to be resolved by the ear - For a single-speaker speech masker?
- Little effect of delay, positive or negative,
until the echo threshold is exceeded - For a two-speaker speech masker? Much more
variation, but still substantial release from
masking. Possibly some release from energetic
masking effects - Note that as speakers are added, multi-speaker
babble approaches speech-shaped noise.
9A Puzzle
- There is virtually no difference between positive
and negative delays with the single-speaker
masker and not much of an advantage with the
two-speaker masker - What is going on here?
- Two possibilities (actually 3, but Ill come back
to this) - 1) The effect is not based on perceived location,
but on timbre or source width - 2) Even when the copy of the masker added to the
front leads the one from the right, the one to
the right pulls the perceived location off a
little so that it is perceived somewhere between
front and right - If (2) is the case, then shifting the apparent
location of the target to match that of the
masker, should abolish the release from masking
10Experiment 3
Position of target varied from 0o to 60o In 5o
steps, at 7 different delay values from to
4ms.
- U-shaped performance curves for all 3 maskers at
D 0 ms. Masker heard midway between front and
right. - For the two-speaker masker, when there is a lag
(ve D) gt 0.5 ms, subjects do best when target is
located near the front (0o). As expected - When there is a lead (-ve D) gt 0.5 ms, subjects
do best when target is located to the right. - BUT the minimum performance is found around 10o
NOT at 0o
11Conclusions
- This would appear to support the hypothesis
mentioned earlier - BUT why is there not a similar minima around
50o when there is a positive delay? - Also energetic and informational masking do not
seem to have been completely separated by this
paradigm as was first thought - AND no mention is made of the phenomena of the
BMLD - Whenever the phase or level differences of the
target signal at the 2 ears are not the same as
those of the masker, ability to detect or
identify the target improves - Inversion of the signal at one ear gives better
performance than delaying it so not just
segregation by spatial separation - Large BMLDs occur when target and masker are not
subjectively well separated - Hearing is sensitive to the profile of interaural
decorrelation across frequency - This could potentially explain why negative
delays are as useful as positive delays adding
a delayed copy of the masker at the right changes
the interaural correlation of the masker relative
to the target - But this still wouldnt explain the difference
between speech and noise