Title: Some Aspects of Wideband Speech in Enterprise Telephony
1Some Aspects of Wideband Speech in Enterprise
Telephony
- Eric J. Diethorn (ejd_at_avaya.com)
- with
- Gary W. Elko (gwe_at_avaya.com) and
- Joseph L. Hall (jhall01_at_avaya.com)
- Avaya, Inc.
- Avaya Labs, Research
- 233 Mt. Airy Road,
- Basking Ridge, New Jersey 07920 USA
2Outline
- Physical acoustics
- Echo
- Voice coders
- Conferencing
- Wideband speech intelligibility
- Hallway demonstration Avaya SIP Softphone
3Some introductory thoughts
- Wideband speech telephony will instantaneously
raise the bar of end-user expectation, at least
for some applications. - Skype
- We have standards for the reproduction of
wideband speech, but is wider-band good enough? - Maybe 150, 5000 is good enough?
- With greater bandwidth comes a greater range of
potential artifacts that the acoustical-signal-pro
cessing engineer must address. - Low-frequency acoustic echo, earpiece hiss,
speech-coder distortion, arbitration of multiple
sampling rates. - The preferences of end users are uncertain.
- Speech bandwidths policies (buddy lists,
profiles)? - Suppose I have a physiological speech impediment.
Do I want it emphasized?
4Physical acoustics
- The physical design of terminal acoustics must
change to render wideband speech. - Acoustical signal processing changes, too.
5Loudspeakers enclosures
- Frequency response,
- traditional narrowband speakerphone,
- 80 dB-SPL50 cm
Sound Pressure Level (dB)
Frequency (Hz)
6Loudspeakers enclosures
- Total harmonic distortion,
- traditional narrowband speakerphone,
- 80 dB-SPL50 cm
- High distortion at low frequency end of
wideband-speech spectrum - Acoustic echo control difficult if not impossible
without acoustical modifications.
THD at harmonics ()
Frequency (Hz)
7Earpieces
- Frequency response, wideband handset
Sound Pressure Level (dB)
Frequency (Hz)
- In order to satisfy wideband standards,
acoustical modifications are necessary to extend
the low-frequency response of most earpiece
designs. - This is particularly challenging for physical
arrangements in which the earpiece is held to the
ear with little pressure.
8Microphones
- Most low-cost electret microphones used today
have a frequency response that is practically
flat beyond the range of wideband speech they
are wideband ready. - Multiple microphone arrangements arrays can
be exploited to reduce the level of ambient noise
at frequencies not present in traditional
narrowband telephony. - Low-frequency rumble.
- High-frequency hiss.
- Short-time spectral modification methods of noise
reduction can help, but the perception of
artifacts from such processing is enhanced by the
wider speech band.
9Microphones
Front of phone
Front of phone
- Omnidirectional microphone (traditional)
- Good pick-up of talkers in all directions
- But, picks-up ambient noise from all directions
- Directional microphone
- Reduces off-axis noise
- Reduces reverberation of talkers voice
- Reduces coupling from speakerphone (helping AEC)
- But, talkers off axis cant be heard well.
10Echo
- Requirements on echo control may change.
- The art of echo control must evolve to meet the
challenge of wideband speech.
11Requirements on Talker Echo
- Roundtrip, mouth-to-ear, echo loss requirements
were measured on populations for narrowband
speech. How well do these data apply to wideband
speech echo paths?
Percent Good-or-Better
Acoustic-to-acoustic echo-path loss (dB)
Echo annoyance as a function of roundtrip,
mouth-to-ear loss and delay, for narrowband
speech.
- Source Transmission Systems for Communications,
Bell Telephone Laboratories, Inc., 5th Edition,
1982.
12Talker Echo, Continued
- Being strictly digital, wideband-speech network
paths do not suffer from analog circuit noises,
however, analog and environmental noises enter
calls at the endpoint. Should requirements on
talker echo incorporate such (wideband) noise
phenomena?
Echo annoyance as a function of roundtrip,
mouth-to-ear echo-and-noise loss. Long-haul
(1000 mi.) PSTN connection, circa 1980.
- Source Transmission Systems for Communications,
Bell Telephone Laboratories, Inc., 5th Edition,
1982.
13Wideband speech coding
- G.722, G.722.1 and G.722.2
- G.722 is cheap.
- G.722.1 often comes with video-on-the-enterprise
(Polycom). - Proprietary codecs
- Silicon solution providers have their favorites.
Some are pretty good. - Linear 16-bit encoding?
- Speech-transmission bandwidth (bits-per-second)
is becoming a non-issue in the enterprise, at
least for wired LANs. - Architecturally appealing within the enterprise.
Let boundary gateways worry about transcoding.
14Multirate audio conferencing
- Rate arbitration
- Transcoding
- Multirate mixing
- (Artificial) bandwidth extension
Conference bridge server
Wide- and narrow-band speech
IP-1
PSTN
narrowband speech
Leased WAN (compressed speech, e.g., G.729, G.726)
IP-2
15Stereo audio conferencing
Hands-free, wideband-speech communications with
stereo echo cancellation
echo
g1
h1
talker
ROOM 1
-
ROOM 2
16Stereo Conferencing
(Placeholder, video demonstration)
17Wideband speech intelligibility
- Siemens wideband transmissions can reduce
speech ambiguities by as much as 90 percent,
increasing conversational intelligibility and
reducing listener fatigue. (2003 press release)
- Polycom For single syllables, 3.3 kHz
bandwidth yields an accuracy of only 75 percent,
as opposed to over 95 percent with 7 kHz
bandwidth. (2003 white paper) - Marketing vs. science both required
18Experimental study
- Similar to Diagnostic Rhyme Test and Diagnostic
Alliteration Test , except we generated our own
word pairs - e. g., tie pie (hot hop)
- Subject hears one of the two, is shown both, is
asked Which of these two did you hear? - Clean anechoic speech filtered to 3 bandwidths
- 50,3300, 50,5000 and 50,7000 Hz.
- Investigate all nine combinations of three
bandwidths and three additive-noise levels (0 dB,
12 dB, 24 dB SNR). - Reference G.A. Miller and P.E. Nicely, An
analysis of perceptual confusions among some
English consonants Lincoln Laboratory, MIT, 1955
(J. Acoust. Soc. Amer. Vol. 27, pp. 338-352)
For questions concerning aspects of this study,
contact Joseph L. Hall, Avaya Research,
jhall01_at_avaya.com
19What do they sound like?
- Seed, feed, seed at different bandwidths and
additive noise levels.
20Representative results
21Summary of results
22Hallway Demonstration -- Avaya widebandSIP
softphone
- Wideband speech (16 kHz sampling, bandwidth
limited by PC sound architecture). - Voice codecs
- G.711, G.729, G.726
- G.722