Title: mh presentation template
1Front-end Audio Processing Reflections on
Issues, Requirements, and Solutions
Tomas Gaensler mh acoustics www.mhacoustics.com
Summit NJ/Burlington VT USA
2Front-end Audio Processing
- Processing to enhance perceived and/or measured
sound quality in communication and recording
devices
Then
Now
3Not So Famous Quotes (Acoustic Jewelry/Bluetooth
Headset)
- Gary Elko (mh/Bell labs colleague)
- At IWAENC 1995 Acoustic Echo cancellation will
not be needed in the future when people wear
acoustic jewelry - Arno Penzias (1978 Nobel prize laureate)
- No one would want acoustic jewelry because
people would think the users talking to
themselves are crazy - Im glad the success of Bluetooth headsets show
that both were completely wrong!
4Classical Front-end Architectures - POTS
5Classical Front-end Architectures Cellphone 1995
6Classical Front-end Architectures Cellphone
2005 - 2010
7Cellphones and Handsfree
- Common problems
- Far-end listener does not hear near-end talker
- Near-end listener does not understand far-end
talker - Why?
- Form factor Size
- Limited understanding of physics and acoustics(?)
8RX/TX Levels, Coupling and Doubletalk
- Echo louder than near-end
- Linear AEC
- ERLE ? 20-30 dB
- After cancellation Residual Echo to Near-end
Ratio (RENR) - RENR ? 90-20-70 0 dB
Far-end ? 95100 dBSPL at loudspeaker
- gt20 dB of residual echo suppression required
- Duplexness suffers
Near-end talker ? 5570 dBSPL at mic
9TX Dynamic Range and Noise
- Echo 90 dBSPL ? Peak echo ?105-110 dB
- No saturation of echo in TX path
Echo Level 90 dBSPL
Near-end speech Level 70 dBSPL
10TX Fixed-point Processing and Quantization Noise
Q-noise increases by 6log2(N) dB!
- N64 ? Q-noise increases by 36 dB
- Double-precision required
11RX Dynamic Range and Distortion
Digital gain
Analog gain
To AEC
- Small loudspeakers have rather high cut-off
frequency (high-pass) - EQ often required to get acceptable sound
(frequency response). However EQ means - Loss of signal loudness and dynamic range
- Increased (analog) distortion
- Many manufacturers compensate the loss of signal
level by excessive digital gain and therefore get
(digital) saturation
12What Can or Should be Done?
- Minimize acoustical coupling by good physical
design - TX
- Use noise suppression but not excessively
- Double-precision, block scaling, or
floating-point - RX
- Compression instead of fixed gain
- 10 or less loudspeaker/driver THD is desired
13What about Non-linear AEC Algorithms?
- Interesting problem proposed and worked on for
many years - Not practical in most AEC applications since
- Complicated model
- Gain and therefore saturation possibly in both TX
and RX paths - Added complexity and system cost
- Often slow convergence
- Difficult to fine-tune in field
- Even when non-linear cancellation works
perfectly, the user still perceives a distorted
loudspeaker signal!
14Classical Front-end Architectures Cellphone
2005 - 2010
15Single Channel Noise Suppression
- Basic single channel noise suppressor
- An extremely successful signal processing
invention by Manfred Schroeder in the 1960s - Musical tones is it a (solved) problem?
- How do we evaluate and improve quality?
- How about convergence rate?
16Background to Single Channel Noise Suppressors
enhanced speech
NS
speech
noise
- Block processing
- Frequency domain model
- Linear Time-varying filter
- Wiener filter
17Background to Single Channel Noise Suppressors
- Estimation of spectra is often done recursively
- Frequency smoothing
, when speech is not present
18Musical Tones Is it a (Solved) Problem?
- Examples
- Original (Sally Sievers reel, June-Sept. 1964
by Manfred Schroeder and Mohan Sondhi at Bell
Labs) - Original noise (iSNR 6 dB)
- Schroeder 1960s
- Generic spectral subtraction Boll 1979
- IS-127 1995
- A problem of last century, only a constraint in
design - Controlling variance of suppression gains
- Any NS algorithm should be constrained not to
have musical tones - Must only have a small impact on voice quality
19Quality Metrics
- Most importantly Listen!
- SNR
- Total
- Segmental
- During speech
- Distortion metrics
- ISD (Itakura-Saito distance)
- ITU-T P.862 PESQ/MOS-LQO
20Quality Metric P.862 (PESQ/MOS-LQO)
- MOS-LQO (MOS Listening Quality Objective)
- Alg-1/2 Wiener methods with 12 dB noise
suppression
- What can the best noise suppressor achieve?
21Quality Metric My Rule of Thumb
- Ideal MOS (PESQ) performance bound is given by
shifting the unprocessed PESQ-curve to the left - Example for 12 dB suppression
- 12 dB shift to the left
12 dB
22Convergence Rate
- Important performance criterion
- Non-stationary noise conditions
- Frame loss
- Main objective
- Maximize convergence rate while maintaining
speech quality
23Convergence Rate A Useful Test
- Input sequence
- IS-127
- Wiener Based
- A spectral subtraction m-script retrieved from
the internet
24Convergence Rate and MOS-LQO
- Normal
- Fast
- MOS-LQO
25Current Applications and Drivers of NS Technology
- Where is NS going in industry now?
- Beyond 12 dB of suppression
- Multi-microphone solutions
- Two- or more channel suppressors
- Linear beamforming
- Applications
- Mobile phones (a few two-microphone models have
reached the market) - Bluetooth headsets great "new" application for
signal processing (Ericsson BT headset 2000)
26Background to Linear Beamforming
- N Number of microphones
- Broadside linear beamforming (e.g. delay-sum)
- Directional gain 10log(N)
- White Noise Gain (WNG)gt0
- Practical size large (30cm)
- Endfire differential beamforming
- Directional gain 20log(N)
- WNGlt0
- Practical size small (1.5-5cm)
? Differential beamformers more suitable for
small form-factors
27Background to Linear Beamforming
- What do we gain?
- Less reverberation (increased intelligibility)
- Less (environmental) noise
- No (or low) distortion on axis
- Possible interference rejection by spatial
zero(s) - Some Issues
- Performance is given by critical distance!
- Increase in sensor noise (WNG, differential
beamforming)
28Beamforming Critical Distance
- Critical distance (Reverberation radius)
reverberant-to-direct path energy ratio is 0 dB - DI Directivity Index gain of direct to
reverberant energy over an omni-directional
microphone - Order of finite differences used. 1st ?2 mics,
2nd ?3 mics etc)
Order DI dB
0 0
1 6 2.0
2 9.5 3.0
3 12 4.0
29First-Order Differential Beamforming
30Classical First-Order Beamformer Responses
Cardioid
Hypercardioid
Dipole
31Beamforming Demo DEWIND? processing