Title: P1254156881umsRn
1Motorola presents in collaboration with CNEL
Project Golden Voice
Meena Ramani, Lingyun Gu, Kausthub Kale
Introduction
Frequency Independent Beamformer
- Motivation The limitation of traditional
narrowband transmission channel - Advantage Phone line frequency range
300Hz-3400Hz Recovered frequency range
20Hz-8000Hz - Goal Increase the speech intelligibility and
quality by adding artificial high frequency
components - Basic Assumption The high correlation between
the low-frequency and high-frequency components
of the same phonemes - Frequency fold
- GMM algorithm
Speech enhancement for cell phones Use
psychoacoustic and auditory system knowledge to
improve speech loudness and intelligibility
Bandwidth Extension of Telephone Speech
Beamforming is the signal processing technique
which operate on multiple sensor arrays
Types of Beamforming
Frequency Dependent Frequency Independent
- Bandwidth Expansion
- Direction of arrival estimation Beamforming
Motivation
Need for enhanced voice quality
- Conventional Beamformers are all frequency
dependent. - The few Frequency independent beamformers
available work with large(512 microphone) array
systems
Complete mobility under noisy conditions
Ability to identify different speakers in a
conference call
Increase the intelligibility of speech
Novel approach
Constraints
- The algorithm developed at CNEL works on a
narrow baseline (4cm) 2 microphone system - The results are superior to conventional
techniques
Physical constraints
Low software and hardware complexity
Good performance at all frequencies
Improvements in SNR
Improvement in Recognition
Real time operation
Aim
Direction Of Arrival (DOA) estimation
Excitation Regeneration
DOA Requirements
- Differentiate speech source from noise source
- Overcome problems of signal distortion due to
noise - Prevent loss of accuracy due to room
reverberations
Spectral Envelope Regeneration
Results
Signal processed by the algorithm
Speech with babble noise in the background
Method
DOA Algorithm requirements
DOA Method Equation for Implementation
Delay and Sum
Minimum Variance
MUSIC
Coherent MUSIC
Root MUSIC
ESPRIT
Hamming window length 20 ms
LPC order(wideband) 18
LPC order(narrowband) 14
Spectral representation LPC cepstrum
Mixture number (Q) 128
VQ codebook size 128
- Low computational intensity (FLOPS)
- High accuracy (Confidence Interval)
- High speed (Time taken)
- Easy to implement
- Work well at low SNRs
- Work well in a 2 microphone narrow baseline
(4cm) system.
Speech with pink noise in the background
Signal processed by the algorithm
ESPRIT
High Speed and good low SNR performance
Low FLOPS count
Good Accuracy
Tradeoff between Accuracy and Computational
intensity
Improvements in SNR for varying Noise DOA
Plot comparing the MSE for the six different
methods at different SNRs
Performance comparison between Motorola's Noise
suppressor and our algorithm
Comparison of FLOPS for the six different methods
for 10dB SNR
ESPRIT 34501
Outperforms!
Captions to be set in Times or Times New Roman or
equivalent, italic, 18 to 24 points, to the
length of the column in case a figure takes more
than 2/3 of column width.