Title: How to deal with the noise in real systems?
1How to deal with the noise in real systems?
- Hsiao-Chun Wu
- Motorola PCS Research and Advanced Technology
Labs, Speech Laboratory - richardw_at_srl.css.mot.com
- Phone (815) 884-3071
2Why do we need to study noise?
- Noise exists everywhere. It affects the
performance of signal processing in reality.
Since the noise cannot be avoided by system
engineers, modern noise-processing technology
has been researched and designed to overcome this
problem. Hence many related research areas have
been emerging, such as signal detection, signal
enhancement/noise suppression and channel
equalization.
3How to deal with noise? Cut it off!!!!
- Spectral Truncation
- Spectral Subtraction (1989)
- Time Truncation
- Signal Detection
- Spatial and/or Temporal Filtering
- Equalization
- Array Signal Separation (Blind Source Separation)
4Session 1. On-line Automatic End-of-speech
Detection Algorithm (Time Truncation)
- 1. Project goal.
- 2. Review of current methods.
- 3. Introduction to voice metric based
end-of-speech detector. - 4. Simulation results.
- 5. Conclusion.
51. Project Goal
- Problem
- Digit-dial recognition with unknown digit string
length - Solution 1
- fixed length window such as 10 seconds?
(inconvenience to users) - Solution 2
- Dynamic termination of data capture? (need a
robust detection algorithm)
6- Research and design a robust dynamic termination
mechanism for speech recognizer. - a new on-line automatic end-of-speech detection
algorithm with small computational complexity. - Design a more robust front end to improve the
recognition accuracy for speech recognizers. - a new algorithm can also decrease the excessive
feature extraction of redundant noise.
72. Review of Current Methods
- Most speech detection algorithms can be
characterized into three categories. - Frame energy detection
- short-term frame energy (20 msec) can be used for
speech/noise classification. - it is not robust at large background noise
levels. - Zero-crossing rate detection
- short-term zero-crossing rate can also be used
for speech/noise classification. - it is not robust in a wide variety of noise
types. - Higher-order-spectral detection
- short-term higher-order spectra can be used for
speech/noise classification. - it implies a heavy computational complexity and
its threshold is difficult to be pre-determined.
83. Introduction to Voice Metric Based
End-of-speech Detector
- End-of-speech detection using voice metric
features is based on the Mel-energies. Voice
metric features are robust over a wide variety of
background noise. Originally voice metric based
speech/noise classifier was applied for IS-127
CELP speech coder standard. We modify and enhance
voice-metric features to design a new
end-of-speech detector for Motorola voice
recognition front end (VR LITE III).
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15voice metric score table
16Original VR LITE Front End
Pre-S/N Classifier
Voice Metric
Mel- Spectrum
SNR Estimate
raw data
FFT
voice metric scores
no
Threshold Adaptation
Silence Duration Threshold
Post-S/N Classifier
EOS Buffer
Speech Start?
yes
data capture stops
End-of-speech Detector
17segmentation of speech into frames
front end with end-of-speech detector
speech input
end of speech?
yes
frame i
next frame i1
no
data capture terminates
18String 2-2-9-1-7-8 in Car 55 mph
raw data
6.51 seconds
end point
3.78 seconds
detected end point
4.81 seconds
19String 2-2-9-1-7-8 in Car 55 mph
false detection time error
correct detection time error
End point
Correct detection
False detection
seconds
204. Simulation Results (Simulation is done over
Motorola digit-string database, including 16
speakers and 15,166 variable-length digit strings
in 7 different conditions. Silence threshold is
1.85 seconds.)
- A. Receiver Operating Curve (ROC) ROC curve is
the relationship between the end-of-speech
detection rate versus the false (early) detection
rate. We compare two different methods, namely,
(1) new voice-metric based end-of-speech detector
and (2) old speech/noise flag based end-of-speech
detector.
21ROC curve
detection rate ()
false detection rate ()
22- B. String-accuracy-convergence (SAC) curve SAC
curve is the relationship between the string
recognition accuracy versus the false (early)
detection rate. We compare two different methods,
namely, (1) new voice-metric based end-of-speech
detector and (2) old speech/noise flag based
end-of-speech detector.
23SAC curve
string recognition accuracy ()
false detection rate ()
24C. Table of detection results (This table
illustrates the result among the Madison
sub-database including data files with 1.85
seconds or more of silence after end of speech.)
25(This table illustrates the result over the small
database collected by Motorola PCS CSSRL. All
digits strings are recorded in 15 seconds of
fixed window)
Condition Average Time Error Average False Detection Time Error Average Correct Detection Time Error False Detection Rate String Numbers Total Detection Rate String Recognition Accuracy (w/i EOS) String Recognition Accuracy (w/o EOS)
Overall 1.82 seconds 0 seconds 1.82 seconds 0 121 96.69 50.41 29.75
Office Close-talk 1.85 seconds 0 seconds 1.85 seconds 0 21 100 66.67 61.90
Office-Arm-length 1.84 seconds 0 seconds 1.84 seconds 0 20 100 65.00 65.00
Café Close-talk 1.76 seconds 0 seconds 1.76 seconds 0 40 100 40.00 15.00
Café Arm-length 1.85 seconds 0 seconds 1.85 seconds 0 40 90 45.00 10.00
26Analysis of the Simulation Result Why didnt EOS
detection work well in babble noise?
27Optimal Detection Decision
- Bayes classifier
- Likelihood Ratio Test
28Digit one in close-talking mic, quiet office
29Digit one in handsfree mic, 55 mil/h car
30Digit one in far-talking mic, cafeteria
315. Conclusion
- New voice-metric based end-of-speech detector is
robust over a wide variety of background noise. - Only a small increase in the computational
complexity will be brought by new voice-metric
based end-of-speech detector and it can be
real-time implementable. - New voice-metric based end-of-speech detector can
improve recognition performance by discarding
extra noise due to the fixed data capture window. - New voice-metric based end-of-speech detector
needs further improvement in the babble noise
environment.
32Session 2. Speech Enhancement Algorithms Blind
Source Separation Methods (Spatial and Temporal
Filtering)
- 1. Motivation and research goal.
- 2. Statement of blind source separation
problem. - 3. Principles of blind source separation.
- 4. Criteria for blind source separation.
- 5. Application to blind channel equalization for
digital - communication systems.
- 6. Simulation and comparison.
- 7. Summary and conclusion.
331. Motivation
- Mimic human auditory system to differentiate the
subject signals from other sounds, such as
interfered sources, background noise for clear
recognition of the subject contents. - One of the most striking facts about our ears is
that we have two of them--and yet we hear one
acoustic world only one voice per speaker. (E.
C. Cherry and W. K. Taylor. Some further
experiments on the recognition of speech, with
one and two ears. Journal of the Acoustic Society
of America, 26554-559, 1954) - The cocktail party effect--the ability to
focus ones listening attention on a single
talker among a cacophony of conversations and
background noise--has been recognized for some
time. This specialized listening ability may be
because of characteristics of the human speech
production system, the auditory system, or
high-level perceptual and language processing.
34Research Goal
- Design a preprocessor with digital signal
processing speech enhancement algorithms. The
input signals are collected through multiple
sensor (microphone) arrays. After the computation
of embedded signal processing algorithms, we have
clearly separated signals at the output.
35Enhanced Output
Audio Input
Blind Source Separation Algorithms
362. Problem Statement of Blind Source Separation
What is Blind Source Separation?
37Formulation of Blind Source Separation Problem
- A received signal vector from the array, X(t), is
the original source vector S(t) through the
channel distortion H(t), such that X(t) H(t)
S(t), where -
- and
- We need to estimate a separator W(t) such that
- where
383. Principles of Blind Source Separation
The independence measurement Shannons Mutual
information.
394. Criteria to Separate Independent Sources
- Constrained Entropy (Wu, IJCNN99)
-
- Hardamard Measure (Wu, ICA99)
-
- Frobenius Norm (Wu, NNSP97)
-
- Quadratic Gaussianity (Wu, NNSP99)
-
405. Application to Blind Single Channel
Equalization for Digital Communication Systems
- We apply the minimization of modified constrained
entropy -
to adapt an equalizer w(t) w0, w1, ....
for - a digital channel h(t). Assume a PAM signal
constellation with symbols s(t) , passing
through a digital channel h(t) c(t, 0.11)
0.8c(t-1, 0.11) - 0.4c(t-3, 0.11)W6T(t), - where
is raised-cosine function
with - roll-off factor b and
is a rectangular window. the input signal - to the equalizer is
where n(t) is the background noise.
We applied generalized anti-Hebbian learning to
adapt w(t) - such that .
41Signal-to-interference Ratio (dB)
Signal-to-noise Ratio (dB)
42Bit Error Rate
Signal-to-noise Ratio (dB)
436. Simulation and Comparison
- The simulation results for comparison among our
generalized anti-Hebbian learning, SDIF algorithm
and Lees Informax method (Lee IJCNN97) over
three real recordings downloaded from Salk
Institute, University of California at San Diego.
44New VR LITE Frontend Blind Source Separation
End-of-speech Detection
457. Conclusion and Future Research
- The computational efficiency of blind source
separation needs to be reduced. - Test BSS for EOS detection under microphone
arrays of the same kind. - Incorporate other array signal processing
(beamformer?) technique to improve speech
detection and recognition.