How to deal with the noise in real systems? - PowerPoint PPT Presentation

About This Presentation

Title:

How to deal with the noise in real systems?

Description:

... 7-8' in Car 55 mph. PCS Research & Advanced Technology Labs. Speech Lab ... Digit 'one' in handsfree mic, 55 mil/h car. PCS Research & Advanced Technology Labs ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 46

Provided by: ece5

Learn more at: https://www.ece.lsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: How to deal with the noise in real systems?

1
How to deal with the noise in real systems?

Hsiao-Chun Wu
Motorola PCS Research and Advanced Technology
Labs, Speech Laboratory
richardw_at_srl.css.mot.com
Phone (815) 884-3071

2
Why do we need to study noise?

Noise exists everywhere. It affects the
performance of signal processing in reality.
Since the noise cannot be avoided by system
engineers, modern noise-processing technology
has been researched and designed to overcome this
problem. Hence many related research areas have
been emerging, such as signal detection, signal
enhancement/noise suppression and channel
equalization.

3
How to deal with noise? Cut it off!!!!

Spectral Truncation
Spectral Subtraction (1989)
Time Truncation
Signal Detection
Spatial and/or Temporal Filtering
Equalization
Array Signal Separation (Blind Source Separation)

4
Session 1. On-line Automatic End-of-speech
Detection Algorithm (Time Truncation)

1. Project goal.
2. Review of current methods.
3. Introduction to voice metric based
end-of-speech detector.
4. Simulation results.
5. Conclusion.

5
1. Project Goal

Problem
Digit-dial recognition with unknown digit string
length
Solution 1
fixed length window such as 10 seconds?
(inconvenience to users)
Solution 2
Dynamic termination of data capture? (need a
robust detection algorithm)

Research and design a robust dynamic termination
mechanism for speech recognizer.
a new on-line automatic end-of-speech detection
algorithm with small computational complexity.
Design a more robust front end to improve the
recognition accuracy for speech recognizers.
a new algorithm can also decrease the excessive
feature extraction of redundant noise.

7
2. Review of Current Methods

Most speech detection algorithms can be
characterized into three categories.
Frame energy detection
short-term frame energy (20 msec) can be used for
speech/noise classification.
it is not robust at large background noise
levels.
Zero-crossing rate detection
short-term zero-crossing rate can also be used
for speech/noise classification.
it is not robust in a wide variety of noise
types.
Higher-order-spectral detection
short-term higher-order spectra can be used for
speech/noise classification.
it implies a heavy computational complexity and
its threshold is difficult to be pre-determined.

8
3. Introduction to Voice Metric Based
End-of-speech Detector

End-of-speech detection using voice metric
features is based on the Mel-energies. Voice
metric features are robust over a wide variety of
background noise. Originally voice metric based
speech/noise classifier was applied for IS-127
CELP speech coder standard. We modify and enhance
voice-metric features to design a new
end-of-speech detector for Motorola voice
recognition front end (VR LITE III).

9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
voice metric score table
16
Original VR LITE Front End
Pre-S/N Classifier
Voice Metric
Mel- Spectrum
SNR Estimate
raw data
FFT
voice metric scores
no
Threshold Adaptation
Silence Duration Threshold
Post-S/N Classifier
EOS Buffer
Speech Start?
yes
data capture stops
End-of-speech Detector
17
segmentation of speech into frames
front end with end-of-speech detector
speech input
end of speech?
yes
frame i
next frame i1
no
data capture terminates
18
String 2-2-9-1-7-8 in Car 55 mph
raw data
6.51 seconds
end point
3.78 seconds
detected end point
4.81 seconds
19
String 2-2-9-1-7-8 in Car 55 mph
false detection time error
correct detection time error
End point
Correct detection
False detection
seconds
20
4. Simulation Results (Simulation is done over
Motorola digit-string database, including 16
speakers and 15,166 variable-length digit strings
in 7 different conditions. Silence threshold is
1.85 seconds.)

A. Receiver Operating Curve (ROC) ROC curve is
the relationship between the end-of-speech
detection rate versus the false (early) detection
rate. We compare two different methods, namely,
(1) new voice-metric based end-of-speech detector
and (2) old speech/noise flag based end-of-speech
detector.

21
ROC curve
detection rate ()
false detection rate ()
22

B. String-accuracy-convergence (SAC) curve SAC
curve is the relationship between the string
recognition accuracy versus the false (early)
detection rate. We compare two different methods,
namely, (1) new voice-metric based end-of-speech
detector and (2) old speech/noise flag based
end-of-speech detector.

23
SAC curve
string recognition accuracy ()
false detection rate ()
24
C. Table of detection results (This table
illustrates the result among the Madison
sub-database including data files with 1.85
seconds or more of silence after end of speech.)
25
(This table illustrates the result over the small
database collected by Motorola PCS CSSRL. All
digits strings are recorded in 15 seconds of
fixed window)
Condition Average Time Error Average False Detection Time Error Average Correct Detection Time Error False Detection Rate String Numbers Total Detection Rate String Recognition Accuracy (w/i EOS) String Recognition Accuracy (w/o EOS)
Overall 1.82 seconds 0 seconds 1.82 seconds 0 121 96.69 50.41 29.75
Office Close-talk 1.85 seconds 0 seconds 1.85 seconds 0 21 100 66.67 61.90
Office-Arm-length 1.84 seconds 0 seconds 1.84 seconds 0 20 100 65.00 65.00
Café Close-talk 1.76 seconds 0 seconds 1.76 seconds 0 40 100 40.00 15.00
Café Arm-length 1.85 seconds 0 seconds 1.85 seconds 0 40 90 45.00 10.00
26
Analysis of the Simulation Result Why didnt EOS
detection work well in babble noise?
27
Optimal Detection Decision

Bayes classifier
Likelihood Ratio Test

28
Digit one in close-talking mic, quiet office
29
Digit one in handsfree mic, 55 mil/h car
30
Digit one in far-talking mic, cafeteria
31
5. Conclusion

New voice-metric based end-of-speech detector is
robust over a wide variety of background noise.
Only a small increase in the computational
complexity will be brought by new voice-metric
based end-of-speech detector and it can be
real-time implementable.
New voice-metric based end-of-speech detector can
improve recognition performance by discarding
extra noise due to the fixed data capture window.
New voice-metric based end-of-speech detector
needs further improvement in the babble noise
environment.

32
Session 2. Speech Enhancement Algorithms Blind
Source Separation Methods (Spatial and Temporal
Filtering)

1. Motivation and research goal.
2. Statement of blind source separation
problem.
3. Principles of blind source separation.
4. Criteria for blind source separation.
5. Application to blind channel equalization for
digital
communication systems.
6. Simulation and comparison.
7. Summary and conclusion.

33
1. Motivation

Mimic human auditory system to differentiate the
subject signals from other sounds, such as
interfered sources, background noise for clear
recognition of the subject contents.
One of the most striking facts about our ears is
that we have two of them--and yet we hear one
acoustic world only one voice per speaker. (E.
C. Cherry and W. K. Taylor. Some further
experiments on the recognition of speech, with
one and two ears. Journal of the Acoustic Society
of America, 26554-559, 1954)
The cocktail party effect--the ability to
focus ones listening attention on a single
talker among a cacophony of conversations and
background noise--has been recognized for some
time. This specialized listening ability may be
because of characteristics of the human speech
production system, the auditory system, or
high-level perceptual and language processing.

34
Research Goal

Design a preprocessor with digital signal
processing speech enhancement algorithms. The
input signals are collected through multiple
sensor (microphone) arrays. After the computation
of embedded signal processing algorithms, we have
clearly separated signals at the output.

35
Enhanced Output
Audio Input
Blind Source Separation Algorithms
36
2. Problem Statement of Blind Source Separation
What is Blind Source Separation?
37
Formulation of Blind Source Separation Problem

A received signal vector from the array, X(t), is
the original source vector S(t) through the
channel distortion H(t), such that X(t) H(t)
S(t), where
and
We need to estimate a separator W(t) such that
where

38
3. Principles of Blind Source Separation
The independence measurement Shannons Mutual
information.
39
4. Criteria to Separate Independent Sources

Constrained Entropy (Wu, IJCNN99)
Hardamard Measure (Wu, ICA99)
Frobenius Norm (Wu, NNSP97)
Quadratic Gaussianity (Wu, NNSP99)

40
5. Application to Blind Single Channel
Equalization for Digital Communication Systems

We apply the minimization of modified constrained
entropy
to adapt an equalizer w(t) w0, w1, ....
for
a digital channel h(t). Assume a PAM signal
constellation with symbols s(t) , passing
through a digital channel h(t) c(t, 0.11)
0.8c(t-1, 0.11) - 0.4c(t-3, 0.11)W6T(t),
where
is raised-cosine function
with
roll-off factor b and
is a rectangular window. the input signal
to the equalizer is
where n(t) is the background noise.
We applied generalized anti-Hebbian learning to
adapt w(t)
such that .

41
Signal-to-interference Ratio (dB)
Signal-to-noise Ratio (dB)
42
Bit Error Rate
Signal-to-noise Ratio (dB)
43
6. Simulation and Comparison

The simulation results for comparison among our
generalized anti-Hebbian learning, SDIF algorithm
and Lees Informax method (Lee IJCNN97) over
three real recordings downloaded from Salk
Institute, University of California at San Diego.

44
New VR LITE Frontend Blind Source Separation
End-of-speech Detection
45
7. Conclusion and Future Research

The computational efficiency of blind source
separation needs to be reduced.
Test BSS for EOS detection under microphone
arrays of the same kind.
Incorporate other array signal processing
(beamformer?) technique to improve speech
detection and recognition.

Write a Comment

User Comments (0)