How to deal with the noise in real systems? - PowerPoint PPT Presentation

About This Presentation
Title:

How to deal with the noise in real systems?

Description:

... 7-8' in Car 55 mph. PCS Research & Advanced Technology Labs. Speech Lab ... Digit 'one' in handsfree mic, 55 mil/h car. PCS Research & Advanced Technology Labs ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 46
Provided by: ece5
Learn more at: https://www.ece.lsu.edu
Category:
Tags: deal | noise | real | systems

less

Transcript and Presenter's Notes

Title: How to deal with the noise in real systems?


1
How to deal with the noise in real systems?
  • Hsiao-Chun Wu
  • Motorola PCS Research and Advanced Technology
    Labs, Speech Laboratory
  • richardw_at_srl.css.mot.com
  • Phone (815) 884-3071

2
Why do we need to study noise?
  • Noise exists everywhere. It affects the
    performance of signal processing in reality.
    Since the noise cannot be avoided by system
    engineers, modern noise-processing technology
    has been researched and designed to overcome this
    problem. Hence many related research areas have
    been emerging, such as signal detection, signal
    enhancement/noise suppression and channel
    equalization.

3
How to deal with noise? Cut it off!!!!
  • Spectral Truncation
  • Spectral Subtraction (1989)
  • Time Truncation
  • Signal Detection
  • Spatial and/or Temporal Filtering
  • Equalization
  • Array Signal Separation (Blind Source Separation)

4
Session 1. On-line Automatic End-of-speech
Detection Algorithm (Time Truncation)
  • 1. Project goal.
  • 2. Review of current methods.
  • 3. Introduction to voice metric based
    end-of-speech detector.
  • 4. Simulation results.
  • 5. Conclusion.

5
1. Project Goal
  • Problem
  • Digit-dial recognition with unknown digit string
    length
  • Solution 1
  • fixed length window such as 10 seconds?
    (inconvenience to users)
  • Solution 2
  • Dynamic termination of data capture? (need a
    robust detection algorithm)

6
  • Research and design a robust dynamic termination
    mechanism for speech recognizer.
  • a new on-line automatic end-of-speech detection
    algorithm with small computational complexity.
  • Design a more robust front end to improve the
    recognition accuracy for speech recognizers.
  • a new algorithm can also decrease the excessive
    feature extraction of redundant noise.

7
2. Review of Current Methods
  • Most speech detection algorithms can be
    characterized into three categories.
  • Frame energy detection
  • short-term frame energy (20 msec) can be used for
    speech/noise classification.
  • it is not robust at large background noise
    levels.
  • Zero-crossing rate detection
  • short-term zero-crossing rate can also be used
    for speech/noise classification.
  • it is not robust in a wide variety of noise
    types.
  • Higher-order-spectral detection
  • short-term higher-order spectra can be used for
    speech/noise classification.
  • it implies a heavy computational complexity and
    its threshold is difficult to be pre-determined.

8
3. Introduction to Voice Metric Based
End-of-speech Detector
  • End-of-speech detection using voice metric
    features is based on the Mel-energies. Voice
    metric features are robust over a wide variety of
    background noise. Originally voice metric based
    speech/noise classifier was applied for IS-127
    CELP speech coder standard. We modify and enhance
    voice-metric features to design a new
    end-of-speech detector for Motorola voice
    recognition front end (VR LITE III).

9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
voice metric score table
16
Original VR LITE Front End
Pre-S/N Classifier
Voice Metric
Mel- Spectrum
SNR Estimate
raw data
FFT
voice metric scores
no
Threshold Adaptation
Silence Duration Threshold
Post-S/N Classifier
EOS Buffer
Speech Start?
yes
data capture stops
End-of-speech Detector
17
segmentation of speech into frames
front end with end-of-speech detector
speech input
end of speech?
yes
frame i
next frame i1
no
data capture terminates
18
String 2-2-9-1-7-8 in Car 55 mph
raw data
6.51 seconds
end point
3.78 seconds
detected end point
4.81 seconds
19
String 2-2-9-1-7-8 in Car 55 mph
false detection time error
correct detection time error
End point
Correct detection
False detection
seconds
20
4. Simulation Results (Simulation is done over
Motorola digit-string database, including 16
speakers and 15,166 variable-length digit strings
in 7 different conditions. Silence threshold is
1.85 seconds.)
  • A. Receiver Operating Curve (ROC) ROC curve is
    the relationship between the end-of-speech
    detection rate versus the false (early) detection
    rate. We compare two different methods, namely,
    (1) new voice-metric based end-of-speech detector
    and (2) old speech/noise flag based end-of-speech
    detector.

21
ROC curve
detection rate ()
false detection rate ()
22
  • B. String-accuracy-convergence (SAC) curve SAC
    curve is the relationship between the string
    recognition accuracy versus the false (early)
    detection rate. We compare two different methods,
    namely, (1) new voice-metric based end-of-speech
    detector and (2) old speech/noise flag based
    end-of-speech detector.

23
SAC curve
string recognition accuracy ()
false detection rate ()
24
C. Table of detection results (This table
illustrates the result among the Madison
sub-database including data files with 1.85
seconds or more of silence after end of speech.)
25
(This table illustrates the result over the small
database collected by Motorola PCS CSSRL. All
digits strings are recorded in 15 seconds of
fixed window)
Condition Average Time Error Average False Detection Time Error Average Correct Detection Time Error False Detection Rate String Numbers Total Detection Rate String Recognition Accuracy (w/i EOS) String Recognition Accuracy (w/o EOS)
Overall 1.82 seconds 0 seconds 1.82 seconds 0 121 96.69 50.41 29.75
Office Close-talk 1.85 seconds 0 seconds 1.85 seconds 0 21 100 66.67 61.90
Office-Arm-length 1.84 seconds 0 seconds 1.84 seconds 0 20 100 65.00 65.00
Café Close-talk 1.76 seconds 0 seconds 1.76 seconds 0 40 100 40.00 15.00
Café Arm-length 1.85 seconds 0 seconds 1.85 seconds 0 40 90 45.00 10.00
26
Analysis of the Simulation Result Why didnt EOS
detection work well in babble noise?
27
Optimal Detection Decision
  • Bayes classifier
  • Likelihood Ratio Test

28
Digit one in close-talking mic, quiet office
29
Digit one in handsfree mic, 55 mil/h car
30
Digit one in far-talking mic, cafeteria
31
5. Conclusion
  • New voice-metric based end-of-speech detector is
    robust over a wide variety of background noise.
  • Only a small increase in the computational
    complexity will be brought by new voice-metric
    based end-of-speech detector and it can be
    real-time implementable.
  • New voice-metric based end-of-speech detector can
    improve recognition performance by discarding
    extra noise due to the fixed data capture window.
  • New voice-metric based end-of-speech detector
    needs further improvement in the babble noise
    environment.

32
Session 2. Speech Enhancement Algorithms Blind
Source Separation Methods (Spatial and Temporal
Filtering)
  • 1. Motivation and research goal.
  • 2. Statement of blind source separation
    problem.
  • 3. Principles of blind source separation.
  • 4. Criteria for blind source separation.
  • 5. Application to blind channel equalization for
    digital
  • communication systems.
  • 6. Simulation and comparison.
  • 7. Summary and conclusion.

33
1. Motivation
  • Mimic human auditory system to differentiate the
    subject signals from other sounds, such as
    interfered sources, background noise for clear
    recognition of the subject contents.
  • One of the most striking facts about our ears is
    that we have two of them--and yet we hear one
    acoustic world only one voice per speaker. (E.
    C. Cherry and W. K. Taylor. Some further
    experiments on the recognition of speech, with
    one and two ears. Journal of the Acoustic Society
    of America, 26554-559, 1954)
  • The cocktail party effect--the ability to
    focus ones listening attention on a single
    talker among a cacophony of conversations and
    background noise--has been recognized for some
    time. This specialized listening ability may be
    because of characteristics of the human speech
    production system, the auditory system, or
    high-level perceptual and language processing.

34
Research Goal
  • Design a preprocessor with digital signal
    processing speech enhancement algorithms. The
    input signals are collected through multiple
    sensor (microphone) arrays. After the computation
    of embedded signal processing algorithms, we have
    clearly separated signals at the output.

35
Enhanced Output
Audio Input
Blind Source Separation Algorithms
36
2. Problem Statement of Blind Source Separation
What is Blind Source Separation?
37
Formulation of Blind Source Separation Problem
  • A received signal vector from the array, X(t), is
    the original source vector S(t) through the
    channel distortion H(t), such that X(t) H(t)
    S(t), where


  • and
  • We need to estimate a separator W(t) such that
  • where

38
3. Principles of Blind Source Separation
The independence measurement Shannons Mutual
information.
39
4. Criteria to Separate Independent Sources
  • Constrained Entropy (Wu, IJCNN99)
  • Hardamard Measure (Wu, ICA99)
  • Frobenius Norm (Wu, NNSP97)
  • Quadratic Gaussianity (Wu, NNSP99)

40
5. Application to Blind Single Channel
Equalization for Digital Communication Systems
  • We apply the minimization of modified constrained
    entropy

  • to adapt an equalizer w(t) w0, w1, ....
    for
  • a digital channel h(t). Assume a PAM signal
    constellation with symbols s(t) , passing
    through a digital channel h(t) c(t, 0.11)
    0.8c(t-1, 0.11) - 0.4c(t-3, 0.11)W6T(t),
  • where
    is raised-cosine function
    with
  • roll-off factor b and
    is a rectangular window. the input signal
  • to the equalizer is
    where n(t) is the background noise.
    We applied generalized anti-Hebbian learning to
    adapt w(t)
  • such that .

41
Signal-to-interference Ratio (dB)
Signal-to-noise Ratio (dB)
42
Bit Error Rate
Signal-to-noise Ratio (dB)
43
6. Simulation and Comparison
  • The simulation results for comparison among our
    generalized anti-Hebbian learning, SDIF algorithm
    and Lees Informax method (Lee IJCNN97) over
    three real recordings downloaded from Salk
    Institute, University of California at San Diego.

44
New VR LITE Frontend Blind Source Separation
End-of-speech Detection
45
7. Conclusion and Future Research
  • The computational efficiency of blind source
    separation needs to be reduced.
  • Test BSS for EOS detection under microphone
    arrays of the same kind.
  • Incorporate other array signal processing
    (beamformer?) technique to improve speech
    detection and recognition.
Write a Comment
User Comments (0)
About PowerShow.com