Advances in WP1 - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Advances in WP1

Description:

WP1: Environment & Sensor Robustness. T1.2 Noise Independence. Noise Reduction: ... 1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 21

Provided by: cvspC

Category:

more less

Transcript and Presenter's Notes

Title: Advances in WP1

1
Advances in WP1

Nancy Meeting 6-7 July 2006

www.loquendo.com
2
WP1 Environment Sensor RobustnessT1.2 Noise
Independence

Noise Reduction
Spectral Subtraction (YEAR 1) and Spectral
Attenuation (YEAR2)
Automatic Speech Recognition
With a Modified Ephraim-Malah Rule,
Roberto Gemello, Franco Mana and Renato De Mori
IEEE Signal Processing Letters, VOL 13, NO 1,
January 2006
Evaluation of HEQ for feature normalization
(HEQ study Revision 2)

3
Denoising Techniques for Y2 evaluations (1)
Spectral Attenuation (or spectral weighting) is a
form of audio signal enhancement in which noise
suppression can be viewed as the application of a
suppression rule, or non-negative real-valued
gain Gk, to each bin k of the observed signal
magnitude spectrum, in order to form an estimate
of the original signal magnitude spectrum.

4
Denoising Techniques for Y2 evaluations (2)
We propose to make the estimation of the a priori
and the a posteriori SNR dependent on the noise
overestimation factor a(m) and the spectral floor
b(m) as follows
5
Denoising Techniques for Y2 evaluations (3)
The noise spectrum amplitude is obtained by a
first-order recursion in conjunction with an
energy based Voice Activity Detector (VAD) as
follows
Where ? controls the update speed of the
recursion (0.9), ? controls the allowed dynamics
of noise (4.0), and the noise standard deviation
?(m) is estimated as
6
Baseline evaluations of Loquendo ASR on Aurora2
speech databases
7
Year 12 Performance evaluations
The testing conditions used in the experiments
are the following 1) No Denoising (ND) Rasta
PLP features (RPLP) are used without any
preliminary noise reduction. 2) Wiener modified
(WM) RPLP with Wiener filtering dependent on
global SNR. 3) Ephraim-Malah modified (EMM) RPLP
with noise reduction based on the modified
Ephraim-Malah spectral attenuation rule.
Test A Test A Test B Test B Test C Test C A-B-C Avg A-B-C Avg
Models Clean Multi Clean Multi Clean Multi Clean Multi
ND 24.4 6.5 22.5 8.9 24.7 9.8 23.7 8.1
WM 16.0 (34.4) 6.1 (6.1) 15.6 (30.7) 7.9 (11.2) 16.7 (32.4) 9.5 (3.0) 16.0 (32.5) 7.5 (7.4)
EMM 14.7 (39.7) 6.0 (7.7) 15.8 (29.8) 8.0 (10.1) 15.2 (38.5) 8.9 (9.2) 15.2 (35.9) 7.4 (8.6)
8
Baseline evaluations of Loquendo ASR on Aurora3
speech databases
9
Year 12 Performance evaluations
The testing conditions used in the experiments
are the following 1) No Denoising (ND) Rasta
PLP features (RPLP) are used without any
preliminary noise reduction. 2) Wiener modified
(WM) RPLP with Wiener filtering dependent on
global SNR. 3) Ephraim-Malah modified (EMM) RPLP
with noise reduction based on the modified
Ephraim-Malah spectral attenuation rule.
Ita WM Ita HM Spa WM Spa HM
ND 1.8 53.4 2.7 25.4
WM 1.7 (5.5) 22.5 (57.9) 2.4 (11.1) 10.1 (60.2)
EMM 1.6 (11.1) 17.8 (66.7) 2.3 (14.8) 11.5 (54.7)
10
Baseline evaluations of Loquendo ASR on Aurora4
speech databases
11
Year 12 Performance evaluations
The testing conditions used in the experiments
are the following 1) No Denoising (ND) Rasta
PLP features (RPLP) are used without any
preliminary noise reduction. 2) Wiener modified
(WM) RPLP with Wiener filtering dependent on
global SNR. 3) Ephraim-Malah modified (EMM) RPLP
with noise reduction based on the modified
Ephraim-Malah spectral attenuation rule.
CLEAN Models CLEAN Car Babble Restaurant Street Airport Train Station Noise avg.
ND 14.8 45.7 76.9 70.6 66.0 70.7 67.7 66.3
WM 14.8 (00.0) 33.0 (27.8) 63.4 (17.5) 69.3 (1.8) 56.9 (13.8) 68.1 (3.7) 51.2 (24.4) 57.0 (14.0)
EMM 14.5 (2.02) 29.6 (35.2) 62.9 (18.2) 68.4 (3.1) 54.2 (17.8) 68.4 (3.2) 46.3 (31.6) 55.0 (17.0)
12
Year 12 Performance evaluations
The testing conditions used in the experiments
are the following 1) No Denoising (ND) Rasta
PLP features (RPLP) are used without any
preliminary noise reduction. 2) Wiener modified
(WM) RPLP with Wiener filtering dependent on
global SNR. 3) Ephraim-Malah modified (EMM) RPLP
with noise reduction based on the modified
Ephraim-Malah spectral attenuation rule.
MULTI Models CLEAN Car Babble Restaurant Street Airport Train Station Noise avg.
ND 15.7 24.8 40.1 41.8 41.9 39.1 42.3 38.3
WM 16.6 (-5.7) 24.1 (2.8) 39.7 (1.0) 43.2 (-3.3) 39.6 (5.5) 39.5 (-1.0) 37.1 (12.3) 37.2 (2.9)
EMM 15.5 (1.3) 24.7 (0.4) 40.4 (-0.7) 44.2 (-5.7) 39.5 (5.7) 40.4 (-3.3) 38.2 (9.7) 37.9 (1.0)
13
HEQ Denoising techniques
14
HEQ Evaluation Revision 1 (1)(Loquendo UGR)
Problems (1) Context dependency (whole
utterance CDF estimation the best) (2) High
variability in background noise segment
15
HEQ Integration Revision 1 (2)(Loquendo UGR)
Phoneme-based Models
Feature Normalization (Frame -39coeff- level)
Denoise (Power Spectrum level)
AURORA3 ITA - HM SA WA WI WD WS
Loquendo 46.6 77.5 4.8 7.2 10.4
HEQ121 38.2 69.6 4.3 12.6 13.5
HEQ121 37.9 69.1 3.5 13.8 13.5
HEQ1001 46.5 77.7 4.0 7.3 11.0
16
HEQ Evaluation Revision 2 (3)(Loquendo UGR)
HEQ (1573)
E12CEP DE12DEP DDE12DDEP (39 coefficients)
HEQ (1573)
HEQ (1573)
Benefits (1) Relation in magnitude and dynamics
among coefficients are preserved (2) More stable
CDF estimation similar to extend the HEQ temporal
window
17
HEQ Evaluation Revision 2 (4)(Loquendo UGR)
AURORA3 ITA - HM SA WA WI WD WS
WM 46.6 77.5 4.8 7.2 10.4
HEQ121 47.9 77.7 5.1 6.7 10.5
HEQ241 49.7 79.7 4.3 6.6 9.3
WMHEQ121 49.0 79.2 5.1 5.7 10.0
WMHEQ241 50.8 79.8 4.6 6.1 9.4
18
HEQ for denoising (5)(Loquendo UGR)
Comparing RPLP / HEQrev1 / HEQrev2 using the
same clean and noisy signal
19
HEQ for signal level equalization (6)(Loquendo
UGR)
Comparing RPLP / HEQrev1 / HEQrev2 using the same
clean signal at normal gain level and at
low gain level
20
WP1 Workplan

Selection of suitable benchmark databases
(m6)
Completion of LASR baseline experimentation of
Spectral Subtraction (Wiener SNR dependent)

(m12)
Discriminative VAD (trainingAURORA3 testing)
(m16)
Exprimentation of Spectral Attenuation rule
(Ephraim-Malah SNR dependent)
(m21)
Preliminary results on spectral subtraction and
HEQ techniques (m24)
Integration of denoising and normalization
techniques (m33)
Noise estimation and reduction for non-stationary
noises (m33)