Title: HIWIRE MEETING Granada, June 9-10, 2005
1HIWIRE MEETINGGranada, June 9-10, 2005
- JOSÉ C. SEGURA, LUZ GARCÍA
- JAVIER RAMÍREZ
2Schedule
- Non-linear feature normalization
- ECDF segmental implementation
- Progressive equalization
- 2-class normalization
- Non-linear speaker adaptation/independence
- Non-linear feature normalization
- Non-linear model adaptation
- VAD and technique combination
- MO-LRT
- Bi-spectrum based VAD
- Combined Front-End
3Schedule
- Non-linear feature normalization
- ECDF segmental implementation
- Progressive equalization
- 2-class normalization
- Non-linear speaker adaptation/independence
- Non-linear feature normalization
- Non-linear model adaptation
- VAD and technique combination
- MO-LRT
- Bi-spectrum based VAD
- Combined Front-End
4ECDF-based nonlinear transformation (1)
- CDF-matching nonlinear transformation
- In previous works we modeled CDFs by using
histograms
5ECDF-based nonlinear transformation (2)
- An alternative algorithm based on Order Statistics
- Is faster, only requires sorting and table
indexing - Results are almost equal to those obtained with
histograms
6ECDF Segmental implementation
- Based on a sliding window
- José C. Segura, M. Carmen Benítez, Ángel de la
Torre, Antonio J. Rubio, Javier Ramírez, Cepstral
domain segmental nonlinear feature
transformations for robust speech recognition,
IEEE Signal Processing Letters.,Vol.11, pp.
666-669, 2004
7Progressive normalization
- As not all MFCC offer equal discrimination
- And HEQ introduces certain distortion
- Normalization up to a certain MFCC gives the best
performance
8ECDF-based normalization results
92-class normalization (1)
- A first approach on parametric non-linear
equalization - PDFs are modeled as two-Gaussian class mixtures
for each MFCC - Actually we use speech/noise like classes
- EM is used on each sentence to obtain the
Gaussian classes
102-class normalization (2)
Nonlinear parametric transformation
112-class normalization results
12Schedule
- Non-linear feature normalization
- ECDF segmental implementation
- Progressive equalization
- 2-class normalization
- Non-linear speaker adaptation/independence
- Non-linear feature normalization
- Non-linear model adaptation
- VAD and technique combination
- MO-LRT
- Bi-spectrum based VAD
- Combined Front-End
13ECDF Features Normalization
- HEQ as a non-linear speaker normalization
technique using ECDF
14 ECDF Norm. for SA
Test01 WER () Test01WAC ()
MLLR 10,97 89,03
BASELINE 13,22 86,78
AFE 12,74 87,26
ECDF 11,23 88,77
15ECDF Models Adaptation
- 2 APPROACHES
- Pure Equalization HEQ MOD
- new Gaussian Distributions
- - shift on the means X -gtX HEQ
- - scale factor on the variances
- Equalization mixed with linear transformation
HEQ PLIN - LT XA MX B
- M, B such that
- D(XA, XHEQ) MXB - XHEQ 2
minimum
Speaker Specific Features
Speaker Independent Features
16Models Adaptation
Test01 WER () Test01WAC ()
MLLR 10,97 89,03
BASELINE 13,22 86,78
HEQ MOD 12,95 87,05
HEQ PLIN 13,31 86,52
17SA methods. Comparison
18Future Work 1/2
- SA models using MLLR are not robust against noise
- Feature Normalization MLLR
19Future Work 2/2
- Non linear Feature Normalization and Model
Adaptation - Development of further experiments with more
complex tasks on WSJ1 database (spoke3 and spoke4)
20Schedule
- Non-linear feature normalization
- ECDF segmental implementation
- Progressive equalization
- 2-class normalization
- Non-linear speaker adaptation/independence
- Non-linear feature normalization
- Non-linear model adaptation
- VAD and technique combination
- MO-LRT
- Bi-spectrum based VAD
- Combined Front-End
21Previous work on VAD
- Voice activity detection
- Kullback-Leibler divergence
- J. Ramírez, J. C. Segura, C. Benítez, A. de la
Torre, A. Rubio, A New Kullback-Leibler VAD for
Robust Speech Recognition, - IEEE Signal Processing Letters, Vol.11, No.2,
pp. 666-669, Feb. 2004 - Long-term spectral divergence
- J. Ramírez, J. C. Segura, C. Benítez, A. de la
Torre, A. Rubio, Efficient Voice Activity
Detection Algorithms Using Long-Term Speech
Information, - Speech Communication, Vol. 42/3-4, pp. 271-287,
2004 - Subband SNR estimation using OS filters
- J. Ramírez, J. C. Segura, C. Benítez, A. de la
Torre, A. Rubio, An Effective Subband OSF-based
VAD with Noise Reduction for Robust Speech
Recognition, - To appear in IEEE Transactions on Speech and
Audio Processing, 2005/2006. - Multiple observation likelihood ratio test
- J. Ramírez, J. C. Segura, C. Benítez, L. García,
A. Rubio, Statistical Voice Activity Detection
using a Multiple Observation Likelihood Ratio
Test, - To appear in IEEE Signal Processing Letters
22Likelihood ratio test
- Generalization of the Sohns VAD
- J. Sohn, N. S. Kim, W. Sung, A statistical
model-based voice activity detection, IEEE
Signal Processing Letters, vol. 16 (1), pp. 1-3,
1999. - Two hypothesis are considered
- H0 y n Absence of speech (Silence)
- H1 y s n Speech presence
- Optimum decision rule (Bayes classifier)
- l-frame observation vector
- LRT evaluation ? Adequate signal model
LRT Likelihood ratio test
23Multiple observation likelihood ratio test
- MO-LRT (multiple observation LRT)
- Given a set of N 2m1 consecutive observations
- LRT
- Under statistical independence
- Recursive Log-LRT
24Analysis Optimum delay
Probability distributions
Classification errors
- Increasing m (number of the observations)
- Reduction of the overlap between the
distributions - Misclassification errors
- Reduced for speech vs Moderate
increase for non-speech
25Analysis Optimum delay
- ROC analysis AURORA 3 Spanish
(High-Ch1, 5dB)
26Speech recognition experiments
Frame dropping (FD)
Wiener Filtering (WF)
MFCC
HTK
Noise estimation
VAD
AURORA 2
MO-LRT G.729 AMR1 AMR2 AFE
86.14 70.32 74.29 82.89 83.29
Ref. VAD Woo Li Marzinzik Sohn
86.86 81.09 82.11 85.23 83.80
Average Wacc () for CT and MCT
27Speech recognition experiments
AURORA 3 Spanish SpeechDat-Car
WACC () MO-LRT G.729 AMR1 AMR2 AFE
WM 96.33 88.62 94.65 95.67 95.28
MM 91.61 72.84 80.59 90.91 90.23
HM 87.43 65.50 62.41 85.77 77.53
Average 91.79 75.65 74.33 90.78 87.68
MO-LRT Woo Li Marzinzik Sohn
WM 96.33 95.35 91.82 94.29 96.07
MM 91.61 89.30 77.45 89.81 91.64
HM 87.43 83.64 78.52 79.43 84.03
Average 91.79 89.43 82.60 87.84 90.58
28Work in progress
- Statistical tests in the bispectrum domain
- J. M. Górriz, et al., Voice Activity Detection
Based on HOS, 8th International Work-Conference
on Artificial Neural Networks (IWANN'2005) - J. M. Górriz, et al., Statistical Tests for
Voice Activity Detection, Non-linear Speech
Processing (NOLISP2005), 2005. - J. M. Górriz, et al., Bispectra analysis-based
VAD for robust speech recognition, First
International Work-Conference on the Interplay
Between Natural and Artificial Computation
(IWINAC2005) - Bispectrum LRT (application of MO-LRT on
the bispectra) - J. M. Górriz, et al, An Improved MO-LRT VAD
Based on a Bispectra Gaussian Model, Submitted
to Electronics Letters.
29GSTC-UGR speech recognition results
- LTSE VAD
- J. Ramírez, et al., Efficient Voice Activity
Detection Algorithms Using Long-Term Speech
Information, Speech Communication, Vol. 42/3-4,
pp. 271-287, 2004 - Segmental ECDF 60 frame delay
- J. C. Segura, et al., Cepstral Domain Segmental
Nonlinear Feature Transformations for Robust
Speech Recognition, IEEE Signal Processing
Letters, Vol.11, No. 5, pp. 517 - 520, 2004 - Progressive
- Log-E Up to the 4th cepstral coefficient
30GSTC-UGR speech recognition results
AURORA 2 WACC () SET A SET B SET C Average
Multicondition training GSTC-UGR 90.58 90.23 89.10 90.14
Multicondition training HIWIRE baseline 88.40 88.96 88.97 88.74
Clean training GSTC-UGR 86.01 86.84 85.00 86.14
Clean training HIWIRE baseline 64.00 69.10 64.73 66.18
WER Relative Improvements 12 (MCT)
59 (CT)
AURORA 3 WACC () Italian Italian Italian Spanish Spanish Spanish Average Average Average
AURORA 3 WACC () WM MM HM WM MM HM WM MM HM
GSTC-UGR 96.94 91.89 86.19 96.52 92.03 89.95 96.73 91.96 88.07
HIWIRE baseline 94.40 87.14 46.75 89.30 83.18 65.50 91.85 85.16 56.13
WER Relative Improvements 60 (WM)
46 (MM) 73 (HM)
31GSTC-UGR speech recognition results
AURORA 4 WER () (clean training
experiments)
Test 1 2 3 4 5 6 7 Avg
GSTC-UGR 13.37 19.52 37.53 40.22 39.19 37.16 39.30 32.33
HIWIRE baseline 13.22 24.68 46.00 47.62 52.67 44.79 54.73 40.53
Test 8 9 10 11 12 13 14 Avg.
GSTC-UGR 21.40 30.76 45.49 48.43 50.46 45.30 48.77 41.52
HIWIRE baseline 22.58 36.21 55.40 58.31 65.34 54.11 62.28 50.60
WER Relative Improvements 20 (Test sets
17) 17 (Test sets 814)
32HIWIRE MEETINGGranada, June 9-10, 2005
- JOSÉ C. SEGURA, LUZ GARCÍA
- JAVIER RAMÍREZ