HIWIRE MEETING Granada, June 9-10, 2005 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

HIWIRE MEETING Granada, June 9-10, 2005

Description:

Jos C. Segura, M. Carmen Ben tez, ngel de la Torre, Antonio J. Rubio, Javier ... June 9-10, 2005. JOS C. SEGURA, LUZ GARC A. JAVIER RAM REZ. GSTC UGR ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 33
Provided by: cvspC
Category:
Tags: hiwire | meeting | granada | june

less

Transcript and Presenter's Notes

Title: HIWIRE MEETING Granada, June 9-10, 2005


1
HIWIRE MEETINGGranada, June 9-10, 2005
  • JOSÉ C. SEGURA, LUZ GARCÍA
  • JAVIER RAMÍREZ

2
Schedule
  • Non-linear feature normalization
  • ECDF segmental implementation
  • Progressive equalization
  • 2-class normalization
  • Non-linear speaker adaptation/independence
  • Non-linear feature normalization
  • Non-linear model adaptation
  • VAD and technique combination
  • MO-LRT
  • Bi-spectrum based VAD
  • Combined Front-End

3
Schedule
  • Non-linear feature normalization
  • ECDF segmental implementation
  • Progressive equalization
  • 2-class normalization
  • Non-linear speaker adaptation/independence
  • Non-linear feature normalization
  • Non-linear model adaptation
  • VAD and technique combination
  • MO-LRT
  • Bi-spectrum based VAD
  • Combined Front-End

4
ECDF-based nonlinear transformation (1)
  • CDF-matching nonlinear transformation
  • In previous works we modeled CDFs by using
    histograms

5
ECDF-based nonlinear transformation (2)
  • An alternative algorithm based on Order Statistics
  • Is faster, only requires sorting and table
    indexing
  • Results are almost equal to those obtained with
    histograms

6
ECDF Segmental implementation
  • Based on a sliding window
  • José C. Segura, M. Carmen Benítez, Ángel de la
    Torre, Antonio J. Rubio, Javier Ramírez, Cepstral
    domain segmental nonlinear feature
    transformations for robust speech recognition,
    IEEE Signal Processing Letters.,Vol.11, pp.
    666-669, 2004

7
Progressive normalization
  • As not all MFCC offer equal discrimination
  • And HEQ introduces certain distortion
  • Normalization up to a certain MFCC gives the best
    performance

8
ECDF-based normalization results
9
2-class normalization (1)
  • A first approach on parametric non-linear
    equalization
  • PDFs are modeled as two-Gaussian class mixtures
    for each MFCC
  • Actually we use speech/noise like classes
  • EM is used on each sentence to obtain the
    Gaussian classes

10
2-class normalization (2)
Nonlinear parametric transformation
11
2-class normalization results
12
Schedule
  • Non-linear feature normalization
  • ECDF segmental implementation
  • Progressive equalization
  • 2-class normalization
  • Non-linear speaker adaptation/independence
  • Non-linear feature normalization
  • Non-linear model adaptation
  • VAD and technique combination
  • MO-LRT
  • Bi-spectrum based VAD
  • Combined Front-End

13
ECDF Features Normalization
  • HEQ as a non-linear speaker normalization
    technique using ECDF

14
ECDF Norm. for SA
Test01 WER () Test01WAC ()
MLLR 10,97 89,03
BASELINE 13,22 86,78
AFE 12,74 87,26
ECDF 11,23 88,77
15
ECDF Models Adaptation
  • 2 APPROACHES
  • Pure Equalization HEQ MOD
  • new Gaussian Distributions
  • - shift on the means X -gtX HEQ
  • - scale factor on the variances
  • Equalization mixed with linear transformation
    HEQ PLIN
  • LT XA MX B
  • M, B such that
  • D(XA, XHEQ) MXB - XHEQ 2
    minimum

Speaker Specific Features
Speaker Independent Features
16
Models Adaptation
Test01 WER () Test01WAC ()
MLLR 10,97 89,03
BASELINE 13,22 86,78
HEQ MOD 12,95 87,05
HEQ PLIN 13,31 86,52
17
SA methods. Comparison
18
Future Work 1/2
  • SA models using MLLR are not robust against noise
  • Feature Normalization MLLR

19
Future Work 2/2
  • Non linear Feature Normalization and Model
    Adaptation
  • Development of further experiments with more
    complex tasks on WSJ1 database (spoke3 and spoke4)

20
Schedule
  • Non-linear feature normalization
  • ECDF segmental implementation
  • Progressive equalization
  • 2-class normalization
  • Non-linear speaker adaptation/independence
  • Non-linear feature normalization
  • Non-linear model adaptation
  • VAD and technique combination
  • MO-LRT
  • Bi-spectrum based VAD
  • Combined Front-End

21
Previous work on VAD
  • Voice activity detection
  • Kullback-Leibler divergence
  • J. Ramírez, J. C. Segura, C. Benítez, A. de la
    Torre, A. Rubio, A New Kullback-Leibler VAD for
    Robust Speech Recognition,
  • IEEE Signal Processing Letters, Vol.11, No.2,
    pp. 666-669, Feb. 2004
  • Long-term spectral divergence
  • J. Ramírez, J. C. Segura, C. Benítez, A. de la
    Torre, A. Rubio, Efficient Voice Activity
    Detection Algorithms Using Long-Term Speech
    Information,
  • Speech Communication, Vol. 42/3-4, pp. 271-287,
    2004
  • Subband SNR estimation using OS filters
  • J. Ramírez, J. C. Segura, C. Benítez, A. de la
    Torre, A. Rubio, An Effective Subband OSF-based
    VAD with Noise Reduction for Robust Speech
    Recognition,
  • To appear in IEEE Transactions on Speech and
    Audio Processing, 2005/2006.
  • Multiple observation likelihood ratio test
  • J. Ramírez, J. C. Segura, C. Benítez, L. García,
    A. Rubio, Statistical Voice Activity Detection
    using a Multiple Observation Likelihood Ratio
    Test,
  • To appear in IEEE Signal Processing Letters

22
Likelihood ratio test
  • Generalization of the Sohns VAD
  • J. Sohn, N. S. Kim, W. Sung, A statistical
    model-based voice activity detection, IEEE
    Signal Processing Letters, vol. 16 (1), pp. 1-3,
    1999.
  • Two hypothesis are considered
  • H0 y n Absence of speech (Silence)
  • H1 y s n Speech presence
  • Optimum decision rule (Bayes classifier)
  • l-frame observation vector
  • LRT evaluation ? Adequate signal model

LRT Likelihood ratio test
23
Multiple observation likelihood ratio test
  • MO-LRT (multiple observation LRT)
  • Given a set of N 2m1 consecutive observations
  • LRT
  • Under statistical independence
  • Recursive Log-LRT

24
Analysis Optimum delay
Probability distributions
Classification errors
  • Increasing m (number of the observations)
  • Reduction of the overlap between the
    distributions
  • Misclassification errors
  • Reduced for speech vs Moderate
    increase for non-speech

25
Analysis Optimum delay
  • ROC analysis AURORA 3 Spanish
    (High-Ch1, 5dB)

26
Speech recognition experiments
Frame dropping (FD)
Wiener Filtering (WF)
MFCC
HTK
Noise estimation
VAD
AURORA 2
MO-LRT G.729 AMR1 AMR2 AFE
86.14 70.32 74.29 82.89 83.29
Ref. VAD Woo Li Marzinzik Sohn
86.86 81.09 82.11 85.23 83.80
Average Wacc () for CT and MCT
27
Speech recognition experiments
AURORA 3 Spanish SpeechDat-Car
WACC () MO-LRT G.729 AMR1 AMR2 AFE
WM 96.33 88.62 94.65 95.67 95.28
MM 91.61 72.84 80.59 90.91 90.23
HM 87.43 65.50 62.41 85.77 77.53
Average 91.79 75.65 74.33 90.78 87.68
MO-LRT Woo Li Marzinzik Sohn
WM 96.33 95.35 91.82 94.29 96.07
MM 91.61 89.30 77.45 89.81 91.64
HM 87.43 83.64 78.52 79.43 84.03
Average 91.79 89.43 82.60 87.84 90.58
28
Work in progress
  • Statistical tests in the bispectrum domain
  • J. M. Górriz, et al., Voice Activity Detection
    Based on HOS, 8th International Work-Conference
    on Artificial Neural Networks (IWANN'2005)
  • J. M. Górriz, et al., Statistical Tests for
    Voice Activity Detection, Non-linear Speech
    Processing (NOLISP2005), 2005.
  • J. M. Górriz, et al., Bispectra analysis-based
    VAD for robust speech recognition, First
    International Work-Conference on the Interplay
    Between Natural and Artificial Computation
    (IWINAC2005)
  • Bispectrum LRT (application of MO-LRT on
    the bispectra)
  • J. M. Górriz, et al, An Improved MO-LRT VAD
    Based on a Bispectra Gaussian Model, Submitted
    to Electronics Letters.

29
GSTC-UGR speech recognition results
  • LTSE VAD
  • J. Ramírez, et al., Efficient Voice Activity
    Detection Algorithms Using Long-Term Speech
    Information, Speech Communication, Vol. 42/3-4,
    pp. 271-287, 2004
  • Segmental ECDF 60 frame delay
  • J. C. Segura, et al., Cepstral Domain Segmental
    Nonlinear Feature Transformations for Robust
    Speech Recognition, IEEE Signal Processing
    Letters, Vol.11, No. 5, pp. 517 - 520, 2004
  • Progressive
  • Log-E Up to the 4th cepstral coefficient

30
GSTC-UGR speech recognition results
AURORA 2 WACC () SET A SET B SET C Average
Multicondition training GSTC-UGR 90.58 90.23 89.10 90.14
Multicondition training HIWIRE baseline 88.40 88.96 88.97 88.74
Clean training GSTC-UGR 86.01 86.84 85.00 86.14
Clean training HIWIRE baseline 64.00 69.10 64.73 66.18
WER Relative Improvements 12 (MCT)
59 (CT)
AURORA 3 WACC () Italian Italian Italian Spanish Spanish Spanish Average Average Average
AURORA 3 WACC () WM MM HM WM MM HM WM MM HM
GSTC-UGR 96.94 91.89 86.19 96.52 92.03 89.95 96.73 91.96 88.07
HIWIRE baseline 94.40 87.14 46.75 89.30 83.18 65.50 91.85 85.16 56.13
WER Relative Improvements 60 (WM)
46 (MM) 73 (HM)
31
GSTC-UGR speech recognition results
AURORA 4 WER () (clean training
experiments)
Test 1 2 3 4 5 6 7 Avg
GSTC-UGR 13.37 19.52 37.53 40.22 39.19 37.16 39.30 32.33
HIWIRE baseline 13.22 24.68 46.00 47.62 52.67 44.79 54.73 40.53
Test 8 9 10 11 12 13 14 Avg.
GSTC-UGR 21.40 30.76 45.49 48.43 50.46 45.30 48.77 41.52
HIWIRE baseline 22.58 36.21 55.40 58.31 65.34 54.11 62.28 50.60
WER Relative Improvements 20 (Test sets
17) 17 (Test sets 814)
32
HIWIRE MEETINGGranada, June 9-10, 2005
  • JOSÉ C. SEGURA, LUZ GARCÍA
  • JAVIER RAMÍREZ
Write a Comment
User Comments (0)
About PowerShow.com