Advances in WP1 and WP2 - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Advances in WP1 and WP2

Description:

Discriminative VAD (m16) Spectral Attenuation (Ephraim-Malah SA SNR dependent) (m18) ... is an alternative to HMM modeling that exploits the discriminative training of ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 19
Provided by: cvspC
Category:

less

Transcript and Presenter's Notes

Title: Advances in WP1 and WP2


1
Advances in WP1 and WP2
  • Paris Meeting 11 febr. 2005

www.loquendo.com
2
Advances in WP1
  • Paris Meeting 11 febr. 2005

www.loquendo.com
3
WP1 Environment Sensor RobustnessT1.2 Noise
Independence
  • Voice Activity Detection
  • A Model-based approach using NN (Neural Networks)
    to discriminate two classes (noise and voice)
    will be explored
  • NN input could be standard features (Cepstral
    coeff., Energy) after noise reduction, in case
    complemented by other features (pitch/voicing)
    produced by other partners (IRST)
  • Training set will be multi-style, including
    several types of noise conditions and languages
  • Noise Reduction
  • Some noise reduction techniques will be
    experimented on the test sets selected as
    benchmarks for the project
  • Spectral Subtraction (standard, Wiener and SNR
    dependent) and Spectral Attenuation
    (Ephraim-Malah SA standard and SNR dependent)
  • New techniques for non-stationary noises

4
WP1 Speech Databases for Noise Reduction
  • Aurora 2 - Connected digits - TIdigits data down
    sampled to 8 kHz, filtered with a G712
    characteristic and noise artificially added at
    several SNRs (20dB, 15dB, 10 dB, 5dB, 0dB, -5dB).
    There are three test sets
  • A same noises as in train subway, babble, car
    noise, exhibition hall
  • B 4 different noises restaurant, street,
    airport, train station
  • C same noises as A but filtered with a different
    microphone
  • Aurora 3 - Connected digits recorded in car
    environment - Signal collected by hand free (ch1)
    and close talk (ch0) microphones. In HIWIRE we
    use Italian and Spanish recordings. There are two
    test sets
  • WM ch0 and ch1 recordings used in training and
    testing lists
  • HM ch0 for training and ch1 for testing
  • Aurora 4 - Continuous speech 5k vocabulary - It
    is WSJ0 5K with added noise of 6 kinds Car,
    Babble, Restaurant, Street, Airport, Train
    station. It uses the standard Bi-Gram language
    modeling.

5
Denoising Techniques for baseline evaluations
Spectral Subtraction (SS) operates in the
frequency domain and attempts to compute a
denoised version of the power spectrum. Wiener
spectral subtraction is defined as
6
Baseline evaluations of Loquendo ASR on Aurora2
speech databases
7
Baseline Performance evaluations
 
  • This test was performed with the Loquendo ASR
    with the CLEAN / MULTI_CONDITION models trained
    using the Aurora2 training lists.
  • The test has been done using the A/B/C testing
    lists.

Performances in terms of Word Accuracy and Error
Reduction
8
Baseline evaluations of Loquendo ASR on Aurora3
speech databases
9
Baseline Performance evaluations
 
  • This test was performed with the Loquendo ASR and
    the models trained using the Aurora3 training
    lists.
  • The test has been done using the Well Matched
    (WM) and High Mismatch (HM) testing lists.

Performances in terms of Word Accuracy and Error
Reduction
10
Baseline evaluations of Loquendo ASR on Aurora4
speech databases(work in progress)
11
WP1 Workplan
  • selection of suitable benchmark databases (m6)
  • Completion of LASR baseline experimentation of
    Spectral Subtraction (Wiener SNR dependent) (m12)
  • Discriminative VAD (m16)
  • Spectral Attenuation (Ephraim-Malah SA SNR
    dependent) (m18)
  • Noise estimation and reduction for non-stationary
    noises (m24)

12
Advances in WP2
  • Paris Meeting 11 febr. 2005

www.loquendo.com
13
WP2 User RobustnessT2.2 Speaker Adaptation
  • Acoustic model adaptation
  • Loquendo ASR is based on Hybrid HMM-NN
  • Hybrid HMM-NN is an alternative to HMM modeling
    that exploits the discriminative training of MLP
    to estimate the acoustic units likelihood it is
    also very efficient for open vocabularies
  • Differently from HMM, not much has been done in
    the literature for the adaptation of NN
  • State-of-art NN adaptation methods
  • The Linear Input Network (LIN) method has been
    proposed for speaker adaptation with promising
    results Neto 1996 Mana 2002
  • The principle of LIN adaptation is to learn
    through error back-propagation the parameters of
    a linear input space transformation
  • The speaker independent acoustic model (MLP) is
    kept fixed
  • Innovative NN adaptation methods
  • Other innovative techniques for NN adaptation
    will be proposed and experimented, including
    regularization techniques and rotations of NN
    hidden units activations

14
LOQUENDO Activity in the first year
  • The first activity has been the selection of
    suitable benchmark databases WSJ0 Adaptation
    component and WSJ1 Spoke-3 component
  • The second activity has been the set up of
    experimental baselines for these databases, with
    standard LASR and without adaptation
  • In the meantime, LIN adaptation method has been
    implemented and experimentations on the
    benchmarks are under way and will be presented at
    M12

15
Speech Databases for Speaker Adaptation
  • WSJ0 (standard ARPA, 1993, LDC, 1000)
  • Large vocabulary (5K words) continuous speech
    database
  • Test Set 8 speakers, 40 utterances, read
    speech, bigram LM
  • Adaptation set the same 8 speakers, 40
    utterances each
  • WSJ1 (1994,LDC, 1500)
  • Similar to WSJ0, same vocabulary and LM
  • SPOKE-3 standard case study of adaptation to
    non-native speakers
  • 10 speakers, 40 adaptation utterances, 40 test
    utterances
  • Hiwire Non-Native Speaker database
  • Collected within the project
  • 80 speakers, each reads 100 utterances

16
WSJ0 baseline
  • WSJ0 SI Test Set is made up by 8 speakers and 40
    sentences for each speaker (two microphones WV1
    Sennheiser WV2 others)
  • Vocabulary 5K words, with a standard bigram LM
  • The Adaptation component of WSJ0 is made up by
    the same 8 speakers of SI test, with 40
    adaptation sentences for each of them
  • Only the component of adaptation and test set
    with the coherent microphone (Sennheiser -WV1)
    has been employed

17
WSJ1 SPOKE-3 baseline
  • Spoke-3 is the standard WSJ1 case study to
    evaluate adaptation to non-native speakers
  • There are 10 non-native speakers
  • For each of them there are 40 adaptation
    sentences and 40 test sentences
  • Vocabulary is 5K words, with standard bigram LM
  • Standard LASR for US-english has been used

18
Workplan
  • Selection of suitable benchmark databases (m6)
  • Baseline set-up for the selected databases (m8)
  • LIN adaptation method implemented and
    experimented on the benchmarks (m12)
  • Regularization methods implemented and
    experimented on the benchmarks (m12)
  • Innovative NN adaptation methods for acoustic
    modeling (m24)
Write a Comment
User Comments (0)
About PowerShow.com