Title: Advances in WP1 and WP2
1Advances in WP1 and WP2
- Paris Meeting 11 febr. 2005
www.loquendo.com
2Advances in WP1
- Paris Meeting 11 febr. 2005
www.loquendo.com
3WP1 Environment Sensor RobustnessT1.2 Noise
Independence
- Voice Activity Detection
- A Model-based approach using NN (Neural Networks)
to discriminate two classes (noise and voice)
will be explored - NN input could be standard features (Cepstral
coeff., Energy) after noise reduction, in case
complemented by other features (pitch/voicing)
produced by other partners (IRST) - Training set will be multi-style, including
several types of noise conditions and languages - Noise Reduction
- Some noise reduction techniques will be
experimented on the test sets selected as
benchmarks for the project - Spectral Subtraction (standard, Wiener and SNR
dependent) and Spectral Attenuation
(Ephraim-Malah SA standard and SNR dependent) - New techniques for non-stationary noises
4WP1 Speech Databases for Noise Reduction
- Aurora 2 - Connected digits - TIdigits data down
sampled to 8 kHz, filtered with a G712
characteristic and noise artificially added at
several SNRs (20dB, 15dB, 10 dB, 5dB, 0dB, -5dB).
There are three test sets - A same noises as in train subway, babble, car
noise, exhibition hall - B 4 different noises restaurant, street,
airport, train station - C same noises as A but filtered with a different
microphone -
- Aurora 3 - Connected digits recorded in car
environment - Signal collected by hand free (ch1)
and close talk (ch0) microphones. In HIWIRE we
use Italian and Spanish recordings. There are two
test sets - WM ch0 and ch1 recordings used in training and
testing lists - HM ch0 for training and ch1 for testing
- Aurora 4 - Continuous speech 5k vocabulary - It
is WSJ0 5K with added noise of 6 kinds Car,
Babble, Restaurant, Street, Airport, Train
station. It uses the standard Bi-Gram language
modeling.
5Denoising Techniques for baseline evaluations
Spectral Subtraction (SS) operates in the
frequency domain and attempts to compute a
denoised version of the power spectrum. Wiener
spectral subtraction is defined as
6Baseline evaluations of Loquendo ASR on Aurora2
speech databases
7Baseline Performance evaluations
Â
- This test was performed with the Loquendo ASR
with the CLEAN / MULTI_CONDITION models trained
using the Aurora2 training lists. - The test has been done using the A/B/C testing
lists.
Performances in terms of Word Accuracy and Error
Reduction
8Baseline evaluations of Loquendo ASR on Aurora3
speech databases
9Baseline Performance evaluations
Â
- This test was performed with the Loquendo ASR and
the models trained using the Aurora3 training
lists. - The test has been done using the Well Matched
(WM) and High Mismatch (HM) testing lists.
Performances in terms of Word Accuracy and Error
Reduction
10Baseline evaluations of Loquendo ASR on Aurora4
speech databases(work in progress)
11WP1 Workplan
- selection of suitable benchmark databases (m6)
- Completion of LASR baseline experimentation of
Spectral Subtraction (Wiener SNR dependent) (m12) - Discriminative VAD (m16)
- Spectral Attenuation (Ephraim-Malah SA SNR
dependent) (m18) - Noise estimation and reduction for non-stationary
noises (m24)
12Advances in WP2
- Paris Meeting 11 febr. 2005
www.loquendo.com
13WP2 User RobustnessT2.2 Speaker Adaptation
- Acoustic model adaptation
- Loquendo ASR is based on Hybrid HMM-NN
- Hybrid HMM-NN is an alternative to HMM modeling
that exploits the discriminative training of MLP
to estimate the acoustic units likelihood it is
also very efficient for open vocabularies - Differently from HMM, not much has been done in
the literature for the adaptation of NN - State-of-art NN adaptation methods
- The Linear Input Network (LIN) method has been
proposed for speaker adaptation with promising
results Neto 1996 Mana 2002 - The principle of LIN adaptation is to learn
through error back-propagation the parameters of
a linear input space transformation - The speaker independent acoustic model (MLP) is
kept fixed - Innovative NN adaptation methods
- Other innovative techniques for NN adaptation
will be proposed and experimented, including
regularization techniques and rotations of NN
hidden units activations
14LOQUENDO Activity in the first year
- The first activity has been the selection of
suitable benchmark databases WSJ0 Adaptation
component and WSJ1 Spoke-3 component - The second activity has been the set up of
experimental baselines for these databases, with
standard LASR and without adaptation - In the meantime, LIN adaptation method has been
implemented and experimentations on the
benchmarks are under way and will be presented at
M12
15Speech Databases for Speaker Adaptation
- WSJ0 (standard ARPA, 1993, LDC, 1000)
- Large vocabulary (5K words) continuous speech
database - Test Set 8 speakers, 40 utterances, read
speech, bigram LM - Adaptation set the same 8 speakers, 40
utterances each - WSJ1 (1994,LDC, 1500)
- Similar to WSJ0, same vocabulary and LM
- SPOKE-3 standard case study of adaptation to
non-native speakers - 10 speakers, 40 adaptation utterances, 40 test
utterances - Hiwire Non-Native Speaker database
- Collected within the project
- 80 speakers, each reads 100 utterances
16WSJ0 baseline
- WSJ0 SI Test Set is made up by 8 speakers and 40
sentences for each speaker (two microphones WV1
Sennheiser WV2 others) - Vocabulary 5K words, with a standard bigram LM
- The Adaptation component of WSJ0 is made up by
the same 8 speakers of SI test, with 40
adaptation sentences for each of them - Only the component of adaptation and test set
with the coherent microphone (Sennheiser -WV1)
has been employed
17WSJ1 SPOKE-3 baseline
- Spoke-3 is the standard WSJ1 case study to
evaluate adaptation to non-native speakers - There are 10 non-native speakers
- For each of them there are 40 adaptation
sentences and 40 test sentences - Vocabulary is 5K words, with standard bigram LM
- Standard LASR for US-english has been used
18Workplan
- Selection of suitable benchmark databases (m6)
- Baseline set-up for the selected databases (m8)
- LIN adaptation method implemented and
experimented on the benchmarks (m12) - Regularization methods implemented and
experimented on the benchmarks (m12) - Innovative NN adaptation methods for acoustic
modeling (m24)