Title: Advances in WP2
1Advances in WP2
- Nancy Meeting 6-7 July 2006
www.loquendo.com
2Recent Work on NN Adaptation in WP2
- State of the art LIN adaptation method
implemented and experimented on the benchmarks
(m12) - Innovative LHN adaptation method implemented and
experimented on the benchmarks (m21) - Experimental results on benchmark corpora and
Hiwire database with LIN and LHN (m21) - Further advances on new adaptation methods (m24)
3LIN Adaptation
Acoustic phonetic Units
Emission Probabilities
.
Output layer
Speaker Independent MLP SI-MLP
.
2nd hidden layer
1st hidden layer
.
Input layer
Speech Signal parameters
4LHN Adaptation
Acoustic phonetic Units
Emission Probabilities
.
Output layer
Speaker Independent MLP SI-MLP
.
2nd hidden layer
1st hidden layer
.
Input layer
Speech Signal parameters
5Results Summary (W.E.R.)
Test set baseline LIN adapted E.R. LHN adapted E.R
WSJ0 16kHz bigr LM 10.5 9.4 10.5 8.4 20.0
WSJ1 Spoke-3 16kHz bigr LM 54.2 46.5 14.2 30.6 43.5
HIWIRE 8kHz 11.6 7.8 32.1 7.2 37.9
6Papers presented
- Roberto Gemello, Franco Mana, Stefano Scanzio,
Pietro Laface, Renato De Mori, Adaptation of
Hybrid ANN/HMM models using hidden linear
transformations and conservative training, Proc.
of Icassp 2006, Toulouse, France, May 2006 - Dario Albesano, Roberto Gemello, Pietro Laface,
Franco Mana, Stefano Scanzio, Adaptation of
Artificial Neural Networks Avoiding Catastrophic
Forgetting, Proc. of IJCNN 2006, Vancouver,
Canada, July 2006
7The Forgetting problem in ANN Adaptation
- It is well known, in connectionist learning, that
acquiring new information in the adaptation
process can damage previously learned information
(Catastrophic Forgetting) - This effect must be taken into account when
adapting an ANN with limited amount of data,
which do not include enough samples for all the
classes. - The absent classes may be forgotten during
adaptation as the discriminative training (Error
Backpropagation) assigns always zero targets to
absent classes
8Forgetting in ANN for ASR
- While Adapting ASR ANN/HMM model, this problem
can arise when the adaptation set does not
contain examples for some phonemes, due to the
limited amount of adaptation data or the limited
vocabulary - The ANN training is discriminative, contrary to
that of GMM-HMMs, and absent phonemes will be
penalized by assigning to them a zero target
during the adaptation - That induces in the ANN a forgetting of the
capability to classify the absent phonemes. Thus,
while the HMM models for phonemes with no
observations remain un-adapted, the ANN output
units corresponding to phonemes with no
observations loose their characterization, rather
than staying not adapted
9Example of Forgetting
Adaptation examples only of E, U, O (e.g. from
words uno, due, tre) no examples for the other
vowels (A, I, ? ) The classes with examples adapt
themselves, but tend to invade the classes with
no examples, that are partially forgotten
I
E
F2 (kHz)
A
e
U
O
F1 (kHz)
10Conservative Training
- We have introduced conservative training to
avoid the forgetting of absent phonemes - The idea is to avoid zero target for the absent
phonemes, using for them the output of the
Original NN as target - Let be FP the set of phonemes present in the
adaptation set and FA the set of absent ones. The
target are assigned according to the following
equations
11Conservative Training target assignment policy
P2 is the class corresponding to the correct
phoneme Px class in the adaptation set
Ax absent class
12Conservative Training
- In this way, the phonemes that are absent in the
adaptation set are represented by the response
given by the Original NN - Thus, the absent phonemes are not absorbed by
the neighboring present phonemes - The results of adaptation with conservative
training are - Comparable performances on target environment
- Preservation of performances on generalist
environment - Great improvement of performances in speaker
adaptation, when only few sentences are available
13Adaptation tasks
- Application data adaptation Directory Assistance
- 9325 Italian city names
- 53713 training 3917 test utterances
- Vocabulary adaptation Command words
- 30 command words
- 6189 training 3094 test utterances
- Channel-Environment adaptation Aurora-3
- 2951 training 654 test utterances
14Adaptation Results on different tasks (WER)
Adaptation Task Adaptation Method Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1
No adaptation 14.6 3.8 24.0
LIN 11.2 3.4 11.0
LIN CT 12.4 3.4 15.3
LHN 9.6 2.1 9.8
LHN CT 10.1 2.3 10.4
15Mitigation of Catastrophic Forgetting using
Conservative Training
Tests using adapted models on Italian continuous
speech ( WER)
Models Adapted on Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1
Adaptation Method LIN 36.3 42.7 108.6
Adaptation Method LIN CT 36.5 35.2 42.1
Adaptation Method LHN 40.6 63.7 152.1
Adaptation Method LHN CT 40.7 45.3 44.2
Adaptation Method No Adaptation 29.3 29.3 29.3
16Conclusions
- The new LHN adaptation method, developed within
the project, outperforms standard LIN adaptation - In adaptation tasks with missing classes,
Conservative Training reduces the catastrophic
forgetting effect, preserving the performance on
another generic task
17Workplan
- Selection of suitable benchmark databases (m6)
- Baseline set-up for the selected databases (m8)
- LIN adaptation method implemented and
experimented on the benchmarks (m12) - Experimental results on Hiwire database with
LIN (m18) - Innovative NN adaptation methods and algorithms
for acoustic modeling and experimental results
(m21) - Further advances on new adaptation methods (m24)
- Unsupervised Adaptation algorithms and
experimentation (m33)