Stefano Scanzio, Pietro Laface - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Stefano Scanzio, Pietro Laface

Description:

Adapting Hybrid ANN/HMM to Speech Variations Stefano Scanzio, Pietro Laface Politecnico di Torino Dario Albesano, Roberto Gemello, and Franco Mana – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 18
Provided by: Piet113
Category:

less

Transcript and Presenter's Notes

Title: Stefano Scanzio, Pietro Laface


1
  • Adapting Hybrid ANN/HMM to Speech Variations
  • Stefano Scanzio, Pietro Laface
  • Politecnico di Torino
  • Dario Albesano, Roberto Gemello, and Franco Mana

2
Acoustic Model Adaptation
  • Adaptation tasks
  • Linear Input Network
  • Linear Hidden Network
  • Catastrophic forgetting
  • Conservative Training
  • Results on several adaptation tasks

3
Acoustic model adaptation
  • Specific speaker
  • Speaking style (spontaneous, regional accents)
  • Audio channel (telephone, cellular, microphone)
  • Environment (car, office, )
  • Specific vocabulary

Voice Application
ASR
Data Log
4
Linear Input Network adaptation
Acoustic phonetic units
Emission Probabilities
.
Speaker/Task Independent MLP
.
Input layer
Speech parameters
5
Linear Hidden Network - LHN
Acoustic phonetic units
Emission Probabilities
.
.
Hidden layer 2
Hidden layer 1
.
Input layer
Speech parameters
6
Catastrophic forgetting
  • Acquiring new information can damage previously
    learned information if the new data that do not
    adequately represent the knowledge included in
    the original training data
  • This effect is evident when adaptation data do
    not contain examples for a subset of the output
    classes.
  • Problem is more severe in ANN framework than in
    the Gaussian Mixture HMM framework

7
Catastrophic forgetting
  • Back-propagation algorithm penalizes classes with
    no adaptation examples setting their target value
    to zero for every adaptation frame
  • Thus, during adaptation, the weights of the ANN
    will be biased
  • to favor the activations of the classes with
    samples in the adaptation set
  • to weaken the other classes.

8
16-classes training
20 x 2 hidden nodes 2 input nodes 16 output
nodes 2500 patterns per class
The adaptation set includes 5000 patterns
belonging only to classes 6 and 7
9
Adaptation of 2 Classes
6
7
10
Conservative Training target assignment policy
Standard target assignment policy
  • 0.00 0.00 1.00 0.00 0.00

P2 is the class corresponding to the current
input frame Px class in the adaptation set
Mx missing class
11
Adaptation of 2 classes
Conservative Training adaptation
Standard adaptation
6
7
12
Adaptation tasks
  • Application data adaptation Directory Assistance
  • 9325 Italian city names
  • 53713 training 3917 test utterances
  • Vocabulary adaptation Command words
  • 30 command words
  • 6189 training 3094 test utterances
  • Channel-Environment adaptation Aurora-3
  • 2951 training 654 test utterances
  • Speaker adaptation WSJ0
  • 8 speakers, 16KHz
  • 40 test 40 train sentences

13
Results on different tasks (WER)
Adaptation Task Adaptation Method Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1
No adaptation 14.6 3.8 24.0
LIN 11.2 3.4 11.0
LIN CT 12.4 3.4 15.3
LHN 9.6 2.1 9.8
LHN CT 10.1 2.3 10.4
14
Mitigation of Catastrophic Forgetting using
Conservative Training
Tests using adapted models on Italian continuous
speech ( WER)
Models Adapted on Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1
Adaptation Method LIN 36.3 42.7 108.6
Adaptation Method LIN CT 36.5 35.2 42.1
Adaptation Method LHN 40.6 63.7 152.1
Adaptation Method LHN CT 40.7 45.3 44.2
Adaptation Method No Adaptation 29.3 29.3 29.3
15
Networks used in Speaker Adaptation Task
  • STD (Standard)
  • 2 hidden layer hybrid MLP-HMM model
  • 273 input features (39 parameters and 7 context
    frames)
  • IMP (Improved)
  • Uses a wider input window spanning a time context
    of 25 frames
  • Includes an additional hidden layer

16
Results on WSJ0 Speaker Adaptation Task
Net type Adaptation method Trigram LM
STD No adaptation 8.4
STD LIN 7.9
STD LINCT 7.1
STD LHNCT 6.6
STD LINLHNCT 6.3
IMP No adaptation 6.5
IMP LHNCT 5.6
IMP LINLHNCT 5.0
17
Conclusions
  • LHN adaptation outperforms LIN adaptation
  • Linear transformations at different levels
    produce different positive effects
  • LINLHN performs better than LHN
  • In adaptation tasks with missing classes,
    Conservative Training
  • reduces the catastrophic forgetting effect,
    preserving the performance on another generic
    task
  • improve the performance in speaker adaptation
    with few available sentences

18
Weight merging
19
Conservative Training (CT)
  • For each observation frame
  • 1) Set the target value for each class that has
    no (few) adaptation data to its posterior
    probability computed by the original network
  • 2) Set to zero the target value for a class that
    has adaptation data, but does not correspond to
    the input frame
  • 3) Set the target value for the class
    corresponding to the input frame to 1 minus the
    sum of the posterior probabilities assigned
    according to rule 1)

20
Conclusions on LHN
  • LHN outperform LIN
  • Linear transformations at different levels
    produce different positive effects
  • LINLHN performs better than LHN
  • For continuous speech, the wide-input IMP network
    is better than the STD one
Write a Comment
User Comments (0)
About PowerShow.com