Adaptation Techniques in Automatic Speech Recognition

About This Presentation

Title:

Description:

Number of Views:125

Avg rating:3.0/5.0

Slides: 18

Provided by: ohi69

Learn more at: http://web.cse.ohio-state.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adaptation Techniques in Automatic Speech Recognition

1
Adaptation Techniques in Automatic Speech
Recognition

Tor André Myrvoll
Telektronikk 99(2), Issue on Spoken Language
Technology in Telecommunications, 2003.

2
Goal and Objective

Make ASR robust to speaker and environmental
variability.
Model adaptation Automatically adapt a HMM using
limited but representative new data to improve
performance.
Train ASRs for applications w/ insufficient data.

3
What Do We Have/Adapt?

A HMM based ASR trained in the usual manner.
The output probability is parameterized by GMMs.
No improvement when adapting state transition
probabilities and mixture weights.
Difficult to estimate ? robustly.
Mixture means can be adapted optimally and
proven useful.

4
Adaptation Principles

Main Assumption Original model is good enough,
model adaptation cant be re-training!

5
Offline Vs. Online

6
Online Adaptation Using Prior Evolution.

7
MAP Adaptation

HMMs have no sufficient statistics gt cant use
conjugate prior-posterior pairs. Find posterior
via EM.
Find prior empirically (multi-modal, first model
estimated using ML training).

8
EMAP

All phonemes in every context dont occur in
adaptation data Need to store correlations
between variables.
EMAP only considers correlation between mean
vectors under jointly Gaussian assumption.
For large model sizes, share means across models.

9
Transformation Based Model Adaptation

10
Bias, Affine and Nonlinear Transformations

11
MLLR

Apply separate transformations to different
parts of the model (HEAdapt in HTK).

12
SMAP

13
Adaptive Training

14
Speaker Adaptive Training

Assumption There exists a compact model (?c),
which relates to all speaker-dependent model via
an affine transformation T (MLLR). The model and
the transformation are found using EM.

15
Cluster Adaptive Training

Group speakers in training set into clusters. Now
find the cluster closest to the test speaker.
Use Canonical Models

16
Eigenvoices

Similar to Cluster Adaptive Training.
Concatenate means from R speaker dependent
model. Perform PCA on the resulting vector. Store
K ltlt R eigenvoice vectors.
Form a vector of means from the SI model too.
Given a new speaker, the mean is a linear
combination of SI vector and eigenvoice vector.