Adaptation Techniques in Automatic Speech Recognition - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Adaptation Techniques in Automatic Speech Recognition

Description:

Adaptation Techniques in Automatic Speech Recognition Tor Andr Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications, 2003. – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 18
Provided by: ohi69
Category:

less

Transcript and Presenter's Notes

Title: Adaptation Techniques in Automatic Speech Recognition


1
Adaptation Techniques in Automatic Speech
Recognition
  • Tor André Myrvoll
  • Telektronikk 99(2), Issue on Spoken Language
    Technology in Telecommunications, 2003.

2
Goal and Objective
  • Make ASR robust to speaker and environmental
    variability.
  • Model adaptation Automatically adapt a HMM using
    limited but representative new data to improve
    performance.
  • Train ASRs for applications w/ insufficient data.

3
What Do We Have/Adapt?
  • A HMM based ASR trained in the usual manner.
  • The output probability is parameterized by GMMs.
  • No improvement when adapting state transition
    probabilities and mixture weights.
  • Difficult to estimate ? robustly.
  • Mixture means can be adapted optimally and
    proven useful.

4
Adaptation Principles
  • Main Assumption Original model is good enough,
    model adaptation cant be re-training!

5
Offline Vs. Online
  • If possible offline (performance uncompromised by
    computational reasons).
  • Decode the adaptation speech data based on
    current model.
  • Use this to estimate the speaker-dependent
    models statistics.

6
Online Adaptation Using Prior Evolution.
  • Present posterior is the next prior.

7
MAP Adaptation
  • HMMs have no sufficient statistics gt cant use
    conjugate prior-posterior pairs. Find posterior
    via EM.
  • Find prior empirically (multi-modal, first model
    estimated using ML training).

8
EMAP
  • All phonemes in every context dont occur in
    adaptation data Need to store correlations
    between variables.
  • EMAP only considers correlation between mean
    vectors under jointly Gaussian assumption.
  • For large model sizes, share means across models.

9
Transformation Based Model Adaptation
  • Estimate a transform T parameterized by ?.
  • ML
  • MAP

10
Bias, Affine and Nonlinear Transformations
  • ML estimation of bias.
  • Affine transformation.
  • Nonlinear transformation (? may be a neural
    network).

11
MLLR
  • Apply separate transformations to different
    parts of the model (HEAdapt in HTK).

12
SMAP
  • Model the mismatch between the SI model (x) and
    the test environment.
  • No mismatch
  • Mismatch
  • ? and ? estimated by usual ML methods on
    adaptation data.

13
Adaptive Training
  • Gender dependent model selection
  • VTLN (in HTK using WARPFREQ)

14
Speaker Adaptive Training
  • Assumption There exists a compact model (?c),
    which relates to all speaker-dependent model via
    an affine transformation T (MLLR). The model and
    the transformation are found using EM.

15
Cluster Adaptive Training
  • Group speakers in training set into clusters. Now
    find the cluster closest to the test speaker.
  • Use Canonical Models

16
Eigenvoices
  • Similar to Cluster Adaptive Training.
  • Concatenate means from R speaker dependent
    model. Perform PCA on the resulting vector. Store
    K ltlt R eigenvoice vectors.
  • Form a vector of means from the SI model too.
  • Given a new speaker, the mean is a linear
    combination of SI vector and eigenvoice vector.

17
Summary
  • 2 major approaches MAP (EMAP) and MLLR.
  • MAP needs more data (use of a simple prior) than
    MLLR. MAP --gt SD model.
  • Adaptive training is gaining popularity.
  • For mobile applications, complexity and memory
    are major concerns.
Write a Comment
User Comments (0)
About PowerShow.com