Gaussian Mixture Language Models for Speech Recognition - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Gaussian Mixture Language Models for Speech Recognition

Description:

Probability density for history y given the word w. Probability of word w given history y ... Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alex Acero. Introduction ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 16
Provided by: XUAN87
Category:

less

Transcript and Presenter's Notes

Title: Gaussian Mixture Language Models for Speech Recognition


1
Gaussian Mixture Language Models for Speech
Recognition
  • Mohamed Afify, Olivier Siohan and Ruhi Sarikaya

2
Introduction
  • Two issues for n-gram
  • Generalizability adaptability
  • Generalizability
  • Word class / parsing
  • Measure similarity in the continuous space
  • Adaptability
  • Larger parameter numbers for LM
  • Use continuous space to reduce parameter numbers

3
Approach
  • Word
  • Word vector of dimensions
  • New word vector of dimensions
  • History concatenation of words
  • History vector N-1 words, dimensions
  • New History vector

M
M
N-1

M
L
4
Approach (cont.)
  • Probability density for history y given the word
    w
  • Probability of word w given history y
  • Smoothed n-gram or smoothed clustered n-gram

or
exponents can be used to control the dynamic
ranges of n-gram and Gaussian mixture
probabilities
5
Implementation
  • Word co-occurrence matrix E
  • Word i follows word j
  • SVD, 100 dimensions
  • To create a trigram
  • Two words are stacked to form a 200-d vector
  • LDA MLLT
  • Reduce dimensionality to 50
  • GMM Training

6
Experimental results
  • 5-best rescoring

7
A discriminative training framework using n-best
speech recognition transcriptions and scores for
spoken utterance classification
  • Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alex
    Acero

8
Introduction
  • Conventionally, a two-phase approaches is adapted
    for SUC (spoken utterance classification) task
  • ASR transcription
  • Semantic classification
  • It has been reported that reduction in WER (word
    error rate) do not necessarily translate into CER
    (classification error rate)
  • A novel discriminative training framework for
    learning the language and classification model
    is proposed

9
DT framework Using the N-best Lists
  • As long as enough words are recognized to trigger
    the correct salient phrase, the correct meaning
    is assigned to the utterance
  • Using ME Classifier
  • Joint association score

10
DT framework Using the N-best Lists (cont.)
  • The most likely to yield the correct class is
    first extracted based on joint association score
    from N-best list
  • Assign remaining sentences in the N-best list
  • Assignment of sentences in the N-best list to
    classes is an effective mechanism for
    discriminating the sentence in the N-best list
    that is most likely to yield the correct class
    from those that more likely to yield other wrong
    classes

11
DT framework Using the N-best Lists (cont.)
  • Discriminant function loss function
  • Approximation loss

12
DT framework Using the N-best Lists (cont.)
  • Assignment of class

?
13
DT framework Using the N-best Lists (cont.)
  • DT of LM parameters
  • DT of classifier parameters

14
Experimental Results

15
Conclusions
  • A new discriminative training framework for
    spoken utterance classification was proposed
  • The use of N-best transcription is motivated by
    the fact the same class is often associated with
    many variants of spoken utterances
Write a Comment
User Comments (0)
About PowerShow.com