ICASSP 2005 Survey Discriminative Training 6 papers - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

ICASSP 2005 Survey Discriminative Training 6 papers

Description:

Context-dependent model design using decision tree ... Bo Liu12, Hui Jiang3, Jian-Lai Zhou1, Ren-Hua Wang2. 1Micorsoft Research Asia ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 44
Provided by: jenwe
Category:

less

Transcript and Presenter's Notes

Title: ICASSP 2005 Survey Discriminative Training 6 papers


1
ICASSP 2005 SurveyDiscriminative Training (6
papers)
  • Presenter Jen-Wei Kuo

2
Outline
  • Adaptation of Precision Matrix Models on Large
    Vocabulary Continuous Speech Recognition
    Cambrige University
  • Discriminative Training of CDHMMs for Maximum
    Relative Separation Margin York University
  • Statistical Performance Analysis of MCE/GPD
    Learning in Gaussian Classifiers and Hidden
    Markov Models BBN
  • Discriminative Training of Acoustic Models
    Applied to Domains with Unreliable Transcripts
    JHU
  • Minimum Classification Error for Large Scale
    Speech Recognition Tasks using Weighted Finite
    State Transducers NTT
  • Discriminative Training based on the Criterion of
    Least Phone Competing Tokens for Large Vocabulary
    Speech Recognition Microsoft

3
Discriminative Training of CDHMMs for Maximum
Relative Separation Margin
  • Chaojun Liu, Hui Jiang, Xinwei Li
  • York University, Canada
  • ICASSP05 - Discriminative Training
  • Presenter Jen-Wei Kuo

4
Reference
  • Large Margin HMMs for Speech Recognition
  • Xinwei Li, Hui Jiang, Chaojun Liu
  • York University, Canada
  • ICASSP05 - Speech and Audio Processing
    Applications session

5
Large Margin Estimation (LME) of HMM
The constrain can not guarantee the existence of
the solution
6
Iterative Localized Optimization
  • Step 1. Based on current model , choose the
    support token satisfying above constrains
    gives the minimum margin.
  • Step 2. Update model by using GPD
  • Step 3. If some convergence conditions are not
    met then go to Step 1.

7
Experimental Results
  • English E-set vocabulary of OGI ISOLET database

8
Experimental Results
9
Experimental Results
10
Large Relative Margin Estimation (LRME) of HMM
11
Large Relative Margin Estimation (LRME) of HMM
12
Experimental Results
  • English E-set vocabulary of OGI ISOLET database
    and Alphabet set

13
Experimental Results
14
Experimental Results
15
Conclusion
  • Main Concept
  • Criterion
  • Maximum Large Margin
  • Maximum Large Relative Margin
  • Support token
  • Utterance which has relatively small positive
    margin

16
Discriminative Training of Acoustic Models
Applied to Domains with Unreliable Transcripts
  • Lambert Mathias
  • Girija Yegnanarayanan, Juergen Fritsch
  • JHU
  • Multimodal Technologies, Inc.
  • ICASSP05 - Discriminative Training
  • Presenter Jen-Wei Kuo

17
Introduction
  • This paper presents a method for the automatic
    generation of transcripts from medical reports.
  • Medical Domain
  • Unlimited amount of speech data available for
    each speaker
  • These speech data have no verbatim transcripts
    but final reports
  • Medical final reports
  • Made by physicians and other healthcare
    professionals
  • Grammatical error corrections
  • Removal of disfluencies and repetitions
  • Addition of nondictated sentence and paragraph
    boundaries
  • Rearranged order of dictated paragraphs
  • Can still be explored as an information source
    for generating training transcripts

18
Introduction
  • Central idea of this paper
  • Step1. Transform the reports to spoken form
    transcripts ( Partially Reliable Transcripts,
    PRT)
  • Step2. Identify reliable regions in the
    transcripts
  • Step3. Apply ML/MMI acoustic training
  • Propose an approach of frame-based filtering for
    lattice-based MMI
  • Step4. The results show that MMI outperforms ML

19
Partially Reliable Transcripts
  • Step1. Normalize the medical reports to a common
    format
  • Step2. Generate a report-specific FSG for all the
    available medical reports
  • Step3. Use the normalized medical reports to
    train a LM
  • Step4. Generate the orthographic transcripts
    using the LM and the best AM
  • Step5. Annotate the orthographic transcripts by
    aligning it against the corresponding
    report-specific FSG
  • Step6. Parse the orthographic transcripts using
    the report-specific FSG with a robust parser that
    allows for INS, DEL and SUB
  • Step7. If the word is an INS, DEL or SUB then
    mark the frames of underlying phone sequence as
    unreliable, or reliable otherwise
  • Step8. Use the reliable segments to retrain the
    AMs
  • Step9. Goto step4.

20
MMI Training with Frame Filtering
  • Approach 1
  • Step1. Mark each are on the MMI training lattices
    as RELIABLE or UNRELIABLE
  • Step2. Counts (num and den) are then accumulated
    only on the RELIABLE arcs
  • Approach 2 (Frame Filtering)
  • Step1. Mark each frame as reliable or
    unreliable
  • Step2. Allow for inclusion of partially reliable
    words in the training

21
Experimental Results
22
Experimental Results
23
Minimum Classification Error for Large Scale
Speech Recognition Tasks using Weighted Finite
State Transducers
  • Erik McDermott and Shigeru Katagiri
  • NTT Communication Science Laboratories
  • ICASSP05 - Discriminative Training
  • Presenter Jen-Wei Kuo

24
Introduction
  • Special features focused in this paper
  • MCE training with Quickprop optimization
  • SOLON WFST-based recognizer (designed by NTT)
  • It uses a time-synchronous beam search strategy
    and has been applied LM with vocabularies of up
    to 1.8 million words
  • Context-dependent model design using decision
    tree
  • Corpus of Spontaneous Japanese (CSJ) lecture
    speech transcription task (about 190 hrs)
  • Name recognition on 22k names
  • Word recognition on 30k words

25
Corpus for Name Recognition
  • Name Recognition (40 hrs from CSJ)
  • 35500 utterances (39 hrs) for training
  • Contain 22320 names ( 16547 family names and 5744
    given names)
  • 6428 utterances for testing
  • Contain OOVs
  • WFST
  • Weight-pushing, Network Optimization
  • 489756 nodes
  • 1349430 arcs

26
WFST Recognizer
  • Four strategies to generate denominator
    statistics for MCE training
  • Triphone-Loop
  • Like free syllable recognition in Mandarin
  • Bigram triphone LM
  • Full-WFST LM Flat Transcripts
  • Full 22k LM (22,320 names in vocabulary)
  • Represent transcription as a WFST which is by
    compositing of the full WFST and the transcribed
    word sequence
  • Lattice-WFST Flat Transcripts
  • The lattice is first generated by MLE-trained
    model
  • Faster than Full-WFST (average 800 arcs each v.s.
    1349430 arcs)
  • Lattice-WFST Rich Transcripts
  • Add all possible fillers into transcription
    grammar

27
Experimental Results
28
Experimental Results
29
Experimental Results
  • Use of Lp norm and N-best incorrect candidates

30
Word Recognition
  • Word Recognition Corpus and Exp. Results
  • 154000 utterances (190 hrs) for training
  • 10 lecture speeches and 130 minutes in total
  • 30k words in vocabulary
  • WFST
  • Trigram LM
  • 6138702 arcs
  • MCE Training
  • Beam search with unigram (about 3-5x RT)
  • 494845 arcs

31
Discriminative Training based on the Criterion of
Least Phone Competing Tokens for Large Vocabulary
Speech Recognition
  • Bo Liu12, Hui Jiang3, Jian-Lai Zhou1, Ren-Hua
    Wang2
  • 1Micorsoft Research Asia
  • 2University of Science and Technology of China
  • 3York University
  • ICASSP05 - Discriminative Training
  • Presenter Jen-Wei Kuo

32
Reference
  • A Dynamic In-Search Discriminative Training
    Approach for Large Vocabulary Speech Recognition
  • Hui Jiang, Olivier Siohan, Frank K. Soong,
    Chin-Hui Lee
  • Bell Labs, Lucent Technologies
  • ICASSP02 Discriminative Training in Speech
    Recognition session

33
Competing Token Collection
  • For each frame t
  • For each active word arc w
  • Perform backtrace to obtain the partial path
  • HMM alignment
  • For each HMM m
  • Calculate the overlap rate
  • If overlap rate lt threshold and Likelihood(m) lt
    Likelihood(Ref)
  • Then m is collected to be a competing token
  • End
  • End
  • End

34
Experimental Results
  • Corpus
  • DARPA Communicator task (Travel Reservation
    Application)

35
Introduction
  • Discriminative Criterion in Phone Level
  • Least Phone Competing Tokens Criterion (LPCT)
  • Given speech segment O and phone a
  • Competing Token (CT)
  • True Token (TT)

36
Off-line Token Collection
  • Discriminative Criterion in Phone Level
  • True Token (TT)
  • Firstly, the forced-alignment is performed.
  • Every segment in the reference is treated as a
    TT.
  • Competing Token (CT)
  • Generate word lattice.
  • At each word arc, phone boundaries are annotated.
  • Choose phone arcs to be CT or not
  • 1. max overlap with same phones in reference gt
    threshold
  • 2. the difference log-likelihood gt threshold
  • 3. add the phone arc (segment and phone id) into
    CT
  • LPCT Token Collection MCE/GPD

37
Least Phone Competing Tokens Criterion (LPCT)
  • Experimental Results
  • Resource Management database

38
Least Phone Competing Tokens Criterion (LPCT)
39
Experimental Results
  • Switchboard database

40
Experimental Results
41
Adaptation of Precision Matrix Models on Large
Vocabulary Continuous Speech Recognition
  • K. C. Sim and M. J. F. Gales
  • University of Cambridge
  • ICASSP05 - Discriminative Training
  • Presenter Jen-Wei Kuo

42
Background for Precision Modeling
  • Problem
  • How to model the correlation in the feature in
    that increasing the dimension
  • Solution
  • Approximate diagonal covariance matrix is
    employed
  • Structured precision matrix approximations ? SPAM
    model
  • R1
  • nd ?STC model
  • dltnltd(d1)/2 ? EMLLT model

43
Research Progress
Write a Comment
User Comments (0)
About PowerShow.com