MultiPass Pronunciation Adaptation - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

MultiPass Pronunciation Adaptation

Description:

Forced alignment of utterance to best dictionary pron. ... http://silicon.speechworks.com/cgi-bin/wiki.pl?PronLearn. Written in Python ... – PowerPoint PPT presentation

Number of Views:278
Avg rating:3.0/5.0
Slides: 22
Provided by: matthie
Category:

less

Transcript and Presenter's Notes

Title: MultiPass Pronunciation Adaptation


1
Multi-Pass Pronunciation Adaptation
  • Nathan Bodenstab, Summer Intern 2006

2
The Problem
  • Word pronunciations (prons) in a lexicon can be
  • Incorrect
  • Consistently mispronounced by speakers
    (especially names)
  • Using transcribed acoustic data, we want to
    correct these prons and increase recognition
    accuracy, ie. solve
  • X acoustic data
  • A lexicon entry / canonical pron (language
    model)
  • Bi ith pron candidate
  • Example Stephan Granger Auto Attendant, Nuance
    phonemes (similar to CPA)
  • Lexicon entries (canonical prons)
  • s t E i f v _at_ I n
  • g r e n dZ _at_r
  • Pros from acoustic data
  • s t _at_ I f A n
  • g r A o n dZ i e _at_r

May be multi-modal
3
Prior Work
  • Dragons MREC pron-guessing (Drew Lowry) and
    dictionary checker (Paul Vozila).
  • Learns graphoneme n-grams (grapheme / phoneme
    pairs) using an HMM. Passes acoustic data
    through the n-best results to find best pron(s).
  • Blue Nuance Autopron (Francoise Beaufays, Ananth
    Sankar, Mitch Weintraub)
  • Forced alignment of utterance to best dictionary
    pron. Finds worst phoneme match and replaces
    with alternative phonemes. Passes acoustic data
    through resulting prons linguistic prior to
    find best pron(s).
  • Speechworks 6.5 LEARN (Mark Fanty and Krishna
    Govindarajan)
  • Start with dictionary phone graph (FSM) and
    augment with learned phone variations. Pass
    acoustic data through new phone graph to find
    best pron(s).

4
PronLearn - New Pron Learning Algorithm
  • Summer Project Goals
  • Stand-alone tool to correct sub-optimal prons and
    be compatible with both Quantum and OSR
  • PronLearn Algorithm Outline
  • Input a set of transcribed audio files (ie. 25
    utterances of Stephan Granger)
  • Pass 1
  • Create a weighted FSM of possible prons for the
    utterance
  • Run each audio file through the FSM and record
    its preferred path
  • Pass 2
  • Take the top X phoneme distortions from Pass 1
    and build a new (non-weighted) FSM
  • Re-run each audio file through the new FSM and
    record the preferred path
  • Pass 3 (if recognition results arent clustered
    well)
  • Repeat Pass 2 using new preferred phoneme
    distortions

5
PronLearn Pass 1
  • Pass 1 example Stephen
  • Initialize weighted FSM with canonical pron(s)
  • Add phoneme substitutions, deletions, and
    insertions with learned weights P(new_phone
    canonical_phone)
  • Run utterances through weighted FSM and retrieve
    each preferred path

6
PronLearn Pass 1
  • Substitution probabilities P(new_phone
    canonical_phone) estimated using
    linguist-generated lexicon
  • Align alternate prons of a single word with a
    dynamic programming alignment algorithm (ie.
    spelling correction, shortest edit distance)
  • EPS is used to represent insertions and deletions
  • Example
  • / s t E f _at_ n /
  • / t E v I n / ? (s,EPS) (f,v) and (_at_,I)
  • Phoneme differences between prons are tallied and
    relative frequency counts are used to estimate
    probabilities
  • Adding more context ie. P(new prev,
    canonical, next)
  • Didnt cause a sufficient improvement in accuracy
  • Simplifies hand-crafted estimation to a phone
    similarity confusion matrix when no data is
    available

7
PronLearn Pass 1 FSM Weight
  • We can control the balance between the acoustic
    and the language model contribution
  • Can modify phoneme substitutions, deletions, and
    insertions weights to
  • Favor acoustics P(new_phone canonical_phone)
    1.0
  • Favor canonical pron (LM) P(new_phone
    canonical_phone) 0.0

8
PronLearn Problems
  • Why is one pass not enough?
  • We have a tuning parameter to shift bias between
    favoring the utterance acoustics or favoring the
    canonical phonemes

Favor acoustics Weights all phoneme sequences
equally (used in voice enrollment). Recognized
prons vary widely no clustered group of new prons
9
PronLearn Problems
  • Why is one pass not enough?
  • We have a tuning parameter to shift bias between
    favoring the utterance acoustics or favoring the
    dictionary phonemes

Favor acoustics Weights all phoneme sequences
equally (used in voice enrollment). Recognized
prons vary widely no clustered group of new prons
Favor dictionary Heavy bias towards dictionary
prons clusters new pron results, but does not
allow much deviation
10
PronLearn Problems
  • Why is one pass not enough?
  • We have a tuning parameter to shift bias between
    favoring the utterance acoustics or favoring the
    dictionary phonemes
  • PronLearn solution First learn which phonemes
    substitutions are acoustically popular (Pass 1),
    then limit the number of possible paths through
    the FSM using only these substitutions (Pass 2)

Favor acoustics Weights all phoneme sequences
equally (used in voice enrollment). Recognized
prons vary widely no clustered group of new prons
But we want both!
Favor dictionary Heavy bias towards dictionary
prons clusters new pron results, but does not
allow much deviation
11
PronLearn Pass 2
  • Pass 1 Favor acoustics (low dictionary pron
    bias)
  • Pass 2
  • Extract top X substitutions from Pass 1 prons and
    build unweighted FSM
  • Re-run utterances through new Pass 2 FSM and
    record preferred prons

12
PronLearn Three Pass Examples
Most frequent pron is new
13
PronLearn Pass 3
  • If we want to add at most n new prons, we can
    either take the top n prons from Pass 2, or
  • Pass 3 Build a new FSM with only the dictionary
    prons and the n-best new prons
  • This forces every utterance to choose which of
    the n new prons is its best acoustic
    representation

14
PronLearn Three Pass Examples
15
PronLearn Three Pass Examples
16
Results
  • How much can pron learning help improve
    recognition results? Many Auto Attendant
    recognition errors had one or more of the
    following
  • Heavy signal noise
  • Name alteration (Michael - Mike, Richard -
    Dick)
  • Difficult grammar competition (Teri Thomas vs.
    Kerry Thomas)
  • Oracle pron learning accuracy is difficult
    (impossible) to know, but perfect prons will
    obviously not solve all of our problems

17
Results
  • Auto Attendant Simulation - Phantom
  • Training 3750 utterances (150 names)
  • Testing 3750 utterances 10,000 additional
    grammar names to increase task difficulty

18
Results
  • Auto Attendant Simulation (2) - Phantom
  • Training 3750 utterances (150 names)
  • Testing 3-pass PronLearn with 3750 utterances
    X additional grammar names to increase the task
    difficulty

19
Thanks
20
Results
  • BellSouth Directory Assistance OSR 3.09 (thanks
    to Jean-Philippe)
  • Baseline accuracy was achieved using an OSS
    hand-crafted dictionary that decreased original
    error by 2.0
  • Only learned prons for 240 words from one data set

21
PronLearn Tools
  • Documentation at
  • http//silicon.speechworks.com/cgi-bin/wiki.pl?Pro
    nLearn
  • Written in Python
  • Requires local install of OSR or Phantom. Uses
    acc_test, dicttest, split_gram, FSM tools
  • genProns.py
  • Input (transcription, audio file) pairs
  • Output
  • mergePronCounts.py Parallellize work or
    accumulate counts over time
  • genUserDict.py Threshold pron percentages or
    optimize on a validation set, and output to an
    XML user dictionary

Word Freq Percent InDict Pron leonard
21 0.840 0 l I n _at_r d leonard
3 0.120 1 l E n _at_r d leonard
1 0.040 0 A l w E n _at_r d leonard
0 0.000 1 l E n _at_ d pendergast
23 0.920 1 p E n d _at_r g a s t pendergast
2 0.080 0 E n d _at_r g a s t
Write a Comment
User Comments (0)
About PowerShow.com