MultiPass Pronunciation Adaptation - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

MultiPass Pronunciation Adaptation

Description:

Forced alignment of utterance to best dictionary pron. ... http://silicon.speechworks.com/cgi-bin/wiki.pl?PronLearn. Written in Python ... – PowerPoint PPT presentation

Number of Views:278

Avg rating:3.0/5.0

Slides: 22

Provided by: matthie

Category:

more less

Transcript and Presenter's Notes

Title: MultiPass Pronunciation Adaptation

1
Multi-Pass Pronunciation Adaptation

Nathan Bodenstab, Summer Intern 2006

2
The Problem

Word pronunciations (prons) in a lexicon can be
Incorrect
Consistently mispronounced by speakers
(especially names)
Using transcribed acoustic data, we want to
correct these prons and increase recognition
accuracy, ie. solve
X acoustic data
A lexicon entry / canonical pron (language
model)
Bi ith pron candidate
Example Stephan Granger Auto Attendant, Nuance
phonemes (similar to CPA)
Lexicon entries (canonical prons)
s t E i f v _at_ I n
g r e n dZ _at_r
Pros from acoustic data
s t _at_ I f A n
g r A o n dZ i e _at_r

May be multi-modal
3
Prior Work

Dragons MREC pron-guessing (Drew Lowry) and
dictionary checker (Paul Vozila).
Learns graphoneme n-grams (grapheme / phoneme
pairs) using an HMM. Passes acoustic data
through the n-best results to find best pron(s).
Blue Nuance Autopron (Francoise Beaufays, Ananth
Sankar, Mitch Weintraub)
Forced alignment of utterance to best dictionary
pron. Finds worst phoneme match and replaces
with alternative phonemes. Passes acoustic data
through resulting prons linguistic prior to
find best pron(s).
Speechworks 6.5 LEARN (Mark Fanty and Krishna
Govindarajan)
Start with dictionary phone graph (FSM) and
augment with learned phone variations. Pass
acoustic data through new phone graph to find
best pron(s).

4
PronLearn - New Pron Learning Algorithm

Summer Project Goals
Stand-alone tool to correct sub-optimal prons and
be compatible with both Quantum and OSR
PronLearn Algorithm Outline
Input a set of transcribed audio files (ie. 25
utterances of Stephan Granger)
Pass 1
Create a weighted FSM of possible prons for the
utterance
Run each audio file through the FSM and record
its preferred path
Pass 2
Take the top X phoneme distortions from Pass 1
and build a new (non-weighted) FSM
Re-run each audio file through the new FSM and
record the preferred path
Pass 3 (if recognition results arent clustered
well)
Repeat Pass 2 using new preferred phoneme
distortions

5
PronLearn Pass 1

Pass 1 example Stephen
Initialize weighted FSM with canonical pron(s)
Add phoneme substitutions, deletions, and
insertions with learned weights P(new_phone
canonical_phone)
Run utterances through weighted FSM and retrieve
each preferred path

6
PronLearn Pass 1

Substitution probabilities P(new_phone
canonical_phone) estimated using
linguist-generated lexicon
Align alternate prons of a single word with a
dynamic programming alignment algorithm (ie.
spelling correction, shortest edit distance)
EPS is used to represent insertions and deletions
Example
/ s t E f _at_ n /
/ t E v I n / ? (s,EPS) (f,v) and (_at_,I)
Phoneme differences between prons are tallied and
relative frequency counts are used to estimate
probabilities
Adding more context ie. P(new prev,
canonical, next)
Didnt cause a sufficient improvement in accuracy
Simplifies hand-crafted estimation to a phone
similarity confusion matrix when no data is
available

7
PronLearn Pass 1 FSM Weight

We can control the balance between the acoustic
and the language model contribution
Can modify phoneme substitutions, deletions, and
insertions weights to
Favor acoustics P(new_phone canonical_phone)
1.0
Favor canonical pron (LM) P(new_phone
canonical_phone) 0.0

8
PronLearn Problems

Why is one pass not enough?
We have a tuning parameter to shift bias between
favoring the utterance acoustics or favoring the
canonical phonemes

Favor acoustics Weights all phoneme sequences
equally (used in voice enrollment). Recognized
prons vary widely no clustered group of new prons
9
PronLearn Problems

Why is one pass not enough?
We have a tuning parameter to shift bias between
favoring the utterance acoustics or favoring the
dictionary phonemes

Favor acoustics Weights all phoneme sequences
equally (used in voice enrollment). Recognized
prons vary widely no clustered group of new prons
Favor dictionary Heavy bias towards dictionary
prons clusters new pron results, but does not
allow much deviation
10
PronLearn Problems

Why is one pass not enough?
We have a tuning parameter to shift bias between
favoring the utterance acoustics or favoring the
dictionary phonemes
PronLearn solution First learn which phonemes
substitutions are acoustically popular (Pass 1),
then limit the number of possible paths through
the FSM using only these substitutions (Pass 2)

Favor acoustics Weights all phoneme sequences
equally (used in voice enrollment). Recognized
prons vary widely no clustered group of new prons
But we want both!
Favor dictionary Heavy bias towards dictionary
prons clusters new pron results, but does not
allow much deviation
11
PronLearn Pass 2

Pass 1 Favor acoustics (low dictionary pron
bias)

Pass 2
Extract top X substitutions from Pass 1 prons and
build unweighted FSM
Re-run utterances through new Pass 2 FSM and
record preferred prons

12
PronLearn Three Pass Examples
Most frequent pron is new
13
PronLearn Pass 3

If we want to add at most n new prons, we can
either take the top n prons from Pass 2, or
Pass 3 Build a new FSM with only the dictionary
prons and the n-best new prons
This forces every utterance to choose which of
the n new prons is its best acoustic
representation

14
PronLearn Three Pass Examples
15
PronLearn Three Pass Examples
16
Results

How much can pron learning help improve
recognition results? Many Auto Attendant
recognition errors had one or more of the
following
Heavy signal noise
Name alteration (Michael - Mike, Richard -
Dick)
Difficult grammar competition (Teri Thomas vs.
Kerry Thomas)
Oracle pron learning accuracy is difficult
(impossible) to know, but perfect prons will
obviously not solve all of our problems

17
Results

Auto Attendant Simulation - Phantom
Training 3750 utterances (150 names)
Testing 3750 utterances 10,000 additional
grammar names to increase task difficulty

18
Results

Auto Attendant Simulation (2) - Phantom
Training 3750 utterances (150 names)
Testing 3-pass PronLearn with 3750 utterances
X additional grammar names to increase the task
difficulty

19
Thanks
20
Results

BellSouth Directory Assistance OSR 3.09 (thanks
to Jean-Philippe)
Baseline accuracy was achieved using an OSS
hand-crafted dictionary that decreased original
error by 2.0
Only learned prons for 240 words from one data set

21
PronLearn Tools

Documentation at
http//silicon.speechworks.com/cgi-bin/wiki.pl?Pro
nLearn
Written in Python
Requires local install of OSR or Phantom. Uses
acc_test, dicttest, split_gram, FSM tools
genProns.py
Input (transcription, audio file) pairs
Output
mergePronCounts.py Parallellize work or
accumulate counts over time
genUserDict.py Threshold pron percentages or
optimize on a validation set, and output to an
XML user dictionary

Word Freq Percent InDict Pron leonard
21 0.840 0 l I n _at_r d leonard
3 0.120 1 l E n _at_r d leonard
1 0.040 0 A l w E n _at_r d leonard
0 0.000 1 l E n _at_ d pendergast
23 0.920 1 p E n d _at_r g a s t pendergast
2 0.080 0 E n d _at_r g a s t

Write a Comment

User Comments (0)