Phoneme Alignment based on Discriminative Learning - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Phoneme Alignment based on Discriminative Learning

Description:

Phoneme Alignment based on Discriminative Learning – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 27

Provided by: sha7156

Learn more at: https://www.cis.upenn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Phoneme Alignment based on Discriminative Learning

1
Phoneme Alignment based on Discriminative Learning

Shai Shalev-Shwartz
The Hebrew University, Jerusalem
Joint work with
Joseph Keshet, Hebrew University
Yoram Singer, Google
Dan Chazan, IBM

2
The Alignment Problem
Have a test
Text
/hh ae v ey tcl t eh s tcl t/
Phonetic transcription
Waveform
3
The Alignment Problem Setting
acoustic representation
start-time of phoneme pi in x
alignment function
phonetic representation
/hh ae v ey tcl t eh s tcl t/
4
Acoustic Representation
Short-time Fourier Transform
5
Comparing Alignments
e.g.
6
?-insensitive Cost
?-insensitivity region
7
A Discriminative Learning Approach
Training set
Learning Algorithm
Hypotheses class
Alignment function
8
Outline of Solution

Define the hypotheses class - constitutes the
template of our alignment function
Map each possible alignment into vectors in an
abstract vector-space
Devise a projection in the vector-space which
order alignments in accordance to their quality
Derive a simple online learning algorithm
Convert the Online Alg. to a Batch procedure with
some formal guarantees

9
Feature Primitives for Alignment
acoustic and phonetic representation
feature primitive for alignment Assessing the
quality of a suggested alignment
suggested alignment
10
Feature Primitive I
Cumulative spectral change across the boundaries
11
Feature Primitives I
Cumulative spectral change across the boundaries
12
Feature Primitives II
Cumulative confidence in the phoneme sequence
13
Feature Primitive III
Phoneme duration model
14
Feature Primitive IV
Speaking-rate (dynamics)
Spectogram at different rates of articulation
(Pickett, 1980)
15
Feature Functions for Alignment
Mapping all possible alignments into a vector
space
slightly incorrect alignment
correct alignment
grossly incorrect alignment
16
Main Solution Principle
Find a linear projection that ranks alignments
according to their quality
slightly incorrect alignment
correct alignment
grossly incorrect alignment
17
Main Solution Principle (cont.)
example of low confidence projection
slightly incorrect alignment
correct alignment
grossly incorrect alignment
18
Main Solution Principle (cont.)
example of incorrect projection
slightly incorrect alignment
correct alignment
grossly incorrect alignment
19
Online Learning
Online Learning Algorithm
Hypotheses class
Cumulative cost
20
Online Learning

For
Receive an instance
Predict
Receive true alignment and Pay cost
If
Set
Set
Update

21
Converting from Online to Batch

Run online algorithm on the training set and
generate w1,,wM
Small online error ? exists w 2 w1,,wM whose
generalization error is low
(Cesa-bianchi et al.)
Choose w 2 w1,,wM which minimizes the error on
a fresh validation set

22
Algorithmic aspects

Running-time
If the inference, ,
can be performed in polynomial time (e.g. dynamic
programming), then the entire algorithm operates
in polynomial time as well.
Worst case analysis for Online Learning
For any competitor u,
Generalization error
Online-to-batch conversion guarantees that low
online error ? low generalization error

23
Experiments

TIMIT corpus
Phoneme representation
48 phonemes (Lee Hon, 1989)
Acoustic Representation
MFCC??? (ETSI standard)
TIMIT training set
500 utterances for training a frame classifier
3096 utterances for learning alignment function
100 utterances used for validation

24
Alternative Approaches

Brugnara, Falavigna Omologo, Automatic
segmentation and labeling of speech based on HMM,
1993.
Hosom, Automatic phoneme alignment on
acoustic-phonetic modeling, 2002.
Toledano, Gomez Grande, Automatic Phoneme
Alignment, 2003.

25
Results
Training size Test set t lt 10 ms t lt 20 ms t lt 30 ms t lt 40 ms
Discrim. Algo. 650 or 3696 192 core 79.7 92.1 96.2 98.1
Brugnara et al 3696 192 core 75.3 88.9 94.4 97.1
Discrim. Algo. 650 or 2336 1344 entire 80.0 92.3 96.4 98.2
Brugnara et al 2336 1344 entire 74.6 88.8 94.1 96.8
Brugnara, Falavigna and Omologo, Automatic
segmentation and labling of speech based on
Hidden Markov Models, Speech Comm., 12 (1993)
357-370.
26
Current and Future Work