Phoneme Alignment based on Discriminative Learning - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Phoneme Alignment based on Discriminative Learning

Description:

Phoneme Alignment based on Discriminative Learning – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 27
Provided by: sha7156
Category:

less

Transcript and Presenter's Notes

Title: Phoneme Alignment based on Discriminative Learning


1
Phoneme Alignment based on Discriminative Learning
  • Shai Shalev-Shwartz
  • The Hebrew University, Jerusalem
  • Joint work with
  • Joseph Keshet, Hebrew University
  • Yoram Singer, Google
  • Dan Chazan, IBM

2
The Alignment Problem
Have a test
Text
/hh ae v ey tcl t eh s tcl t/
Phonetic transcription
Waveform
3
The Alignment Problem Setting
acoustic representation
start-time of phoneme pi in x
alignment function
phonetic representation
/hh ae v ey tcl t eh s tcl t/
4
Acoustic Representation
Short-time Fourier Transform
5
Comparing Alignments
e.g.
6
?-insensitive Cost
?-insensitivity region
7
A Discriminative Learning Approach
Training set
Learning Algorithm
Hypotheses class
Alignment function
8
Outline of Solution
  • Define the hypotheses class - constitutes the
    template of our alignment function
  • Map each possible alignment into vectors in an
    abstract vector-space
  • Devise a projection in the vector-space which
    order alignments in accordance to their quality
  • Derive a simple online learning algorithm
  • Convert the Online Alg. to a Batch procedure with
    some formal guarantees

9
Feature Primitives for Alignment
acoustic and phonetic representation
feature primitive for alignment Assessing the
quality of a suggested alignment
suggested alignment
10
Feature Primitive I
Cumulative spectral change across the boundaries
11
Feature Primitives I
Cumulative spectral change across the boundaries
12
Feature Primitives II
Cumulative confidence in the phoneme sequence
13
Feature Primitive III
Phoneme duration model
14
Feature Primitive IV
Speaking-rate (dynamics)
Spectogram at different rates of articulation
(Pickett, 1980)
15
Feature Functions for Alignment
Mapping all possible alignments into a vector
space
slightly incorrect alignment
correct alignment
grossly incorrect alignment
16
Main Solution Principle
Find a linear projection that ranks alignments
according to their quality
slightly incorrect alignment
correct alignment
grossly incorrect alignment
17
Main Solution Principle (cont.)
example of low confidence projection
slightly incorrect alignment
correct alignment
grossly incorrect alignment
18
Main Solution Principle (cont.)
example of incorrect projection
slightly incorrect alignment
correct alignment
grossly incorrect alignment
19
Online Learning
Online Learning Algorithm
Hypotheses class
Cumulative cost
20
Online Learning
  • For
  • Receive an instance
  • Predict
  • Receive true alignment and Pay cost
  • If
  • Set
  • Set
  • Update

21
Converting from Online to Batch
  • Run online algorithm on the training set and
    generate w1,,wM
  • Small online error ? exists w 2 w1,,wM whose
    generalization error is low
  • (Cesa-bianchi et al.)
  • Choose w 2 w1,,wM which minimizes the error on
    a fresh validation set

22
Algorithmic aspects
  • Running-time
  • If the inference, ,
    can be performed in polynomial time (e.g. dynamic
    programming), then the entire algorithm operates
    in polynomial time as well.
  • Worst case analysis for Online Learning
  • For any competitor u,
  • Generalization error
  • Online-to-batch conversion guarantees that low
    online error ? low generalization error

23
Experiments
  • TIMIT corpus
  • Phoneme representation
  • 48 phonemes (Lee Hon, 1989)
  • Acoustic Representation
  • MFCC??? (ETSI standard)
  • TIMIT training set
  • 500 utterances for training a frame classifier
  • 3096 utterances for learning alignment function
  • 100 utterances used for validation

24
Alternative Approaches
  • Brugnara, Falavigna Omologo, Automatic
    segmentation and labeling of speech based on HMM,
    1993.
  • Hosom, Automatic phoneme alignment on
    acoustic-phonetic modeling, 2002.
  • Toledano, Gomez Grande, Automatic Phoneme
    Alignment, 2003.

25
Results
Training size Test set t lt 10 ms t lt 20 ms t lt 30 ms t lt 40 ms
Discrim. Algo. 650 or 3696 192 core 79.7 92.1 96.2 98.1
Brugnara et al 3696 192 core 75.3 88.9 94.4 97.1
Discrim. Algo. 650 or 2336 1344 entire 80.0 92.3 96.4 98.2
Brugnara et al 2336 1344 entire 74.6 88.8 94.1 96.8
Brugnara, Falavigna and Omologo, Automatic
segmentation and labling of speech based on
Hidden Markov Models, Speech Comm., 12 (1993)
357-370.
26
Current and Future Work
  • Discriminative learning methods for
  • Whole phoneme sequence classification
  • 64 (ours) vs. 59 (HMM IDIAP Torch3)
  • Results without normalization of silences etc.
  • Small vocabulary continuous speech recognition
  • Segmentation of utterances to speakers
  • Full online learning setting
  • real-time adaptation to Speaker/environment
    changes
Write a Comment
User Comments (0)
About PowerShow.com