Recent Progress on Automatic Phoneme Alignment - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Recent Progress on Automatic Phoneme Alignment

Description:

Spectrogram. Context. Phonetic Transcription. attitude. ae dx ax ... Spectrogram. Boundaries given by HMM Force Alignment. Hypothesis Boundaries as Input to SVM ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 18
Provided by: iisSin
Category:

less

Transcript and Presenter's Notes

Title: Recent Progress on Automatic Phoneme Alignment


1
Recent Progress on Automatic Phoneme Alignment
  • Hung-Yi Lo and Hsin-Min Wang

2
Outline
  • A Minimum Boundary Error Framework for Automatic
    Phonetic Segmentation (ICSLP06, ISCSLP06)
  • Phonetic Boundary Refinement Using Support Vector
    Machine (submitted to ICASSP06)
  • Introduction
  • Reduced support vector machine
  • Phone-transition-dependent SVM classifier
  • Experimental results
  • Progress on Manually Corpus Labeling

3
Automatic Corpus Segmentation Problem
4
Boundary Refinement Using SVM
5
Introduction to Support Vector Machine
6
Support Vector Machines Formulation
  • Solve the quadratic program for some

min
(QP)
s. t.
,
7
Reduced Support Vector Machine
  • Overcoming computational and storage difficulties
    in nonlinear SVM by using a rectangular kernel.
  • Choose a small random sample
    of .
  • Replace by
    in nonlinear SVM.
  • Only need to compute and store
    numbers for the rectangular kernel.
  • Computational complexity reduced from
    to .
  • The nonlinear classifier is defined by the
    optimal solution by solving a
    quadratic programming problem.

8
Feature Extraction in Refinement Stage
  • Each frame represented by a 45-dim vector
  • 39 MFCC-based coefficients
  • Zero crossing rate
  • Bisector frequency
  • Burst degree
  • Spectral entropy
  • General weighted entropy
  • Subband energy
  • Two features were extracted for each hypothesis
    boundary
  • Symmetrical Kullback-Leibler distance
  • Spectral feature transition rate
  • Each hypothesized boundary was represented by the
    feature vectors of the left and right frames next
    to it and the two boundary feature, which is a
    92-dim augmented vector.

9
Phone-Transition-Dependent SVM Classifier
10
Phone Transition Clustering
  • For each specific phone transition case, we
    gather all augmented vectors associated with the
    human-labeled phone boundaries, and compute the
    mean vector.
  • For each one of the three phone transition
    classes, sonorant to non-sonorant, sonorant to
    sonorant, non-sonorant to non-sonorant, apply
    K-means algorithm to cluster the phone
    transitions according to their mean vectors.
  • Note that only the phone transitions with enough
    instances are considered in this step.
  • The number of clusters is determined according to
    the cross-validation accuracy that the resulting
    SVM classifiers achieve in the training data.
  • We assign the phone transitions, which are
    ignored in Step 2), to the nearest clusters
    according to the distances between their mean
    vectors and the cluster centers.

11
Discriminative Feature Extraction
  • For each partition subset, two discriminative
    features, namely discriminative weighted entropy
    and discriminative subband energy are extracted
    to replace the general weighted entropy and
    subband energy.
  • Discriminative weighted entropy
  • where is a weight vector and is
    the element of normalized power spectrum. The
    weight vectors of each partition subset is
    trained by linear SVM, to maximize the variation
    of the weighted entropy feature close to the true
    boundary.

12
Discriminative Feature Extraction
  • Discriminative subband energy
  • where is
    pre-defined subband energy and the
  • weight score defined as
  • where and are the mean and
    standard deviation of the j-th subband energy for
    the training samples of the left or negative
    class.

13
Boundary Recognition Using SVM
  • A RSVM classifier trained for each phone
    transition subset.
  • Augmented vector associated with the true
    boundaries are the positive training samples.
  • Randomly selected augmented vector at least 20ms
    away from the true boundary as the negative
    training example.
  • Gaussian kernel with weighted Euclidean distance
    is applied.
  • In the testing phase, the augmented vector around
    the hypothesized boundary are examined by the
    classifier according to which the phone
    transition belong.
  • The position with the maximum classifier output
    is recognized as the refined boundary.

14
Experimental Setup
  • TIMIT corpus (dialect sentences are excluded)
  • Training set 3696 utterances
  • Testing set 1312 utterances
  • Context independent HMMs used for preliminary
    alignment
  • 50 phone models (left-to-right topology)
  • 3 states for each HMM
  • 5265 mixtures totally
  • Frame window size and shift are 20 ms and 5 ms
    respectively.
  • 12 MFCCs and log energy, and their first and
    second differences
  • CMS and CVN are applied.

15
Experimental Setup
  • Phone transition cluster is determine by
    cross-validation on the TIMIT training data.
  • 20 in the sonorant to non-sonorant class
  • 16 in the sonorant to sonorant class
  • 10 in the non-sonorant to non-sonorant class
  • In the refinement phase, 5 hypothesized
    boundaries extracted every 5 ms around the
    initial boundary within 10 ms will be examined
    by SVM.

16
Experimental Results
17
Thank you
Write a Comment
User Comments (0)
About PowerShow.com