LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net

Description:

1024-dimensional principal component 32X32 spectrogram, plot as an image: 1st principal component (not shown) measures total energy of the spectrogram ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 54

Provided by: jhas

Category:

more less

Transcript and Presenter's Notes

Title: LandmarkBased Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Net

1
Landmark-Based Speech RecognitionSpectrogram
Reading,Support Vector Machines,Dynamic
Bayesian Networks,and Phonology

Mark Hasegawa-Johnson
jhasegaw_at_uiuc.edu
University of Illinois at Urbana-Champaign, USA

2
Lecture 5 Generalization Error Support Vector
Machines

Observation Vector Summary Statistic Principal
Components Analysis (PCA)
Risk Minimization
If Posterior Probability is known MAP is optimal
Example Linear Discriminant Analysis (LDA)
When true Posterior is unknown Generalization
Error
VC Dimension, and bounds on Generalization Error
Lagrangian Optimization
Linear Support Vector Machines
The SVM Optimality Metric
Lagrangian Optimization of SVM Metric
Hyper-parameters Over-training
Kernel-Based Support Vector Machines
Kernel-based classification optimization
formulas
Hyperparameters Over-training
The Entire Regularization Path of the SVM
High-Dimensional Linear SVM
Text classification using indicator functions
Speech acoustic classification using redundant
features

3
What is an Observation?

Observation can be
A vector created by vectorizing many
consecutive MFCC or mel-spectra
A vector including MFCC, formants, pitch,
PLP, auditory model features,

4
Normalized Observations
5
Plotting the Observations, Part I Scatter Plots
and Histograms
6
Problem Where is the Information in a
1000-Dimensional Vector?
7
Statistics that Summarize a Training Corpus
8
Summary Statistics Matrix Notation
Examples of y-1
Examples of y1
9
Eigenvectors and Eigenvalues of R
10
Plotting the Observations, Part 2 Principal
Components Analysis
11
What Does PCA Extract from the Spectrogram?
Plot PCAGram

1024-dimensional principal component ? 32X32
spectrogram, plot as an image
1st principal component (not shown) measures
total energy of the spectrogram
2nd principal component E(after landmark)
E(before landmark)
3rd principal component E(at the landmark)
E(surrounding syllables)

12
Minimum-Risk Classifier Design
13
True Risk, Empirical Risk, and Generalization
14
When PDF is Known Maximum A Posteriori (MAP) is
Optimal
15
Another Way to Write the MAP Classifier Test the
Sign of the Log Likelihood Ratio
16
MAP Example Gaussians with Equal Covariance
17
Linear Discriminant Projection of the Data
18
Other Linear Classifiers Empirical Risk
Minimization (Choose v, b to Minimize Remp(v,b))
19
A Serious Problem Over-Training
The same projection, applied to new test data
Minimum-Error projection of training data
20
When the True PDF is Unknown Upper Bounds on
True Risk
21
The VC Dimension of a Hyperplane Classifier
22
Schematic Depiction w Controls the
Expressiveness of the Classifier(and a less
expressive classifier is less prone to overtrain)
23
The SVM An Optimality Criterion
24
Lagrangian Optimization Inequality Constraint

Consider minimizing f(v), subject to the
constraint g(v) 0. Two solution types exist
g(v) 0
g(v)0 curve is tangent to f(v)fmin curve at
vv
g(v) gt 0
v minimizes f(v)

g(v) lt 0
Unconstrained Minimum
g(v) lt 0
v
g(v) 0
v
g(v) gt 0
g(v) gt 0
g(v) 0
Diagram from Osborne, 2004
25
Case 1 gm(v)0
26
Case 2 gm(v)gt0
27
Training an SVM
28
Differentiate the Lagrangian
29
now Simplify the Lagrangian
30
and impose Kuhn-Tucker
31
Three Types of Vectors
Interior Vector a0
Margin Support Vector 0ltaltC
Error aC
Partial Error aC
From Hastie et al., NIPS 2004
32
and finally, Solve the SVM
33
Quadratic Programming
ai2
C
ai1
C
ai
ai2 is off the margin truncate to ai20. ai1
is still a margin candidate solve for it again
in iteration i1.
34
Linear SVM Example
35
Linear SVM Example
36
Choosing the Hyper-Parameter to Avoid
Over-Training(Wang, Presentation at CLSP
workshop WS04)
SVM test corpus error vs. l1/C, classification
of nasal vs. non-nasal vowels.
37
Choosing the Hyper-Parameter to Avoid
Over-Training

Recall that vSm amymxm
Therefore, v lt (C Sm xm)1/2 lt (CM maxxm)1/2
Therefore, width of the margin is constrained to
1/v gt (CM maxxm)-1/2, and therefore, the
SVM is not allowed to make the margin very small
in its quest to fix individual errors
Recommended solution
Normalize xm so that maxxm1 (e.g., using
libsvm)
Set C1/M
If desired, adjust C up or down by a factor of 2,
to see if error rate on independent development
test data will decrease

38
From Linear to Nonlinear SVM
39
Example RBF Classifier
40
An RBF Classification Boundary
41
Two Hyperparameters ? Choosing Hyperparameters is
Much Harder(Hastie, Rosset, Tibshirani, and Zhu,
NIPS 2004)
42
Optimum Value of C Depends on g(Hastie, Rosset,
Tibshirani, and Zhu, NIPS 2004)
From Hastie et al., NIPS 2004
43
SVM is a Regularized Learner (l1/C)
44
SVM Coefficients are a Piece-Wise Linear Function
of l1/C(Hastie, Rosset, Tibshirani, and Zhu,
NIPS 2004)
45
The Entire Regularization Path of the SVM
Algorithm(Hastie, Zhu, Tibshirani and Rosset,
NIPS 2004)

Start with l large enough (C small enough) so
that all training tokens are partial errors
(amC). Compute the solution to the quadratic
programming problem in this case, including
inversion of XTX or XXT.
Reduce l (increase C) until the initial event
occurs two partial error points enter the
margin, i.e., in the QP problem, amC becomes the
unconstrained solution rather than just the
constrained solution. This is the first
breakpoint. The slopes dam/dl change, but only
for the two training vectors the margin all
other training vectors continue to have
amC.Calculate the new values of dam/dl for these
two training vectors.
Iteratively find the next breakpoint. The next
breakpoint occurs when one of the following
occurs
A value of am that was on the margin leaves the
margin, i.e., the piece-wise-linear function
am(l) hits am0 or amC.
One or more interior points enter the margin,
i.e., in the QP problem, am0 becomes the
unconstrained solution rather than just the
constrained solution.
One or more interior points enter the margin,
i.e., in the QP problem, amC becomes the
unconstrained solution rather than just the
constrained solution.

46
One Method for Using SVMPath (WS04, Johns
Hopkins, 2004)

Download SVMPath code from Trevor Hasties web
page
Test several values of g, including values within
a few orders of magnitude from g1/K.
For each candidate value of g, use SVMPath to
find the C-breakpoints. Choose a few dozen
C-breakpoints for further testing, and write out
the corresponding values of am.
Test the SVMs on a separate development test
database for each combination (C,g), find the
development test error. Choose the combination
that gives least development test error.

47
Results, RBF SVM
SVM test corpus error vs. l1/C, classification
of nasal vs. non-nasal vowels.
Wang, WS04 Student Presentation, 2004
48
High-Dimensional Linear SVMs
49
Motivation Project it Yourself

The purpose of a nonlinear SVM
f(x) contains higher-order polynomial terms in
the elements of x.
By combining these higher-order polynomial terms,
SymamK(x,xm) can create a more flexible boundary
than can SymamxTxm.
The flexibility of the boundary does not lead to
generalization error the regularization term
lv2 avoids generalization error.
A different approach
Augment x with higher-order terms, up to a very
large dimension. These terms can include
Polynomial terms, e.g., xixj
N-gram terms, e.g., (xi at time t AND xj at time
t)
Other features suggested by knowledge-based
analysis of the problem
Then apply a linear SVM to the
higher-dimensional problem

50
Example 1 Acoustic Classification of Stop Place
of Articulation

Feature Dimension K483/10ms
MFCCsddd, 25ms window K39/10ms
Spectral shape energy, spectral tilt, and
spectral compactness, once/millisecond K40/10ms
Noise-robust MUSIC-based formant frequencies,
amplitudes, and bandwidths K10/10ms
Acoustic-phonetic parameters (Formant-based
relative spectral measures and time-domain
measures) K42/10ms
Rate-place model of neural response fields in the
cat auditory cortex K352/10ms
Observation concatenation of up to 17 frames,
for a total of K17 X 483 8211 dimensions
Results Accuracy improves as you add more
features, up to 7 frames (one/10ms
3381-dimensional x). Adding more frames didnt
help.
RBF SVM still outperforms linear SVM, but only by
1

51
Example 2 Text Classification

Goal
Utterances were recorded by physical therapy
patients, specifying their physical activity
once/half hour for seven days.
Example utterance I ate breakfast for twenty
minutes, then I walked to school for ten
minutes.
Goal for each time period, determine the type of
physical activity, from among 2000 possible type
categories.
Indicator features
50000 features one per word, in a 50000-word
dictionary
x d1, d2, d3, , d50000 T
di 1 if the ith dictionary word was contained
in the utterance, zero otherwise
X is very sparse most sentences contain only a
few words
Linear SVM is very efficient

52
Example 2 Text Classification

Result
85 classification accuracy
Most incorrect classifications were reasonable to
a human
I played hopskotch with my daughter playing
a game, or light physical exercise?
Some categories were never observed in the
training data, therefore no test data were
assigned to those categories
Conclusion SVM is learning keywords keyword
combinations

53
Summary

Plotting the Data Use PCA, LDA, or any other
discriminant
If PDF is known Use MAP classifier
If PDF unknown Structural Risk Minimization
SVM is a training criterion a particular
upper bound on structural risk of hyperplane
Choosing hyperparameters
Easy for a linear classifier
For a nonlinear classifier use the Complete
Regularization Path algorithm
High-dimensional Linear SVMs human user acts as
an intelligent kernel