My Slides

About This Presentation

Title:

My Slides

Description:

Difficult to do because they have never been compared head ... Sigmoid Model in Discriminant Dimension. SVM = Regularized Nonlinear Discriminant. SVM Extracts a ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 18

Provided by: peopleC

Learn more at: http://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: My Slides

1
My Slides

Support vector machines (brief intro)
WS04 What we accomplished
WS04 Organizational lessons
AVICAR video corpus current status

2
Support Vector Machinesas (sort of) compared
toNeural Networks
Difficult to do because they have never been
compared head-to-head on any speech task!
3
SVM Regularized Nonlinear Discriminant
Kernel Transform to Infinite- Dimensional Hilbert
Space
The only way in which SVM differs from
RBF-NN THE TRAINING CRITERION SVM Discriminant
Dimension c argmin( training_error(c)
l/width(margin(c)) )
SVM Extracts a Discriminant Dimension
(Bourlard/Morgan Hybrid) Niyogi Burges, 2002
Posterior PDF Sigmoid Model in Discriminant
Dimension
OR
(BDFK Tandem) Borys Hasegawa-Johnson, 2005
Likelihood Mixture Gaussian in Discriminant
Dimension
4
Binary Classifier sign( Nonlinear
Discriminant )
5
Advantages of SVMs w.r.t. NNs

Accuracy
SVM generalizes much better from small training
data sets ( training tokens gt 6X observation
vector size )
As training data size increases, accuracy of NN
and SVM converge
Theoretically, and in some practical experiments
too
Like 3-layer-MLP, RBF-SVM is a universal
approximator
Fast training nearly quadratic optimality
criterion

6
Disadvantages of SVMs w.r.t. NNs

No way to train with very large training set
Complexity O(N2) either fast or impossible
Computational complexity during test
Solution Burges Reduced Set Method (extra
training step only available right now in
matlab)
Accuracy unless you optimize the
hyper-parameters, accuracy is good but not great
Exhaustive hyper-training is very slow
Can get good accuracy but not great accuracy with
the theoretically correct hyper-parameters

7
Disadvantages of SVMs w.r.t. NNs

The real problem We need phonetically labeled
training data
Embedded re-estimation experiment
pre-trained SVMs used as HMM input (tandem
system)
RBF weights re-estimated, together with HMM
params, in order to maximize likelihood of the
training data
Result Training Data Likelihood, WRA increased
Result Test Data WRA decreased

8
WS04
9
WS04 SVM/DBN hybrid recognizer
Word
A
LIKE
Canonical Form

Tongue closed
Tongue Mid
Tongue front
Tongue open

Surface Form
Semi-closed
Tongue open
Tongue front
Tongue Front

Manner
Glide
Front
Vowel
Place
Palatal

SVM Outputs
p( gPGR(x) palatal glide release)
p( gGR(x) glide release )
x Multi-Frame Observation including Spectrum, Fo
rmants, Auditory Model

10
WS04 Organizational LessonsWhat Worked

Innovative experiments, made possible by people
who really wanted to be doing what they were
doing
Result Published ideas were interesting to many
people
Parallel SVM classification experiments allowed
us to test many different SVM definitions
Result Classification errors mostly below 20 by
end of WS
Parallel recognizer test experiments (DBN/SVM was
one, MaxEnt-based lattice rescoring was another)
Result both achieved small (nonsignificant) WER
reduction over baseline

11
WS04 Organizational LessonsWhat Didnt Work

Software Bottleneck between the SVMs and the
recognizers Only one tool available to apply an
SVM to every frame in a speech file, and only
person knew how to use it.
Too Many Experimental Variables Should SVMs be
trained using (1) all frames, or (2) only
landmark frames? DBN expects 1. HMM works best
if manner features are 1, place features are 2.
DBN? Impossible to test in six weeks.
Apples Oranges SVM-only classifier outputs in
cases 1, 2 were incomparable gt no test short
of full DBN integration is meaningful.

12
WS04 Organizational LessonsWhat Didnt Work

Unbeatable baseline Goal was to rescore the
output of the SRI recognizer in order to reduce
WER gt to find acoustic information not already
used by the baseline recognizer.
What information is not already used?
Phone-based ANN/HMM hybrid system hard to say.
When an experiment fails why?
Better use open-source baseline ( not state of
the art, but thats OK), construct test systems
in a continuum between baseline and target.

13
AVICAR
14
AVICAR Recording Hardware
System is not permanently installed mounting
requires 10 minutes.
15
AVICAR Data Summary

100 Talkers
5 noise conditions
Engine idling,
35mph, windows closed / windows open
55mph, windows closed / windows open
4 types of utterances
Isolated Digits
Phone numbers
Isolated Letters (e-set articulation test)
TIMIT sentences
Public release 16 schools companies (but I
dont know how many are using it)

16
AVICAR Labeling Recognition

Manual lip segmentation 36 images
Automatic face tracking nearly perfect
Automatic lip tracking not so good
Manual audio segmentation sentence boundaries
Audio Enhancement
Audio Digit WRA 97, 89, 87, 84, 78

17
AVICAR Data Problems

DIVX encoding gt database lt 300G, but
DIVX gt poor edge quality in some images
Amelioration plan re-transfer from tapes in high
quality, huge data size for folks who want it.

Write a Comment

User Comments (0)