My Slides - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

My Slides

Description:

Difficult to do because they have never been compared head ... Sigmoid Model in Discriminant Dimension. SVM = Regularized Nonlinear Discriminant. SVM Extracts a ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 18
Provided by: peopleC
Category:
Tags: sigmoid | slides

less

Transcript and Presenter's Notes

Title: My Slides


1
My Slides
  • Support vector machines (brief intro)
  • WS04 What we accomplished
  • WS04 Organizational lessons
  • AVICAR video corpus current status

2
Support Vector Machinesas (sort of) compared
toNeural Networks
Difficult to do because they have never been
compared head-to-head on any speech task!
3
SVM Regularized Nonlinear Discriminant
Kernel Transform to Infinite- Dimensional Hilbert
Space
The only way in which SVM differs from
RBF-NN THE TRAINING CRITERION SVM Discriminant
Dimension c argmin( training_error(c)
l/width(margin(c)) )
SVM Extracts a Discriminant Dimension
(Bourlard/Morgan Hybrid) Niyogi Burges, 2002
Posterior PDF Sigmoid Model in Discriminant
Dimension
OR
(BDFK Tandem) Borys Hasegawa-Johnson, 2005
Likelihood Mixture Gaussian in Discriminant
Dimension
4
Binary Classifier sign( Nonlinear
Discriminant )
5
Advantages of SVMs w.r.t. NNs
  • Accuracy
  • SVM generalizes much better from small training
    data sets ( training tokens gt 6X observation
    vector size )
  • As training data size increases, accuracy of NN
    and SVM converge
  • Theoretically, and in some practical experiments
    too
  • Like 3-layer-MLP, RBF-SVM is a universal
    approximator
  • Fast training nearly quadratic optimality
    criterion

6
Disadvantages of SVMs w.r.t. NNs
  • No way to train with very large training set
  • Complexity O(N2) either fast or impossible
  • Computational complexity during test
  • Solution Burges Reduced Set Method (extra
    training step only available right now in
    matlab)
  • Accuracy unless you optimize the
    hyper-parameters, accuracy is good but not great
  • Exhaustive hyper-training is very slow
  • Can get good accuracy but not great accuracy with
    the theoretically correct hyper-parameters

7
Disadvantages of SVMs w.r.t. NNs
  • The real problem We need phonetically labeled
    training data
  • Embedded re-estimation experiment
  • pre-trained SVMs used as HMM input (tandem
    system)
  • RBF weights re-estimated, together with HMM
    params, in order to maximize likelihood of the
    training data
  • Result Training Data Likelihood, WRA increased
  • Result Test Data WRA decreased

8
WS04
9
WS04 SVM/DBN hybrid recognizer
Word
A
LIKE
Canonical Form

Tongue closed
Tongue Mid
Tongue front
Tongue open

Surface Form
Semi-closed
Tongue open
Tongue front
Tongue Front

Manner
Glide
Front
Vowel
Place
Palatal

SVM Outputs
p( gPGR(x) palatal glide release)
p( gGR(x) glide release )
x Multi-Frame Observation including Spectrum, Fo
rmants, Auditory Model

10
WS04 Organizational LessonsWhat Worked
  • Innovative experiments, made possible by people
    who really wanted to be doing what they were
    doing
  • Result Published ideas were interesting to many
    people
  • Parallel SVM classification experiments allowed
    us to test many different SVM definitions
  • Result Classification errors mostly below 20 by
    end of WS
  • Parallel recognizer test experiments (DBN/SVM was
    one, MaxEnt-based lattice rescoring was another)
  • Result both achieved small (nonsignificant) WER
    reduction over baseline

11
WS04 Organizational LessonsWhat Didnt Work
  • Software Bottleneck between the SVMs and the
    recognizers Only one tool available to apply an
    SVM to every frame in a speech file, and only
    person knew how to use it.
  • Too Many Experimental Variables Should SVMs be
    trained using (1) all frames, or (2) only
    landmark frames? DBN expects 1. HMM works best
    if manner features are 1, place features are 2.
    DBN? Impossible to test in six weeks.
  • Apples Oranges SVM-only classifier outputs in
    cases 1, 2 were incomparable gt no test short
    of full DBN integration is meaningful.

12
WS04 Organizational LessonsWhat Didnt Work
  • Unbeatable baseline Goal was to rescore the
    output of the SRI recognizer in order to reduce
    WER gt to find acoustic information not already
    used by the baseline recognizer.
  • What information is not already used?
    Phone-based ANN/HMM hybrid system hard to say.
  • When an experiment fails why?
  • Better use open-source baseline ( not state of
    the art, but thats OK), construct test systems
    in a continuum between baseline and target.

13
AVICAR
14
AVICAR Recording Hardware
System is not permanently installed mounting
requires 10 minutes.
15
AVICAR Data Summary
  • 100 Talkers
  • 5 noise conditions
  • Engine idling,
  • 35mph, windows closed / windows open
  • 55mph, windows closed / windows open
  • 4 types of utterances
  • Isolated Digits
  • Phone numbers
  • Isolated Letters (e-set articulation test)
  • TIMIT sentences
  • Public release 16 schools companies (but I
    dont know how many are using it)

16
AVICAR Labeling Recognition
  • Manual lip segmentation 36 images
  • Automatic face tracking nearly perfect
  • Automatic lip tracking not so good
  • Manual audio segmentation sentence boundaries
  • Audio Enhancement
  • Audio Digit WRA 97, 89, 87, 84, 78

17
AVICAR Data Problems
  • DIVX encoding gt database lt 300G, but
  • DIVX gt poor edge quality in some images
  • Amelioration plan re-transfer from tapes in high
    quality, huge data size for folks who want it.
Write a Comment
User Comments (0)
About PowerShow.com