Steps into DetectionBased ASR - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Steps into DetectionBased ASR

Description:

... of Phonological Feature Systems Used in Detection-Based ASR,' in Proc. ... Oracle Trained CRF is able to retrieve more phonological information from speech ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 35
Provided by: iisSin
Category:

less

Transcript and Presenter's Notes

Title: Steps into DetectionBased ASR


1
Steps into Detection-Based ASR
  • I-Fan Chen (???)
  • http//www.iis.sinica.edu.tw/ifanchen

2
Outline
  • Introduction
  • What is Detection-Based ASR?
  • Why Detection-Based ASR?
  • Steps into Detection-Based ASR
  • What we need?
  • How to implement a baseline system?
  • Available Toolkits
  • MLP
  • CRF

3
IntroductionWhat is Detection-Based ASR?
  • Detection-Based ASR Knowledge-Based ASR
  • Detection approach
  • Knowledge target
  • Goal
  • To Imitate Human Speech Recognition

Ref C.-H. Lee, et al.,An Overview on Automatic
Speech Attribute Transcription (ASAT) in
Proc. INTERSPEECH 2007
4
Human Speech Recognition
Knowledge Detection
Integration
Knowledge (Higher Level)
5
Human Speech Recognition
Knowledge Detection
Integration
Knowledge (Higher Level)
MFCC
PLP
HMM
Acoustic Fea.
HMM is a Special Case for Knowledge-Base ASR
6
Why Detection-Based ASR?
  • Is classic HMM-ASR enough?
  • What we have done on it
  • Discriminative AM Training
  • Discriminative Feature Extraction
  • Discriminative LM Training
  • Discriminative Decoding
  • System Fusion
  • Baysian optimization on WER/PER/ER
  • But

7
20
8
More than HMM
Knowledge Detection
Integration
Knowledge (Higher Level)
MFCC
PLP
HMM
Acoustic Fea.
9
Why Detection-Based ASR?
  • We need more than HMM
  • More research topics
  • What kind of features should we use?
  • How to integrate different feature types?
  • Feature asynchronization problem
  • Different sampling rate
  • Sample-based,
  • Frame-based,
  • Feature format mismatch
  • Spike-train feature

10
Why Detection-Based ASR?
  • We need more than HMM
  • More research topics
  • More interesting
  • Linguistics
  • Biology
  • Signal Processing
  • Machine Learning

11
Why Detection-Based ASR?
  • We need more than HMM
  • More research topics
  • More interesting
  • It is a Trend!

12
How to implement a baseline system?
  • What we need?
  • Corpus
  • English TIMIT
  • Chinese TCC300
  • Knowledge Attributes
  • Detectors
  • SVM, MLP, HMM,etc.
  • Integrator
  • HMM, CRF,etc.

13
How to implement a baseline system?
  • Take a paper for example
  • I-Fan Chen and Hsin-Min Wang, An Investigation
    of Phonological Feature Systems Used in
    Detection-Based ASR, in Proc. ISCSLP 2008
  • English DT ASR TIMIT
  • Focused problem
  • Knowledge Attribute Selection
  • The way to train a CRF integrator

14
Detection-Based ASR
HSR
Knowledge Detection
Integration
Knowledge (Higher Level)
  • Phone
  • Syllable
  • Word
  • Sentence
  • Semantic info
  • HMM
  • CRF (Conditional Random Fields)
  • Phonological attr
  • Prosodic attr
  • Acoustic attr

15
Phonological Systems (1)
  • 3 Phonological Sys (King Taylor, 2000)
  • SPE, GP, MV
  • SPE Sound Pattern of English
  • Chomsky and Halle, 1968
  • Production based
  • 13 binary features ( silence 14)
  • Eg. Anterior, nasal, round, etc.
  • GP Government Phonology
  • Sound structure Prime based
  • 11 binary features(8 primes 3 head primes)

Ref Detection of Phonological Features in
Continuous Speech using Neural Networks,
https//www.era.lib.ed.ac.uk/handle/1842/1001
16
Phonological Systems (2)
  • MV Multi-Valued System
  • Commonly used in phonological analysis
  • Production based
  • 6 features, with 210 possible values
  • Eg. Centrality, front back, manner, phonation,
    place, roundness

17
Phonological Features Detection (1)
Detector
SPE14
MLP
Attr posterior prob
Hidden layer 250 nodes
9 frames
quantize
i
GP11
13 MFCC
i
i-4
i4
18
Phonological Features Detection (2)
MV29
Centrality
6 MV Features
0 1 0 0
0 1 0 0 1 0 0 0 1 0
Front-Back
1 0 0
Roundness
0 1 0
Possible Values Posterior Prob
19
SPE14
GP11
MV
20
CRF Integrator
General Chain CRF
yi
yi-1
y
Output (Phone)
?j , µk feature function weight
parameters
Input (Phonological Features)
x
xi
xi1
xi-1
21
CRF Integrator Training Issues
  • Required Label for CRF Training
  • Phone y
  • Phonological features x

Training Data
Phone Label
Phonological feature (with error)
MLP Detector
DT CRF
Speech
Detected-data Trained CRF
22
Experiments
  • Corpus TIMIT
  • No SA1, SA2
  • Training set(3296 utts), Dev set(400 utts)
  • Test set(1344 utts)
  • Phone set TIMIT61
  • Evaluation CMU/MIT 39
  • Baseline
  • CI-HMM
  • MLP toolkit
  • Nico Toolkit
  • CRF toolkit
  • CRF

23
Oracle Test
  • Suppose all detectors achieve 100 accuracy
  • Upper bounds of phonological feature systems
  • OT CRF

Oracle Free Phone Decoding Results
24
Real Case for OT, RT CRF
  • Use MLP detectors results for CRF integration

25
System Combine (1)
  • 3 phonological systems are complementary

26
System Combine (2) method
  • Use CRF for combination
  • Use Development set for training

yi
yi-1
y
Combined Result Phone
SPE sys
Phone
x
GP sys
MV sys
xi
xi1
xi-1
27
System Combine (3) result
28
Conclusion
  • A well-designed phonological feature system is
    important.
  • Oracle Trained CRF is able to retrieve more
    phonological information from speech
  • High phone correction rate (sensitive to
    detection error)
  • Helpful for combination
  • Detection-Based ASR is promising
  • Front-end detector is a major issue

29
Relevant References
  • J. Morris and Eric Fosler-Lussier, Combining
    phonetic attributes using conditional random
    fields, in Proc INTERSPEECH 2006
  • J. Morris and Eric Fosler-Lussier, Further
    Experiments With Detector-Based Conditional
    Random Fields in Phonetic Recognition, in Proc.
    ICASSP 2007

30
Toolkits Detail
  • MLP
  • QuickNet MLP (http//www.icsi.berkeley.edu/Speech
    /qn.html)
  • Nico ToolKit (http//nico.nikkostrom.com/)
  • CRF
  • CRF (http//crfpp.sourceforge.net/)
  • Mallet (http//mallet.cs.umass.edu/)
  • Java CRF (http//crf.sourceforge.net/)

31
Nico Toolkit
  • Program
  • C code
  • Executable program directly linux environment
  • Pros
  • With very good documents
  • http//nico.nikkostrom.com/
  • With Speech Example on TIMIT corpus
  • http//nico.nikkostrom.com/doc/speechex.html

32
CRF Yet Another CRF toolkit
  • Program
  • C code
  • Executable program directly linux / windows
    (binary package for MS-Windows)
  • Train crf_learn
  • Test crf_test
  • Pros
  • Support multi-thread
  • Very easy to use
  • Cons
  • Binary Feature Function only

33
CRF Train
  • Required data (file)
  • Feature template
  • Training data
  • Command line
  • crf_learn fea.tpl train.data output_model.crf

time
n_gp_A n_gp_H n_gp_hh n_gp_S h gp_A
gp_H gp_hh n_gp_S p gp_A
n_gp_H n_gp_hh n_gp_S er gp_A n_gp_H
n_gp_hh n_gp_S er n_gp_A n_gp_H n_gp_hh
n_gp_S h n_gp_A n_gp_H n_gp_hh n_gp_S
h n_gp_A gp_H gp_hh n_gp_S s
Unigram U01x-1,0 U02x0,0 U03x1,0 U04
x-1,1 U05x0,1 U06x1,1 Bigram B
Column ID
Relative time for the watch point
Last Column For Label (Y)
observations
34
CRF Test
time
n_gp_A n_gp_H n_gp_hh n_gp_S h h n_gp_A
gp_H gp_hh gp_S p k n_gp_A
gp_H gp_hh gp_S p k gp_A
gp_H gp_hh n_gp_S p r gp_A n_gp_H
n_gp_hh n_gp_S er r n_gp_A n_gp_H n_gp_hh
gp_S l h n_gp_A n_gp_H n_gp_hh
gp_S h h n_gp_A n_gp_H n_gp_hh n_gp_S
h h n_gp_A n_gp_H n_gp_hh n_gp_S h h
  • Required data
  • model
  • Test_Data
  • Command line
  • crf_test -m model -o result Test_Data

Predicted results
Your AF Detectors results
Answer
Write a Comment
User Comments (0)
About PowerShow.com