Title: Steps into DetectionBased ASR
1Steps into Detection-Based ASR
- I-Fan Chen (???)
- http//www.iis.sinica.edu.tw/ifanchen
2Outline
- Introduction
- What is Detection-Based ASR?
- Why Detection-Based ASR?
- Steps into Detection-Based ASR
- What we need?
- How to implement a baseline system?
- Available Toolkits
- MLP
- CRF
3IntroductionWhat is Detection-Based ASR?
- Detection-Based ASR Knowledge-Based ASR
- Detection approach
- Knowledge target
- Goal
- To Imitate Human Speech Recognition
Ref C.-H. Lee, et al.,An Overview on Automatic
Speech Attribute Transcription (ASAT) in
Proc. INTERSPEECH 2007
4Human Speech Recognition
Knowledge Detection
Integration
Knowledge (Higher Level)
5Human Speech Recognition
Knowledge Detection
Integration
Knowledge (Higher Level)
MFCC
PLP
HMM
Acoustic Fea.
HMM is a Special Case for Knowledge-Base ASR
6Why Detection-Based ASR?
- Is classic HMM-ASR enough?
- What we have done on it
- Discriminative AM Training
- Discriminative Feature Extraction
- Discriminative LM Training
- Discriminative Decoding
- System Fusion
- Baysian optimization on WER/PER/ER
- But
720
8More than HMM
Knowledge Detection
Integration
Knowledge (Higher Level)
MFCC
PLP
HMM
Acoustic Fea.
9Why Detection-Based ASR?
- We need more than HMM
- More research topics
- What kind of features should we use?
- How to integrate different feature types?
- Feature asynchronization problem
- Different sampling rate
- Sample-based,
- Frame-based,
-
- Feature format mismatch
- Spike-train feature
10Why Detection-Based ASR?
- We need more than HMM
- More research topics
- More interesting
- Linguistics
- Biology
- Signal Processing
- Machine Learning
11Why Detection-Based ASR?
- We need more than HMM
- More research topics
- More interesting
- It is a Trend!
12How to implement a baseline system?
- What we need?
- Corpus
- English TIMIT
- Chinese TCC300
- Knowledge Attributes
- Detectors
- SVM, MLP, HMM,etc.
- Integrator
- HMM, CRF,etc.
13How to implement a baseline system?
- Take a paper for example
- I-Fan Chen and Hsin-Min Wang, An Investigation
of Phonological Feature Systems Used in
Detection-Based ASR, in Proc. ISCSLP 2008 - English DT ASR TIMIT
- Focused problem
- Knowledge Attribute Selection
- The way to train a CRF integrator
14Detection-Based ASR
HSR
Knowledge Detection
Integration
Knowledge (Higher Level)
- Phone
- Syllable
- Word
- Sentence
- Semantic info
- HMM
- CRF (Conditional Random Fields)
- Phonological attr
- Prosodic attr
- Acoustic attr
15Phonological Systems (1)
- 3 Phonological Sys (King Taylor, 2000)
- SPE, GP, MV
- SPE Sound Pattern of English
- Chomsky and Halle, 1968
- Production based
- 13 binary features ( silence 14)
- Eg. Anterior, nasal, round, etc.
- GP Government Phonology
- Sound structure Prime based
- 11 binary features(8 primes 3 head primes)
Ref Detection of Phonological Features in
Continuous Speech using Neural Networks,
https//www.era.lib.ed.ac.uk/handle/1842/1001
16Phonological Systems (2)
- MV Multi-Valued System
- Commonly used in phonological analysis
- Production based
- 6 features, with 210 possible values
- Eg. Centrality, front back, manner, phonation,
place, roundness
17Phonological Features Detection (1)
Detector
SPE14
MLP
Attr posterior prob
Hidden layer 250 nodes
9 frames
quantize
i
GP11
13 MFCC
i
i-4
i4
18Phonological Features Detection (2)
MV29
Centrality
6 MV Features
0 1 0 0
0 1 0 0 1 0 0 0 1 0
Front-Back
1 0 0
Roundness
0 1 0
Possible Values Posterior Prob
19SPE14
GP11
MV
20CRF Integrator
General Chain CRF
yi
yi-1
y
Output (Phone)
?j , µk feature function weight
parameters
Input (Phonological Features)
x
xi
xi1
xi-1
21CRF Integrator Training Issues
- Required Label for CRF Training
- Phone y
- Phonological features x
Training Data
Phone Label
Phonological feature (with error)
MLP Detector
DT CRF
Speech
Detected-data Trained CRF
22Experiments
- Corpus TIMIT
- No SA1, SA2
- Training set(3296 utts), Dev set(400 utts)
- Test set(1344 utts)
- Phone set TIMIT61
- Evaluation CMU/MIT 39
- Baseline
- CI-HMM
- MLP toolkit
- Nico Toolkit
- CRF toolkit
- CRF
23Oracle Test
- Suppose all detectors achieve 100 accuracy
- Upper bounds of phonological feature systems
- OT CRF
Oracle Free Phone Decoding Results
24Real Case for OT, RT CRF
- Use MLP detectors results for CRF integration
25System Combine (1)
- 3 phonological systems are complementary
26System Combine (2) method
- Use CRF for combination
- Use Development set for training
yi
yi-1
y
Combined Result Phone
SPE sys
Phone
x
GP sys
MV sys
xi
xi1
xi-1
27System Combine (3) result
28Conclusion
- A well-designed phonological feature system is
important. - Oracle Trained CRF is able to retrieve more
phonological information from speech - High phone correction rate (sensitive to
detection error) - Helpful for combination
- Detection-Based ASR is promising
- Front-end detector is a major issue
29Relevant References
- J. Morris and Eric Fosler-Lussier, Combining
phonetic attributes using conditional random
fields, in Proc INTERSPEECH 2006 - J. Morris and Eric Fosler-Lussier, Further
Experiments With Detector-Based Conditional
Random Fields in Phonetic Recognition, in Proc.
ICASSP 2007
30Toolkits Detail
- MLP
- QuickNet MLP (http//www.icsi.berkeley.edu/Speech
/qn.html) - Nico ToolKit (http//nico.nikkostrom.com/)
- CRF
- CRF (http//crfpp.sourceforge.net/)
- Mallet (http//mallet.cs.umass.edu/)
- Java CRF (http//crf.sourceforge.net/)
31Nico Toolkit
- Program
- C code
- Executable program directly linux environment
- Pros
- With very good documents
- http//nico.nikkostrom.com/
- With Speech Example on TIMIT corpus
- http//nico.nikkostrom.com/doc/speechex.html
32CRF Yet Another CRF toolkit
- Program
- C code
- Executable program directly linux / windows
(binary package for MS-Windows) - Train crf_learn
- Test crf_test
- Pros
- Support multi-thread
- Very easy to use
- Cons
- Binary Feature Function only
33CRF Train
- Required data (file)
- Feature template
- Training data
- Command line
- crf_learn fea.tpl train.data output_model.crf
time
n_gp_A n_gp_H n_gp_hh n_gp_S h gp_A
gp_H gp_hh n_gp_S p gp_A
n_gp_H n_gp_hh n_gp_S er gp_A n_gp_H
n_gp_hh n_gp_S er n_gp_A n_gp_H n_gp_hh
n_gp_S h n_gp_A n_gp_H n_gp_hh n_gp_S
h n_gp_A gp_H gp_hh n_gp_S s
Unigram U01x-1,0 U02x0,0 U03x1,0 U04
x-1,1 U05x0,1 U06x1,1 Bigram B
Column ID
Relative time for the watch point
Last Column For Label (Y)
observations
34CRF Test
time
n_gp_A n_gp_H n_gp_hh n_gp_S h h n_gp_A
gp_H gp_hh gp_S p k n_gp_A
gp_H gp_hh gp_S p k gp_A
gp_H gp_hh n_gp_S p r gp_A n_gp_H
n_gp_hh n_gp_S er r n_gp_A n_gp_H n_gp_hh
gp_S l h n_gp_A n_gp_H n_gp_hh
gp_S h h n_gp_A n_gp_H n_gp_hh n_gp_S
h h n_gp_A n_gp_H n_gp_hh n_gp_S h h
- Required data
- model
- Test_Data
- Command line
- crf_test -m model -o result Test_Data
Predicted results
Your AF Detectors results
Answer