Steps into DetectionBased ASR - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Steps into DetectionBased ASR

Description:

... of Phonological Feature Systems Used in Detection-Based ASR,' in Proc. ... Oracle Trained CRF is able to retrieve more phonological information from speech ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 35

Provided by: iisSin

Category:

more less

Transcript and Presenter's Notes

Title: Steps into DetectionBased ASR

1
Steps into Detection-Based ASR

I-Fan Chen (???)
http//www.iis.sinica.edu.tw/ifanchen

2
Outline

Introduction
What is Detection-Based ASR?
Why Detection-Based ASR?
Steps into Detection-Based ASR
What we need?
How to implement a baseline system?
Available Toolkits
MLP
CRF

3
IntroductionWhat is Detection-Based ASR?

Detection-Based ASR Knowledge-Based ASR
Detection approach
Knowledge target
Goal
To Imitate Human Speech Recognition

Ref C.-H. Lee, et al.,An Overview on Automatic
Speech Attribute Transcription (ASAT) in
Proc. INTERSPEECH 2007
4
Human Speech Recognition
Knowledge Detection
Integration
Knowledge (Higher Level)
5
Human Speech Recognition
Knowledge Detection
Integration
Knowledge (Higher Level)
MFCC
PLP
HMM
Acoustic Fea.
HMM is a Special Case for Knowledge-Base ASR
6
Why Detection-Based ASR?

Is classic HMM-ASR enough?
What we have done on it
Discriminative AM Training
Discriminative Feature Extraction
Discriminative LM Training
Discriminative Decoding
System Fusion
Baysian optimization on WER/PER/ER
But

7
20
8
More than HMM
Knowledge Detection
Integration
Knowledge (Higher Level)
MFCC
PLP
HMM
Acoustic Fea.
9
Why Detection-Based ASR?

We need more than HMM
More research topics
What kind of features should we use?
How to integrate different feature types?
Feature asynchronization problem
Different sampling rate
Sample-based,
Frame-based,
Feature format mismatch
Spike-train feature

10
Why Detection-Based ASR?

We need more than HMM
More research topics
More interesting
Linguistics
Biology
Signal Processing
Machine Learning

11
Why Detection-Based ASR?

We need more than HMM
More research topics
More interesting
It is a Trend!

12
How to implement a baseline system?

What we need?
Corpus
English TIMIT
Chinese TCC300
Knowledge Attributes
Detectors
SVM, MLP, HMM,etc.
Integrator
HMM, CRF,etc.

13
How to implement a baseline system?

Take a paper for example
I-Fan Chen and Hsin-Min Wang, An Investigation
of Phonological Feature Systems Used in
Detection-Based ASR, in Proc. ISCSLP 2008
English DT ASR TIMIT
Focused problem
Knowledge Attribute Selection
The way to train a CRF integrator

14
Detection-Based ASR
HSR
Knowledge Detection
Integration
Knowledge (Higher Level)

Phone
Syllable
Word
Sentence
Semantic info

HMM
CRF (Conditional Random Fields)

Phonological attr
Prosodic attr
Acoustic attr

15
Phonological Systems (1)

3 Phonological Sys (King Taylor, 2000)
SPE, GP, MV
SPE Sound Pattern of English
Chomsky and Halle, 1968
Production based
13 binary features ( silence 14)
Eg. Anterior, nasal, round, etc.
GP Government Phonology
Sound structure Prime based
11 binary features(8 primes 3 head primes)

Ref Detection of Phonological Features in
Continuous Speech using Neural Networks,
https//www.era.lib.ed.ac.uk/handle/1842/1001
16
Phonological Systems (2)

MV Multi-Valued System
Commonly used in phonological analysis
Production based
6 features, with 210 possible values
Eg. Centrality, front back, manner, phonation,
place, roundness

17
Phonological Features Detection (1)
Detector
SPE14
MLP
Attr posterior prob
Hidden layer 250 nodes
9 frames
quantize
i
GP11
13 MFCC
i
i-4
i4
18
Phonological Features Detection (2)
MV29
Centrality
6 MV Features
0 1 0 0
0 1 0 0 1 0 0 0 1 0
Front-Back
1 0 0
Roundness
0 1 0
Possible Values Posterior Prob
19
SPE14
GP11
MV
20
CRF Integrator
General Chain CRF
yi
yi-1
y
Output (Phone)
?j , µk feature function weight
parameters
Input (Phonological Features)
x
xi
xi1
xi-1
21
CRF Integrator Training Issues

Required Label for CRF Training
Phone y
Phonological features x

Training Data
Phone Label
Phonological feature (with error)
MLP Detector
DT CRF
Speech
Detected-data Trained CRF
22
Experiments

Corpus TIMIT
No SA1, SA2
Training set(3296 utts), Dev set(400 utts)
Test set(1344 utts)
Phone set TIMIT61
Evaluation CMU/MIT 39
Baseline
CI-HMM
MLP toolkit
Nico Toolkit
CRF toolkit
CRF

23
Oracle Test

Suppose all detectors achieve 100 accuracy
Upper bounds of phonological feature systems
OT CRF

Oracle Free Phone Decoding Results
24
Real Case for OT, RT CRF

Use MLP detectors results for CRF integration

25
System Combine (1)

3 phonological systems are complementary

26
System Combine (2) method

Use CRF for combination
Use Development set for training

yi
yi-1
y
Combined Result Phone
SPE sys
Phone
x
GP sys
MV sys
xi
xi1
xi-1
27
System Combine (3) result
28
Conclusion

A well-designed phonological feature system is
important.
Oracle Trained CRF is able to retrieve more
phonological information from speech
High phone correction rate (sensitive to
detection error)
Helpful for combination
Detection-Based ASR is promising
Front-end detector is a major issue

29
Relevant References

J. Morris and Eric Fosler-Lussier, Combining
phonetic attributes using conditional random
fields, in Proc INTERSPEECH 2006
J. Morris and Eric Fosler-Lussier, Further
Experiments With Detector-Based Conditional
Random Fields in Phonetic Recognition, in Proc.
ICASSP 2007

30
Toolkits Detail

MLP
QuickNet MLP (http//www.icsi.berkeley.edu/Speech
/qn.html)
Nico ToolKit (http//nico.nikkostrom.com/)
CRF
CRF (http//crfpp.sourceforge.net/)
Mallet (http//mallet.cs.umass.edu/)
Java CRF (http//crf.sourceforge.net/)

31
Nico Toolkit

Program
C code
Executable program directly linux environment
Pros
With very good documents
http//nico.nikkostrom.com/
With Speech Example on TIMIT corpus
http//nico.nikkostrom.com/doc/speechex.html

32
CRF Yet Another CRF toolkit

Program
C code
Executable program directly linux / windows
(binary package for MS-Windows)
Train crf_learn
Test crf_test
Pros
Support multi-thread
Very easy to use
Cons
Binary Feature Function only

33
CRF Train

Required data (file)
Feature template
Training data
Command line
crf_learn fea.tpl train.data output_model.crf

time
n_gp_A n_gp_H n_gp_hh n_gp_S h gp_A
gp_H gp_hh n_gp_S p gp_A
n_gp_H n_gp_hh n_gp_S er gp_A n_gp_H
n_gp_hh n_gp_S er n_gp_A n_gp_H n_gp_hh
n_gp_S h n_gp_A n_gp_H n_gp_hh n_gp_S
h n_gp_A gp_H gp_hh n_gp_S s
Unigram U01x-1,0 U02x0,0 U03x1,0 U04
x-1,1 U05x0,1 U06x1,1 Bigram B
Column ID
Relative time for the watch point
Last Column For Label (Y)
observations
34
CRF Test
time
n_gp_A n_gp_H n_gp_hh n_gp_S h h n_gp_A
gp_H gp_hh gp_S p k n_gp_A
gp_H gp_hh gp_S p k gp_A
gp_H gp_hh n_gp_S p r gp_A n_gp_H
n_gp_hh n_gp_S er r n_gp_A n_gp_H n_gp_hh
gp_S l h n_gp_A n_gp_H n_gp_hh
gp_S h h n_gp_A n_gp_H n_gp_hh n_gp_S
h h n_gp_A n_gp_H n_gp_hh n_gp_S h h