The SPEERAL Decoder

About This Presentation

Title:

The SPEERAL Decoder

Description:

84911 AVIGNON CEDEX 09. T l. 33 (0)4 90 84 35 09. Fax. 33 (0)4 90 84 35 01 ... Laboratoire d 'Informatique d 'Avignon. AGROPARC. BP 1228, 84911 AVIGNON Cedex 9 ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 26

Provided by: noc1

Category:

more less

Transcript and Presenter's Notes

Title: The SPEERAL Decoder

1
The SPEERAL Decoder

NOCERA Pascal
Laboratoire d Informatique d Avignon
AGROPARC
BP 1228, 84911 AVIGNON Cedex 9
Tel 04.90.84.35.07
E-mail pascal.nocera_at_lia.univ-avignon.fr

2
The SPEERAL System
Stochastic approach Find the best
hypothesis among all the possible hypotheses with
the A algorithm.
3
The SPEERAL System
Stochastic approach
4
Acoustic Models
Hidden Markov Models Gaussian Mixture
Models Contextual Models (Phonemes)
5
Acoustic Model Toolkit

Parameterization program
Text to phone program
Alignment program
HMM learning program
Supervised and unsupervised Model Adaptation
MLLR
MAP
Structural Model Space Transformation

6
Linguistic Models

Stochastic Language Models
N-grams
Class based language models

7
Linguistic Model Toolkit

Text Normalization Tools
Language Model Training
CMU toolkit
SRI toolkit
ATT toolkit
Language Model Compilation
Lexicon Compilation

8
Standard A algorithm

best-first search algorithm
Extend the best path to generate new candidates
Assign a score F(x) to all explored path
g(x) combines Language Model and acoustic scores
h(x) estimates the probability of the best
extension
Keep the list of explored paths as a priority
queue
When the best path reaches end then stop

9
Standard A algorithm (2/2)

Requires an admissible heuristic function
h(x) underestimates the true remaining cost path
(the more accurate the better).
Heuristics samples
h(x) 0
Breadth-First search
h(x) true remaining cost (i.e. F(x) never
changes)
Deterministic search

10
The SPEERAL System

Language model
Stochastic n-gram LM (n3)
Lexical, phonetic and acoustic knowledge source
Acoustic model (HMM, )
Decoding vocabulary (lexicon)
Input signal ? Phoneme lattice
( p, beg, end, sc ) with score sc P(X
beg..end/p)
/

11
Sounding function h

Remaining path estimation
Acoustic score only
Computed with a backward Viterbi, during the
phoneme lattice generation
Heuristic admissibility
Underestimate remaining cost no LM information
Cannot be true cost (lack of LM information)

12
Lexicon

Prefix-tree organization
Widely applied
Compact representation
search effort occurs at word begin

13
Search space

Phoneme lattice
Concatenation of lexical trees

14
LM look-ahead

Word anticipation
n is a lexicon node
wn is any leaf (i.e. word) of the sub-tree
starting at n
P(n/...wi-2 w i-1) Part_LM(n, wi-2 wi-1 )
Part_LM(n, wi-2 w i-1 ) maxWnP(wn/wi-2 wi-1)
? Paths leading to improbable words are early
penalized

15
Start-synchronous tree

Asynchronous search
The search processes the same part (lexicon) with
a different history.
With start-synchronous capabilities
Most advanced path can be reused when encountered
twice.
For each frame x, the lexicon starting at x is
stored.
Only the deepest nodes (or leaves) are stored.

16
Principle (1/5)
Frame t
Frame 0
Deepest lexicon nodes at frame 0
Deepest lexicon nodes at frame t
p
W
p
1
2
3
p
1
p
1
p
W
p
p
3
3
2
1
p
p
2
1
17
Principle (2/5)
Frame t
Frame 0
p
2
p
W
p
1
2
3
p
1
p
3
p
1
p
W
p
p
3
3
2
1
p
p
2
1
18
Principle (3/5)
Frame t
Frame 0
W
2
p
2
p
W
p
1
2
3
p
1
p
3
p
1
p
W
p
p
3
3
2
1
p
p
2
1
19
Principle (4/5)
Frame t
Frame 0
W
2
p
2
p
W
p
1
2
3
p
1
p
3
p
1
p
W
p
p
3
3
2
1
p
p
2
1
20
Principle (5/5) .
Frame t
Frame 0
p
Frame tn
1
p
2
W
2
p
2
p
1
p
W
p
1
2
3
p
1
p
p
3
2
p
1
p
W
p
p
3
3
2
1
p
p
2
1
21
Search space pruning

Optimization
If two candidates end with the same 3 words, only
the best is kept.
Cut
Short candidates are dropped when their distance
increase too much with the deepest.

ASR Output
1 best hypothesis
N best hypothesis
word graph

Applications
Transcription
Question answering
Named entities extraction
Information Retrieval
Call-type classification

23
French Broadcast News Campain ESTER
Broadcast News (1h long show)
24
System Description

Acoustic Models
10k HMM contextual
3.6k states
230k gaussian
Lexicon 65K Words
Language model Combination
(Le Monde 87-02, 0.41)
(Le Monde 02-03, 0.24)
(ESTER, 0.35)

The SPEERAL Decoder - PowerPoint PPT Presentation

The SPEERAL Decoder

84911 AVIGNON CEDEX 09. T l. 33 (0)4 90 84 35 09. Fax. 33 (0)4 90 84 35 01 ... Laboratoire d 'Informatique d 'Avignon. AGROPARC. BP 1228, 84911 AVIGNON Cedex 9 ... – PowerPoint PPT presentation