The SPEERAL Decoder - PowerPoint PPT Presentation

About This Presentation
Title:

The SPEERAL Decoder

Description:

84911 AVIGNON CEDEX 09. T l. 33 (0)4 90 84 35 09. Fax. 33 (0)4 90 84 35 01 ... Laboratoire d 'Informatique d 'Avignon. AGROPARC. BP 1228, 84911 AVIGNON Cedex 9 ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 26
Provided by: noc1
Category:

less

Transcript and Presenter's Notes

Title: The SPEERAL Decoder


1
The SPEERAL Decoder
  • NOCERA Pascal
  • Laboratoire d Informatique d Avignon
  • AGROPARC
  • BP 1228, 84911 AVIGNON Cedex 9
  • Tel 04.90.84.35.07
  • E-mail pascal.nocera_at_lia.univ-avignon.fr

2
The SPEERAL System
Stochastic approach Find the best
hypothesis among all the possible hypotheses with
the A algorithm.
3
The SPEERAL System
Stochastic approach
4
Acoustic Models
Hidden Markov Models Gaussian Mixture
Models Contextual Models (Phonemes)
5
Acoustic Model Toolkit
  • Parameterization program
  • Text to phone program
  • Alignment program
  • HMM learning program
  • Supervised and unsupervised Model Adaptation
  • MLLR
  • MAP
  • Structural Model Space Transformation

6
Linguistic Models
  • Stochastic Language Models
  • N-grams
  • Class based language models

7
Linguistic Model Toolkit
  • Text Normalization Tools
  • Language Model Training
  • CMU toolkit
  • SRI toolkit
  • ATT toolkit
  • Language Model Compilation
  • Lexicon Compilation

8
Standard A algorithm
  •  best-first  search algorithm
  • Extend the best path to generate new candidates
  • Assign a score F(x) to all explored path
  • g(x) combines Language Model and acoustic scores
  • h(x) estimates the probability of the best
    extension
  • Keep the list of explored paths as a priority
    queue
  • When the best path reaches end then stop

9
Standard A algorithm (2/2)
  • Requires an admissible heuristic function
  • h(x) underestimates the true remaining cost path
    (the more accurate the better).
  • Heuristics samples
  • h(x) 0
  • Breadth-First search
  • h(x) true remaining cost (i.e. F(x) never
    changes)
  • Deterministic search

10
The SPEERAL System
  • Language model
  • Stochastic n-gram LM (n3)
  • Lexical, phonetic and acoustic knowledge source
  • Acoustic model (HMM, )
  • Decoding vocabulary (lexicon)
  • Input signal ? Phoneme lattice
  • ( p, beg, end, sc ) with score sc P(X
    beg..end/p)
  • /

11
Sounding function h
  • Remaining path estimation
  • Acoustic score only
  • Computed with a backward Viterbi, during the
    phoneme lattice generation
  • Heuristic admissibility
  • Underestimate remaining cost no LM information
  • Cannot be true cost (lack of LM information)

12
Lexicon
  • Prefix-tree organization
  • Widely applied
  • Compact representation
  • search effort occurs at word begin

13
Search space
  • Phoneme lattice
  • Concatenation of lexical trees

14
LM look-ahead
  • Word anticipation
  • n is a lexicon node
  • wn is any leaf (i.e. word) of the sub-tree
    starting at n
  • P(n/...wi-2 w i-1) Part_LM(n, wi-2 wi-1 )
  • Part_LM(n, wi-2 w i-1 ) maxWnP(wn/wi-2 wi-1)
  • ? Paths leading to improbable words are early
    penalized

15
Start-synchronous tree
  • Asynchronous search
  • The search processes the same part (lexicon) with
    a different history.
  • With start-synchronous capabilities
  • Most advanced path can be reused when encountered
    twice.
  • For each frame x, the lexicon starting at x is
    stored.
  • Only the deepest nodes (or leaves) are stored.

16
Principle (1/5)
Frame t
Frame 0
Deepest lexicon nodes at frame 0
Deepest lexicon nodes at frame t
p
W
p
1
2
3
p
1
p
1
p
W
p
p
3
3
2
1
p
p
2
1
17
Principle (2/5)
Frame t
Frame 0
p
2
p
W
p
1
2
3
p
1
p
3
p
1
p
W
p
p
3
3
2
1
p
p
2
1
18
Principle (3/5)  
Frame t
Frame 0
W
2
p
2
p
W
p
1
2
3
p
1
p
3
p
1
p
W
p
p
3
3
2
1
p
p
2
1
19
Principle (4/5) 
Frame t
Frame 0
W
2
p
2
p
W
p
1
2
3
p
1
p
3
p
1
p
W
p
p
3
3
2
1
p
p
2
1
20
Principle (5/5) . 
Frame t
Frame 0
p
Frame tn
1
p
2
W
2
p
2
p
1
p
W
p
1
2
3
p
1
p
p
3
2
p
1
p
W
p
p
3
3
2
1
p
p
2
1
21
Search space pruning
  • Optimization
  • If two candidates end with the same 3 words, only
    the best is kept.
  • Cut
  • Short candidates are dropped when their distance
    increase too much with the deepest.

22
  • ASR Output
  • 1 best hypothesis
  • N best hypothesis
  • word graph
  • Applications
  • Transcription
  • Question answering
  • Named entities extraction
  • Information Retrieval
  • Call-type classification

23
French Broadcast News Campain ESTER
Broadcast News (1h long show)
24
System Description
  • Acoustic Models
  • 10k HMM contextual
  • 3.6k states
  • 230k gaussian
  • Lexicon 65K Words
  • Language model Combination
  • (Le Monde 87-02, 0.41)
  • (Le Monde 02-03, 0.24)
  • (ESTER, 0.35)

25
Results and Demonstration
  • WER 25 (10 RT)
  • Demonstration on TV
Write a Comment
User Comments (0)
About PowerShow.com