SPEECH RECOGNITION SEARCH - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

SPEECH RECOGNITION SEARCH

Description:

Acoustic model and Language model drive an integrated search process ... Use standard trellis. Allow transition from word ends to word starts where LM allows ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 25
Provided by: DVC2
Category:

less

Transcript and Presenter's Notes

Title: SPEECH RECOGNITION SEARCH


1
SPEECHRECOGNITIONSEARCH
2
Bayesian Recognition including acoustic and
linguistic information sources
  • W word sequence
  • x1T feature vector sequence
  • recognized word sequence

3
STRUCTURE for ISOLATED WORD SYSTEMS
Language model acts as postprocessor on
acoustic match
Word Likelihoods P(w1xi) P(w2 xi) P(wM xi)
HMM scores P(xiw1) P(xiw2) P(xiwM)
4
STRUCTURE for LARGE VOCABULARY CONTINUOUS SPEECH
RECOGNITION
Acoustic model and Language model drive an
integrated search process
Language Model
Acoustic Model
s(t)
xi
Feature Extraction
Large Vocabulary Search Engine
Ranked Sentence Likelihoods P(S1xi) P(S2
xi)
5
DECODINGFOR SMALL VOCABULARYSYSTEMSWITH
SHORT-SPAN LMs
6
Search Space for IWR with full word HMMs
7
Search Space for IWR with full word HMMs
  • search space is defined by
  • all states in all words as derived from the
    (word/phonetic) lexicon
  • a few additional states for marking beginning and
    end (of sentence)
  • additional connections for feeding a priori word
    probabilities (LM information)

8
Time Synchronous Viterbi Search
observations xt-1 xt
  • at each time t
  • - compute a new set of likelihoods P(s x0xt)
    for all states in the search space
  • - the new likelihoods are based on values at
    time t-1 and the new observation xt

search space
9
Time Synchronous Recognizer with pruning
  • limit the number of states considered for
    computation to a set of active states
  • active states at time t are those that can be
    reached from the active states at time t-1
  • after computation of likelihoods at time t
  • rank all active states according to likelihood
  • PRUNE away the less likely states
  • if there are no active states left in a word,
    abandon the word as a whole

abandon word3
10
Continuous Word Recognition
S
E
11
Continuous Word Recognition
  • words are expanded into their states as in
    isolated word recognition
  • word connections contain word transition
    probabilities (limited to bigram probabilities)

12
One-Pass Dynamic Programming
  • Use standard trellis
  • Allow transition from word ends to word starts
    where LM allows
  • Use standard Viterbi beam search
  • Recognition of word string
  • by backtracking (requires maintenance of full
    search history)
  • by adding word history as extra information
    into the nodes
  • a decision is made on best possible history at
    start-up time of a new word
  • intrinsic bigram limitation
  • Conclusion
  • well suited for small medium vocabulary tasks
    with simple language models
  • not suited for large vocabulary recognition with
    complex language models

13
DECODINGFOR LARGE VOCABULARYCONTINUOUS SPEECH
RECOGNITION
14
Search for Phoneme Based Recognizers
speech signal
Feature Extraction
sequence of observations
Acoustic Matching by Dynamic Programming
Acoustic Models
phoneme hypothesis
Large Vocabulary Search
Linguistic Models
ranked list of recognized words, sentences
0.0 recognize speech -6.3 wreck a nice
beach -8.8 recognize beach -9.7 wreck a nice
peach .
15
Search for Phoneme Based Recognizers
  • Output of Acoustic Match dense phoneme graph
  • many phoneme hypotheses in parallel need to be
    considered
  • begin- and end-time of competing hypotheses may
    not coincide
  • Linguistic Match matches the phoneme hypotheses
    against all sentence hypotheses suggested by
  • phonetic lexicon
  • language model
  • ISSUES
  • number of possible sentence hypothesis huge
    (infinite)
  • phonetic lexicon may not provide enough
    pronunciation variants
  • SOLUTION
  • prototypical systems integrate acoustic match and
    linguistic match into a single search trying to
    find the most likely sentence for a stream of
    features
  • sentence hypotheses are built up word by word,
    i.e. new words are hypthesized when reaching a
    word end state

16
Search in Large Vocabulary Continuous
Recognition
  • GOAL Find the sentence with the highest
    likelihood, given the observed features
  • REALITY The number of possible sentences is so
    huge that only a fraction of all hypotheses can
    be evaluated
  • SOLUTION Quickly select those few hypotheses
    that seem to be likely candidates to become
    winners

17
Issues in Large Vocabulary Continuous
Recognition
  • SEARCH STRATEGY
  • ORDER in which different hypotheses are
    evaluated
  • BREADTH FIRST SYNCHRONOUS VITERBI BEAM SEARCH
  • BEST FIRST (A), STACK DECODING
  • PRUNING STRATEGY
  • Pruning parameters, criteria beam threshold,
    number of hypotheses to be considered
  • Is there a chance that the best path gets pruned
    away ?
  • DATA REPRESENTATION
  • ACOUSTIC MODEL
  • LEXICON
  • LANGUAGE MODEL
  • SEARCH HYPOTHESIS

18
Lexical Trees for large vocabulary tasks
  • Flat (linear) dictionary
  • each word is an entry by itself
  • full computation for each word
  • computation proportional to Nwords x avg nbr of
    phonemes per word
  • computation proportional to Nwords
  • Tree structured dictionaries
  • organize lexicon as a tree
  • computation is shared between words with same
    initial set of phonemes
  • total number of nodes in search network is much
    smaller than for linearly organized lexicons

19
Lexical Trees for large vocabulary tasks
20
Dynamic Tree Expansion
  • Works with lexical trees
  • When an end node of the tree is reached, a new
    lexical tree can be added to the search space
  • Different pruning criteria for internal nodes vs.
    end nodes
  • Serves as basic algorithm for typical decoders in
    LVCSR
  • By virtue of sharing of the initial node in a
    tree, it is not possible to model cross-word
    coarticulation in a single pass

21
Dynamic Tree Expansion
22
Dynamic Combination of Lexicon and LM
  • Maintaining multiple copies of the lexical tree
    wastes memory.
  • Use of longer span LM not possible.
  • Solution
  • Reuse single lexical tree.
  • Hypothesis is combination of lexicon position and
    language model history.

23
Dynamic Combination of Lexicon and LM
24
Multi-pass algorithms
  • A number of essential features in LVCSR
    drastically increase the search space
  • Cross-word triphone models
  • Long span language models
  • A multi-pass algorithm can overcome this problem
    with following strategy
  • FIRST PASS
  • use simplified assumptions such that an efficient
    search algorithm can be used to explore the FULL
    search space
  • use a highly conservative pruning strategy
  • Retain possible solutions in a graph or as an
    N-best list of sentences
  • SUBSEQUENT PASS(ES)
  • use detailed acoustic and language models
  • use different search algorithm (sometimes
    exhaustive) on the retained graph
Write a Comment
User Comments (0)
About PowerShow.com