Speech Recognition - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Speech Recognition

Description:

... transition probabilities between states pronunciation network for each potential ... Algorithm: given input of a pronunciation network for each possible word and ... – PowerPoint PPT presentation

Number of Views:971
Avg rating:3.0/5.0
Slides: 19
Provided by: alexis70
Category:

less

Transcript and Presenter's Notes

Title: Speech Recognition


1
Speech Recognition
  • Michael Gibney and Alexis Baird
  • CIT595

2
Speech Recognition Applications
  • Automation, translation, dictation and
    transcription, medical, and military uses
  • In any situation where keyed input is impractical
    or impossible (as in military aircraft cockpits,
    over-the-phone computer interaction, or in cases
    where the user is physically handicapped), speech
    recognition technology can overcome obstacles by
    providing the possibility of a voice user
    interface.

3
Speech Recognition Overview
  • Acoustic Signal
  • Phonemes
  • Words
  • Sentences

4
Acoustic Signal
  • Acoustic signal is sliced up by given time
    units and features are automatically marked
  • Calculate probability that a set of signals
    produced a given phoneme

5
Phonemes What are phonemes?
  • foxes ? f?ks?z
  • tough ? t?f
  • Not necessarily one-to-one correspondence with
    written letters
  • Each phoneme has a unique set of features that
    can be extracted from its acoustic realization

6
Phonemes
  • Why is identifying phonemes hard?
  • No one has unique set of phonemes gender
    differences, dialectical variation, etc.
  • Pin, pen, legs, bad, caught
  • Allophonic variation pot spot, top

7
Acoustic Signal to Phonemes
  • Compression/Rarefaction of air
  • Diaphragm(microphone)-gtanalog
  • A-gtD conversion (sampling quanitzation)
  • Sample (for speech recognition) usually stored as
    an int (8- or 16-bit)
  • Sample/Quantization tradeoff frequency accuracy
    (Nyquist frequency)? Amplitude accuracy (bit
    depth)?

8
Data Representation
  • What information is considered important? (varies
    for different applications fixed/float DSP, bit
    depth, etc.)
  • Amplitude/time-gtFreq-Amp/Time (Spectrogram) (can
    be read back into sound! In theory, the
    digitized wave form should be able to be read
    directly. All the information is there, and
    despite the variation of contexts, frequency
    combinations are clearly recognizable (to
    humans!) as being a human voice!)
  • Ampl. wave decomposition can be interpreted into
    phonemes (highly contextual! e.g. different
    voices, accents, noise, sentence context, etc.)

9
Phonemes to Words
  • Why is it hard to identify words from phonemes?
  • set your, what you, find him, left me
  • Simple Conditional Probability Algorithm
  • w argmax P(wO) P(Ow)P(w) P(Ow)P(w)
  • w ? V P(O)
  • O o1o2o3on ? sequence of observations
  • w ? some word in the universe V

10
Phonemes to Words II
  • Weighted Automoton states corresponding to
    single phoneme and set of transition
    probabilities between states ? pronunciation
    network for each potential word
  • What does this remind us of?
  • Forward Algorithm given input of a pronunciation
    network for each possible word and an observed
    sequence spectral input slices, we want the
    probabilities of possible words corresponding to
    the observed sequence
  • forwardt, j is the probability of being in
    state j after seeing the first t observations
    given the automaton ?
  • forwardt, j P(o1o2o3ot, qt j ?) P(w)
  • Forward algorithm is applied to each word and the
    word with the greatest probability is selected

11
Phonemes to Word III
  • Forward algorithm makes a few simplifications
  • Assumes that input is a sequence of symbols
    rather than slices of an acoustic signal
  • Assumes that input symbols have an exact
    one-to-one correspondence to states
  • Solution? Hidden Markov Models

12
Hidden Markov Models
  • Extension of simple state machine idea
  • Adds a sequence of observations which dont
    uniquely determine the states but have a
    corresponding set of observation likelihoods
  • Observation likelihoods are the probability of a
    given observation being generated from a given
    state

13
Words to Sentences
  • Why do we care about sentence-level?
  • W argmax P(WO) P(OW) P(W)
  • P(O)
  • P(OW) P(W)
  • P(W) ? prior probability
  • P(OW) ?observation likelihood

14
Prior Probability P(W)
  • Probability of a given sequence of words
  • Use n-gram models
  • Bigrams Example find relative frequency of pairs
    of words such as want to
  • Requires a huge corpus (collection of data)

15
Oberservation Likelihood P(OW)
  • Many different types of algorithms
  • Use Hidden Markov Models (find the probability of
    a sequence of phonemes given the entire sentence)
  • Viterbi and A Algorithms compute the
    probability of an observation sequence given each
    sentence AND return the most-likely sentence

16
Viterbi Algorithm
  • Find the best state sequence (qq1q2qr) given
    an observation sequence (o) and a model/state
    graph (?)
  • A matrix in which each cell contains the
    probability of the best path to that cell the
    y-axis is all the words in the lexicon the
    x-axis is a sequence of observed phonemes
  • Viterbit, imax P(q1q2qti, o1o2ot?)

17
Obstacles in Speech Recognition
  • John hit Mary. John hid Mary.
  • Dialectical variation
  • Noise interference
  • Fast speech or hyper-articulated speech

18
Other Methods in SR
  • Discourse level ? context of the surrounding
    speech is used to predict future words
  • Training on domain specific corporaie. if the
    speech recognition system is for an airline, use
    a context-specific corpus
Write a Comment
User Comments (0)
About PowerShow.com