Speech Recognition Introduction I

About This Presentation

Title:

Speech Recognition Introduction I

Description:

Speech Recognition Introduction I E.M. Bakker Speech Recognition Some Applications An Overview General Architecture Speech Production Speech Perception Speech ... – PowerPoint PPT presentation

Number of Views:362

Avg rating:3.0/5.0

Slides: 25

Provided by: Bakk7

Category:

more less

Transcript and Presenter's Notes

Title: Speech Recognition Introduction I

1
Speech RecognitionIntroduction I

E.M. Bakker

2
Speech Recognition

Some Applications
An Overview
General Architecture
Speech Production
Speech Perception

3
Speech Recognition

Goal Automatically extract the string of words
spoken from the speech signal

4
Speech Recognition

Goal Automatically extract the string of words
spoken from the speech signal

How is SPEECH produced?
Characteristics of
Acoustic Signal

5
Speech Recognition

Goal Automatically extract the string of words
spoken from the speech signal

How is SPEECH perceived? gt Important Features
6
Speech Recognition

Goal Automatically extract the string of words
spoken from the speech signal

What LANGUAGE is spoken? gt Language Model
7
Speech Recognition

Goal Automatically extract the string of words
spoken from the speech signal

What is in the BOX?
8
Important Componentsof General SR Architecture

Speech Signals
Signal Processing Functions
Parameterization
Acoustic Modeling (Learning Phase)
Language Modeling (Learning Phase)
Search Algorithms and Data Structures
Evaluation

9
Recognition ArchitecturesA Communication
Theoretic Approach
Message Source
Linguistic Channel
Articulatory Channel
Acoustic Channel
Features
Observable Message
Words
Sounds
Speech Recognition Problem P(WA), where A
is acoustic signal, W words
spoken
Objective minimize the word error
rate Approach maximize P(WA) during training

Bayesian formulation for speech recognition
P(WA) P(AW) P(W) / P(A), A is
acoustic signal, W words spoken

Components
P(AW) acoustic model (hidden Markov models,
mixtures)
P(W) language model (statistical, finite
state networks, etc.)
The language model typically predicts a small set
of next words based on
knowledge of a finite number of previous words
(N-grams).

10
Recognition Architectures
Input Speech
Language Model P(W)
11
ASR Architecture
Evaluators
Feature Extraction
Recognition Searching Strategies
Speech Database, I/O
HMM Initialisation and Training
Common BaseClasses Configuration and Specification
Language Models
12
Signal ProcessingFunctionality

Acoustic Transducers
Sampling and Resampling
Temporal Analysis
Frequency Domain Analysis
Ceps-tral Analysis
Linear Prediction and LP-Based Representations
Spectral Normalization

13
Acoustic Modeling Feature Extraction
Fourier Transform
Input Speech
Cepstral Analysis
Perceptual Weighting
Time Derivative
Time Derivative
Delta Energy Delta Cepstrum
Delta-Delta Energy Delta-Delta Cepstrum
Energy Mel-Spaced Cepstrum
14
Acoustic Modeling

Dynamic Programming
Markov Models
Parameter Estimation
HMM Training
Continuous Mixtures
Decision Trees
Limitations and Practical Issues of HMM

15
Acoustic ModelingHidden Markov Models

Acoustic models encode the temporal evolution of
the features (spectrum).
Gaussian mixture distributions are used to
account for variations in speaker, accent, and
pronunciation.
Phonetic model topologies are simple
left-to-right structures.
Skip states (time-warping) and multiple paths
(alternate pronunciations) are also common
features of models.
Sharing model parameters is a common strategy to
reduce complexity.

16
Acoustic Modeling Parameter Estimation

Closed-loop data-driven modeling supervised from
a word-level transcription.
The expectation/maximization (EM) algorithm is
used to improve our parameter estimates.
Computationally efficient training algorithms
(Forward-Backward) have been crucial.
Batch mode parameter updates are typically
preferred.
Decision trees are used to optimize
parameter-sharing, system complexity, and the
use of additional linguistic knowledge.

17
Language Modeling

Formal Language Theory
Context-Free Grammars
N-Gram Models and Complexity
Smoothing

18
Language Modeling
19
Language Modeling N-Grams
20
LM Integration of Natural Language
21
Search Algorithms and Data Structures

Basic Search Algorithms
Time Synchronous Search
Stack Decoding
Lexical Trees
Efficient Trees

22
Dynamic Programming-Based Search
23
Recognition Architectures
Input Speech
Language Model P(W)
24
Speech Recognition

Speech Recognition Introduction I - PowerPoint PPT Presentation

Speech Recognition Introduction I

Speech Recognition Introduction I E.M. Bakker Speech Recognition Some Applications An Overview General Architecture Speech Production Speech Perception Speech ... – PowerPoint PPT presentation