Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK

Description:

Bridge over to HIWIRE work-plan. Robust Features, Acoustic Modeling, ... Genones are used in Decipher and Nuance. TUC - SDSG. Segment Models. HMM limitations: ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 36

Provided by: vdi6

Category:

more less

Transcript and Presenter's Notes

Title: Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK

1
Feature Selection, Acoustic Modeling and
Adaptation SDSG REVIEW of recent WORK

Technical University of Crete
Speech Processing and
Dialog Systems Group
Presenter Alex Potamianos

2
Outline

Prior Work
Adaptation
Acoustic Modeling
Robust Feature Selection
Bridge over to HIWIRE work-plan
Robust Features, Acoustic Modeling, Adaptation
New areas audio-visual, microphone arrays

3
Adaptation

Transformation-based adaptation
MAP Adaptation (Bayesian learning approximation)
Speaker Clustering / Speaker space models.
Robust Feature Selection
Combinations

4
Acoustic Model Adaptation SDSG Selected Work

Constrained Estimation Adaptation
Maximum Likelihood Stochastic Transformations
Combined Transformation-MAP adaptation
MLST Basis Vectors
Incremental Adaptation
Dependency modeling of biases
Vocal Tract Norm. with Linear Transformation

5
Constrained Estimation Adaptation (Digalakis
1995)

Hypothesize a sequence of feature-space linear
transformations
Adapted models (A) are then
diagonal.
Adaptation is equivalent to estimating the state
dependent

6
Compared to MLLR (Leggeter 1996)

Both published at the same time.
MLLR is only model adaptation.
MLLR transforms only the model means
in MLLR is block diagonal.
Constrained estimation is more generic.

7
Limitations of the Linear Assumption

Linear assumption may be too restrictive in
modeling the training testing dependency.
Goal Try a more complex transformation.
All Gaussians in a class are restricted to be
transformed identically using the same
transformation.
Goal Let each Gaussian in a class to decide for
its own transformation.
Which transformation transforms each Gaussian is
predefined.
Goal Let the system to automatically choose the
transformation-Gaussian couples.

8
ML Stochastic Transformations (MLST)
(Diakoloukas Digalakis 1997)

Hypothesize a sequence of feature-space
stochastic transformations of the form

9
MLST model-space

Use a set of MLSTs instead of linear
transformations.
Adapted observation densities
MLST-Method I
is diagonal
MLST-Method II
is block diagonal

10
MLST Reduce the number of mixture components

The adapted mixture densities consist of
Gaussians.
Reduce the Gaussians back to their SI number
HPT Apply the component transformation with the
highest probability to each Gaussian.
LCT Linear combination of all component
transforms.
MTG Merge the transformed Gaussians.

11
Schematic representation of MLST adaptation
12
MLST properties

Asj, bsj are shared at a state or state-cluster
level
Transformation weights lj are estimated at a
Gaussian level
MLST combines transformed Gaussians
MLST is flexible on how to select a
transformation for each Gaussian.
MLST chooses arbitrary number of transformations
per class.

13
MLST compared to ML Linear Transforms

Hard versus Soft decision
Choose the linear component based on the training
samples.
Adaptation Resolution
Linear components are common to a transformation
class
Choose the transformation at a Gaussian level
Increased adaptation resolution - robust
estimation

14
MLST basis transforms (Boulis Diakoloukas
Digalakis 2000)

Algorithm steps
Cluster the training speaker space into classes
Train MLST component transforms using data from
each training speaker class
Adaptation data is used to estimate the
transformation weight
It is like having a-priori knowledge to the
estimation process
Results in rapid speaker adaptation
Significant gains for medium and small data sets

15
Combined Transformation Bayesian (Digalakis
Neumeyer 1996)

MAP estimation can be expressed as
Retain the asymptotic properties of MAP
Retain fast adaptation rates of transformations.

16
Rapid Speech Recognizer Adaptation (Digalakis
et.al 2000)

Dependence models of the bias components of
cascaded transforms. Techniques
Gaussian multiscale process
Hierarchical tree-structured prior
Explicit correlation models
Markov Random Fields

17
VTN with Linear Transformation(Potamianos and
Rose 1997, Potamianos and Narayanan 1998)

Vocal Tract Normalization
Select optimal warping factor ? according to
? arg max P(Xªa, ?, H)
where H is the transcription, and Xª frequency
warped observation vector by factor a.
VTN with linear transformation
?, ? arg max P(Xªa, ?, ?, H)
where h?() is a parametric linear transformation
with parameter ?

18
Acoustic ModelingSDSG Selected Work

Genones Generalized Gaussian mixture tying
scheme
Stochastic Segment Models (SSMs)

19
Genones Generalized Mixture Tying (Digalakis
Monaco Murveit 1996)

Algorithm Steps
Clustering of HMM states based on the similarity
of their distributions
Splitting Construct seed codebooks for each
state cluster
Either identify the most likely mixture component
subset
Or cluster down the original codebook
Reestimation of the parameters using Baum-Welch
Better trade-off between modelling resolution and
robustness
Genones are used in Decipher and Nuance

20
Segment Models

HMM limitations
Weak duration modelling
Conditional independence of observations
assumption
Restrictions on feature extraction imposed by
frame-based observations
Segment models motivation
Larger number of degrees of freedom in the model
Use segmental features
Model correlation of frame-based features
Powerful modelling of transitions and
longer-range speech dynamics
Less distortion for segmental coding ? segmental
recognition more efficient

21
General Stochastic Segment Models

A segment s in an utterance of N frames is
s (ta , tb) 1 ta tb N
Segment model density
Segment models generate a variable-length
sequence of frames

22
Stochastic Segment Model (Ostendorf Digalakis
1992)

Problem Model time correlation within a segment
Solution Gaussian model variations based on
assumptions about the form of statistical
dependency
Gauss-Markov model
Dynamical System model
Target State model.

23
SSM Viterbi Decoding (Ostendorf Digalakis
Kimball 1996)

HMM Viterbi recognition
State to Word sequence mapping
SSM analogous solution
Map the segment label sequence to the appropriate
word sequence

24
From HMMs to Segment Models(Ostendorf Digalakis
1996)

Unified view of stochastic modeling
General stochastic model that encompasses most SM
type models
Similarities in terms of correlation and
parameter tying assumptions
Analogies between segment models and HMMs

25
Robust Feature Selection

Time-Frequency Representation for ASR
(Potamianos and Maragos 1999)
Confidence Measure Estimation for ASR Features
sent over wireless channels (missing features)
(Potamianos and Weerackody 2001)
AM-FM Model Based Features
(Dimitriadis et al 2002)

26
Other Work

Multiple source separation using microphone
arrays (Sidiropoulos et al. 2001)

27
Prior Work Overview
Constr. Est. Adapt.
MLST.
Combinations
MAP (Bayes) Adapt.
VTLN
Genones
Segment Models
Robust Features
28
HIWIRE Work Proposal
Adaptation Bayes optimal class.
Acoustic Modeling Segment Models
Feature Selection AM-FM Features
Microphone Arrays Speech/Noise Separation
Audio Visual ASR Baseline experiments
29
Bayes optimal classification (HIWIRE proposal)