Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK

Description:

Bridge over to HIWIRE work-plan. Robust Features, Acoustic Modeling, ... Genones are used in Decipher and Nuance. TUC - SDSG. Segment Models. HMM limitations: ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 36
Provided by: vdi6
Category:

less

Transcript and Presenter's Notes

Title: Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK


1
Feature Selection, Acoustic Modeling and
Adaptation SDSG REVIEW of recent WORK
  • Technical University of Crete
  • Speech Processing and
  • Dialog Systems Group
  • Presenter Alex Potamianos

2
Outline
  • Prior Work
  • Adaptation
  • Acoustic Modeling
  • Robust Feature Selection
  • Bridge over to HIWIRE work-plan
  • Robust Features, Acoustic Modeling, Adaptation
  • New areas audio-visual, microphone arrays

3
Adaptation
  • Transformation-based adaptation
  • MAP Adaptation (Bayesian learning approximation)
  • Speaker Clustering / Speaker space models.
  • Robust Feature Selection
  • Combinations

4
Acoustic Model Adaptation SDSG Selected Work
  • Constrained Estimation Adaptation
  • Maximum Likelihood Stochastic Transformations
  • Combined Transformation-MAP adaptation
  • MLST Basis Vectors
  • Incremental Adaptation
  • Dependency modeling of biases
  • Vocal Tract Norm. with Linear Transformation

5
Constrained Estimation Adaptation (Digalakis
1995)
  • Hypothesize a sequence of feature-space linear
    transformations
  • Adapted models (A) are then
  • diagonal.
  • Adaptation is equivalent to estimating the state
    dependent

6
Compared to MLLR (Leggeter 1996)
  • Both published at the same time.
  • MLLR is only model adaptation.
  • MLLR transforms only the model means
  • in MLLR is block diagonal.
  • Constrained estimation is more generic.

7
Limitations of the Linear Assumption
  • Linear assumption may be too restrictive in
    modeling the training testing dependency.
  • Goal Try a more complex transformation.
  • All Gaussians in a class are restricted to be
    transformed identically using the same
    transformation.
  • Goal Let each Gaussian in a class to decide for
    its own transformation.
  • Which transformation transforms each Gaussian is
    predefined.
  • Goal Let the system to automatically choose the
    transformation-Gaussian couples.

8
ML Stochastic Transformations (MLST)
(Diakoloukas Digalakis 1997)
  • Hypothesize a sequence of feature-space
    stochastic transformations of the form

9
MLST model-space
  • Use a set of MLSTs instead of linear
    transformations.
  • Adapted observation densities
  • MLST-Method I
  • is diagonal
  • MLST-Method II
  • is block diagonal

10
MLST Reduce the number of mixture components
  • The adapted mixture densities consist of
    Gaussians.
  • Reduce the Gaussians back to their SI number
  • HPT Apply the component transformation with the
    highest probability to each Gaussian.
  • LCT Linear combination of all component
    transforms.
  • MTG Merge the transformed Gaussians.

11
Schematic representation of MLST adaptation
12
MLST properties
  • Asj, bsj are shared at a state or state-cluster
    level
  • Transformation weights lj are estimated at a
    Gaussian level
  • MLST combines transformed Gaussians
  • MLST is flexible on how to select a
    transformation for each Gaussian.
  • MLST chooses arbitrary number of transformations
    per class.

13
MLST compared to ML Linear Transforms
  • Hard versus Soft decision
  • Choose the linear component based on the training
    samples.
  • Adaptation Resolution
  • Linear components are common to a transformation
    class
  • Choose the transformation at a Gaussian level
  • Increased adaptation resolution - robust
    estimation

14
MLST basis transforms (Boulis Diakoloukas
Digalakis 2000)
  • Algorithm steps
  • Cluster the training speaker space into classes
  • Train MLST component transforms using data from
    each training speaker class
  • Adaptation data is used to estimate the
    transformation weight
  • It is like having a-priori knowledge to the
    estimation process
  • Results in rapid speaker adaptation
  • Significant gains for medium and small data sets

15
Combined Transformation Bayesian (Digalakis
Neumeyer 1996)
  • MAP estimation can be expressed as
  • Retain the asymptotic properties of MAP
  • Retain fast adaptation rates of transformations.

16
Rapid Speech Recognizer Adaptation (Digalakis
et.al 2000)
  • Dependence models of the bias components of
    cascaded transforms. Techniques
  • Gaussian multiscale process
  • Hierarchical tree-structured prior
  • Explicit correlation models
  • Markov Random Fields

17
VTN with Linear Transformation(Potamianos and
Rose 1997, Potamianos and Narayanan 1998)
  • Vocal Tract Normalization
  • Select optimal warping factor ? according to
  • ? arg max P(Xªa, ?, H)
  • where H is the transcription, and Xª frequency
    warped observation vector by factor a.
  • VTN with linear transformation
  • ?, ? arg max P(Xªa, ?, ?, H)
  • where h?() is a parametric linear transformation
    with parameter ?

18
Acoustic ModelingSDSG Selected Work
  • Genones Generalized Gaussian mixture tying
    scheme
  • Stochastic Segment Models (SSMs)

19
Genones Generalized Mixture Tying (Digalakis
Monaco Murveit 1996)
  • Algorithm Steps
  • Clustering of HMM states based on the similarity
    of their distributions
  • Splitting Construct seed codebooks for each
    state cluster
  • Either identify the most likely mixture component
    subset
  • Or cluster down the original codebook
  • Reestimation of the parameters using Baum-Welch
  • Better trade-off between modelling resolution and
    robustness
  • Genones are used in Decipher and Nuance

20
Segment Models
  • HMM limitations
  • Weak duration modelling
  • Conditional independence of observations
    assumption
  • Restrictions on feature extraction imposed by
    frame-based observations
  • Segment models motivation
  • Larger number of degrees of freedom in the model
  • Use segmental features
  • Model correlation of frame-based features
  • Powerful modelling of transitions and
    longer-range speech dynamics
  • Less distortion for segmental coding ? segmental
    recognition more efficient

21
General Stochastic Segment Models
  • A segment s in an utterance of N frames is
  • s (ta , tb) 1 ta tb N
  • Segment model density
  • Segment models generate a variable-length
    sequence of frames

22
Stochastic Segment Model (Ostendorf Digalakis
1992)
  • Problem Model time correlation within a segment
  • Solution Gaussian model variations based on
    assumptions about the form of statistical
    dependency
  • Gauss-Markov model
  • Dynamical System model
  • Target State model.

23
SSM Viterbi Decoding (Ostendorf Digalakis
Kimball 1996)
  • HMM Viterbi recognition
  • State to Word sequence mapping
  • SSM analogous solution
  • Map the segment label sequence to the appropriate
    word sequence

24
From HMMs to Segment Models(Ostendorf Digalakis
1996)
  • Unified view of stochastic modeling
  • General stochastic model that encompasses most SM
    type models
  • Similarities in terms of correlation and
    parameter tying assumptions
  • Analogies between segment models and HMMs

25
Robust Feature Selection
  • Time-Frequency Representation for ASR
  • (Potamianos and Maragos 1999)
  • Confidence Measure Estimation for ASR Features
    sent over wireless channels (missing features)
  • (Potamianos and Weerackody 2001)
  • AM-FM Model Based Features
  • (Dimitriadis et al 2002)

26
Other Work
  • Multiple source separation using microphone
    arrays (Sidiropoulos et al. 2001)

27
Prior Work Overview
Constr. Est. Adapt.
MLST.
Combinations
MAP (Bayes) Adapt.
VTLN
Genones
Segment Models
Robust Features
28
HIWIRE Work Proposal
Adaptation Bayes optimal class.
Acoustic Modeling Segment Models
Feature Selection AM-FM Features
Microphone Arrays Speech/Noise Separation
Audio Visual ASR Baseline experiments
29
Bayes optimal classification (HIWIRE proposal)
  • Classifier decision for a test data vector xtest
  • Choose the class that results in the highest
    value

30
Bayes optimal versus MAP
  • Assumption the posterior is sufficiently peaked
    around the most probable point
  • MAP approximation
  • ?MAP is the set of parameters that maximize

31
Why Bayes optimal classification
  • Optimal classification criterion
  • The prediction of all the parameter hypotheses is
    combined
  • Better discrimination
  • Less training data
  • Faster asymptotic convergence to the ML estimate
  • However
  • Computationally more expensive
  • Difficult to find analytical solutions
  • ....hence some approximations should still be
    considered

32
Segment Models
  • Phone Transition modeling
  • New features
  • Combine with HMMs
  • Parametric modeling of feature trajectories

33
AM-FM Features
  • See NTUA presentation

34
Audio-Visual ASR
  • Baseline

35
Microphone Array
  • Speech Noise source separation algorithms
Write a Comment
User Comments (0)
About PowerShow.com