Title: HIWIRE Progress Report July 2006
1HIWIRE Progress Report July 2006
- Technical University of Crete
- Speech Processing and
- Dialog Systems Group
- Presenter Alex Potamianos
2Outline
- Work package 1
- Task 1Blind Source Separation for ASR
- Task 2,5 Feature extraction and fusion
- Task 4 Segment models for ASR
- Work package 2
- Task 1,2 VTLN
- Task 2 Bayes optimal adaptation
- Work package 3
- Task 1 Fixed platform integration
3Blind Speech Separation (BSS) problem
4Data Model Problem Statement
mixing impulse response matrix
spatial signature of the i-th speaker for lag t
additive noise vector
L Channel order
Objective Estimate the inverse-channel impulse
response matrix W(t) from the observed signal
5BSS permutation problem
- Permutation problem Order of mics may be
different in the solution for each frequency bin - To solve permutation combine
- Spatial constraints
- Continuity constraints in frequency domain
- Solution to the permutation problem can be
formulated using - ILS minimization criterion
6Recent progress
- Improved solution to permutation problem
- Combining spatial and continuity constraints
- Trying out different continuity criteria
- Created a synthetic database using typical room
impulse responses - First ASR experiments using the synthetic
database
7Outline
- Work package 1
- Task 1Blind Source Separation for ASR
- Task 2,5 Feature extraction and fusion
- Task 4 Segment models for ASR
- Work package 2
- Task 1,2 VTLN
- Task 2 Bayes optimal adaptation
- Work package 3
- Task 1 Fixed platform integration
8Motivation
- Combining classifiers/information sources is an
important problem in machine learning apps. - Simple, yet powerful, way to combine classifiers
is multi-stream approach assumes independent
information sources - Unsupervised stream weight computation for
multi-stream classifiers is an open problem
9Problem Definition
10Optimal Stream Weights Result I
- Equal error rate in single-stream classifiers
-
- optimal stream weights are inversely
proportional to the total stream estimation error
variance
11Optimal Stream Weights Result II
- Equal estimation error variance in each stream
- optimal weights are approximately inversely
proportional to the single stream classification
error
12Recent Progress
- Experiments with synthetic data
- Gaussian distribution classification problem)
- Results show good match with theoretical results
- Experimental verification for Naïve Bayes
classifiers - utterance classification - NLP application
- First experiments with unsupervised estimates
of stream weights - Intra-class based metrics on observations
- AV-ASR application
13Outline
- Work package 1
- Task 1Blind Source Separation for ASR
- Task 2,5 Feature extraction and fusion
- Task 4 Segment models for ASR
- Work package 2
- Task 1,2 VTLN
- Task 2 Bayes optimal adaptation
- Work package 3
- Task 1 Fixed platform integration
14Dynamical System Segment Model
- Based on linear dynamical system
- Where x is state, y is observation, u control,
w,v noise - The system parameters should guarantee
- Identifiability, Controllability, Observability,
Stability - We investigated more generalized parameter
structures
15Generalized forms of parameter structures
- The systems parameters have an identifiable
canonical form - F ones in the superdiagonal remaining with
zeros. Row ri with free parameters (i1,,n) - H column dim. equal to F. Filled with zeros.
Take r00 and then row i have a one in column
ri-1 1. - P, R filled with free parameters.
- Propose a novel element-wise estimation based on
EM algorithm for systems identification.
16Application on speech
- Experiments on clean data from AURORA 2
- 11 word-models (onenine, zero, oh)
- No. of segments of each model depends on the No.
of phones of the word-model - HTK for feature extraction (14 MFCCs)
- Alignments taken by HTK using HMMs
- 4000 training sentences 600 isolated words for
testing
17Results
- Fig. (a) classification performance (using 3
different initializations) - Fig. (b) the log-likelihood is increasing for the
same runs
18Conclusions Future Work
- Developed new forms of Linear State-space models
- Proposed a novel element-wise parameter
estimation process - Performed training classification on AURORA 2
based on speech segments and LDS - Results shown correlation between performance and
initialization - In the future
- investigation of optimal initialization
- Feature-segments alignment (through dynamic
programming) - Investigation of state space dimension
19Outline
- Work package 1
- Task 1Blind Source Separation for ASR
- Task 2,5 Feature extraction and fusion
- Task 4 Segment models for ASR
- Work package 2
- Task 1,2 VTLN
- Task 2 Bayes optimal adaptation
- Work package 3
- Task 1 Fixed platform integration
20Vocal Tract Length Normalization.
- Linear and Non-Linear Frequency Warping.
- Multi-Parameter Frequency Warping.
- Warping and Spectral Bias Addition by ML
Estimation.
21Linear and Non-Linear Warping Analysis
- An optimal warping factor a is computed (for
each phoneme), so that the Euclidean spectral
distance (MSE) is minimized, -
- between the warped g(X) and the corresponding
unwraped spectrum X. Optimization is achieved by
full search - The mapped spectrum is warped according to this
optimal warping factor.
22Linear and Non-Linear Warping
- Frequency Warping is implemented by re-sampling
the spectral envelope at linearly and nonlinearly
frequency indices, i.e. - 1. Linear
- 2. Piece-Wise Non-Linear
- 3. Power
23Multi-Parameter Frequency Warping.
- After the computation of the optimal warping
factor, we explore alternative piecewise linear
frequency warping strategies - Bi-Parametric Warping Function (2pts)
- Different warping factors are evaluated, for the
low (F lt 3 KHz) and high (F 3 KHz) frequencies. - Four-Parametric Warping Function (4pts)
- Different warping factors are evaluated for the
frequency ranges, 0-1.5, 1.5-3, 3-4.5 and 4.5-8
KHz.
24Reduction in MSE Non-linear warping
25Reduction in MSE Multi-parametric warping
26Reduction in MSE Bias Removal and
Multi-parametric warping
27Ongoing work
- Implementation of phone-dependent warping in
HTK - Implementation of multi-parametric warping and
bias removal in HTK
28Outline
- Work package 1
- Task 1Blind Source Separation for ASR
- Task 2,5 Feature extraction and fusion
- Task 4 Segment models for ASR
- Work package 2
- Task 1,2 VTLN
- Task 2 Bayes optimal adaptation
- Work package 3
- Task 1 Fixed platform integration
29Optimal Bayes Adaptation
- Central problem is to determine
- Using Bayes rule we have
- 2 step process
- Obtain the priors from the SI
models - Compute the Likelihoods
30Phone-Based Clustering
- Cluster the output distributions based on
common central phone
Number of Mixture
Components
1
2
M
1
2
M
Number of
Dimensions (Cepstrum Coef)
genone 1
genone 2
? is every component of the above representation
and stands for the prior
31Our Implementation
- Computation of priors using
- Computation of likelihoods by using Baum Welch
algorithm and ML -
- After computation of posterior probabilities we
use smoothing - Such techniques are
32Outline
- Work package 1
- Task 1Blind Source Separation for ASR
- Task 2,5 Feature extraction and fusion
- Task 4 Segment models for ASR
- Work package 2
- Task 1,2 VTLN
- Task 2 Bayes optimal adaptation
- Work package 3
- Task 1 Fixed platform integration