HIWIRE Progress Report July 2006

About This Presentation

Title:

HIWIRE Progress Report July 2006

Description:

HIWIRE Progress Report July 2006. Technical University of Crete. Speech Processing and ... Proposed a novel element-wise parameter estimation process ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 33

Provided by: vdi6

Category:

more less

Transcript and Presenter's Notes

Title: HIWIRE Progress Report July 2006

1
HIWIRE Progress Report July 2006

Technical University of Crete
Speech Processing and
Dialog Systems Group
Presenter Alex Potamianos

2
Outline

Work package 1
Task 1Blind Source Separation for ASR
Task 2,5 Feature extraction and fusion
Task 4 Segment models for ASR
Work package 2
Task 1,2 VTLN
Task 2 Bayes optimal adaptation
Work package 3
Task 1 Fixed platform integration

3
Blind Speech Separation (BSS) problem
4
Data Model Problem Statement
mixing impulse response matrix
spatial signature of the i-th speaker for lag t
additive noise vector
L Channel order
Objective Estimate the inverse-channel impulse
response matrix W(t) from the observed signal
5
BSS permutation problem

Permutation problem Order of mics may be
different in the solution for each frequency bin
To solve permutation combine
Spatial constraints
Continuity constraints in frequency domain
Solution to the permutation problem can be
formulated using
ILS minimization criterion

6
Recent progress

Improved solution to permutation problem
Combining spatial and continuity constraints
Trying out different continuity criteria
Created a synthetic database using typical room
impulse responses
First ASR experiments using the synthetic
database

7
Outline

Work package 1
Task 1Blind Source Separation for ASR
Task 2,5 Feature extraction and fusion
Task 4 Segment models for ASR
Work package 2
Task 1,2 VTLN
Task 2 Bayes optimal adaptation
Work package 3
Task 1 Fixed platform integration

8
Motivation

Combining classifiers/information sources is an
important problem in machine learning apps.
Simple, yet powerful, way to combine classifiers
is multi-stream approach assumes independent
information sources
Unsupervised stream weight computation for
multi-stream classifiers is an open problem

9
Problem Definition
10
Optimal Stream Weights Result I

Equal error rate in single-stream classifiers
optimal stream weights are inversely
proportional to the total stream estimation error
variance

11
Optimal Stream Weights Result II

Equal estimation error variance in each stream
optimal weights are approximately inversely
proportional to the single stream classification
error

12
Recent Progress

Experiments with synthetic data
Gaussian distribution classification problem)
Results show good match with theoretical results
Experimental verification for Naïve Bayes
classifiers
utterance classification - NLP application
First experiments with unsupervised estimates
of stream weights
Intra-class based metrics on observations
AV-ASR application

13
Outline

Work package 1
Task 1Blind Source Separation for ASR
Task 2,5 Feature extraction and fusion
Task 4 Segment models for ASR
Work package 2
Task 1,2 VTLN
Task 2 Bayes optimal adaptation
Work package 3
Task 1 Fixed platform integration

14
Dynamical System Segment Model

Based on linear dynamical system
Where x is state, y is observation, u control,
w,v noise
The system parameters should guarantee
Identifiability, Controllability, Observability,
Stability
We investigated more generalized parameter
structures

15
Generalized forms of parameter structures

The systems parameters have an identifiable
canonical form
F ones in the superdiagonal remaining with
zeros. Row ri with free parameters (i1,,n)
H column dim. equal to F. Filled with zeros.
Take r00 and then row i have a one in column
ri-1 1.
P, R filled with free parameters.
Propose a novel element-wise estimation based on
EM algorithm for systems identification.

16
Application on speech

Experiments on clean data from AURORA 2
11 word-models (onenine, zero, oh)
No. of segments of each model depends on the No.
of phones of the word-model
HTK for feature extraction (14 MFCCs)
Alignments taken by HTK using HMMs
4000 training sentences 600 isolated words for
testing

17
Results

Fig. (a) classification performance (using 3
different initializations)
Fig. (b) the log-likelihood is increasing for the
same runs

18
Conclusions Future Work

Developed new forms of Linear State-space models
Proposed a novel element-wise parameter
estimation process
Performed training classification on AURORA 2
based on speech segments and LDS
Results shown correlation between performance and
initialization
In the future
investigation of optimal initialization
Feature-segments alignment (through dynamic
programming)
Investigation of state space dimension

19
Outline

Work package 1
Task 1Blind Source Separation for ASR
Task 2,5 Feature extraction and fusion
Task 4 Segment models for ASR
Work package 2
Task 1,2 VTLN
Task 2 Bayes optimal adaptation
Work package 3
Task 1 Fixed platform integration

20
Vocal Tract Length Normalization.

Linear and Non-Linear Frequency Warping.
Multi-Parameter Frequency Warping.
Warping and Spectral Bias Addition by ML
Estimation.

21
Linear and Non-Linear Warping Analysis

An optimal warping factor a is computed (for
each phoneme), so that the Euclidean spectral
distance (MSE) is minimized,
between the warped g(X) and the corresponding
unwraped spectrum X. Optimization is achieved by
full search
The mapped spectrum is warped according to this
optimal warping factor.

22
Linear and Non-Linear Warping

Frequency Warping is implemented by re-sampling
the spectral envelope at linearly and nonlinearly
frequency indices, i.e.
1. Linear
2. Piece-Wise Non-Linear
3. Power

23
Multi-Parameter Frequency Warping.

After the computation of the optimal warping
factor, we explore alternative piecewise linear
frequency warping strategies
Bi-Parametric Warping Function (2pts)
Different warping factors are evaluated, for the
low (F lt 3 KHz) and high (F 3 KHz) frequencies.
Four-Parametric Warping Function (4pts)
Different warping factors are evaluated for the
frequency ranges, 0-1.5, 1.5-3, 3-4.5 and 4.5-8
KHz.

24
Reduction in MSE Non-linear warping
25
Reduction in MSE Multi-parametric warping
26
Reduction in MSE Bias Removal and
Multi-parametric warping
27
Ongoing work

Implementation of phone-dependent warping in
HTK
Implementation of multi-parametric warping and
bias removal in HTK

28
Outline

Work package 1
Task 1Blind Source Separation for ASR
Task 2,5 Feature extraction and fusion
Task 4 Segment models for ASR
Work package 2
Task 1,2 VTLN
Task 2 Bayes optimal adaptation
Work package 3
Task 1 Fixed platform integration

29
Optimal Bayes Adaptation

Central problem is to determine
Using Bayes rule we have
2 step process
Obtain the priors from the SI
models
Compute the Likelihoods

30
Phone-Based Clustering

Cluster the output distributions based on
common central phone

Number of Mixture
Components
1
2
M
1
2
M
Number of
Dimensions (Cepstrum Coef)
genone 1
genone 2
? is every component of the above representation
and stands for the prior
31
Our Implementation