Title: Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis
1Blind Source Separation of Acoustic Signals Based
on Multistage Independent Component Analysis
- Hiroshi SARUWATARI,
- Tsuyoki NISHIKAWA, and Kiyohiro SHIKANO
- Graduate School of Information Science
- Nara Institute of Science and Technology
2Contents
- Background of blind source separation (BSS)
- Overview of Independent Component Analysis (ICA)
- Time-domain ICA (TDICA)
- Frequency-domain ICA (FDICA)
- Disadvantages of TDICA and FDICA
- Proposal of Multistage ICA
- Experimental results under real acoustic
condition - Conclusion
3Background
Hands-free speech recognition system
Is it fine tomorrow?
Microphone
Target Speech
Speech recognition system
4Background (Contd)
- Goal
- Microphone array
- Receiver which consists of multiple elements
- To enhance target speech or reduce interference
- Problem of microphone array processing
- A priori information is required.
- Directions of arrival of the sound sources
- Breaks of target speech for filter adaptation
- Realization of high quality hands-free
- speech interface system
5Problem of Microphone Array
- Delay and Sumto produce narrow beam-pattern
- Adaptiveto update filters to reduce the noise
Target
High sidelobe gains noise.
?
It is necessary to observe only noise for filter
learning.
Set target direction.
?
Null
6Blind Source Separation (BSS)
- Approach taken to estimate source signals only
from the observed mixed signals. - Any information about source directions and
acoustic conditions is not required. - Independent component Analysis (ICA) is mainly
used. - Previous works on ICA
- J. Cardoso, 1989
- C. Jutten, 1990 (Higher-order
decorrelation) - P. Common, 1994 (define the term ICA)
- A. Bell et al., 1995 (infomax)
7ICA-Based BSS
No a priori information (unsupervised adaptive
filtering)
Good Morning!
Speaker1
Source1
Observed signal1
Microphone1
Microphone2
Observed signal2
Hello!
Source2
Speaker2
8BSS for Instantaneous mixture
Linearly Mixing Process
Mixing Matrix
Source
Observed
Separation Process
Unmixing Matrix
Separated
Optimize
9Various Criterion for ICA
Separated Signal
- Decorrelation
- To minimize correlation among signals
- in multiple time durations
- Nonlinear function 1
- To minimize higher-order correlation
- Nonlinear function 2
- To assume p.d.f of sources
Sigmoid function
10Cost Function for Nonlinear Function 2
Kullback-Leibler (KL) divergence between
and
11Derivation for Nonlinear Function 2
To update along the negative gradient of
Nonlinear Function 2 ? To be diagonalized
where
This can be approximated by Sigmoid Function in
speech signal.
12Why Instantaneous Mixture Model?
- ICA for Instantaneous Mixture model
- ? Mixing matrix is represented as real-valued.
- ? No assumption for time delay among
microphones and room reflections.
Is it applicable to sound signals in real
acoustic environment?
NO!
It is a useful example to show how ICA can work,
but only mathematical Toy model.
13ICA for Convolutive Mixture Model (1)
- In application to microphone array,
- each received signal has a time delay which
corresponds to the sound direction and position
of each element. - mixing matrix A is not simple scalar-valued
coefficient, - but is represented as convolution filter.
Mixing Process in Real Environment
Mixing Matrix (FIR-Filter)
Source
Observed
?We should use FIR filter for separation process.
14ICA for Convolutive Mixture Model (2)
- To simplify the convolutive mixture down to
- instantaneous mixtures by the frequency
transform
Mixing Process in Frequency Domain
Complex-Valued Mixing Matrix
Source
observed
?We only solve the complex-valued instantaneous
mixture in each subband independently.
15Permutation Problem in FDICA
ICA is conducted in each subband
independently. Ordering and Scaling of outputs
are arbitrary in ICA.
Permutation and Gain Determinacy
16Solutions for Permutation Problem
To use correlation among outputs waveforms
(Murata et al. 1998) To use directivity pattern
of W (Kurita and Saruwatari, 2000) To use
correlation among unmixing matrices in
neighboring frequency bins (Parra et al, 2000,
Asano et al, 2001)
17Problem in ICA-Based BSS
Separation performance under reverberant
conditions significantly degrades.
Why?
Reverberation in typical room 2400-tap
FIR-filter in 8 kHz sampling ?The number of
parameters is too large.
It is necessary to achieve robust BSS method
against reverberation.
18Conventional Approaches
- Frequency-Domain ICA (FDICA)
- To estimate the separation coefficients every
frequency bin in the frequency domain - Time-Domain ICA (TDICA)
- To estimate the separation FIR filter in the time
domain
19FDICA-Based BSS
20Performance of FDICA
- In conventional dereverberation processing,
dereverberation performance is improved as the
number of subbands (filter length) is enlarged.
ltSpeculationgt
In FDICA, as the number of subbands is enlarged,
is source-separation performance also improved?
To investigate the relation between the number of
subbands and source-separation performance
21Experimental Setup
- Interelement
- spacing is 4 cm.
- Reverberation time
- is 300 ms.
- Sound source
- 2-male and 2-female speech from ASJ corpus
- (12 combinations)
- Evaluation Noise Reduction Rate
- Output SNR dB Input SNR dB
22FDICA Algorithm
- Iterative learning algorithm (Amari, 1996)
-
- where
? Step size parameter
23Relation between Number of Subbands and
Separation Performance
24Narrow-Band Signals
Real part (1 kHz)
Signal1
Signal1
Signal2
Signal2
32-Subbands
2048-Subbands
25Relation between Number of Subbands and
Correlation
Higher correlation
26Trade-off Relation between Independence and
Robustness against Reverberation
High-Independence
Robust against Reverberation
Low- Independence
Poor against Reverberation
Noise Reduction Rate
Number of Subbands
Large
Small
27TDICA-Based BSS
28Performance of TDICA
- In conventional dereverberation processing,
dereverberation performance is improved as the
filter length is lengthened.
ltSpeculationgt
In TDICA, as the filter length is lengthened,
is source-separation performance also improved?
To investigate the relation between the filter
length and source-separation performance
29TDICA Algorithm
minimize
where
30TDICA Algorithm (Contd)
- Iterative equation of separation filter
- Natural gradient of Q(w(z))
where a is the step-size parameter
31Proposed Iterative Equation of TDICA
- Iterative equation of separation filter (TDICA1)
32Results of TDICA1 and TDICA2
TDICA 1 (only correlation of same time)
TDICA 2 (correlation of different time)
33Problems and Solution
FDICA
TDICA
- Advantages
- To simplify the convolutive mixture down to
instantaneous mixtures - Easy to converge the separation filter with high
stability - Disadvantages
- Independence assumption collapses in each
narrow-band. - Separation performance is saturated before
reaching a sufficient performance.
- Advantages
- To treat the fullband speech signals where the
independence assumption of sources usually holds - High-convergence possibility near the optimal
point - Disadvantages
- Iterative rule for filter learning is
complicated. - Convergence degrades under reverberant conditions.
34Separation Procedure of MSICA
TDICA
FDICA
Mixing system
- Separated signals of FDICA are regarded
- as the input signals for TDICA.
- Residual cross-talk components of FDICA
- can be removed by TDICA.
35Comparison among Each ICA
- To investigate the relation between the filter
length and source-separation performance of TDICA
part in MSICA - Comparison among separation performances of
TDICA, FDICA, and MSICA
36Relation between Filter Length and Separation
Performance
9.4
37Comparison among Each ICA
- To investigate the relation between the filter
length and source-separation performance of TDICA
part in MSICA - Comparison among separation performances of
TDICA, FDICA, and MSICA
38Comparison Results
39Spectral Distortion (TDICA)
10 taps No whitening, 1000 taps whitening
40Spectral Distortion (FDICA, MSICA)
FDICA No whitening, MSICA No whitening
41Sound Demonstration of MSICA
- Reverberation time 300 ms
- Mixed speech (FemaleMale)
- Separated speech (Female)
- Separated speech (Male)
Female
Male
40
-30
42Conclusion
- Disadvantages of FDICA and TDICA
- Separation performance of FDICA is saturated
before reaching a sufficient performance. - Source separation using long filter fails in
TDICA because the iterative rule for FIR-filter
learning is complicated. - MSICA combining FDICA and TDICA
- In TDICA part in MSICA, separation performance is
improved even with the long filter. - Separation performance of MSICA is superior to
those of TDICA and FDICA.
43Future Work
- Further evaluation in real environment
- Robustness under reverberant conditions
- Larger array with more than 2-element
- To apply BSS to speech recognition system
- Improvement of convergence speed
- On-line and real-time algorithm
44ICA and BSS Where do we go?
We should go to NARA-city!
Why?
454th International Symposium onICA and BSS
(ICA2003), in NARA
- Date April 1-4, 2003
- Place Nara, JAPAN
- Scientific Areas
- ICA and Factor Analysis, PCA etc.
- Blind source separation
- Blind and semi-blind equalization and
deconvolution - Blind identification
- Any signal processing application related with
ICA - URL http//ica2003.jp/
46Analysis Conditions
TDICA, TDICA part in MSICA
Filter length 102000 taps
Number of blocks B (Block length ) 110 (Data length/B)
ICA-iteration 500
FDICA part in MSICA
Number of subbands 1024 point
Frame shift 16 point
ICA-iteration 30
47Analysis Conditions
TDICA, TDICA part in MSICA
Filter length 10 (TDICA) 1000 (MSICA)
Number of blocks B 3 (TDICA) 9 (MSICA)
ICA-iteration 500
FDICA, FDICA part in MSICA
Number of subbands 1024 point
Frame shift 16 point
ICA-iteration 30