Title: Blind Separation of Speech Mixtures
1Blind Separation of Speech Mixtures
- Vaninirappuputhenpurayil Gopalan REJU
- School of Electrical and Electronic Engineering
- Nanyang Technological University
2 Introduction
Blind Source Separation
Convolutive
s1
s2
3 Introduction
Convolutive Blind Source Separation
Instantaneous Blind Source Separation
4 Introduction
Convolutive Blind Source Separation
Instantaneous Blind Source Separation
Difficult to separate
Easy to separate
5 Introduction
No. of sources lt No. of sensor
Overdetermined mixing
Easy to separate
No. of sources No. of sensor
Determined mixing
No. of sources gt No. of sensor
Difficult to separate
Underdetermined mixing
6Approaches for BSS of Speech Signals
Types of mixing
Instantaneous mixing
Convolutive mixing
7Approaches for BSS of Speech Signals
Instantaneous mixing
Step 1 Selection of cost function
Step 2 Minimization or maximization of the cost
function
X1
S1
Y1
W
H
S2
Y2
X2
Separated?
8Approaches for BSS of Speech Signals
Instantaneous mixing
Selection of cost function
Statistical independence
Signals from two different sources are independent
Information theoretic
Non-Gaussianity
Central limit theorem Mixture of two or more
sources will be more Gaussian than their
individual components
Non Gaussianity measures
Kurtosis
Negentropy
Nonlinear cross moments
Temporal structure of speech
Non-stationarity of speech
9Approaches for BSS of Speech Signals
Instantaneous mixing
Minimization or maximization of the cost function
simple gradient method
Natural gradient method
e.g. Informax ICA algorithm
Newtons method
e.g. FastICA
10Approaches for BSS of Speech Signals
Convolutive Mixing
Time Domain
Frequency Domain
Advantage No permutation problem Disadvantage Sl
ow convergence High computational cost for long
filter taps
Advantage Low computational cost Fast
convergence Disadvantage Permutation Problem
X1
S1
Y1 Y2
W
H
or
S2
Y2 Y1
X2
11Permutation Problem in Frequency Domain BSS
Corresponding to y3
One frequency bin
Instantaneous ICA algorithm
f1
BSS
K point FFT
K point IFFT
Solving permutation Problem
y1
y1
x1
f2
BSS
y2
y2
x2
x3
y3
y3
fk
BSS
Mixed signals
Still signals are mixed
Separated signals
Corresponding to different sources Due to
permutation problem
12Motivation
Instantaneous
Determined/ Overdetermined
Frequency domain
Frequency bin-wise separation
Permutation problem
mixtures sources
Convolutive
Time domain
BSS
Instantaneous
Mixing matrix estimation
Source estimation
Underdetermined
mixtures lt sources
Frequency domain
Frequency bin-wise separation
Permutation problem
Convolutive
Time domain
Automatic detection of no. of sources
13My Contribution - I
Instantaneous
Determined/ Overdetermined
Frequency domain
Frequency bin-wise separation
Permutation problem
mixtures sources
Convolutive
Time domain
BSS
Instantaneous
Mixing matrix estimation
Source estimation
Underdetermined
mixtures lt sources
Frequency domain
Frequency bin-wise separation
Permutation problem
Convolutive
Time domain
Automatic detection of no. of sources
14Algorithm for Solving the Permutation Problem
One frequency bin
Instantaneous ICA algorithm
f1
BSS
K point FFT
K point IFFT
Solving permutation Problem
y1
x1
f2
BSS
y2
x2
x3
y3
fk
BSS
Mixed signals
Separated signals
Permutation problem solved
Permutation problem
15Existing Method forSolving the Permutation
Problem
Direction Of Arrival (DOA) method
Direction of y1 -30o Direction of y2 20o
Position of the pth sensor
Velocity of sound
16Existing Method forSolving the Permutation
Problem
Direction Of Arrival (DOA) method
- Disadvantages
- Fails at lower frequencies.
- Fails when sources are near.
- Room reverberation.
- Sensor positions must be known.
- Reasons for failure at lower freq
- Lower spacing causes error in phase difference
measurement. - The relation is approximated for plane wave front
under anechoic condition
17Existing Method forSolving the Permutation
Problem
Adjacent bands correlation method
High correlation
Low correlation
Low correlation
f1
BSS
K point FFT
K point IFFT
Solving permutation Problem
y1
x1
f2
BSS
y2
x2
y3
x3
fk
BSS
Mixed signals
Separated signals
18Existing Method forSolving the Permutation
Problem
Adjacent bands correlation method
r11
r11
r11
r11
s1
K-1
K1
K2
..
..
K
K3
Correlation matrix
r12 r21
r12 r21
r12 r21
r12 r21
r11 r12 r21 r22
s2
K
K3
K-1
K1
K2
..
..
r22
r22
r22
r22
With confidence
Without confidence
Example
Example
Change permutation
No change
19Existing Method forSolving the Permutation
Problem
Adjacent bands correlation method
r11
r11
r11
r11
Correlation matrix
s1
K-1
K1
K2
..
..
K
K3
r11 r12 r21 r22
r12 r21
r12 r21
r12 r21
r12 r21
s2
K
K3
K-1
K1
K2
..
..
r22
r22
r22
r22
Disadvantage The method is not robust
20Existing Method forSolving the Permutation
Problem
Combination of DOA and Correlation methods method
DOA Harmonic Correlation Adjacent bands
correlation Advantage Increased robustness
21Proposed algorithm Partial separation
method(Parallel configuration)Reference V. G.
Reju, S. N. Koh and I. Y. Soon, Partial
separation method for solving permutation problem
in frequency domain blind source separation of
speech signals, Neurocomputing, Vol. 71, NO.
1012, June 2008, pp. 20982112.
Time domain stage
Frequency domain stage
22Partial separation method(Parallel configuration)
Time domain stage
Frequency domain stage
23Partial separation method(Cascade configuration)
Parallel configuration
Frequency domain stage
Time domain stage
24Advantages of Partial Separation method
25Comparison with Adjacent Bands Correlation Method
26Comparison with DOA method
PS - Partial Separation method with confidence
check, C1 - Correlation between the adjacent bins
without confidence check, C2 - Correlation
between adjacent bins with confidence check, Ha -
Correlation between the harmonic components with
confidence check, PS1 - Partial separation method
alone without confidence check.
27My Contribution -II
Instantaneous
Determined/ Overdetermined
Frequency domain
Frequency bin-wise separation
Permutation problem
mixtures sources
Convolutive
Time domain
BSS
Instantaneous
Mixing matrix estimation
Source estimation
Underdetermined
mixtures lt sources
Frequency domain
Frequency bin-wise separation
Permutation problem
Convolutive
Time domain
Automatic detection of no. of sources
28Underdetermined Blind Source Separation of
Instantaneous Mixtures
29Mathematical Representation of Instantaneous
MixingReference V. G. Reju, S. N. Koh and I. Y.
Soon, An algorithm for mixing matrix estimation
in instantaneous blind source separation, Signal
Processing, Vol. 89, Issue 9, September 2009, pp.
17621773.
Time domain
P No. of mixtures Q No. of sources
Time-Frequency domain
30Single Source Points in Time-Frequency domain
Single source point 1
Single source point 2
0
0
31Single Source Points in Time-Frequency domain
Single source point 1
Single source point 2
32Single Source Points in Time-Frequency domain
Single source point 1
Single source point 2
Scalar
Scalar
Scalar
Scalar
.?. At single source point 1
.?. At single source point 2
33Scatter Diagram of the Mixtures When Source are
Perfectly Sparse
Example
0 0 0
0 0
34Scatter Diagram of the Mixtures When Source are
Not Perfectly Sparse
Example
0 0 0
0
0 0
35Scatter Diagram of the Mixtures when Sources are
Sparse
No. of sources 6 No. of mixtures 2
36Scatter Diagram of the Mixtures when Sources are
Sparse, After Clustering
No. of sources 6 No. of mixtures 2
37Scatter Diagram of the Mixtures when Sources are
Not Perfectly Sparse
Objective Estimation of the single source
points.
No. of sources 6 No. of mixtures 2
38Principle of the Proposed Algorithm for the
Detection of Single Source Points
Single source point 1
Single source point 2
Scalar
Scalar
Scalar
Scalar
Multi source point
39Principle of the Proposed Algorithm for the
Detection of Single Source Points
Single source point 1
Single source point 2
Scalar
Scalar
Scalar
Scalar
Multi source point
40Principle of the Proposed Algorithm for the
Detection of Single Source Points
Average of 15 pairs of speech utterances of
length 10 s each
SSP
MSP
41Proposed Algorithm for the Detection of Single
Source Points
SSP
MSP
42Elimination of Outliers
SSPs detection
Clustering
Outlier elimination
43Experimental Results
No. of mixtures 2, No. of sources 6
44Detected Single Source Points,Three mixtures
No. of mixtures 3, No. of sources 6
45Comparison with Classical Algorithms for
Determined Case
Average of 500 experimental results
No. of mixtures 2 No. of sources 2
-gt
46Comparison with Method Proposed in 1,
Underdetermined case
Normalized mean square error (NMSE) in mixing
matrix estimation (dB)
P No. of mixtures Q No. of sources
Order of the mixing matrices (PxQ)
1 Y. Li, S. Amari, A. Cichocki, D. W. C. Ho,
and S. Xie, Underdetermined blind source
separation based on sparse representation, IEEE
Transactions on Signal Processing, vol. 54, p.
423437, Feb. 2006.
47Advantages of the Proposed algorithm
1) Much simpler constrain the algorithm does
not require single source zone.
2) Separation performance is better.
3) The algorithm is extremely simple but
effective
Step 1 Convert x in the time domain to the TF
domain to get X. Step 2 Check the
condition Step 3 If the condition is
satisfied, then X(k, t) is a sample at the SSP,
and this sample is kept for mixing matrix
estimation otherwise, discard the point. Step
4 Repeat Steps 2 to 3 for all the points in the
TF plane or until sufficient number of SSPs are
obtained.
-gt
48My Contributions III, IV and V
Instantaneous
Determined/ Overdetermined
Frequency domain
Frequency bin-wise separation
Permutation problem
mixtures sources
Convolutive
Time domain
BSS
Instantaneous
Mixing matrix estimation
Source estimation
Underdetermined
mixtures lt sources
Frequency domain
Frequency bin-wise separation
Permutation problem
Convolutive
Time domain
Automatic detection of no. of sources
49Underdetermined Convolutive Blind Source
Separation via Time-Frequency MaskingReference
V. G. Reju, S. N. Koh and I. Y. Soon,
Underdetermined Convolutive Blind Source
Separation via Time- Frequency Masking, IEEE
Transactions on Audio, Speech and Language
Processing, Vol. 18, NO. 1, Jan. 2010, pp.
101116.
STFT
Apply mask
Mic 1
Mixture in TF domain
STFT
Apply Mask
Mic P
Mask estimation
Separated signals in TF domain
50Mathematical Representation
Time domain
P No. of mixtures Q No. of sources
Frequency domain
51Single source points
Instantaneous mixing
Single source point 1
Single source point 2
Real
Real
Real scalar
Real scalar
Real scalar
Real scalar
Convolutive mixing
Single source point 1
Single source point 2
Complex
Complex
Complex scalar
Complex scalar
52Basic Principle of Single Source Points Detection
Convolutive mixing
Single source point 1
Single source point 2
Complex
Complex
Complex scalar
Complex scalar
-gt
The Hermitian angle between the complex vectors
u1 and u2 will remain the same even if the
vectors are multiplied by any complex scalars,
whereas the pseudo angle will change.
53Algorithm for Single Source Points Detection
?H2
?H1
?H1
OR
?H2
54Mask Estimation by k-means (KM)
Clean
Estimated
55Mask Estimation by Fuzzy c-means (FCM)
Clean
Estimated
56Automatic Detection of Number of Sources
Cluster validation technique
For c 2 to cmax Cluster the data into c
clusters. Calculate the cluster validation
index. End Take c corresponding to the best
cluster as the number of sources.
-gt
57Elimination of Low Energy Points