Title: Blind Separation of Acoustic Signals
1Blind Separation of Acoustic Signals
- Scott C. Douglas
- Southern Methodist University, Dallas TX, USA
Presented by - Praharshana Perera
2 Cocktail Party Problem Convolutive Mixtures
x1(k)
s1(k)
xn(k)
sm(k)
3Convolutive Blind Source Separation Task
Unknown Mixing System
Separation System
- BLIND -
- The source signals are not observed
- No information is available about the mixture
4Unknown Mixing System
- Source signal vector sequence
- These signals pass through an (m x n) linear
time-invariant system with matrix impulse
response Ai , - The measured signal vector sequence
5Unknown Mixing System
- The mixing model is convolutive
- (
convolution operator) - Causality of the system model is assumed
- for l lt 0
-
- No broadband noise is present
-
- where
6Separation System
- Each source can be uniquely extracted from the
sensor measurements - The sequence of (m x n) matrices describe
the separation system - Contains the estimates of the individual sources
- Causality of the separation is assumed
- Number of sensors n gt m number of sources
7Goal of Convolutive BSS
- Goal Adjust the impulse response of the demixing
system such that each output signal yi(k)contains
one filtered version of each source signal
sj(k)without replacement and without any loss of
information -
-
- where is one to one and
arbitrary - Assumption
- Each si(k) is statistically independent of each
sj(l) for all i j all k and all l - For Samples sisi(k), the joint PDF is of the
form
8The Simple Cocktail Party Problem Instantaneous
Blind Signal Separation
Mixing matrix A
x1
s1
Observations
Sources
xn
sm
9Instantaneous Blind Signal Separation
- Mixing model
- Linear and Instantaneous Mixing
- No dispersive effects or time delays are present
- Instantaneous Separation
- Solution
- Adapt the separation matrix such that
- Where is an (m x m) permutation matrix with
one unity entry in any row or column and D is a
diagonal scaling matrix
10Multi Channel Blind Deconvolution
- Convolutive BSS
- Attempts to enforce spatial independence of
output signals - Blind Deconvolution
- Attempts to enforce both spatial and temporal
independence - Additional assumption
- All source signals are spatially and temporally
independent - Ideal Multichannel blind deconvolution
-
- Each yi(k) is a scaled time-shifted version of
sj(k)
11Separating Criteria for BSS
- Efficiency of BSS depends on the separation
criteria that is employed - Convolutive BSS separation criteria
- Density modeling
- Contrast functions
- Correlation based criteria
12Density Modeling Criteria
- Based on Information Theory
- Characterize the shared information in a set of
signals - Separation when no shared information can be
found in out put signals - Method
- Adjust the separation system Bl(k) so that the
joint PDF of y(k) , Py(y)is as close as possible
to some model distribution - Kullback-Leibler divergence measure
- The formula can be expressed using expectation
operator
13- Choice of depends on the assumptions
and priory knowledge of s(k). - If all si(k) are identically distributed
- Leads to a ML estimate of the separation matrix
for given signal statistics - depends on the p.d.f of
the extracted sources in y(k) ? depends on the
impulse response Bl. - A cost function for Bl
can be developed that matches the above formula
up to a constant
Density Modeling Criteria
14Contrast Functions
- Depends on a single extracted output
- Identifies when one output yi(k) contains
elements of only one source signal sj(k) - Do not require significant knowledge about the
nature of the source PDF - Define combine system matrix C BA
- ith extracted output signal
- Contrast function is a cost
function of the distribution of yi(k) - A local maximum over all elements of Cij, 0lt j lt
m defines a separate solution - The normalized kurtosis of the random variable y
is a good candidate for a contrast function -
15Correlation Based Criteria
- Assumes that the source signals are statistically
independent but temporally correlated - Such that the normalized cross-correlation matrix
-
-
- has m unique eigenvalues for some value of l
0 - This condition yields a separating solution
- Normalized cross-correlation matrix of input
signals can be simplified to -
- By defining the eigenvalue decomposition of
- The demixing matrix B can be calculated as
-
16Structures and Algorithms for BSS
- Filter Structures
- Design issues
- Room reverberation
- Stability of the separation system
- Computational complexity
- Solution - Finite impulse response (FIR) filters
- The separation equation will be changed into
- L is the systems filter length
- Bl(k), 0 lt l ltL are FIR filter parameters
17Density Matching BSS using Natural Gradient
Adaptation
- The algorithm based on the cost function
- Calculate the gradient of the cost function
- Make differential updates for the filter
parameters - Problem- Difficult to calculate the gradient
- Solution - Transform the update into a natural
gradient adaptation procedure - Natural gradient method alters the true gradient
search direction for more efficient adaptation -
18Natural Gradient Adaptation
- The conditions on the output sequence y(k)
corresponding to -
- i.e a stationary point of the update.
-
1 lt i j lt m, all l - Where
- This condition implies spatial statistical
independence of the extracted output signals -
-
19Contrast Based BSS under Prewhitening Constraints
- A contrast function identifies an independent
source when it is extracted from a linear mixture - To extract m sources in parallel
- such a procedure does not guarantee that yi(k)
and yj(k) corresponds to different source signals
- To extract all the sources, parameter-dependant
constraints can be used - The separation system B(k) is factorized into two
separation systems - B(k)W(k)P(k)
- P(k) and W(k) are optimized separately
20x(k)
y(k)
S(k)
v(k)
A
P
W
m
m
n
- Two processing stages
- A prewhitening stage
- The goal of the prewhitening stage is to
calculate a signal sequence v(k)P(k)x(k), whose
covariance matrix - outputs temporally and spatially uncorrelated
with unit variance - A separation stage
- in which (m x m) separation matrix W is used to
extract the individual sources from v(k) , y(k)
W(k)v(k) - separation matrix W(k) can be constrained to be
orthonormal - This constraint guarantee that each output signal
yi(k) corresponds to a different source signal,
when contrast function optimization is performed
21Numerical Evaluations
- The real world signal mixtures used are
- A two channel recording of two male speakers in
a conference room - ( fs16kHz sampled,16 bit, t18.75s)
- A moderate level of reverberation
- Nominal level of fan noise
- A two channel recording of a male female a
cappella duet taken from an audio cd - ( fs44.1kHz sampled,16 bit, t139.3s)
- Close harmonies with significant spectral overlap
- Both natural and artificial reverberation effects
-
- Algorithm - density based convolutive BSS
- To evaluate the performance, a signal-to-interfere
nce ratio (SIR) was calculated for the mixed and
separated signals
22Speech-Speech Separation Example
- Signal-to-interference and signal-to-noise ratios
for the mixed and separated signals in the
speech-speech separation example -
-
- The audio quality of the extracted outputs were
natural and listenable
23a capella Duet Example
- Spectrograms of the left and the right channels
before processing over the time interval
113lttlt114.2 s - Spectrograms of the outputs of the algorithm over
the same time interval - As can be seen the males voice is enhanced in
the left output channel, whereas the females
voice is enhanced in the right output channel
24BSS Audio Examples
- The algorithm is based on density modeling
- A speaker has been recorded in a normal office
room with two distance talking microphones
(sampling rate 16 khz) with loud music in the
background - The distance between the speaker,cassette player
and the microphone is about 60 cm in a square
ordering - Microphone 1 Microphone 2
- Separated Source 1 Separated Source 2
25Conclusions and Open Issues
- Time-varying acoustical environments
- Changes in source number
- Robustness issues
- Separating mixtures containing more sources
than sensors - Statistical efficiency of separation methods