Title: Subband cocktail-party speech separation: CASA vs. BSS
1Subband cocktail-party speech separation CASA
vs. BSS
Seungjin Choi Department of Computer Science and
Engineering POSTECH, Korea seungjin_at_postech.ac.kr
Co-work with Frederic Berthommier ICP, INPG,
France
2Number95 Stereo Database
ST-Numbers95 Database ICP/INP Grenoble Authors
E.Tessier and F. Berthommier
Left source
Right source
Reference
Mixture
A large database of binary mixtures of sentences
(n613) has been recorded by Tessier and
Berthommier, 1999. The signal of Numbers95 is
played by loudspeakers and recorded. The temporal
overlap between words is about 75 and the
relative level is 0dB. The setup is static. Only
332 mixture sentences truncated at 1 s are used
in the present study.
3Filterbank decomposition
4The CASA Model
5Reconstruction Acuracy
6Gain of CASA
7Gain of CASA Relative Level
RAY
RAX
8Subband effect for CASA
dB
Effect of the number of subbands (nbsb) for the
CASA model on the RA (in dB). From left to right
averaged left source RA, averaged right source
RA, averaged leftright RA over all frames. The
number of subbands varies from 1 to 5 and the two
curves correspond to duration 256 and 512 bins.
The RA of the mixture, which is subtracted for
gain evaluation is labelled ().
9Effect of nbsb RA
Mixt.
Left
Right
2
4
10Subband effect for CASA Gain
Right
Left
nbsb1
nbsb4
11The BSS Model
Xl(t)
Yl(t)
Gain Non linear function Delayed output
Xr(t)
Yr(t)
12Gain of BSS Relative Level
RAY
RAX
13Subband effect for BSS
left
right
leftright
10
10
20
9.5
9.5
19
9
9
18
8.5
8.5
17
8
8
16
dB
dB
dB
7.5
7.5
15
7
7
14
6.5
6.5
13
6
6
12
2
3
10
5.5
5.5
11
100
5
5
10
1
2
3
4
1
2
3
4
1
2
3
4
nbsb
nbsb
nbsb
Effect of the number of subbands (nbsb) for the
BSS model on the RA (in dB). From left to right
av. left source RA, av. right source RA, av.
leftright RA over all frames. The number of
subbands varies from 1 to 4 and the three curves
correspond to nbp 2,3,10, 100. The RA of the
mixture is labelled (). In each figures, two
points are added at nbsb1 for the "BSS giv"
condition (?) and for "BSS ori" data (?).
14RA and Gain for BSS
Speech Separation Program (C) POSTECH Authors
S. Choi and H. Hong
Left
20
RAX
15
Mixt.
-
-
10
RA (dB)
5
0
RAY
-5
0
4
6
8
10
12
14
2
Left
Right
15
10
-
RA (dB)
5
Right
0
-5
0
2
4
6
8
10
12
14
Frame 1024 bins with half overlap
15Subband effect for BSS Gain
16Demixing filters
17Coherence spectrograms
18Effect of nbp Coherence spectrograms
Left
Right
Coh
NBP3
3
0.68
NBP10
10
0.65
NBP100
100
0.60
19Coherence statistic
Effect of the number of subbands (nbsb) on the
coherence index for the BSS model. Left average
leftright RA over all frames. Right coherence
defined as the mean of the coherence spectrogram.
The number of subbands varies from 1 to 4 and the
three curves correspond to nbp 2,3,10, 100. The
RA of the mixture is labelled (). The CohX
coherence between the two mixture channels is
labelled () in the right figure. In each
figures, two points are added at nbsb1 for the
"BSS giv" condition (?) and for "BSS ori" data
(?).
20Summary results
CASA
BSS
REF
Left
Right
Right
mean
Left