Title: HIWIRE meeting
1- HIWIRE meeting
- ITC-irst
-
- Activity report
- Marco Matassoni,
- Piergiorgio Svaizer
- March 9.-10. 2006
- Torino
2Outline
- Beamforming and Adaptive Noise Cancellation
- Environmental Acoustics Estimation
- Audio-Video data collection
- Multi-channel pitch estimation
- Fixed-platform prototype acquisition module
3Beamforming DS
Availability of multi-channel signals allows to
selectively capture the desired source
- Issues
- estimation of reliable TDOAs
- Method
- CSP analysis over multiple frames
- Advantages
- robustness
- reduced computational power
-
4DS with MarkIII
- Test set
- set N1_SNR0 of MC-TIDIGITS (cockpit noise),
MarkIII channels - clean models, trained on original TIDIGITS
- Results (WRR )
-
C_1 38.5
C_32 50.8
DS_C8 79.9
DS_C16 83.0
DS_C32 85.3
DS_C64 85.4
5Adaptive Noise Cancellation
A remote microphone can be used as reference for
noise estimation
6NMLS
- The tested algorithm is the Normalized Mean Least
Squares iterativelly estimate a FIR filter that
minimizes the difference between the primary
channel and the reference - We implemented two algorithms
- time domain
- frequency domain (subband)
7DS ANC
- Test set
- set N1_SNR0 of MC-TIDIGITS (cockpit noise),
MarkIII channels - clean models, trained on original TIDIGITS
- Results (WRR)
C_32 (T) 64.7
C_32 (F) 72.4
DS_C64 (T) 81.8
DS_C64 (F) 88.4
8Acoustics estimation
- Idea
- Simulate in a realistic way an environment (and
the noise) - Method
- Measure several impulse responses in an
environment with a multi-channel equipment
(through reproduction of chirp signals)
preserving relative amplitudes and mutual delays - Generate appropriate noisy signals starting from
clean data - The derived acoustics models perform better in
the given environment (also) using real data.
9AudioVideo Data Collection
- Idea
- In a noisy environment exploit additional
features from video data - (collaboration with NTUA and TUC)
- Design of AV corpus
- Task English connected digits, HIWIRE
commands/keywords - Channels 4 audio, 3 video
- Environment acoustically-treated room noise
diffusion
10AudioVideo Setup
11AudioVideo Setup
- Audio
- 4 omnidirectional PZM Shure microphones, 16
kHz/16 bits - background noise diffused by 2 loudspeakers
- Video
- Webcam 640x480, 30 fps color, Unix timestamps
- Stereoscopic camera pair 640x480, 30 fps - bw or
15 fps color, perfectly synchronous - Current data sets
- 8 speakers / connected digits
- 2 speakers / HIWIRE keyword lists
12Fixed prototype acquisition device
- Hardware platform
- 8 Shure microphones RME Hammerfall
- Software environment
- Linux, ALSA driver
- Acquisition module
- acquires synchronously multiple channels (8)
- writes (to its standard output/file) the enhanced
signal additional information/features
(start/end speech hyphoteses, voiced/unvoiced,
pitch, )
13Multi-channel pitch analysis
The basic principle is that we can exploit many
observations of the same speech process Once
located the speaker, we can take into account the
different propagation time at the microphones and
perform a time-alignment
Pitch analysis can be performed using adjacent
time intervals extracted from different
microphone signals Basic correlation
techniques AMDF, AUTOC, WAUTOC
14Single Channel Method Weighted Autocorrelation
- For every frame of length N
- (see Shimamura-Kobayashi, Trans. on SAP, 2001)
Hz Samples
15A Multichannel WAUTOC Method
- WAUTOC is computed for each channel, and summed
over the M channels. - For a given frame
- Issues
- Weights wi may represent the channel reliability
- Use of possible intraframe smoothing of the
resulting fundamental frequency contour, which
could improve the overall accuracy
16Video example distant-talking speech recognition
17Video example multi-channel pitch estimation
18Forthcoming activities
- more effective combination of beamforming and
ANC - test also ANC before DS beamforming
- test post-filtering after DS
- audio-video collection an improved audio/video
synchronization would be advisable - audio-video collection select best balance
beetween quality and frame rate - acoustically characterize the target environment
(prototype) - integrate the selected features in the
multi-channel front-end