Title: Position Calibration of Audio Sensors and Actuators
1Position Calibration of Audio Sensors and
Actuators in a Distributed Computing
Platform Vikas C. Raykar Igor Kozintsev
Rainer Lienhart University of Maryland,
CollegePark Intel Labs, Intel Corporation
2Motivation
- Many multimedia applications are emerging which
use multiple audio/video sensors and actuators.
Speakers
Microphones
Distributed Capture
Current Work
Distributed Rendering
Cameras
Number Crunching
Displays
Other Applications
3What can you do with multiple microphones
- Speaker localization and tracking.
- Beamforming or Spatial filtering.
X
4Some Applications
Speech Recognition
Hands free voice communication
Novel Interactive audio Visual Interfaces
Multichannel speech Enhancement
Smart Conference Rooms
Audio/Image Based Rendering
Audio/Video Surveillance
Speaker Localization and tracking
MultiChannel echo Cancellation
Source separation and Dereverberation
Meeting Recording
5More Motivation
- Current work has focused on setting up all the
sensors and actuators on a single dedicated
computing platform. - Dedicated infrastructure required in terms of
the sensors, multi-channel interface cards and
computing power. -
- On the other hand
- Computing devices such as laptops, PDAs,
tablets, cellular phones,and camcorders have
become pervasive. - Audio/video sensors on different laptops can be
used to form a distributed network of sensors.
Internal microphone
6(No Transcript)
7(No Transcript)
8Common TIME and SPACE
- Put all the distributed audio/visual input/output
capabilities of all the laptops into a common
TIME and SPACE. - For the common TIME see our poster.
- Universal Synchronization Scheme for Distributed
Audio-Video Capture on Heterogenous Computing
Platforms R. Lienhart, I. Kozintsev and S. Wehr - In this paper we deal with common SPACE i.e
estimate the 3D positions of the sensors and
actuators. - Why common SPACE
- Most array processing algorithms require that
precise positions of microphones be known. - Painful and Imprecise to do a manual measurement.
9This paper is about..
Z
Y
X
10If we know the positions of speakers.
Y
If distances are not exact
If we have more speakers
Solve in the least square sense
?
X
11If positions of speakers unknown
- Consider M Microphones and S speakers.
- What can we measure?
Distance between each speaker and all
microphones. Or Time Of Flight (TOF) MxS TOF
matrix Assume TOF corrupted by Gaussian
noise. Can derive the ML estimate.
Calibration signal
12Nonlinear Least Squares..
More formally can derive the ML estimate using a
Gaussian Noise model
Find the coordinates which minimizes this
13Maximum Likelihood (ML) Estimate..
If noise is Gaussian and independent ML is same
as Least squares
we can define a noise model and derive the ML
estimate i.e. maximize the likelihood ratio
Gaussian noise
14Reference Coordinate System
Reference Coordinate system Multiple Global
minima
Positive Y axis
Similarly in 3D 1.Fix origin (0,0,0) 2.Fix X
axis (x1,0,0) 3.Fix Y axis (x2,y2,0) 4.Fix
positive Z axis x1,x2,y2gt0
Origin
X axis
Which to choose? Later
15On a synchronized platform all is well..
16However On a Distributed system..
17The journey of an audio sample..
Network
This laptop wants to play a calibration signal
on the other laptop. Play comand in software.
When will the sound be actually played out
from The loudspeaker.
18On a Distributed system..
Time Origin
Signal Emitted by source j
t
Playback Started
Signal Received by microphone i
Capture Started
t
19MS TOF Measurements
Joint Estimation..
Microphone and speaker Coordinates 3(MS)-6
Microphone Capture Start Times M -1 Assume
tm_10
Speaker Emission Start Times S
Totally 4M4S-7 parameters to estimates MS
observations Can reduce the number of parameters
20Use Time Difference of Arrival (TDOA)..
Formulation same as above but less number of
parameters.
21Nonlinear least squares..
Levenberg Marquadrat method
Function of a large number of parameters Unless
we have a good initial guess may not converge to
the minima. Approximate initial guess required.
22Closed form Solution..
- Say if we are given all pairwise distances
between N points can we get the coordinates.
23Classical Metric Multi Dimensional Scaling
dot product matrix Symmetric positive
definite rank 3
Given B can you get X ?....Singular Value
Decomposition
Same as Principal component Analysis But we can
measure Only the pairwise distance matrix
24How to get dot product from the pairwise distance
matrix
i
j
25Centroid as the origin
Later shift it to our orignal reference
Slightly perturb each location of GPC into two to
get the initial guess for the microphone and
speaker coordinates
26Example of MDS
27Can we use MDS..Two problems
1. We do not have the complete pairwise
distances 2. Measured distances Include the
effect of lack of synchronization
UNKNOWN
UNKNOWN
28Clustering approximation
29Clustering approximation
i i
30Finally the complete algorithm
Approximation
Clustering
TOF matrix
Approx Distance matrix between GPCs
Approx ts
Dot product matrix
Approx tm
Dimension and coordinate system
MDS to get approx GPC locations
TDOA based Nonlinear minimization
perturb
Approx. microphone and speaker locations
Microphone and speaker locations
tm
31Sample result in 2D
32Algorithm Performance
-
- The performance of our algorithm depends on
- Noise Variance in the estimated distances.
- Number of microphones and speakers.
- Microphone and speaker geometry
- One way to study the dependence is to do a lot
of monte carlo simulations. - Or given a noise model can derive bounds on how
worst can our algortihm perform. - The Cramer Rao bound.
33- Gives the lower bound on the variance of any
unbiased estimator. - Does not depends on the estimator. Just the data
and the noise model. - Basically tells us to what extent the noise
limits our performance i.e. you cannot get a
variance lesser than the CR bound.
Rank Deficit..remove the Known parameters
Jacobian
34Number of sensors matter
35Number of sensors matter
36Geometry also matters
37Geometry also matters
38Calibration Signal
39Time Delay Estimation
40Time Delay Estimation
- Compute the cross-correlation between the signals
received at the two microphones. - The location of the peak in the cross correlation
gives an estimate of the delay. - Task complicated due to two reasons
- 1.Background noise.
- 2.Channel multi-path due to room
reverberations. - Use Generalized Cross Correlation(GCC).
-
- W(w) is the weighting function.
- PHAT(Phase Transform) Weighting
41Synchronized setup bias 0.08 cm sigma 3.8 cm
42Distributed Setup
43Experimental results using real data
44Summary
- General purpose computers can be used for
distributed array processing - It is possible to define common time and space
for a network of distributed sensors and
actuators. - For more information please see our two papers or
contact igor.v.kozintsev_at_intel.com - rainer.lienhart_at_intel.com
- Let us know if you will be interested in
testing/using out time and space synchronization
software for developing distributed algorithms on
GPCs (available in January 2004)
45 Thank You ! Questions ?