Title: Digital Audio Signal Processing Lecture-4: Acoustic Echo Cancellation
1Digital Audio Signal ProcessingLecture-4
Acoustic Echo Cancellation
- Marc Moonen
- Dept. E.E./ESAT-STADIUS, KU Leuven
- marc.moonen_at_esat.kuleuven.be
- homes.esat.kuleuven.be/moonen/
2Outline
- Introduction
- Acoustic echo cancellation (AEC) problem
applications - Acoustic channels
- Adaptive filtering algorithms for AEC
- NLMS
- Frequency domain adaptive filters
- Affine projection algorithm (APA)
- Control algorithm
- Post-processing
- Loudspeaker non-linearity
- Stereo AEC
3Introduction
- AEC problem/applications
- Suppress echo
- to guarantee normal conversation conditions
- to prevent the closed-loop system from becoming
unstable - Applications
- Teleconferencing
- Hands-free telephony
- Handsets
4Introduction
- AEC standardization
- ITU-T recommendations (G.167) on acoustic echo
controllers state that - Input/output delay of the AEC should be smaller
than 16 ms - Far-end signal suppression should reach 40..45 dB
(depending on application), if no near-end signal
is present - In presence of near-end signals the suppression
should be at least 25 dB - Many other requirements
5Introduction
- Room Acoustics (I)
- Propagation of sound waves in an acoustic
environment results in - signal attenuation
- spectral distortion
- Propagation can be modeled
- quite well as a linear filtering
- operation
- Non-linear distortion mainly stems from the
loudspeakers. This is often a second order effect
and mostly not taken into account explicitly
6Introduction
Room Acoustics (II) The linear filter model of
the acoustic path between loudspeaker and
microphone is represented by the acoustic impulse
response
- Observe that
- First there is a dead time
- Then come the direct path impulse
- and some early reflections, which
- depend on the geometry of the room
- Finally there is an exponentially decaying tail
called reverberation, coming from multiple
reflections on walls, objects,... - Reverberation mainly depends on reflectivity
(rather than geometry) of the room
7Introduction
- Room Acoustics (III)
- To characterize the reflectivity of a recording
room the reverberation - time RT60 is defined
- RT60 time which the sound pressure level or
intensity needs - to decay to -60dB of its original value
- For a typical office room RT60 is between 100
and 400 ms, for a church RT60 can be several
seconds - Acoustic room impulse responses are highly
time-varying !!!!
ESAT speech laboratory RT60 ? 120 ms
Begijnhofkerk Leuven RT60 ? 3730 ms
Original speech signal
8Introduction
- Acoustic Impulse Response FIR or IIR ?
- If the acoustic impulse response is modeled as
- an FIR filter ? many hundreds to several
thousands of filter taps are needed - an IIR filter ? filter order can be reduced, but
still hundreds of filter coeffs (num. denom.)
may be needed (sigh!) - Hence FIR models are typically used in practice
because... - these are guaranteed to be stable
- in a speech comms set-up the acoustics are highly
time-varying, hence adaptive filtering techniques
are called for (see DSP-CIS) - FIR adaptive filters simple adaptation rules,
no local minima,.. - IIR adaptive filters more complex adaptation,
local minima
9Introduction
Conventional AEC Techniques
-
- Directional loudspeakers and microphones
- Voice controlled switching, loss control
- Howling control stability margin improvement of
the closed loop by - frequency shifting
- using comb filters
- removing resonant peaks
- Non-linear post-processing, e.g. center clipping
10Outline
- Introduction
- Acoustic echo cancellation (AEC) problem appls
- Acoustic channels
- Adaptive filtering algorithms for AEC
- NLMS
- Frequency domain adaptive filters
- Affine projection algorithm (APA)
- Control algorithm
- Post-processing
- Loudspeaker non-linearity
- Stereo AEC
11Adaptive filtering algorithms for AEC
- Basic set-up
-
- Adaptive filter produces a model for acoustic
room impulse response - an estimate of the echo contribution in
microphone signal, which is - then subtracted from the microphone signal
- Thanks to adaptivity
- time-varying acoustics can be tracked
- performance superior to performance of
conventional techniques
12Adaptive filtering algorithms for AEC
- Algorithms to be discussed
- Normalized LMS
- Frequency-domain adaptive filter (FDAF)
- partitioned block freq-domain adaptive
filter (PB-FDAF) - Affine projection algorithm (APA)
- fast affine projection algorithm
13Adaptive Filtering Algorithms NLMS
- NLMS update equations
- in which
-
- N is the adaptive filter length, ? is the
adaptation stepsize, - ? is a regularization parameter and k is the
discrete-time index
14Adaptive Filtering Algorithms NLMS
- Pros and cons of NLMS
- cheap algorithm O(N)
- small input/output delay ( 1 sample)
- for colored far-end signals (such as speech)
convergence of the NLMS algorithm is slow
- (cfr lambda_max versus lambda_min, etc., see
DSP-CIS) - large N then means even slower convergence
- NLMS is thus often used for the cancellation of
short echo paths
15Adaptive Filtering Algorithms
- As some input/output delay is acceptable in AEC
(cfr ITU..), algorithms can be derived that are
even cheaper than NLMS, by exchanging
implementation cost for extra processing delay,
sometimes even with improved performance - Frequency-domain adaptive filtering (FDAF)
- Partitioned Block FDAF (PB-FDAF)
cost reduction optimal (stepsize) tuning for
each subband/frequency bin separately results
in improved performance
16Adaptive Filtering Algorithms Block-LMS
- To derive the frequency-domain adaptive filter
the BLMS algorithm is considered first
in which
N is filter taps, L is block length, n is block
time index
17Adaptive Filtering Algorithms Block-LMS
- Both the BLMS convolution and correlation
operation are computationally demanding. They can
be implemented more efficiently in the frequency
domain using fast convolution techniques, i.e.
overlap-save/overlap-add
convolution
overlap-save
correlation
with
DFT matrix
18Adaptive Filtering Algorithms FDAF
Will only work if
(M is FFT-size)
19Adaptive Filtering Algorithms FDAF
- Typical parameter setting for the FDAF
- FDAF is functionally equivalent to BLMS
- FDAF is significantly cheaper than (B)LMS
- for a typical parameter settingIf N1024
- - Input/output delay is equal to 2L-12N-1,
which may be unacceptably large for realistic
parameter settings e.g. if N1024 and
fs8000Hz ? delay is 256 ms !
(estimate only, in practice lt20)
20Adaptive Filtering Algorithms PB-FDAF
- Overlap-save PB-FDAF N-tap full-band filter
split into (N/P) filter sections, P-taps each,
then apply overlap-save to each section, etc. (P
takes the place of N).
21Adaptive Filtering Algorithms PB-FDAF
- Typical parameter setting
- PB-FDAF is intermediate between LMS and FDAF
(P/N1) - PB-FDAF is functionally equivalent to BLMS
- PB-FDAF is cheaper than LMS If N1024,
PL128, M256 - Input/output delay is 2L-1 which can be chosen
small, in the example above the delay is 32 ms,
if fs8000Hz - used in commercial AECs
(estimate)
22Adaptive Filtering Algorithms PB-FDAF
- PS Instead of a simple stepsize ?, subband
dependent stepsizes can be applied - stepsizes dependent on the subband energy
(subband normalization) - convergence speed increased at only a small extra
cost - PS PB-FDAF algorithm can be simplified by
leaving - out of the weight updating equation
(unconstrained updating)
23Adaptive Filtering Algorithms APA
Affine Projection Algorithm intermediate
between RLS and NLMS, complexity- as well as
performance-wise
NLMS (delta0)
APA
if ?1
P last a-posteriori errors are 0
a-posteriori error is 0
24Adaptive Filtering Algorithms APA
Problem with APA near-end noise amplification
is echo-signal
is near-end noise
orthogonal
contains sorted singular values on diagonal
, multiplied by , appears as noise in the
filter weights
Solution replace by
in update formula
(regularization, similar to delta in
NLMS-formula)
25Adaptive Filtering Algorithms APA
Effect on near-end noise amplification
Smaller if more regularization
Effect on adaptation speed
Slower if more regularization
26Adaptive Filtering Algorithms Fast-APA
APA complexity, i.e.O(P.N), may be reduced to
(roughly) LMS complexity, i.e. O(N)
1. Recursive error vector calculation
(delta0)
Ex mu1, then lower components were already
nulled _at_ time k-1
2. Delayed filter vector update accumulate
filter adaptations based on vector x_k, apply
only when x_k leaves the X_k matrix (at
time kP-1)
Ignore steps 2 3
3. Recursive updating scheme for inverse in
27Outline
- Introduction
- Acoustic echo cancellation (AEC) problem appls
- Acoustic channels
- Adaptive filtering algorithms for AEC
- NLMS
- Frequency domain adaptive filters
- Affine projection algorithm (APA)
- Control algorithm
- Post-processing
- Loudspeaker non-linearity
- Stereo AEC
28Control Algorithm
- Adaptation speed (? ) should be adjusted
- to the far-end signal power, in order to avoid
instability of the adaptive filter? stepsize
normalization (e.g. NLMS) - to the amount of near-end activity, in order to
prevent the filter to move away from the optimal
solution (see DSP-II) - ? double-talk detection
Double talk refers to the situation where both
the far-end and the near-end speaker are active.
29Control Algorithm
- 3 modes of operation
- Near-end activity (single or double talk) (Ed
large) ? - No near-end activity, only far-end activity (Ex
large, Ed small) ? - No near-end activity, no far-end activity (Ex
small, Ed small) ?
? FILT
? FILTADAPT
? NOP
- Ex is short-time energy of
- the far-end signal (p.36)
- Ed is short-time energy of
- the desired signal
30Control Algorithm
- Double-talk Detection (DTD)
- Difficult problem detection of speech during
speech - Desired properties
- Limited number of false alarms
- Small delay
- Low complexity
- Different approaches exist in the literature
which are based on - Energy
- Correlation
- Spectral contents
31Control Algorithm
- Energy-based DTD
- Compare short-time energy of far-end and
near-end channel Ex and Ed - Method 1 If Ed gt ? Ex ? double talk ? is a
well-chosen threshold - Method 2
If ? gt 1 ? double talk
32Post-processing
- Error suppression obtained in practice will
- be limited to /- 30 dB, due to
- non-linearities in the signal path (loudspeakers)
- time-variations of the acoustic impulse responses
- finite length of the adaptive filter
- local background noise
- failing double-talk detection
-
- A post-processing unit is added to further reduce
the residual signal, e.g. center clipping
33Loudspeaker Non-linearity
- If loudspeaker non-linearity is significant
(e.g. consumer applications), then this should be
compensated for - Solution-1 Non-linear model (fixed) in
cancellation path
x
Non-linear model
Adaptive filter
y
d
e
34Loudspeaker Non-linearity
- Solution-2 Inverse non-linear model in forward
path - Advantage if successful, also improves
loudspeaker - characteristic/sound
quality..
x
Inverse non-linear model
Adaptive filter
y
d
e
35Outline
- Introduction
- Acoustic echo cancellation (AEC) problem appls
- Acoustic channels
- Adaptive filtering algorithms for AEC
- NLMS
- Frequency domain adaptive filters
- Affine projection algorithm (APA)
- Control algorithm
- Post-processing
- Loudspeaker non-linearity
- Stereo AEC
36S-AEC Problem Statement
Multi-microphone/multi-loudspeaker systems
complexity for pre- whitening (APA, RLS) of x
can be shared amongst microphone channels.
Apart from this, different microphone signals
are processed independently
Hence from now on consider S-AEC on one
microphone only. Other microphone(s)
similarly (but independently) processed
37S-AEC Problem Statement
Conditioning Problem S-AEC input vectors are
Mono autocorrelation of x-signal (e.g.
speech) has an impact on
convergence (see DSP-CIS) Stereo also
cross-correlation between signals x1 and x2
plays a role now
Large(r) eigenvalue spread (large(r) condition
number) of correlation matrix -gt large(r) impact
on convergence !
38S-AEC Problem Statement
Hence filter input data matrix X will be singular
(with null-space) -gt LS solution non-unique,
and solutions depend on (changes in)
transmission room (G1,G2) !
39S-AEC Problem Statement
In practice
Hence
So that X will be (only) ill-conditioned
(instead of rank-deficient) which however is
still bad news
40S-AEC Fixes
- -Reduce correlation between the loudspeaker
signals by - Complementary comb filters
- White noise insertion (naive solution - large
distortion) - Colored (masked) noise insertion
- Non-linear processing
- Disadvantages
- Signal distortion
- Stereo perception may be affected
-In addition use algorithms that are less
sensitive to the condition number than NLMS,
e.g. RLS, APA, ...
41S-AEC Fixes Complementary comb filters
Comb-1 for x1, comb-2 for x2 Two channels are
decorrelated, BUT stereo image is distorted if
applied below 1 kHz (psycho-acoustics) Can be
combined with another technique below 1 kHz
42S-AEC Fixes Noise insertion
Remove all signal content below the masking
threshold Fill with noise (both channels
independently)
Correlation between input channels decreases
- Poor performance for speech
- Good performance for music
- Computationally intensive
43S-AEC Fixes Non-linear processing
is often a half wave rectifier
is necessary for good performance, but audible
Good results for speech, audible artifacts in
music
44S-AEC Fixes Non-linear processing
Loudspeakers play original signal
Mismatch
Loudspeakers play processed signal
Time