Title: Other Features
1OtherFeatures
2Echo Cancellation
3Acoustic Echo
Ecan
4Line echo
Ecan
hybrid
hybrid
Telephone 1
Telephone 2
5Subjective reaction to echo
Ecan
6Ecan
7Subjective effect of 15 dB echo returns loss.
Ecan
8Echo suppressor
Ecan
In practice need more VOX, over-ride, reset, etc.
9Why not echo suppresion?
Ecan
- Echo suppression makes conversation half duplex
- Waste of full-duplex infrastructure
- Conversation unnatural
- Hard to break in
- Dead sounding line
- It would be better to cancel the echo
- subtract the echo signal allowing desired signal
through - but that requires DSP.
10Echo cancellation?
Ecan
- Unfortunately, its not so easy
- Outgoing signal is delayed, attenuated, distorted
- Two echo canceller architectures
- MODEM TYPE
- LINE ECHO CANCELLER (LEC)
-
echo path
near end
far end
clean
clean
-
near end
far end
echo path
11LEC architecture
Ecan
h y b r i d
A/D
NLP
-
Y
filter H
doubletalk detector
adapt
near end
far end
X
D/A
12Adaptive Algorithms
Ecan
- How do we
- find the echo cancelling filter?
- keep it correct even if the echo path parameters
change? - Need an algorithm that continually changes the
filter parameters - All adaptive algorithms are based on the same
ideas - (lack of corellation between desired signal and
interference) - Lets start with a simpler case - adaptive noise
cancellation
13Noise cancellation
Ecan
y
h n
x
e n
y
x
-
n
h
e
14Noise cancellation - cont.
Ecan
- Assume that noise is distorted only by unknown
gain h - We correct by transmitting e n so that the
audience hears - y x h n - e n x (h-e) n
- the energy of this signal is
- Ey lt y2 gt lt x2 gt (h-e)2 lt n2 gt 2 (h-e) lt
x ngt - Assume that Cxn lt x ngt 0
- We need only set e to minimize Ey ! (turn knob
until minimal) - Even if the distortion is a complete filter h
- we set the ANC filter e to minimize Ey
15The LMS algorithm
Ecan
- Gradient descent on energy
- correction to H is proportional to error d times
input X
H H l d X
16Nonlinear processing
Ecan
- Because of finite numeric precision
- the LEC (linear) filtering can not completely
remove echo - Standard LEC adds center clipping to remove
residual echo - Clipping threshold needs to be properly set by
adaptation
17Doubletalk detection
Ecan
- Adaptation of H should take place only when far
end speaks - So we freeze adaptation when no far end or
double-talk, - that is whenever near end speaks
- Geigel algorithm compares absolute value of
near-end speech - to half the maximum absolute value in X buffer
- If near-end exceeds far-end can assume only
near-end is speaking
18DataRelays
19The need for relays
Relays
- Voice is a relatively forgiving signal (rather
the ear is) - Compression techniques are designed to pass voice
- but may hopelessly distort other signals
- Even simple tones (or DTMF) may not be passed by
coders - We could go back to 64Kbps G.711 for non-voice
signals - But isnt that silly?
- Using 64Kbps for 64bps or even 9.6Kbps data?
- The solution is to use a relay
20Open Channel
- Reasons to use 64Kbps G.711 (open channel)
- (32 Kbps ADPCM may work as well)
- Inexpensive
- Simple design
- Robust
- Even open channel is not trivial!
- Need dynamic BW mechanism
- Need to detect the event (fax/modem tone, DTMF,
MF, CPT, etc.) - Need to return to compressed voice (end of
session, time-out)
21Tone / Fax / Modem Relay
Relays
Demodulate/ Remodulate
Demodulate/ Remodulate
A/D D/A
Analog
64 Kbps
64 Kbps
A/D D/A
Analog
- Problems
- need highly accurate detectors
- need low false alarm rate
- need appropriate protocol
- need accurate timing
- need expensive DSP processing
- delay may be too large
- may need spoofing
- can sides operate with different parameters?
22VoP DSP Architecture
Relays
Voice Packet Module
Tone Detector
PCM Interface Tone Generator
VAD CNG DISC.
LEC
Packet Voice Protocol
Multi Channel Codec
Speech Coders
Serial Port
Playout Unit
Real Time Operating System
Control
23VoP System Implementation
Relays
Signaling
Network Management Module
NM info
Telephony Signaling Module Microprocessor
PSTN
ATM / FR / IP Network
Voice Packet Module
Packet Protocol Module
Voice
Voice Signaling Packets
DSP
Microprocessor
24Quality of Service
25The meaning of QoS
QoS
- For general purpose data
- Every little bit counts
- only lossless compression
- best effort delivery
- Real-time not essential
- dynamic routing and packet reordering allowed
- For speech
- Only subjective quality counts
- Can use lossy compression
- Can drop segments with little effect
- Real-time essential
- predetermined route preferable (traffic
engineering)
26PSTN QoS
QoS
- Virtually all calls (gt95) completed
- Once connected virtually no disconnects or faults
- Toll quality voice
- Low delay (except satellite calls)
- Full switching, optimized routing
- Call Management
- Fax/Modem functions
- Wireline and wireless services
27Paying for QoS
QoS
- Law of Photonics
- Price of transmitting a bit drops by half
every 9 months - Free Internet telephony
- Several firms offering free long distance
service over Internet - Strong compression, significant delay and
jitter - We no longer need to pay for service
- but we are willing to pay for quality
of service
28Paying for QoS
QoS
toll
wire service
mobile service
29SpeechQualityMeasurement
30Why does it sound the way
it sounds?
SQM
- PSTN
- BW0.2-3.8 KHz, SNRgt30 dB
- PCM, ADPCM (BER 10-3)
- five nines reliability
- line echo cancellation
- Voice over packet network
- speech compression
- delay, delay variation, jitter
- packet loss/corruption/priority
- echo cancellation
31Subjective Voice Quality
SQM
- Old Measures
- 5/9
- DRT
- DAM
- The modern scale
- MOS
- DMOS
meet neat seat feet Pete beat heat
32MOS according to ITU
SQM
- P.800 Subjective Determination of Transmission
Quality - Annex B Absolute Category Rating (ACR)
- Listening Quality
Listening Effort - 5 excellent relaxed
- 4 good attention needed
- 3 fair moderate effort
- 2 poor considerable effort
- 1 bad no meaning
- with feasible
effort
33MOS according to ITU (cont)
SQM
- Annex D Degradation Category Rating (DCR)
- Annex E Comparison Category Rating (CCR)
- ACR not good at high quality speech
- DCR
CCR - 5 inaudible
- 4 not annoying
- 3 slightly annoying much better
- 2 annoying better
- 1 very annoying slightly better
- 0 the same
- -1 slightly worse
- -2 worse
- -3 much worse
34Some MOS numbers
SQM
- Effect of Speech Compression
- (from ITU-T Study Group 15)
- Quiet room 48 KHz 16 bit linear sampling 5.0
- PCM (A-law/mlaw) 64 Kb/s 4.1
- G.723.1 _at_ 6.3 Kb/s 3.9
- G.729 _at_ 8 Kb/s 3.9
- ADPCM G.726 32 Kb/s 3.8
toll quality - GSM _at_ 13Kb/s 3.6
- VSELP IS54 _at_ 8Kb/s 3.4
35The Problem(s) with MOS
SQM
- Accurate MOS tests are the only reliable
benchmark - BUT
- MOS tests are off-line
- MOS tests are slow
- MOS tests are expensive
- Different labs give consistently different
results - Most MOS tests only check one aspect of system
36The Problem(s) with SNR
SQM
- Naive question Isnt CCR the same as SNR?
- SNR does not correlate well with subjective
criteria - Squared difference is not an accurate comparator
- Gain
- Delay
- Phase
- Nonlinear processing
37Speech distance measures
SQM
- Many objective measures have been proposed
- Segmental SNR
- Itakura Saito distance
- Euclidean distance in Cepstrum space
- Bark spectral distortion
- Coherence Function
- None correlate well with MOS
- ITU target - find a quality-measure that does
correlate well
38Return to Biology
SQM
- Standard speech model (LPC)
- (used by most speech processing/compression/re
cognition systems) - is a model of speech production
- Unfortunately, speech production and perception
systems - are not matched
- Speech quality measurement idea
- use a models of human auditory system
(perception) - ITU-T P.861 Perceptual Speech Quality Measurement
(PSQM) - ITU-T P.862 Perceptual Evaluation of Speech
Quality (PESQ) - ITU-R BS1387 Objective Measurements of Perceived
Audio Quality
39Some objective methods
SQM
- Perceptual Speech Quality Measurement (PSQM)
- ITU-T P.861
- Perceptual Analysis Measurement System (PAMS)
- BT proprietary technique
- Perceptual Evaluation of Speech Quality (PESQ)
- ITU-T P.862
- Objective Measurement of Perceived Audio Quality
(PAQM) - ITU-R BS.1387
- E-model
- ITU-T G.107, G.108 ETSI ETR-250
40Objective Quality Strategy
SQM
speech
41PSQM philosophy(from P.861)
SQM
Internal Representation
Perceptual model
Audible Difference
Cognitive Model
Perceptual model
Internal Representation
42PSQM philosophy (cont)
SQM
- Perceptual Modelling (Internal representation)
- Short time Fourier transform
- Frequency warping (telephone-band filtering, Hoth
noise) - Intensity warping
- Cognitive Modelling
- Loudness scaling
- Internal cognitive noise
- Asymmetry
- Silent interval processing
- PSQM Values
- 0 (no degradation) to 6.5 (maximum degradation)
- Conversion to MOS
- PSQM to MOS calibration using known references
- Equivalent Q values
43Problems with PSQM
SQM
- Designed for telephony grade speech codecs
- Doesnt take network effects into account
- filtering
- variable time delay
- localized distortions
- Draft standard P.862 adds
- transfer function equalization
- time alignment, delay skipping
- distortion averaging
44PESQ philosophy(from P.862)
SQM
Perceptual model
Internal Representation
Cognitive Model
Audible Difference
Time Alignment
Perceptual model
Internal Representation
45E-model
SQM
- R factor mouth to ear transmission quality model
- R R0 - Is - Id - Ie A
- where
- R0 effect of SNR
- Is effect of simultaneous impairments
- Id effect of delayed impairments
- Ie effect of equipment distortion
- A advantage of method (e.g. mobility of
cellphone) - Defined in ITU-T G.107, G.108 and ETSI ETR-250
46VQMon
SQM
- PSQM and PESQ are intrusive techniques
- PSQM and PESQ require on-line DSP processing
- Given the speech encoder
- shouldnt there be a connection
- between network parameters e.g. packet loss,
jitter - and speech quality?
- A nonintrusive technique has been developed
- based on the E-model
- Invented by AD Clark (Telchemy) accepted by ETSI
TIPHON