Other Features - PowerPoint PPT Presentation

About This Presentation
Title:

Other Features

Description:

Standard LEC adds center clipping to remove residual echo. Clipping threshold needs to be properly set by adaptation. Ecan. Y(J) Stein VoP4 17 ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 47
Provided by: yjs8
Category:

less

Transcript and Presenter's Notes

Title: Other Features


1
OtherFeatures
2
Echo Cancellation
3
Acoustic Echo
Ecan
4
Line echo
Ecan
hybrid
hybrid
Telephone 1
Telephone 2
5
Subjective reaction to echo
Ecan
6
Ecan
7
Subjective effect of 15 dB echo returns loss.
Ecan
8
Echo suppressor
Ecan

In practice need more VOX, over-ride, reset, etc.
9
Why not echo suppresion?
Ecan
  • Echo suppression makes conversation half duplex
  • Waste of full-duplex infrastructure
  • Conversation unnatural
  • Hard to break in
  • Dead sounding line
  • It would be better to cancel the echo
  • subtract the echo signal allowing desired signal
    through
  • but that requires DSP.

10
Echo cancellation?
Ecan
  • Unfortunately, its not so easy
  • Outgoing signal is delayed, attenuated, distorted
  • Two echo canceller architectures
  • MODEM TYPE
  • LINE ECHO CANCELLER (LEC)

-
echo path
near end
far end
clean
clean
-
near end
far end
echo path
11
LEC architecture
Ecan

h y b r i d
A/D
NLP
-
Y
filter H
doubletalk detector
adapt
near end
far end
X
D/A
12
Adaptive Algorithms
Ecan
  • How do we
  • find the echo cancelling filter?
  • keep it correct even if the echo path parameters
    change?
  • Need an algorithm that continually changes the
    filter parameters
  • All adaptive algorithms are based on the same
    ideas
  • (lack of corellation between desired signal and
    interference)
  • Lets start with a simpler case - adaptive noise
    cancellation

13
Noise cancellation
Ecan
y
h n
x
e n
y
x
-
n
h
e
14
Noise cancellation - cont.
Ecan
  • Assume that noise is distorted only by unknown
    gain h
  • We correct by transmitting e n so that the
    audience hears
  • y x h n - e n x (h-e) n
  • the energy of this signal is
  • Ey lt y2 gt lt x2 gt (h-e)2 lt n2 gt 2 (h-e) lt
    x ngt
  • Assume that Cxn lt x ngt 0
  • We need only set e to minimize Ey ! (turn knob
    until minimal)
  • Even if the distortion is a complete filter h
  • we set the ANC filter e to minimize Ey

15
The LMS algorithm
Ecan
  • Gradient descent on energy
  • correction to H is proportional to error d times
    input X

H H l d X
16
Nonlinear processing
Ecan
  • Because of finite numeric precision
  • the LEC (linear) filtering can not completely
    remove echo
  • Standard LEC adds center clipping to remove
    residual echo
  • Clipping threshold needs to be properly set by
    adaptation

17
Doubletalk detection
Ecan
  • Adaptation of H should take place only when far
    end speaks
  • So we freeze adaptation when no far end or
    double-talk,
  • that is whenever near end speaks
  • Geigel algorithm compares absolute value of
    near-end speech
  • to half the maximum absolute value in X buffer
  • If near-end exceeds far-end can assume only
    near-end is speaking

18
DataRelays
19
The need for relays
Relays
  • Voice is a relatively forgiving signal (rather
    the ear is)
  • Compression techniques are designed to pass voice
  • but may hopelessly distort other signals
  • Even simple tones (or DTMF) may not be passed by
    coders
  • We could go back to 64Kbps G.711 for non-voice
    signals
  • But isnt that silly?
  • Using 64Kbps for 64bps or even 9.6Kbps data?
  • The solution is to use a relay

20
Open Channel
  • Reasons to use 64Kbps G.711 (open channel)
  • (32 Kbps ADPCM may work as well)
  • Inexpensive
  • Simple design
  • Robust
  • Even open channel is not trivial!
  • Need dynamic BW mechanism
  • Need to detect the event (fax/modem tone, DTMF,
    MF, CPT, etc.)
  • Need to return to compressed voice (end of
    session, time-out)

21
Tone / Fax / Modem Relay
Relays
Demodulate/ Remodulate
Demodulate/ Remodulate
A/D D/A
Analog
64 Kbps
64 Kbps
A/D D/A
Analog
  • Problems
  • need highly accurate detectors
  • need low false alarm rate
  • need appropriate protocol
  • need accurate timing
  • need expensive DSP processing
  • delay may be too large
  • may need spoofing
  • can sides operate with different parameters?

22
VoP DSP Architecture
Relays
Voice Packet Module
Tone Detector
PCM Interface Tone Generator
VAD CNG DISC.
LEC
Packet Voice Protocol
Multi Channel Codec
Speech Coders
Serial Port
Playout Unit
Real Time Operating System
Control
23
VoP System Implementation
Relays
Signaling
Network Management Module
NM info
Telephony Signaling Module Microprocessor
PSTN
ATM / FR / IP Network
Voice Packet Module
Packet Protocol Module
Voice
Voice Signaling Packets
DSP
Microprocessor
24
Quality of Service
25
The meaning of QoS
QoS
  • For general purpose data
  • Every little bit counts
  • only lossless compression
  • best effort delivery
  • Real-time not essential
  • dynamic routing and packet reordering allowed
  • For speech
  • Only subjective quality counts
  • Can use lossy compression
  • Can drop segments with little effect
  • Real-time essential
  • predetermined route preferable (traffic
    engineering)

26
PSTN QoS
QoS
  • Virtually all calls (gt95) completed
  • Once connected virtually no disconnects or faults
  • Toll quality voice
  • Low delay (except satellite calls)
  • Full switching, optimized routing
  • Call Management
  • Fax/Modem functions
  • Wireline and wireless services

27
Paying for QoS
QoS
  • Law of Photonics
  • Price of transmitting a bit drops by half
    every 9 months
  • Free Internet telephony
  • Several firms offering free long distance
    service over Internet
  • Strong compression, significant delay and
    jitter
  • We no longer need to pay for service
  • but we are willing to pay for quality
    of service

28
Paying for QoS
QoS
toll
wire service
mobile service
29
SpeechQualityMeasurement
30
Why does it sound the way
it sounds?
SQM
  • PSTN
  • BW0.2-3.8 KHz, SNRgt30 dB
  • PCM, ADPCM (BER 10-3)
  • five nines reliability
  • line echo cancellation
  • Voice over packet network
  • speech compression
  • delay, delay variation, jitter
  • packet loss/corruption/priority
  • echo cancellation

31
Subjective Voice Quality
SQM
  • Old Measures
  • 5/9
  • DRT
  • DAM
  • The modern scale
  • MOS
  • DMOS

meet neat seat feet Pete beat heat
32
MOS according to ITU
SQM
  • P.800 Subjective Determination of Transmission
    Quality
  • Annex B Absolute Category Rating (ACR)
  • Listening Quality
    Listening Effort
  • 5 excellent relaxed
  • 4 good attention needed
  • 3 fair moderate effort
  • 2 poor considerable effort
  • 1 bad no meaning
  • with feasible
    effort

33
MOS according to ITU (cont)
SQM
  • Annex D Degradation Category Rating (DCR)
  • Annex E Comparison Category Rating (CCR)
  • ACR not good at high quality speech
  • DCR
    CCR
  • 5 inaudible
  • 4 not annoying
  • 3 slightly annoying much better
  • 2 annoying better
  • 1 very annoying slightly better
  • 0 the same
  • -1 slightly worse
  • -2 worse
  • -3 much worse

34
Some MOS numbers
SQM
  • Effect of Speech Compression
  • (from ITU-T Study Group 15)
  • Quiet room 48 KHz 16 bit linear sampling 5.0
  • PCM (A-law/mlaw) 64 Kb/s 4.1
  • G.723.1 _at_ 6.3 Kb/s 3.9
  • G.729 _at_ 8 Kb/s 3.9
  • ADPCM G.726 32 Kb/s 3.8
    toll quality
  • GSM _at_ 13Kb/s 3.6
  • VSELP IS54 _at_ 8Kb/s 3.4

35
The Problem(s) with MOS
SQM
  • Accurate MOS tests are the only reliable
    benchmark
  • BUT
  • MOS tests are off-line
  • MOS tests are slow
  • MOS tests are expensive
  • Different labs give consistently different
    results
  • Most MOS tests only check one aspect of system

36
The Problem(s) with SNR
SQM
  • Naive question Isnt CCR the same as SNR?
  • SNR does not correlate well with subjective
    criteria
  • Squared difference is not an accurate comparator
  • Gain
  • Delay
  • Phase
  • Nonlinear processing

37
Speech distance measures
SQM
  • Many objective measures have been proposed
  • Segmental SNR
  • Itakura Saito distance
  • Euclidean distance in Cepstrum space
  • Bark spectral distortion
  • Coherence Function
  • None correlate well with MOS
  • ITU target - find a quality-measure that does
    correlate well

38
Return to Biology
SQM
  • Standard speech model (LPC)
  • (used by most speech processing/compression/re
    cognition systems)
  • is a model of speech production
  • Unfortunately, speech production and perception
    systems
  • are not matched
  • Speech quality measurement idea
  • use a models of human auditory system
    (perception)
  • ITU-T P.861 Perceptual Speech Quality Measurement
    (PSQM)
  • ITU-T P.862 Perceptual Evaluation of Speech
    Quality (PESQ)
  • ITU-R BS1387 Objective Measurements of Perceived
    Audio Quality

39
Some objective methods
SQM
  • Perceptual Speech Quality Measurement (PSQM)
  • ITU-T P.861
  • Perceptual Analysis Measurement System (PAMS)
  • BT proprietary technique
  • Perceptual Evaluation of Speech Quality (PESQ)
  • ITU-T P.862
  • Objective Measurement of Perceived Audio Quality
    (PAQM)
  • ITU-R BS.1387
  • E-model
  • ITU-T G.107, G.108 ETSI ETR-250

40
Objective Quality Strategy
SQM
speech
41
PSQM philosophy(from P.861)
SQM
Internal Representation
Perceptual model
Audible Difference
Cognitive Model
Perceptual model
Internal Representation
42
PSQM philosophy (cont)
SQM
  • Perceptual Modelling (Internal representation)
  • Short time Fourier transform
  • Frequency warping (telephone-band filtering, Hoth
    noise)
  • Intensity warping
  • Cognitive Modelling
  • Loudness scaling
  • Internal cognitive noise
  • Asymmetry
  • Silent interval processing
  • PSQM Values
  • 0 (no degradation) to 6.5 (maximum degradation)
  • Conversion to MOS
  • PSQM to MOS calibration using known references
  • Equivalent Q values

43
Problems with PSQM
SQM
  • Designed for telephony grade speech codecs
  • Doesnt take network effects into account
  • filtering
  • variable time delay
  • localized distortions
  • Draft standard P.862 adds
  • transfer function equalization
  • time alignment, delay skipping
  • distortion averaging

44
PESQ philosophy(from P.862)
SQM
Perceptual model
Internal Representation
Cognitive Model
Audible Difference
Time Alignment
Perceptual model
Internal Representation
45
E-model
SQM
  • R factor mouth to ear transmission quality model
  • R R0 - Is - Id - Ie A
  • where
  • R0 effect of SNR
  • Is effect of simultaneous impairments
  • Id effect of delayed impairments
  • Ie effect of equipment distortion
  • A advantage of method (e.g. mobility of
    cellphone)
  • Defined in ITU-T G.107, G.108 and ETSI ETR-250

46
VQMon
SQM
  • PSQM and PESQ are intrusive techniques
  • PSQM and PESQ require on-line DSP processing
  • Given the speech encoder
  • shouldnt there be a connection
  • between network parameters e.g. packet loss,
    jitter
  • and speech quality?
  • A nonintrusive technique has been developed
  • based on the E-model
  • Invented by AD Clark (Telchemy) accepted by ETSI
    TIPHON
Write a Comment
User Comments (0)
About PowerShow.com