Title: Lingfen Sun
1Impact of Packet Loss Location on Perceived
Speech Quality
- Lingfen Sun
- Graham Wade, Benn Lines
- Emmanuel Ifeachor
- University of Plymouth, U.K.
- L.F.Sun_at_jack.see.plym.ac.uk
- j.wade,B.Lines,E.Ifeachor_at_plym.ac.uk
2Outline
- Introduction
- Codec's internal concealment and convergence time
- Perceptual speech quality measurement
- Simulation system
- Loss location with perceived quality
- Loss location with convergence time
- Conclusions and future work
3Introduction
- End-to-end speech transmission quality
- IP network performance (e.g. packet loss and
jitter) - Gateway/terminal (codec loss/jitter
compensation) - Impact of packet loss on perceived speech quality
- Loss pattern (e.g. burst/random)
- Loss location (codec's concealment)
4Introduction (cont.)
- Previous research on loss location
- Concealment performance is speech content related
(e.g. voiced/unvoiced) - Analysis based on MSE or SNR for limited codec
- Perceptual objective methods only to assess
overall quality under stochastic loss simulations - Questions
- How does a packet loss location affect perceived
speech quality ? - How does a packet loss location affect codec's
convergence time (for loss constraint)?
5Codec's internal concealment
- What is codec's concealment?
- When a loss occurs, the decoder interpolates the
parameters for the lost frame from parameters of
previous frames. - Which codec has concealment algorithm?
- G.729/G.723.1/AMR (main VoIP codecs)
- CELP analysis-by-synthesis
- What are the limitations of concealment
algorithms? - During unvoiced(u) or voiced(v)
- During u/v
6Codec's convergence time
- What is convergence time?
- The time taken by decoder to resynchronize its
state with encoder after a loss occurs. It is
also called resynchronization time. - For set up loss constraint distance between two
consecutive losses for new packet loss metrics - What is the relationship between convergence time
with loss location, codec type and packet size?
7Perceptual quality measurement
Reference signal
Objective perceptual quality test
Objective MOS
Degraded signal
- Transform the signal into the psychophysical
representation approximating human perception - Calculating their perceptual difference
- Mapping to objective MOS (Mean Opinion Score)
- Algorithms PSQM/PSQM/MNB/EMBSD/PESQ
8Simulation System
Reference speech
Degraded speech without loss
Bitstream
decoder
encoder
convengence time analysis
Degraded speech with loss
loss simulation
decoder
perceptual quality measure
Reference speech
- Perceptual speech quality analysis with loss
location - Convergence time analysis with loss location
9Speech test sentence
- Speech test sentence is about 6 seconds.
- First talkspurt (about 1.34 second, above
waveform) is used for loss location analysis. - Four voiced segments, V(1) to V(4), which can be
decided by pitch delay in G.729 codec
10Pitch delay from G.729 codec
V(2)
V(1)
V(3)
V(4)
11Loss location with perceived quality
- Each time only one packet loss is created
- Loss position moves from left to right one frame
by one frame - Overall perceptual quality is measured from
PSQM/PSQM, MNB and EMBSD - Packet size 1 to 4 frames/packet
- Codec G.729/G.723.1/AMR
- How does a loss location affect perceived speech
quality ?
12Loss position with quality (1)
Loss position
reference speech
PSQM
degraded speech
PSQM
13Loss position with quality (2)
Loss position
reference speech
PSQM
degraded speech
PSQM
14Loss position with quality (3)
Loss position
reference speech
PSQM
degraded speech
PSQM
15Loss position with quality (4)
reference speech
Loss position
degraded speech
PSQM
PSQM
16Overall PSQM vs loss location (G.729)
G.729
17Overall MNB vs loss location (G.729)
G.729
18Overall EMBSD vs loss location (G.729)
G.729
19Overall PSQM vs loss location (G.723.1)
G.723.1
20Loss location with perceived quality
- Loss location affects perceived quality.
- The loss at unvoiced speech segment has no
obvious impact on perceived quality. - The loss at the beginning of the voiced segment
has the most severe impact on perceived quality. - PSQM yields the most detailed result comparing
to MNB/EMBSD
21Convergence time based on MSE
G.729
22Convergence time based on PSQM
23Convergence time based on PSQM
24Loss location with convergence time
- Convergence time is almost the same for different
packet size - Convergence time for a loss at unvoiced segments
appears stable - Convergence time shows a good linear relationship
for loss at the voiced segments - maximum at the beginning
- linear descending
- Up bound to the end of voiced segments
25Conclusions and future work
- Investigated the impact of loss locations on
perceived speech quality - Investigated the impact of loss locations on
convergence time - The results will be helpful to develop a
perceptually relevant packet loss metric. - Future work will focus on more extensive analysis
of the impact of packet loss on speech content