Title: Voice Over Internet Protocol
1Voice Over Internet Protocol
- Chapter 3
- Speech-Coding Techniques
2Speech-Coding Techniques
- Voice Quality
- A little about speech
- Voice Sampling
- Quantization
- Types of Speech Coders
- G.711
- Adaptive Differential PCM (ADPCM)
- Analysis-by-Synthesis (Abs) Codecs
- G.728 Low-Delay CELP
- G.723.1 Algebraic Codec-Excited Linear Prediction
(ACELP) - G.729
- Selecting Codecs
- Cascading Codecs
- Tones, Signals, and Dual-Tone Multifrequency
(DTMF) Digits
3Voice Quality
- MOS (Mean Opinion Score)
- Excellence-5
- Good-4
- Fair-3
- Poor-2
- Bad-1
- Perceptual Speech Quality Measurement
- ITU P.861
- Evaluate speech coding
4 A little about Speech
5??????
- ??????
- ?????????????,????????,???????????????????????????
?????? - ??,?????????????????????????,????????????????.
- ??????????,?????????,????????
- ????????????,????????????,???(??,bouncing),
??(????????????????????(????????)? - ???????????????,????????????????
6???(Digitization)
- ??????????????
- ????????????????????????,?????
7???(Digitization)
- ?????????? ????????,???????????????????????
- ??????????????????,???????????????????
- ???????,?????????????????? ?????????????(sampling
frequency) (???a)? - ?????,????????8 kHz (8,000 samples per second)
?48 kHz. ???????????Nyquist??(?????)???? - ??????????????quantization(???b)?
8(No Transcript)
9???
- ???????????????,???????????
- ????????
- ???????????,??????????
- ?????????? (????)
10Nyquist Theorem
- ???????????????,???????????????????????
11Nyquist Theorem
- Nyquist?????????????????????????
- ??(a)??????????????????? (???????????????)?
- ???????????????,??(b)?????????????????????????
- ??,???????1.5?,??(c)???????????????? (alias)
,?????????????????? (????????????)? - ?????????????????????,??????Nyquist rate?
12(No Transcript)
13(No Transcript)
14Nyquist Theorem
- Nyquist Theorem ???????band-limited(????????????
f1 ??????f2,??????????2(f2 - f1). - Nyquist frequency Nyquist??????
- ???????????Nyquist frequency ???,??????????antiali
asing filter?????????????????Nyquist
frequency???? - ????,?????Alias Frequency?????
- falias fsampling - ftrue for ftrue lt fsampling
lt 2 ftrue
15??????Signal to Noise Ratio (SNR)
- ???????????????signal to noise ratio (SNR)
- ??????????
- SNR????decibel???(dB),?1 dB??1bel???????db???SNR??
?????????10?????,?????? - SNR 10log10V2signal /V2noise20log10Vsignal
/Vnoise
16??????Signal to Noise Ratio (SNR)
- ????????????????. ?? ????????????????,
?SNR??20log10(10)20dB. - ?????,10????????????????????,???????SNR??10dB, or
1B. - ??????????????????decibel???,????????????????????,
??????????????
17(No Transcript)
18????????
- ????????????????????
- ???????(Non-uniform quantization)
????????????????????????? - Webers Law???????????????????????
- ?Response8?Stimulus/Stimulus
- ???????k???,???????????????(???? r ????? s)
- dr k (1/s) ds with response r and stimulus s.
19????????
- ????????????????r k ln s C, C ???????????
- ??????????r k ln(s/s0)
- s0 ??????????????(r 0 ?ss0).
- ?????????????????s?????????r??,???????????
- ??????????µ-law??,(?u-law)??????????????A-law,????
??????????? - ?????????????????
20????????
21µ100?255A 87.6???????????????
22?????(filter)
- ????AD????,??????????????????????? ?????????????
- ?????,??????50Hz?10kHz??,??????????????????????ban
d-pass filter????????? - ????????????20Hz?20kHz??
- ?DA ??????,?????????
- ???????,?????????????????????step functions????
- ??????,?DA?????????low-pass filter?
23?????????
- ???????????,??????????????????(Stereo)???????????
, ??????
24????????
- ??,???????????????? PCM (Pulse Code
Modulation)???????? DPCM (??????????)??????????ADP
CM?
25Pulse Code Modulation
- ????????????????????????(sampling and
quantization). - ?????????????,??????????????????????????(?????)?
26(No Transcript)
27?????PCM
- ?????????????????????,?????????????
- ?????????????????????
- ??????????(extraneous).
- ???????????????low-pass filter,???????????????????
?????????
28?????PCM
- ??????????????????????low-pass filtering,?????????
?,???(c)????
29???????
- ????????PCM????,????????????????,???????????????,?
?????????????????? - ????????????????????? (temporal
redundancy),???????,??????????????,???????0??????
??????(histogram)?
30???????
- ????????,????????????????,???????????????
???????????????(spike)? - ??????????????????,??????????????,???????????
31?????????Lossless Predictive Coding
- Predictive coding ??????,?????????????????,??????
???,?????????? - ???????????,????PCM???????
- ????????????,??????????fn??? ???????????????? ?,
???????en???? ????????????
32?????????Lossless Predictive Coding
- ?????????????????????fn-1, fn-2,
fn-3?,??????????, ????????????
33?????????Lossless Predictive Coding
- ???????????????????????
- ????(a)???8kHz????????,?????????8?????
- ???(b),????????????0????
- ???(c)????????????????????????0????
- ?????????????????0,??????????????????????????????
34(No Transcript)
35?????????
- ?? ??????????????0..255?,??????????-255..255?????
??????????????????????????????? - ????????????????????,??SU?SD, ????Shift-Up?Shift-
Down???????????????? - ???????????????????????,??????????-15..16?????????
????????????????????(ShiftSU?SD)?,??????????-15..
16????? - ??100??????? SU, SU, SU, 4?
36?????????
- Lossless predictive coding????????????????????????
???????????????
37?????????
- ???????????,??????? f1, f2,f3, f4, f5 21, 22,
27, 25, 22. ??, ????????????f1
21,??????????????
38DPCM
- Differential PCM???????????,??????????????
- ?????????????????????,???????????,?????Lloyd-Max
???, ????????????????? - ???? ??? fn ??????, ?????, ?? ???????????
39DPCM
- DPCM ????,???????????????? en,???????????
??DPCM??????? - ??????huffman coding??????????
40DPCM
- ????????????????????????? ? ? ? ? ?
- ??????(distortion)???????? ? ? ? ? ? ?
?,????????????????????Lloyd- Max
??????????????????
41DPCM
- ?????,???????????????????????????????,??????????,?
?????? ??????i??, ??????N?????,?????????????
42Types of Speech Coders
- Waveform codecs
- High-quality output and not complex
- Large amounts of bandwidth
- Degrade significantly while using lower bandwidth
- Source codecs (vocoders)
- Match the incoming signal to a mathematical model
- Use linear predictive filter
- A set of model parameters replaces the signal
itself - Private communications such as military
applications - Hybrid codecs
- Perform a degree of waveform matching
- Utilize knowledge of how people produce sound in
the first place
43PCM
- Sample rate 8kHz
- Uniform quantization
- 12 bits per sample (96kbps)
- Non-uniform quantization
- A-law and µ-law
- 8 bits per sample (64kbps)
- MOS 4.3
44ADPCM
- G.721 offer ADPCM-coded speech at 32Kbps
- G.721 has now superseded by G.726
- G.726
- A-law and µ-law
- Converted to 16Kbps, 24Kbps, 32Kbps(MOS 4.0) and
40Kbps
45Analysis-by-Synthesis (Abs) Codecs
- Hybrid coder can provide relatively acceptable
quality at rate down to 16kbps - Vocoder can provide intelligible speech at
2.4kbps and lower - Most successful and commonly used are time-domain
Abs codec - Linear prediction filter model for the vocal
tract - Linear Predictive Coding (LPC) vocoder
- Instead of using a simple two-state,
voiced/unvoiced model - Excitation signal is chosen by attempting to
match to reconstructed speech waveform as closely
as possible to the original speech - MPE, RPE, CELP
46G.728 Low-Delay CELP
- CELP(Code-Excited Linear Predictive)
- Filter
- Change over time
- A codebook of acoustic vector
- Each vector contain set of elements, which
represent various characteristics of the
excitation signals - Transmit a set of information
- Filter coefficients, gain, and a pointer to the
excitation vector chosen - Sender and receiver have the same codebook
- G.728
- Operate on five sample at a time (sampled at
8kHz) - 1024 vectors, index to the vector is 10 bits
- 16kbps
- MOS 3.9
47G.728 Low-Delay CELP encoder
48G.728 Low-Delay CELP decoder
49G.723.1
- Algebraic Code-Excited Linear Prediction (ACELP)
- 6.3 Kbps or 5.3 Kbps (8000Hz)
- Operate on 240 samples (30ms delay)
- 4 subframes of 60 samples
- Utilize a look-ahead of 7.5ms (totally 37.5ms)
- Silence suppression
- SID (Silence Insertion Description), 4 bytes
- MOS 3.8
50G.729
- 8 kbps, a frame of 80 samples(8kHz)
- 5 ms look-ahead (totally 15ms)
- Frame size is 80bits
- MOS 4.0
- G.729 Annex A
- MOS 3.7
- G.729 Annex B
- G.729 Annex D
- G.729 Annex E
51(No Transcript)
52Selecting Codecs
- CDMA QCELP
- RFC 2658
- GSM (Enhanced Full Rate, EFR)
- RFC 1890
- Adaptive Multi-Rate (AMR)
- G.711 does not incorporate any logic to deal with
loss - G.729 have the capability to accommodate a lost
frame by interpolating from previous frame
53Cascaded Codecs
- Minimize the number of times that a given speech
is coded and decoded - In some cases, a VoIP implementation may be such
that cascaded codecs are unavoidable - Ensure that the quality does not degrade
54Tones, Signals, and Dual-Tone Multifrequency
(DTMF) Digits
- Most sophisticated codec available today achieve
bandwidth efficiency without losing significant
quality due to smart algorithms and powerful DSP - Based on how voice is produced in the first place
- Tones and beeps are needed to be transmitted
- Fax tones, various tones such as busy tones, and
DTMF(two-stages dialing, voice mail retrieval,
and other applications. - G.711 can handle these tones, G.723.1, G.729
cannot - Use gateway to handle tones and speech in
different ways - IP network v.s. circuit switch network
- Speech (RTP) external signal(H.323, SIP)
- RTP data can represent the tone/event
55RTP payload format for named events/tones