Pr - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Pr

Description:

Noise Reduction De-noised Input Voiced HR-13.0 % ... New 3GPP2 WB Speech Coding Standard for 3G applications ... Enhancement of the periodicity in low frequency region: – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 2
Provided by: gour3
Category:

less

Transcript and Presenter's Notes

Title: Pr


1
ON THE ARCHITECTURE OF THE CDMA2000
VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH
CODING STANDARD
Milan Jelinek, Redwan Salami, Sassan Ahmadi,
Bruno Bessette, Philippe Gournay and Claude
Laflamme
University of Sherbrooke, Canada - VoiceAge
Corp., Canada - Nokia inc., USA
Encoder Flow Chart
  • VMR-WB
  • Variable-Rate Multi-Mode Wideband Speech Codec
  • New 3GPP2 WB Speech Coding Standard for 3G
    applications
  • Main Features
  • Near Face-to-Face Communication Speech Quality
  • Source and Channel Controlled Operation (4
    Modes)
  • 3GPP/ITU AMR-WB Directly Interoperable in Mode 3
  • Average Bit Rates (ABR)
  • Compliant with CDMA2000 Rate Set 2

VMR-WB Coding Techniques
  • Source-Controlled Operation
  • Hierarchical Signal Classification
  • Operating on Frame-level

1. Voice Activity Detection (VAD)
2. Unvoiced Frame Decision
Based on the following parameters
Coding Type Bitrate kbit/s Description
Inactive Speech Coding CNG ER 1.0 -Noise excited LP filter -Smoothed over time
Inactive Speech Coding CNG QR 2.7 -As previous, but interoperable with AMR-WB CNG
Unvoiced Coding Unvoiced HR 6.2 -13 bit Gaussian codebook (4x/frame)
Unvoiced Coding Unvoiced QR 2.7 -As previous, but randomly chosen vectors
Voiced Coding Voiced HR 6.2 -Frame level signal modification -12 bit ACELP codebook (4x/frame)
Generic Coding Interoperable FR 13.3 -Similar to AMR-WB _at_ 12.65 kbit/s
Generic Coding Generic FR 13.3 -As previous FER protection
Generic Coding Interoperable HR 6.2 -As Interoperable FR, but with random algebraic codebook indices
Generic Coding Signaling HR 6.2 -As previous FER protection
Generic Coding Generic HR 6.2 -Pitch coded 2x/frame -12 bit ACELP codebook (4x/frame)
  • Normalized Correlation

T open-loop pitch period estimate xi
perceptually weighted input signal
  • Spectral Tilt

Eh average energy of last 2 critical bands.
El average energy of pitch-synchronous bins in
the first 10 critical bands
Active speech kbit/s 40 Speech Activity kbit/s
Mode 3 13.3 6.1
Mode 0 12.8 5.7
Mode 1 10.5 4.8
Mode 2 8.1 3.8
  • Frame Energy Variation
  • Noise Estimation Update Decision
  • Based on parameters with low sensitivity to noise
    level
  • Pitch period varying
  • AND normalized correlation at pitch period low
  • AND low estimated order of AR model
  • AND signal energy stationary
  • INDEPENDENT of VAD decision!
  • - Robust to noise level variations
  • - Conservative approach the noise estimation is
    updated only if quite sure the frame is inactive

E32(j) energy maximum in a bloc of 32-samples
  • Relative Frame Energy - Erel

Decision
3. Voiced Frame Decision / Signal Modification
4. Low Energy Decision
  • Channel-Controlled Operation
  • 4 Operational Modes Controlled by Channel
    Conditions
  • Transparent Memory-less Mode Switching
  • Per-Frame Bit Rate Control Capability
  •   Coding Types Relative Usage in Active Speech
  • Mode Switching Performance
  • Enhancements at Decoder
  • Low Frequency Post-processing
  • Enhancement of the periodicity in low frequency
    region

Performance (MOS scores from selection
test) CDMA Specific Modes (Modes 0, 1, 2), WB
Input
Performance (MOS scores from characterization
test)
  • Voiced Decision is an Inherent Part of Original
    Signal Modification Algorithm
  • Frame is coded as voiced if all constraints of
    the modification are satisfied
  • Signal modification is done pitch-synchronously
  • Pitch period evolution is piecewise linear
    (constant at frame end) to avoid pitch period
    oscillations
  • Modified input is synchronous with original
    input at frame end
  • Modification is transparent at least up to 30 of
    active speech frames (in the example bellow, no
    coding is used and 30 of active clean speech
    frames are modified)
  • NB Input Test
  • Modes 0, 1, 2, 3,
  • Clean speech, nominal level
  • Test on Interworking with AMR-WB _at_ 12.65 kbit/s
  • -WB input, clean speech conditions

Purpose To avoid encoding unclassified frames
with low perceptual importance at Full Rate
Condition
Ref 0 AMR-WB _at_ 14.25 Ref 1 AMR-WB _at_ 12.65 Ref
2 AMR-WB _at_ 8.85
Test 0 VMR-WB Mode 0 Test 1 VMR-WB Mode
1 Test 2 VMR-WB Mode 2
Et sum of critical band energies for current
frame, in dB Ef long-term mean of Et for active
speech
Clean Speech Conditions
Example Typical example of a low-energy frame
encoded with Generic HR in mode 2
  • Frame Errors Concealment
  • Lost Frame Concealment
  • Excitation energy and spectral envelope converge
    to estimated noise.
  • Excitation periodicity converges to 0.
  • Convergence rate depends on the signal class of
    last good frame.
  • Recovery after erasure
  • Careful energy control of synthesized speech.
  • Artificial onset reconstruction in case of lost
    voiced onset.

Channel Error Conditions
 
Background Noise Conditions
Write a Comment
User Comments (0)
About PowerShow.com