Title: Wideband Codecs for Enhanced Voice Quality
1(No Transcript)
2Wideband Codecs for Enhanced Voice Quality
- Ensuring optimum wideband speech quality in
converged VoIP/mobile applications/services
Claude Gravel VP Engineering VoiceAge Corporation
3Contents
- AMR-WB Alleviates These Challenges
- Market Momentum / Conclusions / Demo
4VoiceAge Corporation who are we?
Business Low bit rate audio compression technologies research, IPR licensing and optimized implementations development
Headquarters Montreal, Canada
Technologies AMR 3GPP, CableLabs narrowband voice codec AMR-WB 3GPP, ITU-T, CableLabs wideband voice codec VMR-WB 3GPP2, CableLabs wideband voice codec AMR-WB 3GPP, DVB-H audio codec
Achievements Won every international audio compression standard for which VoiceAge competed in the last 10 years at 3GPP, 3GPP2, ITU, ETSI, TIA, CableLabs
Implementations World Class optimized implementations and proprietary solutions on multiple O/S and processors/platforms (including TI- ARM-based systems)
Deployment More than 2B mobile phones and over 500M PCs currently use VoiceAges technologies
5International Standards Using ACELP
6Contents
- AMR-WB Alleviates These Challenges
- Market Momentum / Conclusions / Demo
7Speech Synthesis Model
Used in CELP/ACELP Speech Coding
air from lungs
3
2
vocal chords (periodicity)
3
vocal tract articulators (including jaw, lips,
tongue, velum)
2
1
2
3
8 Speech Signal
- Basically, same synthesis model for everyone
- So, speech has a universal structure or
signature
1.25 sec
v o i ce a
ge
180 ms
45 ms
- Voiced fricative
- quasi periodic noise
- lower energy
70 ms
45 ms
45 ms
- Transient
- variable energy
- fast spectral evolution
- Purely Voiced
- quasi periodic
- high energy
- more low frequency energy
- strongly correlated
- Unvoiced
- non periodic
- low energy
- uncorrelated
- more high frequency energy
9What is Wideband Communication?
- Substantially increases captured speech
information - Delivers double the audio signal bandwidth
- Enables digital end-to-end packet-based services
to deliver much better speech communication
quality than traditional PSTN circuit-switched
telephony - VoIP quality differentiator
An Emerging Opportunity to Deliver Vastly
Improved Speech Quality
10Signal Bandwidth
Wideband Speech Below 200 Hz increased
naturalness, presence, and comfort. Above 3400
Hz increased intelligibility and fricative
differentiation
Voiced segment
Unvoiced segment
11Typical Speech Signal Acoustics
Improved voice quality intelligibility (e.g.,
s f differentiation)
Improved speech naturalness, presence and comfort
Everyone looked extremely confused about the
news
Wideband telephony covers much more speech signal
information
12Why Wideband Speech Now?
- Improved intelligibility, naturalness and
presence - Reduces listener fatigue
- Improved hands-free/speakerphone sound quality
- Improves speaker and speech recognition
- High-quality low-bit-rate wideband codecs
- G.722.2/AMR-WB at 724 kbps
- No need to increase network capacity to deliver
better quality sound - Wideband capable devices are available now
- Wideband audio microphones and device acoustics
more affordable - Rising user awareness of enhanced sound quality
- Wideband teleconferencing
- Wideband enterprise/ASP IP telephony
- Wireless/VoIP multimedia services
Speech Coding Technology, Network/Device
Capabilities and Market Demand are Converging
Towards Pervasive Wideband Communications
13Contents
- AMR-WB Alleviates These Challenges
- Market Momentum / Conclusions / Demo
14Voice Processing -- Key for Speech Quality
Control Management
Voice Processing (Digital Communications Domain)
Call Processing
Voice MIBs System MIBs
Speech Codec VAD CNG DTX PLC Variable-Multi Rate
Switching
PCM I/F
Echo Canceller
Echo Canceller
Packet De-Packet RTP
SNMP
Signaling Protocol
Noise Suppressor
Jitter Buffer
Analog Domain
Codec choice impacts network cost and
interoperability A major contributor to the
listener quality experience
15Speech Coding Attributes
- Bit rate
- As low as possible
- Delay
- As little as possible
- Quality
- As high as possible
- Complexity
- As algorithmically simple as possible to
constrain platform processing and memory
requirements and reduce battery consumption in
mobile devices - Robustness
- Effective operation under background noise and
channel impairment conditions - Standards compliance
- Open, tested and interoperable solutions
As required by specific applications
Difficult to attain all of these often divergent
objectives at the same time
16VoIP Speech Quality Challenges
- Missing packets
- Packet delay
- Transcoding
- Background noise
- Due to network congestion or transmission errors
- Wireless networks are more prone to losing
packets
- Due to network congestion or transmission errors
- Real-time communication cant wait too long for
packets or retransmission
- Needed when end-devices and network equipment
support incompatible speech/audio coding
technologies traversing diverse networks such
as across fixed/mobile environments - Increases system costs, adds delays and
introduces audio quality impairments
- Reduces intelligibility and comfort level of
conversations - Ambient office/workplace/household noise
- Street/car noise in mobile applications
17Speech Processing Techniques for Improving VoIP
Voice Quality
- Missing packet impairments can be mitigated
through - Sending additional data to help preserve
information - FEC/Repetition of frames
- Works well for sporadic packet losses but not so
well for bursts of lost packets - Increases transmitted bit rate to send redundant
information
frames
packets
time
A simple forward error correction scheme based on
repeating the previous frame in each packet
18Speech Processing Techniques for Improving VoIP
Voice Quality
- Missing packet impairments can be mitigated
through (contd) - Packet loss concealment (PLC)
- Techniques used by the decoder to estimate
parameter values for missing frames based on the
characteristics of preceding frames - Can be improved by classifying frames and
repeating or adjusting parameters based on
heuristics driven by the classes of the frames
preceding the missing frame(s) - Extrapolate missing frame parameters as a
function of the expected frame class (e.g.,
voiced/unvoiced, stops, nasals, ) - E.g., for voiced frames, repeat the pitch
parameters - Objective limit abrupt changes in energy that
can cause annoying clicks - Late packet arrival processing can also be
leveraged to benefit from some of the information
in a packet that arrives too late - Can benefit PLC methods as applied to subsequent
delayed or lost packets
19Speech Processing Techniques for Improving VoIP
Voice Quality
- Missing packet impairments can be mitigated
through(contd) - Frame Interleaving
- Each packet contains non-contiguous frames to
lower the overall impact on the reconstructed
speech signal of a lost packet - Introduces delays which may make it unsuitable
for real-time speech communication - Works well for audio streaming
frames
packet 1
packet 2
packet 3
time
20Speech Processing Techniques for Improving VoIP
Voice Quality
- Network congestion, which can lead to delayed or
dropped packets, can be alleviated by lowering
the average communication bit rate - VAD/DTX/CNG
- Using Voice Activity Detection (VAD),
Discontinuous Transmission (DTX) and Comfort
Noise Generation (CNG) capabilities to limit
consumed bandwidth during periods of silence
during a conversation - Adaptive codecs
- Source controlled
- Optimal selection of the bit rate and coding
scheme based on active speech - Network controlled
- Adapt the bit rate to make best use of varying
available bandwidth
21Transcoder-Free Network Design for Fixed/Mobile
Convergence
22Improving VoIP Speech Quality
- Mitigating the main issues impacting VoIP speech
quality - Missing packets
- Delayed packets
- Transcoding
- Background noise
- Proper network engineering with integrated QoS
mechanisms (in closed systems) - Choosing the best speech coding/processing
technology (adaptive, enhanced voice quality,
robust and extensible) - Improved packet loss concealment
- Late packet arrival processing
- Time scale modification
- Adaptive jitter buffering
- Transcoder-free network design to avoid increased
system costs, delays and audio quality
impairments - Leverage seamlessly interoperable
standards-proven codecs
- Choose codecs that can readily accommodate
background noise suppression algorithms - Proven noise suppression in standards selection
characterization testing results
23Contents
- AMR-WB Alleviates These Challenges
- Market Momentum / Conclusions / Demo
24Why AMR-WB/G.722.2
- AMR-WB/G.722.2 is the right wideband codec for
network convergence - Very robust
- Supports dynamic adaptation to mobile network
conditions - Includes built-in efficient packet loss
concealment - Performs well even with high bit error rates
- Multi-rate codec delivers very good quality even
at bit rates comparable to those of narrowband
(12 kbps) - No need for potentially costly and time-consuming
network capacity upgrades - Supports VAD/DTX/CNG for enhanced efficiency
- Low-complexity encoder and decoder
- Standardized in 3GPP, ITU-T CableLabs
PacketCable 2.0 - Can interoperate transcoder free across mobile/IP
networks - Eliminates latency, impairments, costs
25Subjective NB-WB Quality Comparison
NB-WB Voice Quality as a Function of Bit Rate
Ericsson Review, No. 3, 2006
AMR-WB/G.722.2 Greatly Improves Perceived Voice
Quality
26AMR-WB Subjective Testing Results
5.0
Clean Condition Test (English Language) AMR-WB/G.7
22.2 Characterization Test
4.5
G.722 _at_ 64 kbps
4.0
G.722 _at_ 48 kbps
3.5
MOS
3.0
G.722.2 _at_ 8.85 kbps
G.722.2 _at_ 12.65 kbps
2.5
G.722.2 _at_ 18.25 kbps
2.0
G.722.2 _at_ 23.05 kbps
1.5
1.0
No Tandem -26 dBov
Self-Tandem -26 dBov
- AMR-WB/G.722.2 Delivers Excellent Wideband Speech
Quality - Even at Low Bit Rates (e.g., MOS at 8.85 kbps
exceeds G.722 at 48 kbps)
27AMR-WB CPU efficiency
- AMR-WB/G.722.2 performance on widely deployed
communications device processors show the codecs
relatively low complexity
ModeBit rate (kbps) 06.6 18.85 212.65 314.25 415.85 518.25 619.85 723.05 823.85
ARM 9E (MHz) Encoder Decoder 3911 349 398 418 418 428 438 438 439
TI C55x (MIPS) Encoder Decoder 19.674.88 21.244.35 24.644.20 27.02 4.30 27.234.39 28.204.55 29.334.61 29.134.83 26.645.21
TI C64x (MIPS) Encoder Decoder 22.155.94 23.755.00 26.984.81 29.364.85 29.584.88 30.684.95 32.104.98 31.765.05 29.975.40
Supported by most commonly used communications
processors
28The Standard Solution Advantage
- Open, collaborative and competitive process
- Requirements specifically address target
applications - Published algorithms and source code
- Permits wider and more effective scrutiny
- Clearer intellectual property ownership
- Rigorous comparative testing under diverse
conditions - Background noise types and levels
- Spoken languages
- Speaker types
- Various network impairments
Interoperable, Open and Fully TestedEnsures that
the best technologies are chosen
29Interoperability between Fixed/Mobile Network
Services
- Transcoder-free Interoperability in Fixed/Mobile
Convergence - 3GPP Wi-Fi/WiMAX ITU-T interoperability
- AMR-WB / G.722.2 end-to-end across networks
- No need for transcoding at media gateways
- Improves on service quality end to end
- Reduces network delays and equipment complexity
- Lowers network costs (equipment costs and
licensing)
30Contents
- AMR-WB Alleviates These Challenges
- Market Momentum / Conclusions / Demo
31Growing Market Momentum
- Ixia
- Tektronix
- GL Comms
- NetHawk
- Many others
- VeriSilicon
- Texas Inst.
- Freescale
- Renesas
- ST Micro
-
- Nokia
- Sony-Ericsson
- Motorola
- Samsung
- Panasonic
- NEC
- CounterPath
- Polycom
- Mobiles, Softphones, VoIP terminals, Conferencing
terminals
- Nokia
- Ericsson
- AudioCodes
- Gateways, ATA/MTA, Softswitches,
- T-Mobile Trial
- Wireless Operators
- Cablecos
- VoIP ASPs
-
Accelerating Adoption of AMR-WB/G.722.2 leads to
Happy Consumers and a Wealthy Telecom Service
Value Chain
32Successful Ericsson/T-Mobile Trial
35
36
11
11
4
3
2
Extremely Good
Good
Quite Good
Nice to Have
Ericsson Review, No. 3, 2006
Quite Bad
- 150 consumers participated for 4 weeks in
Germany, April/May 2006 confirmed
earlier lab MUSHRA tests - More than 90 perceived better voice quality
clarity - Felt a greater sense of privacy, discretion
comfort due to improved voice quality
intelligibility - Could more easily place complete calls in
environments with high background noise - Business users highly valued voice quality for
improving communication, reducing expenses
giving a positive impression - Ericsson anticipates positive outcomes for
operators - More mobile traffic, i.e., more calls for longer
durations - Can offer enhanced services for conferencing,
personalized ringback signals, automatic voice
recognition, voice mail - Can cut costs, e.g., by reducing cost of
acquiring new subscribers, reducing helpdesk costs
Bad
Extremely Bad
33Wideband Speech Communications
- An Evolutionary Migration
- Wideband speech coding is consistent with
narrowband codecs - Bit rates comparable to narrowband codecs
- Similar robustness techniques to handle packet
losses and delays can be used - Low-complexity implementations available for all
popular communications processor types - While vastly improving perceived voice quality
- Strategically deploying wideband capability in
terminal and network equipment enables evolution
to wideband speech communications - Compatible with existing network infrastructure
- No forklift replacements needed a graceful
evolutionary migration, not a disruptive
revolution
34Conclusions
- Speech communications are rapidly moving to
end-to-end digital packets over all networks
wired and wireless towards fixed/mobile
convergence - Provides an opportunity to vastly improve
communications quality through widescale
deployment of wideband speech - Efficient codecs, devices with wideband acoustics
and processing are already available - Many benefits but also some challenges to
consistently delivering high-quality voice end to
end in real-world deployments - Enhanced speech coding and processing techniques
have been developed to help overcome these
challenges - The selection of standards-based advanced
wideband speech coding technologies such as
AMR-WB/G.722.2 is one of the fundamental steps
towards improving voice quality between diverse
devices and converging networks - Adoption of AMR-WB/G.722.2 in the telecom service
delivery value chain is growing wideband speech
quality has been shown to be highly preferred by
consumers
Are your devices, systems, solutions, services
ready?
35Hear the rich sound of wideband
36Wideband Codecs for Enhanced Voice Quality
-
- Thank you!
- claude.gravel_at_voiceage.com
- www.voiceage.com
- Come and talk to VoiceAge at Booth 107