Title: Audio Codecs
1Audio Codecs
Miikka Vilermo Nokia Research Center Audio
Visual Systems Laboratory
2Introduction
- Codecs evolve and new technologies emerge.
- What can we do with all these codecs?
- Will the emerging technologies change the status
quo? - What do people want?
3Audio Codecs
- Recent technical advances
- Existing and emerging codecs
- How good is good enough? Codec requirements.
- Important issues outside todays presentation
- One case closed and another reopened (if time)
- Questions
4Recent Technical Advances
- Spectral Band Replication (SBR)
- Binaural Cue Coding (BCC)
- Integer-to-Integer Modified Discrete Cosine
Transform (INTMDCT)
5Spectral Band Replication (SBR)
- SBR is one method of Bandwidth Extension (BWE).
- BWE is a class of methods to increase the
perceived bandwidth without using many bits.
Psychoacoustics - SBR was introduced by Coding Technologies.
- The technology is applicable to any coder. Eg
AAC, MP3Pro - Achieves very high quality _at_ 48 kbps stereo.
- SBR is going to be standardised as High
Efficiency Advanced Audio Coding (HE-AAC).
6High-level block diagram of the SBR incorporated
to an audio encoder (a) and audio decoder (b).
(Juha Ojanperä)
7Block diagram of the SBR encoder module combined
with AAC core encoder. (Juha Ojanperä, Miikka
Vilermo)
8Example of the time/frequency grid of the SBR.
(Juha Ojanperä, Miikka Vilermo)
9Binaural Cue Coding (BCC)
- Traditional multichannel coding requires the
number of channels x mono bitrate kbps. - Without specific matrixing, traditional
multichannel coding is restricted to a certain
number of channels e.g. 5.1 and speaker
placement. - Binaural Cue Coding (BCC) has two versions
flexible rendering and natural rendering - In flexible rendering the original multichannel
input is downmixed to (usually) one channel and
the spatial information is sent as one separate
low bitrate parametric stream. The decoder then
renders as many channels as are needed based on
the parameterised spatial image. The decoder can
also apply Head Related Transfer Functions
(HRTFs) to create surround headphone playback. - In natural rendering one parameterised stream of
spatial information is created for each of the
original channels. This increases the bitrate and
limits rendering options in the decoder, but also
improves quality. - BCC can also be used as parametric stereo.
10BCC continued
- Typical parameters for BCC are
- Inter-Channel Level Difference (ICLD)
- Inter-Channel Time Difference (ICTD)
- Inter-Channel Correlation (ICC)
- Parameters are applied on critical bands.
- BCC is based on the assumption that on every
critical band the dominant source defines the
spatial perception. - BCC doesnt suffer from unmasking effects since
the quantization noise is automatically rendered
to the same direction as the source.
11Integer-to-Integer Modified Discrete Cosine
Transform (INTMDCT)
- Lossy coding is important, but how could you
extend that to lossless coding? - Modified Discrete Cosine Transform (MDCT) is the
most popular audio coding transform, but
losslessly coding floating point values is
difficult. - Integer-to-Integer Modified Discrete Cosine
Transform (INTMDCT) is similar to MDCT but if the
input is integers then the output is integers
too. - It is possible to create an integer version of
any transform where the transform matrix can be
expressed as a product of matrices that have ones
in the diagonal and all other elements are zero
except in either one row or column.
12INTMDCT Continued
- Givens rotations (butterfly operations) can be
expressed as such matrices. Thus all matrices
that can be expressed in Givens rotations can be
used as basis for an integer transform. - MPEG has an ongoing standardisation on lossless
coding. INTMDCT was a basis for that work.
13Block diagram of scalable lossless INTMDCT
enhanced perceptual codec
14Existing and Emerging Codecs
- Internet codecs
- Multichannel codecs
- Lossless codecs
- Low delay codecs
- New codecs
- Others
15Internet Codecs
- MP3
- MPEG-1 layer 3
- largest user base
- near CD-quality can be over 192 kbps for
difficult material - Ogg Vorbis
- open source
- claimed to be IPR free
- quality around mp3 but varies greatly between
samples - AAC
- MPEG 2 and 4
- lowest bitrate for CD-quality
- near CD-quality around 128 kbps even for
difficult material - Quicktime and RealAudio use AAC for high bitrates
- Windows Media
- proprietary
- large user base through windows
- better than mp3, WMA9 comes close to AAC in
quality - includess lossless and multichannel coding
16Internet Codecs Continued
- RealAudio
- uses AAC for high bitrates
- proprietary low bitrate codecs, the same as in
earlier versions - proprietary multichannel codecs
- built for streaming
- ATRAC
- proprietary
- ATRAC3plus for low bitrates (lt64kbps)
- ATRAC3 for high bitrates
- mp3 like quality in high bitrates
- better than AAC at low bitrates
17Multichannel Codecs
- Windows Media9 and RealAudio10 include
multichannel coding, AAC and AAC support
multichannel coding - AC3 (Audio Coding, Dolby)
- proprietary
- largest installed user base
- quality close to mp3
- production point of view taken into account
- DTS (Digital Theater Systems)
- proprietary
- high bitrate, high quality
- MLP (Meridian Lossless Packing)
- proprietary
- lossless
- SDDS (Sony Dynamic Digital Sound)
- proprietary
- based on ATRAC
18Lossless Codecs
- Compression ratios 1/3-1/2 depending on the
material - FLAC (Free Lossless Audio Coding)
- free
- Monkeys Audio
- free
- Windows Media
- Many others exist
- MPEG has an ongoing standardization work
19Low-Delay Codecs
- G.722 based teleconferencing codecs
- low quality, enough for speech _at_ 64kbps
- AAC-LC
- MPEG 4
- Quality better than mp3
- Most ordinary codecs not good enough for two-way
communications, especially AAC has a very high
delay
20New Codecs
- Spectral Band Replication
- AAC MPEG HE-AAC , very high quality around
48kbps - mp3
- AMR-WB (Adaptive Multi-Rate WideBand, Nokia)
- good quality around 24kbps
- optional codec in 3GPP alongside with AAC
- Discreet multichannel
- AAC discreet 5.1 _at_ 128kbps
- E-AC3 (Enhanced Audio Coding, Dolby)
- Binaural Cue Coding
- mp3 surround 192kbps (FhG, Agere)
- HE-AAC surround 64kbps, supposedly better than
AC-3 at ???kbps - MPEG standardization about to start
- Spectral Band Replication Binaural Cue Coding
- E-AAC (Enhanced AAC, FhG, CT, Philips)
21Other Codecs
- SBC (Sub Band Coding)
- used with bluetooth
- low complexity, low power
- near CD quality _at_ 320 kbps
- Dolby-E
- multichannel
- synchronous with video frames
- high bitrates, but studied tandem coding quality
22How Good Is Good Enough? Codec Requirements
- Many users are happy with 128kbps mp3, but
others or moving to 192kbps mp3 - iTunes AAC 128 is near CD-quality but not fully
transparent. However, this seems to be enough
judging by the popularity of the service. - On the other hand, RealAudio AAC 192 is
practically transparent. - Personally AAC 320 kbps is enough but then
lossless codecs are close at 700 kbps. - Some Internet music services offer songs with
lossless compression. - One unanswered question is What is enough for
streaming? - Streaming over fixed line at 128kbps can be
achieved. But how about wireless links 3G, WLAN,
bluetooth? And in many cases there has to be room
for video.
23How Good Is Good Enough? Codec Requirements
Contd.
- Delay
- usually high efficiency means long delay, AAC is
a prime example - Will multichannel become important?
- Error resilience is a must in wireless
applications - Scalability would be useful, some new ideas
presented recently by A. Aggarwal - Editability
- Transcoding is a sin!
- Reversible codecs
- High enough bitrate
24Important Issues Outside Todays Presentation
- DMR (Digital Rights Management)
- usability
- parametric coding
25One Case Closed and Another Reopened (if time)
- Louder Sounds Can Produce Less Forward Masking
Effects of Component Phase in Complex Tones,
Gockel et al., J. Acoust. Soc. Am., Vol. 114, No.
2 August 2003 - Near-optimal selection of encoding parameters for
audio coding, Aggarwal et Al., IEEE International
Conference on Acoustics, Speech, and Signal
Processing, (ICASSP '01), 7-11 May, 2001.
Proceedings, Volume 5, Pages 3269 - 3272
26Conclusion
- Existing codecs have matured and added new
features - For most needs there already is a codec
- Emerging codecs make possible good quality stereo
_at_ 48kbps and 5.1 multichannel _at_ 64kbps - User requirements are still a question
27