Audio Codecs - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Audio Codecs

Description:

Binaural Cue Coding (BCC) has two versions: flexible rendering and natural rendering ... Spectral Band Replication & Binaural Cue Coding. E-AAC (Enhanced AAC ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 28
Provided by: miikkav
Category:
Tags: audio | binaural | codecs

less

Transcript and Presenter's Notes

Title: Audio Codecs


1
Audio Codecs
Miikka Vilermo Nokia Research Center Audio
Visual Systems Laboratory
2
Introduction
  • Codecs evolve and new technologies emerge.
  • What can we do with all these codecs?
  • Will the emerging technologies change the status
    quo?
  • What do people want?

3
Audio Codecs
  • Recent technical advances
  • Existing and emerging codecs
  • How good is good enough? Codec requirements.
  • Important issues outside todays presentation
  • One case closed and another reopened (if time)
  • Questions

4
Recent Technical Advances
  • Spectral Band Replication (SBR)
  • Binaural Cue Coding (BCC)
  • Integer-to-Integer Modified Discrete Cosine
    Transform (INTMDCT)

5
Spectral Band Replication (SBR)
  • SBR is one method of Bandwidth Extension (BWE).
  • BWE is a class of methods to increase the
    perceived bandwidth without using many bits.
    Psychoacoustics
  • SBR was introduced by Coding Technologies.
  • The technology is applicable to any coder. Eg
    AAC, MP3Pro
  • Achieves very high quality _at_ 48 kbps stereo.
  • SBR is going to be standardised as High
    Efficiency Advanced Audio Coding (HE-AAC).

6
High-level block diagram of the SBR incorporated
to an audio encoder (a) and audio decoder (b).
(Juha Ojanperä)
7
Block diagram of the SBR encoder module combined
with AAC core encoder. (Juha Ojanperä, Miikka
Vilermo)
8
Example of the time/frequency grid of the SBR.
(Juha Ojanperä, Miikka Vilermo)
9
Binaural Cue Coding (BCC)
  • Traditional multichannel coding requires the
    number of channels x mono bitrate kbps.
  • Without specific matrixing, traditional
    multichannel coding is restricted to a certain
    number of channels e.g. 5.1 and speaker
    placement.
  • Binaural Cue Coding (BCC) has two versions
    flexible rendering and natural rendering
  • In flexible rendering the original multichannel
    input is downmixed to (usually) one channel and
    the spatial information is sent as one separate
    low bitrate parametric stream. The decoder then
    renders as many channels as are needed based on
    the parameterised spatial image. The decoder can
    also apply Head Related Transfer Functions
    (HRTFs) to create surround headphone playback.
  • In natural rendering one parameterised stream of
    spatial information is created for each of the
    original channels. This increases the bitrate and
    limits rendering options in the decoder, but also
    improves quality.
  • BCC can also be used as parametric stereo.

10
BCC continued
  • Typical parameters for BCC are
  • Inter-Channel Level Difference (ICLD)
  • Inter-Channel Time Difference (ICTD)
  • Inter-Channel Correlation (ICC)
  • Parameters are applied on critical bands.
  • BCC is based on the assumption that on every
    critical band the dominant source defines the
    spatial perception.
  • BCC doesnt suffer from unmasking effects since
    the quantization noise is automatically rendered
    to the same direction as the source.

11
Integer-to-Integer Modified Discrete Cosine
Transform (INTMDCT)
  • Lossy coding is important, but how could you
    extend that to lossless coding?
  • Modified Discrete Cosine Transform (MDCT) is the
    most popular audio coding transform, but
    losslessly coding floating point values is
    difficult.
  • Integer-to-Integer Modified Discrete Cosine
    Transform (INTMDCT) is similar to MDCT but if the
    input is integers then the output is integers
    too.
  • It is possible to create an integer version of
    any transform where the transform matrix can be
    expressed as a product of matrices that have ones
    in the diagonal and all other elements are zero
    except in either one row or column.

12
INTMDCT Continued
  • Givens rotations (butterfly operations) can be
    expressed as such matrices. Thus all matrices
    that can be expressed in Givens rotations can be
    used as basis for an integer transform.
  • MPEG has an ongoing standardisation on lossless
    coding. INTMDCT was a basis for that work.

13
Block diagram of scalable lossless INTMDCT
enhanced perceptual codec
14
Existing and Emerging Codecs
  • Internet codecs
  • Multichannel codecs
  • Lossless codecs
  • Low delay codecs
  • New codecs
  • Others

15
Internet Codecs
  • MP3
  • MPEG-1 layer 3
  • largest user base
  • near CD-quality can be over 192 kbps for
    difficult material
  • Ogg Vorbis
  • open source
  • claimed to be IPR free
  • quality around mp3 but varies greatly between
    samples
  • AAC
  • MPEG 2 and 4
  • lowest bitrate for CD-quality
  • near CD-quality around 128 kbps even for
    difficult material
  • Quicktime and RealAudio use AAC for high bitrates
  • Windows Media
  • proprietary
  • large user base through windows
  • better than mp3, WMA9 comes close to AAC in
    quality
  • includess lossless and multichannel coding

16
Internet Codecs Continued
  • RealAudio
  • uses AAC for high bitrates
  • proprietary low bitrate codecs, the same as in
    earlier versions
  • proprietary multichannel codecs
  • built for streaming
  • ATRAC
  • proprietary
  • ATRAC3plus for low bitrates (lt64kbps)
  • ATRAC3 for high bitrates
  • mp3 like quality in high bitrates
  • better than AAC at low bitrates

17
Multichannel Codecs
  • Windows Media9 and RealAudio10 include
    multichannel coding, AAC and AAC support
    multichannel coding
  • AC3 (Audio Coding, Dolby)
  • proprietary
  • largest installed user base
  • quality close to mp3
  • production point of view taken into account
  • DTS (Digital Theater Systems)
  • proprietary
  • high bitrate, high quality
  • MLP (Meridian Lossless Packing)
  • proprietary
  • lossless
  • SDDS (Sony Dynamic Digital Sound)
  • proprietary
  • based on ATRAC

18
Lossless Codecs
  • Compression ratios 1/3-1/2 depending on the
    material
  • FLAC (Free Lossless Audio Coding)
  • free
  • Monkeys Audio
  • free
  • Windows Media
  • Many others exist
  • MPEG has an ongoing standardization work

19
Low-Delay Codecs
  • G.722 based teleconferencing codecs
  • low quality, enough for speech _at_ 64kbps
  • AAC-LC
  • MPEG 4
  • Quality better than mp3
  • Most ordinary codecs not good enough for two-way
    communications, especially AAC has a very high
    delay

20
New Codecs
  • Spectral Band Replication
  • AAC MPEG HE-AAC , very high quality around
    48kbps
  • mp3
  • AMR-WB (Adaptive Multi-Rate WideBand, Nokia)
  • good quality around 24kbps
  • optional codec in 3GPP alongside with AAC
  • Discreet multichannel
  • AAC discreet 5.1 _at_ 128kbps
  • E-AC3 (Enhanced Audio Coding, Dolby)
  • Binaural Cue Coding
  • mp3 surround 192kbps (FhG, Agere)
  • HE-AAC surround 64kbps, supposedly better than
    AC-3 at ???kbps
  • MPEG standardization about to start
  • Spectral Band Replication Binaural Cue Coding
  • E-AAC (Enhanced AAC, FhG, CT, Philips)

21
Other Codecs
  • SBC (Sub Band Coding)
  • used with bluetooth
  • low complexity, low power
  • near CD quality _at_ 320 kbps
  • Dolby-E
  • multichannel
  • synchronous with video frames
  • high bitrates, but studied tandem coding quality

22
How Good Is Good Enough? Codec Requirements
  • Many users are happy with 128kbps mp3, but
    others or moving to 192kbps mp3
  • iTunes AAC 128 is near CD-quality but not fully
    transparent. However, this seems to be enough
    judging by the popularity of the service.
  • On the other hand, RealAudio AAC 192 is
    practically transparent.
  • Personally AAC 320 kbps is enough but then
    lossless codecs are close at 700 kbps.
  • Some Internet music services offer songs with
    lossless compression.
  • One unanswered question is What is enough for
    streaming?
  • Streaming over fixed line at 128kbps can be
    achieved. But how about wireless links 3G, WLAN,
    bluetooth? And in many cases there has to be room
    for video.

23
How Good Is Good Enough? Codec Requirements
Contd.
  • Delay
  • usually high efficiency means long delay, AAC is
    a prime example
  • Will multichannel become important?
  • Error resilience is a must in wireless
    applications
  • Scalability would be useful, some new ideas
    presented recently by A. Aggarwal
  • Editability
  • Transcoding is a sin!
  • Reversible codecs
  • High enough bitrate

24
Important Issues Outside Todays Presentation
  • DMR (Digital Rights Management)
  • usability
  • parametric coding

25
One Case Closed and Another Reopened (if time)
  • Louder Sounds Can Produce Less Forward Masking
    Effects of Component Phase in Complex Tones,
    Gockel et al., J. Acoust. Soc. Am., Vol. 114, No.
    2 August 2003
  • Near-optimal selection of encoding parameters for
    audio coding, Aggarwal et Al., IEEE International
    Conference on Acoustics, Speech, and Signal
    Processing, (ICASSP '01), 7-11 May, 2001.
    Proceedings, Volume 5, Pages 3269 - 3272

26
Conclusion
  • Existing codecs have matured and added new
    features
  • For most needs there already is a codec
  • Emerging codecs make possible good quality stereo
    _at_ 48kbps and 5.1 multichannel _at_ 64kbps
  • User requirements are still a question

27
Write a Comment
User Comments (0)
About PowerShow.com