Title: An Overview of Perceptual Audio Coding and MPEG AAC
1An Overview of Perceptual Audio Coding and MPEG
AAC
2Introduction
- Audio coding or audio compression algorithms are
used to obtain compact digital representation of
high-fidelity (wideband) audio signals for the
purpose of efficient transmission or storage. - The central objective in audio coding is to
represent the signal with minimum number of bits
while achieving transparent signal reproduction
i.e. generating output audio that cannot
distinguished from the original input even by a
listener with Golden Ears - The Motion Picture Experts Group (MPEG) audio
compression algorithm is an International
Organization for Standardization (ISO) standard
for high- fidelity audio compression.
3Continue
- MPEG audio compression standards are lossy audio
coding standards. They try to compress audio by
trying to reduce perceptual and statistical
redundancies. - The basic task of a perceptual audio coding
system is to compress the digital audio data in a
way that - - - the compression is as high as possible, and
- - the reconstructed (decoded) audio sounds
exactly (or as close as possible) to the
original audio before compression
4Audio Coding Techniques
- Parametric Coding
- Waveform Coding
- Time Domain
- PCM, DPCM, ADPCM etc.
- Frequency Domain
- Transform Coding, Subband Coding
- Hybrid Coding
-
5Perceptual Audio Coding Basics
- Human hearing limited to values lower than 20kHz
in most cases - Human hearing is insensitive to quiet frequency
components to sound accompanying other stronger
frequency components - Stereo audio streams contain largely redundant
information - MPEG audio compression takes advantage of these
facts to reduce extent and detail of mostly
inaudible frequency ranges
6Generic Perceptual Audio Coding Architecture
7Psychoacoustic Principles
- High-precision engineering models for
high-fidelity audio currently do not exist. So,
audio coding algorithms rely upon generalized
receiver models to optimize coding efficiency. - In the case of audio, the receiver is ultimately
the human ear and sound perception is affected by
its masking properties. - Perceptual audio coders achieve compression by
exploiting the fact that irrelevant signal
information is not detectable by even a well
trained or sensitive listener.
8- Irrelevant signal information is identified
during signal analysis by incorporating into the
coder several psychoacoustic principles,
including absolute hearing thresholds, critical
band frequency analysis, simultaneous masking,
the spread of masking along the basilar membrane,
and temporal masking. - By combining all these, a quantitative estimate
of the fundamental limit of transparent audio
signal compression i.e. Perceptual Entropy is
determined for given audio frame.
9- Perceptual entropy denotes minimum number of bits
which should be allocated to a given audio frame
to represent perceptually lossless audio.
10Absolute Threshold of Hearing
- The absolute threshold of hearing characterizes
the amount of energy needed in a pure tone such
that it can be detected by a listener in a
noiseless environment. - It can be expressed with a non-linear
function, -
- Tq(f) 3.64(f/1000)-0.8 - 6.5e-0.6(f/1000-3.3)
2 - 10-3(f/1000)4 (dB SPL)
11(No Transcript)
12- When applied to signal compression, it could be
interpreted as a maximum allowable energy level
for coding distortions introduced in the
frequency domain. - So using this information the noise levels during
quantization are tried to fit below this
threshold. - Due to this quantization noise does not become
audible.
13- However
- The detection threshold for spectrally complex
quantization noise is a modified version of the
absolute threshold, with its shape determined by
the stimuli present at any given time. - Since stimuli are in general time-varying, the
detection threshold is also a time-varying
function of the input signal. - A Spreading function helps to determine modified
detection threshold of hearing in presence of
stimuli in given audio frame.
14(No Transcript)
15Critical Bands
- Human ear can be viewed as a discrete set of band
pass filters, which covers the entire 20kHz
frequency range. - The inner ear called as Cochlea contains
frequency sensitive positions. Whenever any tone
enters the cochlea it moves until it reaches the
position where it resonates. (Works as spectrum
analyzer) - The critical bandwidth is a function of
frequency that quantifies the cochlear filter
pass bands. (unit Bark)
16- As the center frequency goes on increasing, the
bark-width also goes on increasing. - Spectral analysis of audio content is performed
using critical bands. - Bark-width with center frequency f is gives as
- BWc(f) 25 75(1 1.4(f/100)2)0.69 Hz
- To convert frequency in Hz to Bark
- Z(f) 13 arctan(0.00076f) 3.5
arctan(f/7500)2 (Bark) -
17Figure Idealized critical band filter bank
18Masking
- Masking refers to a process where one sound is
rendered inaudible because of the presence of
another sound - Simultaneous Masking (Frequency domain)
- Relative shapes of the masker and maskee
magnitude spectra determine extent of masking - Non-simultaneous Masking (Time domain)
- Phase relationships between masker and
maskee determine masking outcome.
19- Depending on the behavior of masker and maskee
there are following cases - Noise Masking Tone (NMT)
- Tone Masking Noise (TMN)
- Noise Masking Noise (NMN)
20Noise Masking Tone Tone Masking NoiseWe
can see the asymmetry of masking power between
noise and tonal maskers. Significantly greater
masking power is associated with noise maskers
than with tonal masker.
21Difference between SMR, NMR and SNR
22Spread of Masking
- Masker centered within one critical band has some
predictable effect on detection thresholds in
other critical bands. This effect, also known as
the spread of masking, - It is often modeled in coding applications by an
approximately triangular spreading function
23Non-simultaneous Masking (Temporal Masking)
24MPEG Audio Codec Family
- MPEG-1 (ISO/IEC 11172-3) Layer 2 (mp2)
- MPEG-1 Layer 3 (mp3)
- MPEG-2 (ISO/IEC 13818-3) AAC
- MPEG-4 (ISO/IEC 14496-3) AAC
- MPEG-4 HE AAC
- MPEG-4 HE AAV v2
25MP3 Compression Flow Chart
26Layer 3 uses a 2-stage filter, more frequency
resolution and improved Huffman Coding to the
basic perceptual coder principle
MDCT Filter bank
QMF Filter bank
27- Bit rates available
- In MPEG-1 Layer 3 are 32, 40, 48, 56, 64, 80, 96,
112, 128, 160, 192, 224, 256 and 320 kbit/s, and
the available sampling frequencies are 32, 44.1
and 48 kHz. 44.1 kHz is almost always used
(coincides with the sampling rate of compact
discs), and 128 kbit/s has become the de facto
"good enough" standard, although 192 kbit/s is
becoming increasingly popular over peer-to-peer
file sharing networks. - In MPEG-2 and the non-official MPEG-2.5 include
some additional bit rates 8, 16, 24, 32, 40, 48,
56, 64, 80, 96, 112, 128, 144, 160 kbit/s while
providing lower sampling frequencies (8, 11.025,
12, 16, 22.05 and 24 kHz)
28Design limitations of MP3
- There are several limitations inherent to the
MP3 format that cannot be overcome by using a
better encoder. Newer audio compression formats
such as Vorbis and AAC no longer have these
limitations. - In technical terms, MP3 is limited in the
following ways - Bitrate is limited to a maximum of 320 kbit/s
- Time resolution can be too low for highly
transient signals, causing some smearing of
percussive sounds - Frequency resolution is limited by the small long
block window size, decreasing coding efficiency - No scale factor band for frequencies above
15.5/15.8 kHz - Joint stereo is done on a frame-to-frame basis
- Encoder/decoder overall delay is not defined,
which means lack of official provision for
gapless playback. However, some encoders such as
LAME can attach additional metadata that will
allow players that are aware of it to deliver
gapless playback. - Nevertheless, a well-tuned MP3 encoder can
perform competitively even with these
restrictions.
29Advanced Audio Coding (AAC)
- It is a standardized, lossy digital audio
compression scheme. It was developed with the
cooperation and contributions of companies mainly
including Dolby, Fraunhofer (FhG), ATT, Sony and
Nokia, and was officially declared an
international standard by the Moving Pictures
Experts Group in April of 1997. - Not backward compatible with other MPEG audio
standards (like mp3)
30- AAC was promoted as the successor to MP3 for
audio coding at medium to high bitrates. - AAC follows the same basic coding paradigm as
Layer-3 (high frequency resolution filterbank,
non-uniform quantization, Huffman coding,
iteration loop structure using analysis
by-synthesis), but improves on Layer-3 in a lot
of details and uses new coding tools for improved
quality at low bit-rates. - Its popularity is currently maintained by it
being the default iTunes codec, the media player
which powers iPod, the most popular digital audio
player on the market. - Furthermore, the iTunes Music Store, whose sales
account for 85 of the market for legal online
downloads, sells AAC-encoded songs (encapsulated
with FairPlay Digital Rights Management)
31AAC's improvements over MP3
- Sample frequencies from 8 kHz to 96 kHz (official
MP3 16 kHz to 48 kHz) - Up to 48 channels
- Higher efficiency and simpler filterbank (hybrid
? pure MDCT) - Higher coding efficiency for stationary signals
(blocksize 576 ? 1024 samples) - Higher coding efficiency for transient signals
(blocksize 192 ? 128 samples) - Can use Kaiser-Bessel derived window function to
eliminate spectral leakage at the expense of
widening the main lobe - Much better handling of frequencies above 16 kHz
- More flexible joint stereo (separate for every
scale band)
32- Both the mid/side coding and the intensity coding
are more flexible, allowing to apply them to
reduce the bit-rate more frequently. - An optional backward prediction, computed line by
line, achieves better coding efficiency
especially for very tone-like signals. This
feature is only available within the rarely used
main profile. - Improved Huffman Coding In AAC, coding by
quadruples of frequency lines applied more often.
In addition, the assignment of Huffman code
tables to coder partitions can be much more
flexible. - AAC and HE-AAC are far better than MP3 at very
low bitrates, but at medium to higher bitrates
the two formats are more comparable
33- Modular encoding
- AAC takes a modular approach to encoding.
Depending on the complexity of the bitstream to
be encoded, the desired performance and the
acceptable output, implementers may create
profiles to define which of a specific set of
tools they want use for a particular application.
The standard offers four default profiles - Low Complexity (LC) - the simplest and most
widely used and supported - Main Profile (MAIN) - like the LC profile, with
the addition of backwards prediction - Sample-Rate Scalable (SRS), a.k.a. Scalable
Sample Rate (MPEG-4 AAC-SSR) - Long Term Prediction (LTP) added in the MPEG-4
standard - an improvement of the MAIN profile
using a forward predictor with lower
computational complexity. - Depending on the AAC profile and the MP3 encoder,
96 kbit/s AAC can give nearly the same or better
perceptional quality as 128 kbit/s MP3
34MPEG-2 AAC Flowchart
35MPEG AAC Family
36Extensions and Improvements
- Some extensions have been added to the
original AAC standard - MPEG-4 Scalable To Lossless (SLS)
- High Efficiency AAC (HE-AAC), a.k.a. aacPlus v1
or AAC - the combination of SBR (Spectral Band
Replication) and AAC used for low bitrates - HE-AAC v.2, a.k.a. aacPlus v2 - the combination
of Parametric Stereo (PS) and HE-AAC - Perceptual Noise Substitution (PNS)
- Long Term Predictor (LTP) - added in MPEG-4 Part
3.
37MPEG AAC Performance
- MPEG AAC provides excellent audio quality.
Reaching perceptually transparent quality at only
64 kbit/s per channel, it fulfills the
requirements for broadcast quality as defined by
the European Broadcasting Union. - With sampling rates ranging from 8kHz up to 96kHz
and above, with bit rates up to 256 kbit/s, and
with support for up to 48 channels, MPEG AAC is
one of the most flexible audio codecs. Of course,
the standard also supports mono, stereo, and all
common multi-channel configurations (e. g. 5.1 or
7.1). - The low computational demands make AAC the ideal
codec for any low bit rate high-quality audio
application.
38MPEG-HE AAC
- HE-AAC is the low bit rate codec in the AAC
family and is a combination of the AAC LC
(Advanced Audio Coding Low Complexity) audio
coder and the SBR (Spectral Band Replication)
bandwidth expansion tool. - This combination achieves good stereo quality
already at bit rates of 32 to 48 kbit/s. HE-AAC
is also known as aacPlus and can be used in
multi-channel operations.
39MPEG-4 HE-AAC v2
- Combined with parametric stereo, the HE-AAC codec
provides good audio quality starting at bit rates
around 16 to 24 kbit/s for stereo content. - HE-AAC v2 is also known as aacPlus v2.
40- Rough work
- Explain basic psychoacoustic principles
Absolute threshold of hearing, Critical bands,
Phenomenon of masking Simultaneous, Masking
asymmetry, Spread of masking, Non-simultaneous,
Perceptual Entropy - MPEG audio codec family mp3, mp2 AAC, mp4 AAC,
advanced AAC plus version 1, advanced AAC plus
version 2 - (mention features present/absent in each)
41- Limitations of mp3
- What is different in AAC ?
- Features in AAC
- Explain each feature in detail (mp2, mp4)