Title: MPEG-4
1MPEG-4
John Lazzaro John Wawrzynek June 18,
2001 Modified by Francois Thibault January 20,
2003 Further modified by Ichiro Fujinaga January
20, 2005
CS Division University of California at
Berkeley www.cs.berkeley.edu/johnw
2MPEG 4 Standard
- Finalized its standardization process in 1999
(Vancouver) - Design to integrate visual and audio
- Includes "natural" (recorded) and "synthetic"
(synthesized) coding of audio and video
3MPEG 4 Scope
- Provides a set of technologies to satisfy the
needs of - authors
- network service providers
- end users
- Enables the production of content that has far
greater reusability in - digital television
- animated graphics
- web pages
4MPEG 4 Features
- MPEG-4 provide standardized ways to
- represent units of aural, visual or audiovisual
content, called media objects - Natural origin
- Synthetic origin
- recorded with a camera or microphone, or
generated with a computer - describe the composition of these objects to
create compound media objects that form
audiovisual scenes - multiplex and synchronize the data associated
with media objects, so that they can be
transported over networks providing a QoS
(Quality of Service) - interact with the audiovisual scene generated at
the receivers end
5MPEG 4 Standard (audio)
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
ISO/IEC 14496-3 sec5
6MPEG 4 Audio Natural (recorded)
- AAC The Advanced Audio Coding
- Originally created as an extension to MPEG-2
- Provides better quality at 64 kbit/sec/channel
than MP3 does at 128 kbit/sec/channel - CELP A codebook-excited linear prediction
- scheme optimized for telephone- quality
transmission of speech in the range 8-32 kbps - Parametric
- A novel "harmonic vector noise" method that
allows lossy but extremely low-bitrate coding of
wideband sounds down to 2 kbps/sec/ channel
7MPEG 4 Audio Synthetic (synthesized)
- Structured Audio
- A downloadable synthesis method that allows
producers to describe new synthesis methods as
part of the bitstream - the receiver implements a reconfigurable
synthesis engine and synthesizes the sound
on-the-fly as the instructions are received - Text-to-Speech
- An interface to standalone TTS systems is
provided, so that synthetic speech can be
synchronized in multimedia presentations - No "method" of creating synthetic speech is
standardized by MPEG
8MPEG 4 Standard - Structured Audio
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Structured Audio One component in the MPEG
audio standard.
ISO/IEC 14496-3 sec5
9Audio Compression Basics
10The Kolmogorov alternative
- Write a computer program that generates the
desired audio stream. - Transmit the computer program.
- To decode, execute the program.
Similar to Postscript!
- MPEG-4 Structured Audio (MP4-SA) uses this
approach. - Eric Scheirer, Editor (MIT Media Lab).
- http//sound.media.mit.edu/eds/mpeg4/
11MP4-SA Encoding
- may be a creative act writing a program.
- directly (emacs), or
- indirectly (GUI, webpage)
- In this case, MP4-SA is a lossless compressor.
- may be automatic given a sound, an encoder
writes a program that generates the sound. - Automatic encoding is a hard in the general case.
12Key Application Music Production
- Modern music production is computer-based.
- Musicians enter performances into computers as
control information, not audio waveforms. - Digital synthesizers, effects, and mixes create
the final audio, under engineer/producer control.
13Key Application Music Production
- Modern music production is computer-based.
- Musicians enter performances into computers as
control information, not audio waveforms. - Digital synthesizers, effects, and mixes create
the final audio, under engineer/producer control.
Ideal for collaborative productions, remixes, and
...
14Key Application Music Performance
- Music Performance requires dynamic control.
- True interactively requires parameterized sounds.
- Musicians control instruments and effects with
interactive controllers. - Control could be indirect and remote (ex games).
15MPEG 4 Structured Audio
- A binary file format that encodes
- The programming language SAOL (pronounced sail).
- The musical score language SASL.
- Legacy support for MIDI.
- Audio sample data.
- Result is normative an MP4-SA file will sound
identical on all compliant decoders. - Different from MIDI files.
16Why SAOL and MP4-SA?Why not Java?
- Musical performance have temporal structure that
changes over several timescales
- Writing sound generation code in a conventional
language results in code dominated by time-scale
management. - Hard to maintain, hard to optimize.
17Time management is built into SAOL.
- A SAOL program executes by moving a simulated
clock forward in time, performing calculations
along the way in a synchronous fashion. - Work is scheduled to happen
- at the a-rate (the audio sample rate)
- at the k-rate (envelope control rate)
- at the i-rate (rate for new notes)
- Language variables are typed as a/k/i-rate.
- A language statement is scheduled based on the
rate of the variables it contains.
18SAOL, SASL, and Scheduling
- Sound creation in MP4-SA can be compared to a
musician playing notes on an instrument. - A SAOL subprogram (called an instr or instrument)
serves as the instrument. - SASL commands (called score lines) act to play
notes on SAOL instruments. - Many instances of a SAOL instr can be active at
one time, making sounds corresponding to notes
launched by different score lines in a SASL file.
19An example
- SAOL instrument tone, that plays a gated sine
wave. (SAOL code in next slide.)
20SAOL code for tone
instr tone (note, loudness) ivar a //
sets osc f ksig env // env output asig
x, y // osc state asig init a
2sin(3.141597cpsmidi(note)/s_rate) env
kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0) if
(init 0) // first a-pass only x
loudness init 1 x x - ay
// the FLOPS happen in y y ax //
these 3 statements output(yenv) //
creates audio output // end of instr tone
21SAOL Features
- Rate semantics
- i/k/a-rate execution
- Vector arithmetic
- ex ABC ? for i1,n AiBiCi
- All floating-point arithmetic.
- Extensive build-in audio function library
- signal generators, table operators, pitch
converters, filters, fft, sample rate conversion,
effects, ...
22Sfront - a SAOL-to-C translator
- Converts MP4-SA files to a ANSI C program, that
when executed, produces audio.
- Runs on UNIX, Windows, MacOS.
- Under Linux, supports real-time MIDI input,
real-time audio input and output, and MIDI over
RTP (Real Time Protocol). - www.cs.berkeley.edu/lazzaro/sa
23Generator Techniques
- Much of the SA standard describes a library
- 104 core opcodes (ex pow(), allpass(), reverb()
) - 16 wave table generators (ex harm, spline,
random) - Sfront optimizes the code produced for each
library element instance based on the invocation
attributes - rate, width, size, constancy, integral nature of
the parameters, number of paramaters
24Conclusions
- MP4-SA puts emphasis on sound synthesis methods
that can be described in a small amount of space.
- Physical Modeling
good - Sampling Natural Instruments bad
- If models are chosen carefully, compression
ratios of 100 to 10,000 are possible. - MP4-SA specifies that a decoder produces audio
that sounds identical to computing the program
accurately.