Title: MPEG-4 Structured Audio
1MPEG-4 Structured Audio
John Lazzaro John Wawrzynek June 18,
2001 Modified by Francois Thibault January 20,
2003
CS Division University of California at
Berkeley www.cs.berkeley.edu/johnw
2MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Structured Audio One component in the MPEG
audio standard.
ISO/IEC 14496-3 sec5
3Audio Compression Basics
- How well does this work?
- Perceptually Lossless 10X-20X reduction
- MP3, Dolby AC3,
- True Lossless 2.5X reduction
- Shorten, T. Robinson (Cambridge University)
4The Kolmogorov alternative
- Write a computer program that generates the
desired audio stream. - Transmit the computer program.
- To decode, execute the program.
Similar to Postscript!
- MPEG-4 Structured Audio (MP4-SA) uses this
approach. - Eric Scheirer, Editor (MIT Media Lab).
- http//sound.media.mit.edu/eds/mpeg4/
5MP4-SA Encoding
- may be a creative act writing a program.
- directly (emacs), or
- indirectly (GUI, webpage)
- In this case, MP4-SA is a lossless compressor.
- may be automatic given a sound, an encoder
writes a program that generates the sound. - Automatic encoding is a hard in the general
case.
6Key Application Music Production
- Modern music production is computer-based.
- Musicians enter performances into computers as
control information, not audio waveforms. - Digital synthesizers, effects, and mixes create
the final audio, under engineer/producer control.
7Key Application Music Production
- Modern music production is computer-based.
- Musicians enter performances into computers as
control information, not audio waveforms. - Digital synthesizers, effects, and mixes create
the final audio, under engineer/producer control.
Ideal for collaborative productions, remixes, and
...
8Key Application Music Performance
- Music Performance requires dynamic control.
- True interactively requires parameterized sounds.
- Musicians control instruments and effects with
interactive controllers. - Control could be indirect and remote (ex games).
9MPEG 4 Structured Audio
- A binary file format that encodes
- The programming language SAOL (pronounced sail).
- The musical score language SASL.
- Legacy support for MIDI.
- Audio sample data.
- Result is normative an MP4-SA file will sound
identical on all compliant decoders. - Different from MIDI files.
10Why SAOL and MP4-SA?Why not Java?
- Musical performance have temporal structure that
changes over several timescales
- Writing sound generation code in a conventional
language results in code dominated by time-scale
management. - Hard to maintain, hard to optimize.
11Time management is built into SAOL.
- A SAOL program executes by moving a simulated
clock forward in time, performing calculations
along the way in a synchronous fashion. - Work is scheduled to happen
- at the a-rate (the audio sample rate)
- at the k-rate (envelope control rate)
- at the i-rate (rate for new notes)
- Language variables are typed as a/k/i-rate.
- A language statement is scheduled based on the
rate of the variables it contains.
12SAOL, SASL, and Scheduling
- Sound creation in MP4-SA can be compared to a
musician playing notes on an instrument. - A SAOL subprogram (called an instr or instrument)
serves as the instrument. - SASL commands (called score lines) act to play
notes on SAOL instruments. - Many instances of a SAOL instr can be active at
one time, making sounds corresponding to notes
launched by different score lines in a SASL file.
13An example
- SAOL instrument tone, that plays a gated sine
wave. (SAOL code in next slide.)
14SAOL Features
- Rate semantics
- i/k/a-rate execution
- Vector arithmetic
- ex ABC ? for i1,n AiBiCi
- All floating-point arithmetic.
- Extensive build-in audio function library
- signal generators, table operators, pitch
converters, filters, fft, sample rate conversion,
effects, ...
15Spectrum of implementations
Significant development maintenance complexity
Zoia Alverti, EPFL, ICASSP 2001
ISO/IEC 14496-3 sec 5, reference implementation
16Sfront - a SAOL-to-C translator
- Converts MP4-SA files to a ANSI C program, that
when executed, produces audio.
- Runs on UNIX, Windows, MacOS.
- Under Linux, supports real-time MIDI input,
real-time audio input and output, and MIDI over
RTP. - www.cs.berkeley.edu/lazzaro/sa
17Generator Techniques
- Much of the SA standard describes a library
- 104 core opcodes (ex pow(), allpass(), reverb()
) - 16 wave table generators (ex harm, spline,
random) - Sfront optimizes the code produced for each
library element instance based on the invocation
attributes - rate, width, size, constancy, integral nature of
the parameters, number of paramaters
18Interesting Issues
- MP4-SA puts emphasis on sound synthesis methods
that can be described in a small amount of space.
- Physical Modeling
good - Sampling Natural Instruments bad
- If models are chosen carefully, compression
ratios of 100 to 10,000 are possible. - Physical Modeling is relatively immature, but
holds much promise.
19Interesting Issues (cont.)
- MP4-SA specifies that a decoder produces audio
that sounds identical to computing the program
accurately. - A new role for psychophysics
- Instead of using psychophysics to squeeze bits
out of a sound representation, MP4-SA decoders
will use psychophysics to squeeze FLOPS out of
sound computations. - Leverage spectral and temporal masking.