MPEG%204%20Structured%20Audio: - PowerPoint PPT Presentation

About This Presentation
Title:

MPEG%204%20Structured%20Audio:

Description:

Write a computer program that generates the desired audio stream. ... Parametric coders: Very-low bit rate coder, works best as. as a speech coder. MPEG 4 ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 40
Provided by: valueds268
Category:

less

Transcript and Presenter's Notes

Title: MPEG%204%20Structured%20Audio:


1
MPEG 4 Structured Audio
  • Algorithmic Sound
  • for the Internet and Beyond

John Lazzaro John Wawrzynek Sep 1, 1999
CS Division University of California at
Berkeley www.cs.berkeley.edu/johnw
2
MPEG 4 Structured Audio
  • Outline
  • Motivation for structured audio
  • Introduction to MP4-SA
  • Example encoding
  • C translator
  • Physical Instrument Modeling
  • Hardware Architectures
  • Future directions

3
Digital Audio Basics
  • mono 705.6 kbps
  • Cell-phone network 5-10kbps
  • dialup modems 50 kpbs
  • xDSL 128 to 1000 kbps
  • How well does this work?
  • True Lossless 2.5X reduction
  • Shorten, T. Robinson (Cambridge University)
  • Perceptually Lossless 10X-20X reduction
  • MP3, Dolby AC3,

4
The Kolmogorov alternative
  • Write a computer program that generates the
    desired audio stream.
  • Transmit the computer program.
  • To decode, execute the program.

Similar to Postscript!
  • MPEG-4 Structured Audio (MP4-SA) uses this
    approach.
  • Final draft standard Nov 15, 1998.
  • Eric Schierer, Editor (MIT Media Lab).
  • http//sound.media.mit.edu/eds/mpeg4/

5
MP4-SA Encoding
  • may be a creative act writing a program.
  • directly (emacs), or
  • indirectly (GUI, webpage)
  • In this case, MP4-SA is a lossless compressor.
  • may be automatic -- given a sound, an encoder
    writes a program that generates the sound.
  • Automatic encoding is a hard problem in the
    general case.

6
Key Application Music Production
  • Modern Music Production is Computer based.
  • Musicians enter performances into computers as
    control information, not audio waveforms.
  • Digital synthesizers, effects, and mixes create
    the final audio, under engineer/producer control.

7
Key Application Music Production
  • Modern Music Production is Computer based.
  • Musicians enter performances into computers as
    control information, not audio waveforms.
  • Digital synthesizers, effects, and mixes create
    the final audio, under engineer/producer control.

Ideal format for collaborative productions,
remixes, ...
8
MPEG 4 Structured Audio
  • A binary file format that encodes
  • The programming language SAOL (say sail).
  • The musical score language SASL.
  • Legacy support for MIDI.
  • Audio sample data.
  • Result is normative an MP4-SA file will sound
    identical on all compliant decoders.
  • Different from MIDI files.

9
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Structured Audio One component in the MPEG
audio standard.
ISO/IEC 14496-3 sec5
10
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Advanced Audio Coding successor to MP3, delivers
highest quality audio, and highest bit-rate.
11
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Time-Frequency Coding Meant for a moderate
bit/sec range, with moderate quality.
12
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Code Excited Linear Prediction Low bit rate
coder, works best as a speech coder.
13
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Parametric coders Very-low bit rate coder, works
best as as a speech coder.
14
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Text-to-Speech Takes phonetic and prosadic
control information, produces syntesized
speech.
15
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
System level includes mechanisms for composing
and synchronizing audio ( video) components.
16
Why SAOL and MP4-SA?Why not Java?
  • Musical performance have temporal structure that
    changes over several timescales
  • Writing sound generation code in a conventional
    language results in code dominated by time-scale
    management.
  • Hard to maintain, hard to optimize.

17
Time management is built into SAOL.
  • A SAOL program executes by moving a simulated
    clock forward in time, performing calculations
    along the way in a synchronous fashion.
  • Work is scheduled to happen
  • at the a-rate (the audio sample rate)
  • at the k-rate (envelope control rate)
  • at the i-rate (rate for new notes)
  • Language variables are typed as a/k/i-rate.
  • A language statement is scheduled based on the
    rate of the variables it contains.

18
SAOL, SASL, and Scheduling
  • Sound creation in MP4-SA can be compared to a
    musician playing notes on an instrument.
  • A SAOL subprogram (called an instr or instrument)
    serves as the instrument.
  • SASL commands (called score lines) act to play
    notes on SAOL instruments.
  • Many instances of a SAOL instr can be active at
    one time, making sounds corresponding to notes
    launched by different score lines in a SASL file.

19
Single Note Execution Trace
  • SAOL Instruments ...
  • Contains all the
  • instructions for
  • playing a note
  • -- Code that runs
  • at note launch.
  • (once per i-pass)
  • -- Code that models
  • timbre evolution
  • at the k-rate.
  • (once per kpass)
  • -- Code to generate
  • audio samples at

Executing a Note (k-rate 4 kHz, a-rate 40
kHz) time(us)
pass 0
i-pass 0 k-pass 0
a-pass 25 a-pass 50 a-pass
... 225 a-pass 250 k-pass
250 a-pass 275 a-pass 300
a-pass ... 475 a-pass
500 k-pass 500 a-pass 525
a-pass ...
20
An example
  • SAOL instrument tone, that plays a gated sine
    wave. (SAOL code in next slide.)

21
SAOL code for tone
instr tone (note, loudness) ivar a //
sets osc f ksig env // env output asig
x, y // osc state asig init a
2sin(3.141597cpsmidi(note)/s_rate) env
kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0) if
(init 0) // first a-pass only x
loudness init 1 x x - ay
// the FLOPS happen in y y ax //
these 3 statements output(yenv) //
creates audio output // end of instr tone
22
SAOL code for tone
instr tone (note, loudness) ivar a //
sets osc f ksig env // env output asig
x, y // osc state asig init a
2sin(3.141597cpsmidi(note)/s_rate) env
kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0) if
(init 0) // first a-pass only x
loudness init 1 x x - ay
// the FLOPS happen in y y ax //
these 3 statements output(yenv) //
creates audio output // end of instr tone
i-rate
23
SAOL code for tone
instr tone (note, loudness) ivar a //
sets osc f ksig env // env output asig
x, y // osc state asig init a
2sin(3.141597cpsmidi(note)/s_rate) env
kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0) if
(init 0) // first a-pass only x
loudness init 1 x x - ay
// the FLOPS happen in y y ax //
these 3 statements output(yenv) //
creates audio output // end of instr tone
k-rate
24
SAOL code for tone
instr tone (note, loudness) ivar a
// sets osc f ksig env // env output
asig x, y // osc state asig init a
2sin(3.141597cpsmidi(note)/s_rate) env
kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0) if
(init 0) // first a-pass only x
loudness init 1 x x - ay
// the FLOPS happen in y y ax //
these 3 statements output(yenv) //
creates audio output // end of instr tone
a-rate
25
SAOL Unique Features
  • Rate semantics
  • i/k/a-rate execution
  • Vector arithmetic
  • ex ABC ? for i1,n AiBiCi
  • All floating-point arithmetic.
  • Extensive build-in audio function library
  • signal generators, table operators, pitch
    converters, filters, fft, sample rate conversion,
    effects, ...

26
SAOL Unique Features
  • Instrument communication through bus structures
  • Dynamic instrument creation and control.
  • Scheduler and language support for MIDI and SASL
    scores.

27
Sfront - a SAOL-to-C translator
  • Converts MP4-SA files to a C program, that when
    executed, produces audio.
  • Runs on UNIX, Win98/NT.
  • Licensed under the GNU public license (GPL).
  • www.cs.berkeley.edu/lazzaro/sa

28
Sfront Benchmarks
Sfront version 0.36 Machine 450 Mhz Pentium
III, 128 MB, gcc version egcs-2.91.66, -O3
optimizer Audio sample rate 44.1 kHz for all
examples MP3 compression ratio 11
29
Sfront Performance Summary
  • Rendering (file decoding)
  • Current performance a benchmark suite of
    moderately complex MP4-SA streams computes in a
    time equivalent to the audio it generates, on a
    400 Mhz Ultrasparc 450 Mhz Pentium.
  • Real-time interaction
  • with a MIDI keyboard with acceptable latency (20
    ms) and microphone input.

30
Interesting Issues
  • MP4-SA puts emphasis on sound synthesis methods
    that can be described in a small amount of space.
  • Physical Modeling
    good
  • Sampling Natural Instruments bad
  • If models are chosen carefully, compression
    ratios of 100 to 10,000 are possible.
  • Physical Modeling is relatively immature, but
    holds much promise.

31
Struck/Plucked Instrument Model
Examples struck bars, bells, drums, plucked
strings
Parameters striker characteristics, resonator
constants
32
Blown Instrument Model
Examples pipes, flutes, etc.
Parameters shape of non-linear function,
resonator constants
33
Physical Modeling Summary
  • Models instrument not sound.
  • Advantages over traditional synthesis techniques
    (FM, sample-based)
  • Compact descriptions.
  • Physical parameterization leads to
  • more intuitive control
  • lower control bandwidth
  • State accurate simulation leads to
  • efficiency in re-excitation
  • emulation of otherwise missing effects
  • Ultimately - more realistic sounds.

34
Physical Modeling Summary (cont.)
  • Disadvantages
  • potential for high computational complexity
  • Approaches
  • PDE (partial differential equation) approach
    would be nice, but probably not practical.
  • ODE (ordinary differential equation, lumped
    circuit models) practical and very general.
    Capture essential physics.
  • Wave-guide filters provide a more efficient
    alternative in some cases.

35
Interesting Issues (cont.)
  • MP4-SA specifies that a decoder produces audio
    that sounds identical to computing the program
    accurately.
  • A new role for psychophysics
  • Instead of using psychophysics to squeeze bits
    out of a sound representation, MP4-SA decoders
    will use psychophysics to squeeze FLOPS out of
    sound computations.
  • Leverage spectral and temporal masking.

36
Interesting Issues (cont.)
  • MP4-SA can be used in a way similar to
    traditional compression except that the
    compression method can be ad hoc
  • Frame-work for experimentation in encoding.
  • Hope for automatic encoding, if done in a voice
    specific way
  • vocals
  • guitar
  • sax
  • and other hard-to-synthesize sounds.

37
Running SAOL on Conventional Architectures
  • Lessons Learned from SAOL development
  • Temporal typing of variables has the nice side
    effect of marking the inner loops.
  • Typically, a-rate 10X to 100X k-rate
  • A-rate code optimization moving subexpressions
    into k-rate or i-rate.
  • SAOL semantics support a static heap.
  • No recursion, all variables sp floats, no
    pointers ... simplifies optimization.
  • Other researchers (Giorgio Zoia - ETH) focusing
    on blocking all a-passes for an instance,
    reducing overhead.
  • Processors with SIMD FP support (Intel SSE, AMD
    3DNow!) will be a good match.

38
Fixed-Function Hardware for SAOL Accelerators
  • Unlike MPEG-2 chips, DVD chips, etc., its not
    clear how MP4-SA can be accelerated by rolling an
    ASIC.
  • Since every MP4-SA file is a new algorithm.
  • Common opcodes can be hardwired and the general
    characteristics of typical MP4-SA files could be
    leveraged to specialize a conventional processor
    design.
  • But the language is only six months old
    execution frequencies are not known.
  • Reconfigurable computing architectures might hold
    promise (however, MP4-SA is all floating point).

39
Directions / Research Opportunities
  • Compiler optimizations for
  • SAOL and other languages with rate semantics
  • high-performance SIMD architectures
  • runtime code specialization
  • Runtime scheduling under limited compute
    resources.
  • SAOL programming environments.
  • Physical modeling.
  • Automatic encoding.
Write a Comment
User Comments (0)
About PowerShow.com