Roberta Eklund - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Roberta Eklund

Description:

OVERVIEW Roberta Eklund ... ISO 14496-4(MPEG-4 Audio) Committee Draft. MPEG document N1903 E ... Samp.Rate 48 48 48 48 8 8 8 16 MOPS/ MIPS 5 3 6 3 4 typ. 10 max 2 2 4 ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 31
Provided by: Matti159
Category:

less

Transcript and Presenter's Notes

Title: Roberta Eklund


1
MPEG-4 AUDIO OVERVIEW
  • Roberta Eklund
  • Consultant

2
MPEG-4 Audio Overview
  • Natural Audio
  • T/F
  • CELP
  • PARA
  • Structured Audio
  • SAOL
  • SASL
  • SASBF
  • MIDI-DLS-version 2
  • TTS
  • Cross Tool(Algorithm) Functionality
  • Pitch/tempo change
  • Bitrate scalability
  • Computation complexity scalability
  • Error robustness
  • Audio related effects
  • Acoustic virtualization

3
Different Tools for Bitrates/Application
4
MPEG-4 Audio Tools PROFILES
  • Object Profile - defines the syntax of the
    bitstream for one single Object, that can
    represent a meaningful entity in the Audio or
    Visual scene. Elementary bitstream
  • Composition Profile - defines which different
    Object Profiles can be combined in the Audio or
    Visual scene. Combinations of Elementary
    bitstreams.

5
OBJECT PROFILES
6
Combination Profiles
7
MPEG-4 Encoder Structure
8
MPEG-4 T/F EncoderConfiguration
9
MPEG-4 T/F DecoderConfiguration
10
Block Diagram of CELP Encoder
11
Block Diagram of CELP Decoder
  • Excitation signal generator
  • codebook
  • regular pulse excitation (RPE)
  • multi-pulse excitation (MPE)

12
Block Diagram of PARA Encoder
13
Block Diagram of PARA Decoder
14
PARA is Two Codecs in One
  • Two operating modes
  • harmonic and noise components (HVXC)
  • for speech coding at 2...4 kbps
  • harm. indiv. sinusoidal comp. noise (HILN)
  • for coding of music signals with low complexity
    content (e.g. single instruments) at 4...16 kbps
  • combination of both modes
  • support by syntax, defined transition
  • automatic mode selector
  • cross fade from one signal to another one

15
Text-to-Speech
  • Phonemic (language-independent) syntax
  • Prosody, timing cues
  • Language, dialect, gender, age parameters
  • Automatic synchronization with FBA
  • Exact TTS synthesis non-normative only
    interface is specified

16
Structured Audio
  • Structured Audio - Sound coding using structured
    descriptions
  • Structured Audio decoder - music and sound-effect
    synthesis
  • MMA, Microsoft, EMU now collaborating on MIDI
    DLS-version 2 in MPEG4

17
SAOL
  • Downloadable BNF synthesis grammar
  • Header contains description of several
    synthesizers and effects processors control
    algorithms and routing instructions for audio
    flow of control
  • SAOL has 100 primitive processing instructions,
    signal generators and operators which fill
    wavetables with data.

18
SASL and MIDI
  • New format for describing control parameters
  • - Basically a scheduler of audio events
  • - Designed to interface well with SAOL
  • - New Control Language Similar to MIDI
  • MIDI (Musical Instrument Digital Interface)
  • Simpler format for describing control
  • Included as alternate control method
  • Leverages existing authoring tools
  • Gives backwards compatibility to SA

19
DLS Level 2
  • Aims at consistent synthetic audio playback
    across wide range of platforms
  • Defines a simple wavetable synthesizer
  • Bitstream includes sound samples
  • Score expressed in MIDI
  • Growing support from both software and hardware
    developers
  • DLS Part of DirectMusic in Microsofts DirectX
    6.0

20
DLS-2 synthesizer model
  • Simple yet powerful structure much alike to many
    existing synthesizers in the market (eg in PC
    soundcards)
  • Uses loopable samples as sound sources
    (wavetable)
  • variable routing of control sources
  • 2 envelopes for amplitude control
  • 2 low frequency oscillators
  • 1-pole dynamic low-pass filter
  • Standardized response to MIDI controllers

21
Audio Bifs
Synchronization with Visual!
BIFSstuff
AudioMix
AudioMix
AudioFX
HRTF
AudioFX
AudioFX
AudioDelay
AudioSource
AudioSource
AudioSource
Finger snaps (Parametric)
Piano (SA)
Bass (SA)
Audiochannels
22
Demo Audio BIFS
23
Conclusion
  • MPEG-4 Audio attempts to offer solutions to all
    spectra of sound.
  • Some of the tools are more stable, while others
    are still in Research and Development.
  • MPEG2-AAC is the best multi-channel lossy audio
    compression standard to date.

24
Acknowledgements
  • I would like to thank the authors from the
    references for providing the material presented
    here today.

25
Definitions
  • T/F Time/Frequency (MDCT transform)
  • AAC Advanced Audio Coding
  • PARA Parametric
  • CELP Code Excited Linear Prediction
  • SA Structured Audio
  • PNS Perceptual Noise Substitution
  • HVXC Harmonic Vector eXcitation Coding
  • HILN Harmonic and Individual Line Noise
  • SAOL Structured Audio Orchestra Language
  • SASL Structured Audio Score Language
  • MIDI Musical Instrument Digital Interface
  • TTS Text to Speech

26
More Definitions
  • CD Committee Draft
  • IS13818-7 Advanced Audio Coding
  • LC Low Complexity
  • BSAC Bit Sliced Arithmetic Coding
  • SSR Scalable Sample Rate
  • PNS Perceptual Noise Substitution
  • VBR Variable Bit Rate
  • TLSS Tools for Large Step Scalability
  • SNHC Synthetic/Natural Hybrid Coding
  • DLS Downloadable Samples

27
Natural Audio Complexity
28
AAC Decoder Complexity Evaluation
MPEG AAC Decoder Complexity 2-channel Main
Profile 40 of 133 MHz Pentium 2-channel Low
Complexity 25 of 133 MHz Pentium 5-channel Main
Profile 90 sq. mm die, 0.5 micron
CMOS 5-channel Low Complexity 60 sq.mm die,
0.5 micron CMOS
29
AAC Test Results
  • Test at BBC and NHK according to ITU-R BS.1116
  • triple-stimulus/hidden-reference/double-blind
  • ITU-R 5-point impairment scale
  • 95 Confidence Intervals
  • MPEG AAC provides indistinguishable quality at
    320 kb/s per five channels
  • MPEG AAC at 320 kb/s outperforms MPEG BC Layer II
    at 640 kb/s per five channels
  • Recent Stereo Tests at NHK Showed MPEG AAC
    provides indistinguishable quality at 128 kb/s
    per two channels

30
References
  • M. Bosi, E. Schrierer, B. Edler, Peter G.
    Schreiner MPEG-4 Seminar, Fribourg, Switzerland
    1997
  • S. Quackenbush, Coding of Natural Audio in
    MPEG-4, Proc IEEE ICASSP, Seattle, 1998
  • B. Grill, B. Edler, I. Kaneko, Y. Lee, M.
    Nishiguichi, E. Scheirer, and M. Väänänen (Eds).
    ISO 14496-4(MPEG-4 Audio) Committee Draft. MPEG
    document N1903
  • E. Schrier, The MPEG-4 Structured Audio
    Standard, Proc IEEE ICASSP, Seattle, 1998
  • Juergen Herre, Updated Description for
    Perceptual Noise Substitution Tool, MPEG
    Document M2692
  • E. Scheirer, R. Väänänen, J. Huopaniemi,
    AudioBIFS The MPEG-4 Standard for Effects
    Processing, AES, SF, 1998
  • Overview http//www.cselt.it/mpeg/standards/mpeg
    -4/mpeg-4.htm
Write a Comment
User Comments (0)
About PowerShow.com