Title: MPEG
1MPEG
- Howell Istance
- School of Computing
- De Montfort University
2Motion Pictures Expert Group
- Established in 1988 with remit to develop
standards for coded representation of audio,
video and their combination - operates within framework of Joint ISO/IEC
Technical Committee (JTC1 on Information
Technology), organised into committees and
sub-committees - originally 25 experts, now approximately 350
experts from 200 companies and academic
institutions, which meet approx. 3 times/year
(depends on committee) - (all) standards work takes a long time, requires
international agreement, (potentially) of great
industrial strategic importance
3MPEG- 1 standards
- video standard for low fidelity video,
implemented in software codecs, suitable for
transmission over computer networks - audio standard has 3 layers, encoding process
increases in complexity and data rates become
lower as layers increase, - Layer 1 - 192 kbps
- Layer 2 - 128 kbps
- Layer 3 - 64 kbps (MPEG 1 - Layer 3 MP3)
- (these data rates are doubled for a stereo
signal)
4MPEG1 - Layer 3 Audio encoding
- Encoders analyse an audio signal and compare it
to psycho-acoustic models representing
limitations in human auditory perception - Encode as much useful information as possible
within restrictions set by bit rate and sampling
frequency - Discard samples where the amplitude is below the
minimum audition threshold for different
frequencies - Auditory masking - a louder sound masks a softer
sound when played simultaneously or close
together, so the softer sound samples can be
discarded
5Psychoacoustic model
Throw away samples which will not be perceived,
ie those under the curve
6MPEG1 - Layer 3 Audio encoding
- Temporal masking - if two tones are close
together on the frequency spectrum and are played
in quick succession, they may appear indistinct
from one another - Reservoir of bytes - data is organised into
frames - space left over in one frame can be
used to store data from adjacent frames that need
additional space - joint stereo - very high and very low frequencies
can not be located in space with the same
precision as sounds towards the centre of the
audible spectrum. Encode these as mono - Huffman encoding removes redundancy in the
encoding of repetitive bit patterns (can reduce
file sizes by 20)
7Masking effects
- Throw samples in region masked by louder tone
8Schematic of MPEG1 - Layer 3 encoding
http//www.iis.fhg.de/amm/techinf/layer3/index.htm
9MPEG - 2 standards
- Video standard for high fidelity video
- Levels define parameters, maximum frame size,
data rate and chrominance subsampling - Profiles may be implemented at one or more
levels - MP_at_ML (main profile at main level) uses CCIR
601 scanning, 420 chrominance subsampling and
supports a data rate of 15Mbps - MP_at_ML used for digital television broadcasting
and DVD - Audio standard essentially same as MPEG-1, with
extensions to cope with surround sound
10MPEG - 4
- MPEG-4 standard activity aimed to define an
audiovisual coding standard to address the needs
of the communication, interactive (computing) and
broadcasting service (TV/film/entertainment)
models - In MPEG-1 and MPEG-2, systems referred to
overall architecture, multiplexing and
synchronisation. - In MPEG-4, systems also includes scene
description, interactivity, content description
and programmability - Initial call for proposals - July 1995, version 2
amendments - December 2000
11Images from Jean-Claude Dufourd, ENST, Paris
12Images from Jean-Claude Dufourd, ENST, Paris
13Images from Jean-Claude Dufourd, ENST, Paris
14MPEG -4 Systems - mission
- Develop a coded, streamable representation for
audio-visual objects and their associated
time-variant data along with a description of how
they are combined - coded representation as opposed to textual
representation - binary encoding for bandwidth
efficiency - streamable as opposed to downloaded -
presentations have a temporal extent rather than
being being based on files of a finite size - audio-visual objects and their associated
time-variant data as opposed to individual
audio or visual streams. MPEG-4 deals with
combinations of streams to create an interactive
visual scene, not with encoding of audio or
visual data
15MPEG-4 Principles
- Audio-visual objects - representation of natural
or synthetic object which has a audio and/or
visual manifestation (e.g video sequence, 3D
animated face) - scene description - information describing where,
when and for how long a-v objects will appear - Interactivity expressed in 3 requirements
- client side interaction with scene description as
well as with exposed properties of a-v objects - behaviour attached to a-v objects, triggered by
events (e.g user generated, timeouts) - client-server interaction, user data sent back to
server, server responds with modifications to
scene (for example)
16MPEG-4 Systems Principles
Interactive scene description
Scene description stream
Object description stream
Visual object stream
Visual object stream
Visual object stream
Audio object stream
17MPEG-4 Systems Principles
Interactive scene description
Scene description stream
Object description stream
Visual object stream
Visual object stream
Visual object stream
Audio object stream
Elementary streams
18Object Descriptor Framework
- Glue between scene description and streaming
resources (elementary descriptors) - object descriptor container structure-
encapsulates all setup and association
information for a set of elementary streams set
of sub-descriptors describing individual streams
(e.g configuration information for stream
decoder) - groups sets of streams that are seen as a single
entity from perspective of scene description - object description framework separated from scene
description so that elementary streams can be
changed and re-located without changing scene
description
19BIFS - BInary Format for Scenes
- Specifies spatial and temporal locations of
objects in scenes, together with their attributes
and behaviours - elements of scene and relationship between them
form a scene graph that must be encoded for
transmission - based heavily on VRML, supports almost all VRML
nodes - does not support use of java in script nodes
(only ECMAScript) - does expand on functionality of VRML - allows a
much broader range of applications to be supported
20BIFS expansions to VRML
- Compressed binary format
- BIFS describes an efficient binary representation
of the scene graph information. - Coding may be either lossless or lossy.
- Coding efficiency derives from a number of
classical compression techniques, plus some novel
ones. - Knowledge of context is exploited heavily in
BIFS. - Streaming
- scene may be transmitted as an initial scene
followed by timestamped modifications to the
scene. - BIFS Command protocol allows replacement of the
entire scenes, addition/deletion/replacement of
nodes and behavioral elements in the scene graph
as well as modification of scene properties.
21BIFS expansions to VRML
- 2D Primitives
- BIFS includes native support for 2D scenes.
- facilitates content creators who wish to produce
low complexity scenes, including the traditional
television and multimedia industries. - Many applications cannot bear the cost of
requiring decoders to have full 3D rendering and
navigation. This is particularly true where
hardware decoders must be of low cost, as for
instance television set-top boxes. - Rather than simply partitioning the multimedia
world into 2D and 3D, MPEG-4 BIFS allows the
combination of 2D and 3D elements in a single
scene.
22BIFS expansions to VRML
- Animation
- A second streaming protocol, BIFS Anim, provides
a low-overhead mechanism for the continuous
animation of changes to numerical values of the
components in the scene. - These streamed animations provide an alternative
to the interpolator nodes supported in both BIFS
and VRML. - Enhanced Audio
- BIFS provides the notion of an "audio scene
graph" - audio sources, including streaming ones, can be
mixed. - audio content can even be processed and
transformed with special procedural code to
produce various sounds effects
23BIFS expansions to VRML
- Facial Animation
- BIFS provides support at the scene level for the
MPEG-4 Facial Animation decoder. - A special set of BIFS nodes expose the properties
of the animated face at the scene level, - animated face can be integrated with all BIFS
functionalities, similarly to any other audio or
visual objects
24(No Transcript)