MPEG - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

MPEG

Description:

(these data rates are doubled for a stereo signal) MPEG1 - Layer 3 Audio encoding ... joint stereo - very high and very low frequencies can not be located in space ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 25

Provided by: howelli

Category:

more less

Transcript and Presenter's Notes

Title: MPEG

1
MPEG

Howell Istance
School of Computing
De Montfort University

2
Motion Pictures Expert Group

Established in 1988 with remit to develop
standards for coded representation of audio,
video and their combination
operates within framework of Joint ISO/IEC
Technical Committee (JTC1 on Information
Technology), organised into committees and
sub-committees
originally 25 experts, now approximately 350
experts from 200 companies and academic
institutions, which meet approx. 3 times/year
(depends on committee)
(all) standards work takes a long time, requires
international agreement, (potentially) of great
industrial strategic importance

3
MPEG- 1 standards

video standard for low fidelity video,
implemented in software codecs, suitable for
transmission over computer networks
audio standard has 3 layers, encoding process
increases in complexity and data rates become
lower as layers increase,
Layer 1 - 192 kbps
Layer 2 - 128 kbps
Layer 3 - 64 kbps (MPEG 1 - Layer 3 MP3)
(these data rates are doubled for a stereo
signal)

4
MPEG1 - Layer 3 Audio encoding

Encoders analyse an audio signal and compare it
to psycho-acoustic models representing
limitations in human auditory perception
Encode as much useful information as possible
within restrictions set by bit rate and sampling
frequency
Discard samples where the amplitude is below the
minimum audition threshold for different
frequencies
Auditory masking - a louder sound masks a softer
sound when played simultaneously or close
together, so the softer sound samples can be
discarded

5
Psychoacoustic model
Throw away samples which will not be perceived,
ie those under the curve
6
MPEG1 - Layer 3 Audio encoding

Temporal masking - if two tones are close
together on the frequency spectrum and are played
in quick succession, they may appear indistinct
from one another
Reservoir of bytes - data is organised into
frames - space left over in one frame can be
used to store data from adjacent frames that need
additional space
joint stereo - very high and very low frequencies
can not be located in space with the same
precision as sounds towards the centre of the
audible spectrum. Encode these as mono
Huffman encoding removes redundancy in the
encoding of repetitive bit patterns (can reduce
file sizes by 20)

7
Masking effects

Throw samples in region masked by louder tone

8
Schematic of MPEG1 - Layer 3 encoding
http//www.iis.fhg.de/amm/techinf/layer3/index.htm
9
MPEG - 2 standards

Video standard for high fidelity video
Levels define parameters, maximum frame size,
data rate and chrominance subsampling
Profiles may be implemented at one or more
levels
MP_at_ML (main profile at main level) uses CCIR
601 scanning, 420 chrominance subsampling and
supports a data rate of 15Mbps
MP_at_ML used for digital television broadcasting
and DVD
Audio standard essentially same as MPEG-1, with
extensions to cope with surround sound

10
MPEG - 4

MPEG-4 standard activity aimed to define an
audiovisual coding standard to address the needs
of the communication, interactive (computing) and
broadcasting service (TV/film/entertainment)
models
In MPEG-1 and MPEG-2, systems referred to
overall architecture, multiplexing and
synchronisation.
In MPEG-4, systems also includes scene
description, interactivity, content description
and programmability
Initial call for proposals - July 1995, version 2
amendments - December 2000

11
Images from Jean-Claude Dufourd, ENST, Paris
12
Images from Jean-Claude Dufourd, ENST, Paris
13
Images from Jean-Claude Dufourd, ENST, Paris
14
MPEG -4 Systems - mission

Develop a coded, streamable representation for
audio-visual objects and their associated
time-variant data along with a description of how
they are combined
coded representation as opposed to textual
representation - binary encoding for bandwidth
efficiency
streamable as opposed to downloaded -
presentations have a temporal extent rather than
being being based on files of a finite size
audio-visual objects and their associated
time-variant data as opposed to individual
audio or visual streams. MPEG-4 deals with
combinations of streams to create an interactive
visual scene, not with encoding of audio or
visual data

15
MPEG-4 Principles

Audio-visual objects - representation of natural
or synthetic object which has a audio and/or
visual manifestation (e.g video sequence, 3D
animated face)
scene description - information describing where,
when and for how long a-v objects will appear
Interactivity expressed in 3 requirements
client side interaction with scene description as
well as with exposed properties of a-v objects
behaviour attached to a-v objects, triggered by
events (e.g user generated, timeouts)
client-server interaction, user data sent back to
server, server responds with modifications to
scene (for example)

16
MPEG-4 Systems Principles
Interactive scene description
Scene description stream
Object description stream
Visual object stream
Visual object stream
Visual object stream
Audio object stream
17
MPEG-4 Systems Principles
Interactive scene description
Scene description stream
Object description stream
Visual object stream
Visual object stream
Visual object stream
Audio object stream
Elementary streams
18
Object Descriptor Framework

Glue between scene description and streaming
resources (elementary descriptors)
object descriptor container structure-
encapsulates all setup and association
information for a set of elementary streams set
of sub-descriptors describing individual streams
(e.g configuration information for stream
decoder)
groups sets of streams that are seen as a single
entity from perspective of scene description
object description framework separated from scene
description so that elementary streams can be
changed and re-located without changing scene
description

19
BIFS - BInary Format for Scenes

Specifies spatial and temporal locations of
objects in scenes, together with their attributes
and behaviours
elements of scene and relationship between them
form a scene graph that must be encoded for
transmission
based heavily on VRML, supports almost all VRML
nodes
does not support use of java in script nodes
(only ECMAScript)
does expand on functionality of VRML - allows a
much broader range of applications to be supported

20
BIFS expansions to VRML

Compressed binary format
BIFS describes an efficient binary representation
of the scene graph information.
Coding may be either lossless or lossy.
Coding efficiency derives from a number of
classical compression techniques, plus some novel
ones.
Knowledge of context is exploited heavily in
BIFS.
Streaming
scene may be transmitted as an initial scene
followed by timestamped modifications to the
scene.
BIFS Command protocol allows replacement of the
entire scenes, addition/deletion/replacement of
nodes and behavioral elements in the scene graph
as well as modification of scene properties.

21
BIFS expansions to VRML

2D Primitives
BIFS includes native support for 2D scenes.
facilitates content creators who wish to produce
low complexity scenes, including the traditional
television and multimedia industries.
Many applications cannot bear the cost of
requiring decoders to have full 3D rendering and
navigation. This is particularly true where
hardware decoders must be of low cost, as for
instance television set-top boxes.
Rather than simply partitioning the multimedia
world into 2D and 3D, MPEG-4 BIFS allows the
combination of 2D and 3D elements in a single
scene.

22
BIFS expansions to VRML

Animation
A second streaming protocol, BIFS Anim, provides
a low-overhead mechanism for the continuous
animation of changes to numerical values of the
components in the scene.
These streamed animations provide an alternative
to the interpolator nodes supported in both BIFS
and VRML.
Enhanced Audio
BIFS provides the notion of an "audio scene
graph"
audio sources, including streaming ones, can be
mixed.
audio content can even be processed and
transformed with special procedural code to
produce various sounds effects

23
BIFS expansions to VRML

Facial Animation
BIFS provides support at the scene level for the
MPEG-4 Facial Animation decoder.
A special set of BIFS nodes expose the properties
of the animated face at the scene level,
animated face can be integrated with all BIFS
functionalities, similarly to any other audio or
visual objects

24
(No Transcript)

Write a Comment

User Comments (0)