Title: Audio/Video Compression 4
1Audio/Video Compression
4
- Lecture 3 Multimedia Networks
- Lecture 4 Audio/Video Compression
- Image Video Compression Standards
- Speech Audio Compression Standards
- Wavelet Transform its Application in Compression
2Introduction to Audio/Video Compression
4
- With todays technology, only compression makes
storage/transmission of digital audio/video
streams possible - Redundancy exploitation for compression based on
human perceptive features
3Introduction to Audio/Video Compression
4
- Spatial redundancy Values of neighboring pixels
strongly correlated in natural images - Temporal redundancy Adjacent frames in a video
sequence often show very little change, a strong
audio signal in a given time segment can mask
certain lower level distortion in future past
segments
4Introduction to Audio/Video Compression
4
- Spectral redundancy In multispectral images,
spectral values of same pixel across spectral
bands correlated, an audio signal can completely
mask a sufficiently weaker signal in its
frequency-vicinity - Redundancy across scale Distinct image features
invariant under scaling - Redundancy in stereo Correlations between stereo
images/audio channels
5Introduction to Audio/Video Compression
4
- Spatial/spectral redundancies Transform Coding
- Temporal redundancy DPCM (differential pulse
code modulation), motion estimation/motion
compensation - First compression methods lossless
- Huffman coding
- Ziv-Lempel coding
- Arithmetic coding
- Inadequate for transmission media of low
bandwidth (e.g., ISDN) or for devices of low data
throughput (e.g., CD-ROM)
6Introduction to Audio/Video Compression
4
- Lossless vs. lossy compression
- Intraframe vs. interframe compression
- Symmetrical vs. asymmetrical compression
- Real-time Encoding-decoding delaylt50 ms
- Scalable Frames coded at different resolutions
or quality levels - Recent advanced compression methods reduce
bandwidths enormously without reduction of
perceptive quality
7Introduction to Audio/Video Compression
4
Preprocessing
Source coding
Entropy coding
- Entropy coding Arithmetic coding, Huffman
coding, Run-length coding - Source coding DPCM, DCT, DWT, motion-estimation/m
otion compensation - Hybrid Coding H.261, H.263, H.263, JPEG, MPEG1,
MPEG2, MPEG4, Perceptual Audio Coder
Uncompressed data
Compressed data
Hybrid coding source coding entropy coding
8Wavelet Theory
4
- A unified framework for analysis of
non-stationary signals - Wavelet transform (WT) Alternative to classical
Short-Time Fourier Transform (STFT) or Gabor
Transform - By contrast to STFT, WT does constant-Q or
relative bandwidth frequency analysis short
windows at high frequencies and long windows at
low frequencies
9Short-Time Fourier Transform
4
- Fourier Transform (FT)
- X(f) Projection of signal x(t) along exp(j2?ft)
- How signal energy being distributed over
frequencies
10Short-Time Fourier Transform
4
- To know local energy distribution, STFT is
introduced - g(t) A window of finite support
- Around local time ?, how signal energy being
distributed over frequencies
11Short-Time Fourier Transform
4
- Given f, STFT(?,?) Output of a bandpass filter
having the window function (modulated to f) as
its impulse response - Resolution in time/frequency by window g(t)
12Short-Time Fourier Transform
4
- Uncertainty Principle (Heisenberg)
- Once window g(t) chosen, resolution in
time/frequency fixed
13Continuous Wavelet Transform (CWT)
4
- If ???? can be kept constant, resolution in
frequency becomes arbitrarily good at low
frequencies while resolution in time becomes
arbitrarily good at high frequencies - CWT follows the above idea but all impulse
responses of filter bank are defined as scaled
versions of the same prototype or basic wavelet
h(t)
14Continuous Wavelet Transform (CWT)
4
- Let
-
- h(t) Any bandpass function
15Continuous Wavelet Transform (CWT)
4
16Continuous Wavelet Transform (CWT)
4
- Resolution in frequency of ha(t)
17Continuous Wavelet Transform (CWT)
4
- Given a fixed frequency f0, if scale a is chosen
as
18Continuous Wavelet Transform (CWT)
4
- By definition of CWT
- Scale a not linked to frequency modulation but
related to time-scaling
19Continuous Wavelet Transform (CWT)
4
- Signal x(at) seen through a constant length
filter centered at ?/a - Larger scale a is, more contracted signal x(t)
becomes - Smaller scale a is, more dilated signal x(t)
becomes - Larger scales CWT(?,a) provides more global view
of signal x(t) - Smaller scales CWT(?,a) provides more detailed
view of signal x(t)
20Continuous Wavelet Transform (CWT)
4
- Define wavelet ha,?
-
Inner product or correlation between x(t) and
ha,? - CWT(?,a) called analysis stage (of signal x(t))
at scale a
21Continuous Wavelet Transform (CWT)
4
- x(t) can be recovered from multi-scale analysis
if -
22Continuous Wavelet Transform (CWT)
4
- Energy conservation
- Signal energy distributed at scale a by
- wavelet spectrogram, or
scalogram, distribution of signal energy in
time-scale plane (associated with area
measure )
23Continuous Wavelet Transform (CWT)
4
- Larger scales ? more global view ? courser
resolutions - Smaller scales ? more detailed view ? finer
resolutions - CWT decomposition of signal over scales ? signal
energy distribution with various resolutions
24Discrete Wavelet Transform (DWT)
4
- Two methods developed independently in late 70s
and early 80s - Subband Coding
- Pyramid Coding or multiresolution signal analysis
25Multiresolution Pyramid
4
- Given an original sequence x(n), n ? Z, define a
lower resolution signal
Where g(n) a halfband lowpass filter
26Multiresolution Pyramid
4
- An approximation of x(n) from y(n)
Where y(2n) y(n), y(2n1) 0 g(n) an
interpolative filter
27Multiresolution Pyramid
4
- If g(n) and g(n) are perfect halfband filters,
i.e., - then a(n) provides a perfect halfband lowpass
approximation to x(n)
28Multiresolution Pyramid
4
29Multiresolution Pyramid
4
- Let
- d(n) x(n) - a(n)
- Then x(n) a(n) d(n)
- But ? redundancy between a(n) and d(n)
- If x(n) uses sampling rate fs , d(n) and y(n)
- use sampling rate fs or fs /2, respectively
30Multiresolution Pyramid
4
- Pyramid decomposition a redundant
representation - But redundancy upper bounded by
- 1 1/2 1/4 lt 2 in one dimensional
system - x(n) y(n) y (n)
-
- d(n) d (n)
31Multiresolution Pyramid
4
- For perfect halfband lowpass filters g(n) and
g(n), - it is clear that d(n) contains frequencies above
?/2 - of x(n), and thus can also be subsampled by two
- without loss of information.
- In a pyramid, it is possible to take very good
lowpass filters and derive visually pleasing
course versions - In a subband scheme, critical sampling is
accomplished at a price of a constraint filter
design and a relatively poor lowpass version as a
course approximation undesirable if the course
version is used for viewing in a compatible
subchannel
32Subband Coding
4
- One stage of a pyramid decomposition ?
- a half rate low resolution signal
- a full rate difference signal
- (samples) increased by 50
- If filter g(n) and g(n) meet certain conditions,
oversampling can be avoided - Subband coding first popularized in speech
compression does not produce such redundancy
33Subband Coding
4
- A full-band one dimensional signal is decomposed
into two subbands using an analysis filter bank - Ideally, the analysis filter bank consists of a
lowpass filter and a highpass filter with
nonoverlapping frequency responses and unit gain
over their respective bandwidth - After filtering, lowpass and highpass signals
each have only a half of original bandwidth or
frequency content, and thus can be downsampled
in half - But ideal filters are unrealizable
34Subband Coding
4
- By using overlapping responses, frequency gaps in
subband signals can be prevented - Aliasing will be introduced when lowpass and
highpass signals are downsampled in half - The aliasing effect can be eliminated to produce
perfect reconstruction at synthesis stage - Lowpass and highpass signals will each have a
bandwidth more than a half of original bandwidth - Quadrature Mirror Filters (QMF) for
analysis/synthesis filtering
35Subband Coding
4
- Output signals from analysis bank after
downsampling - y1(n)(h1x)(2n)
- y2(n)(h2x)(2n)
-
- After quantization, y1(n) and y2(n) ?
- After upsampling,
become -
36Subband Coding
4
- Output signals from synthesis bank
- Reconstructed signal
-
37Subband Coding
4
- Ignoring quantization or coding effect,
- If H1(z), G1(z) are ideal lowpass filters and
H2(z), G2(z) are ideal highpass filters, -
38Subband Coding
4
39Subband Coding
4
- Implying
- Indicating
is the aliasing component when
filters are not ideal, which is desired to be
zero -
40Subband Coding
4
- To have perfect reconstruction in non-ideal
filtering case, the iff conditions are - If H2(z)H1(-z), G1(z)2H1(z), G2(z)-2H1(-z),
the aliased term becomes zero and the
reconstructed is given
41Subband Coding
4
- For perfect reconstruction, we need
- or
- Using symmetric linear phase FIR of length N for
H1 results in
42Subband Coding
4
1
?
0
?
?/2
43Subband Coding
4
- If subband filters Hi(z), Gi(z) satisfy three
conditions - perfect reconstruction results, too
- Aliased term
44Multiresolution Wavelet Representation and
Approximation
4
- Embedded linear spaces in L2(R)
- Let Aj be an orthogonal projection on Vj
- Let Oj be the orthonormal complement of Vj in
Vj1
45Multiresolution Wavelet Representation and
Approximation
4
- Let Dj be an orthogonal projection on Oj
- Then an original signal A0f can be decomposed as
46Multiresolution Wavelet Representation and
Approximation
4
- A-J f the orthogonal projection of A0f on
- D-j f the orthogonal projection of A0f on O-j
- D-j f and D-k f orthogonal to each
other or uncorrelated to each other - D-j f orthogonal to A-J f ,
or uncorrelated to A-J f - A-J f a coarse version of A0f
- details of A0f arranged
from coarser to finer
47Multiresolution Wavelet Representation and
Approximation
4
- Let be an orthonormal
basis of Vj - Aj f can be characterized by the coefficients of
orthonormal expansion - The sequence denoted by and called a
discrete approximation of f in Vj
48Multiresolution Wavelet Representation and
Approximation
4
- Let be an orthonormal
basis of Oj - Dj f characterized by the coefficients
- The sequence denoted by and called a
discrete approximation of f in Oj
49Multiresolution Wavelet Representation and
Approximation
4
- Thus, A0f can be characterized by
- can be further characterized by
- This set of discrete signals is called orthogonal
wavelet representation - is organized as a coarse version added
by increasing fine details - The orthogonal representation decorrelated
representation
50Multiresolution Wavelet Representation and
Approximation
4
- If we require
- Aj f is band-limited such that it can be sampled
by a rate of 2j, i.e., 2j samples per time or
length unit
51Multiresolution Wavelet Representation and
Approximation
4
- Translation invariant with A0
- Translation invariant with produced by
52Multiresolution Wavelet Representation and
Approximation
4
- Then s can be constructed by a scaling
function -
-
- Furthermore, let
- then
53Multiresolution Wavelet Representation and
Approximation
4
- filtered by and downsampled
by two - Let
54Multiresolution Wavelet Representation and
Approximation
4
- Let
- then s can be constructed by
- Let
- then
55Multiresolution Wavelet Representation and
Approximation
4
- filtered by and
downsampled by two - From
- H,G Quadrature Mirror Filters
56Multiresolution Wavelet Representation and
Approximation
4
57Multiresolution Wavelet Representation and
Approximation
4
58Multiresolution Wavelet Representation and
Approximation
4
- Think of
- Then, analysis stage for subband or wavelet
decomposition is the same - Higher resolution signal ? Two low resolution
signals through filtering by
and downsampling by two
59Multiresolution Wavelet Representation and
Approximation
4
- Synthesis stage for subband or wavelet
decomposition is different - For subband low resolution signals upsampled by
two, followed by filtering by
, followed by summation to
reconstruct higher resolution signal - For wavelet low resolution signals filtered by
the same , and downsampled by two,
followed by summation to reconstruct higher
resolution signal
60Multiresolution Wavelet Representation and
Approximation
4
- After filtering at analysis stage, two produced
signals have only a half resolution as the
original signal - Downsampling by two is justifiable
- Before filtering at synthesis stage, upsampling
by two on two low resolution signals in subband
decomposition seems not well justifiable
61Multiresolution Wavelet Representation and
Approximation
4
62Multiresolution Wavelet Representation and
Approximation
4
63Multiresolution Wavelet Representation and
Approximation
4