Introduction to Wavelet Transform

About This Presentation

Title:

Introduction to Wavelet Transform

Description:

Introduction to Wavelet Transform Time Series are Ubiquitous! What kind of Could be useful? Impulse Function (Haar): Best time resolution Sinusoids (Fourier ... – PowerPoint PPT presentation

Number of Views:871

Avg rating:3.0/5.0

Slides: 120

Provided by: showDocja

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Wavelet Transform

1
Introduction to Wavelet Transform
2
Time Series are Ubiquitous!
A random sample of 4,000 graphics from 15 of the
worlds newspapers published from 1974 to 1989
found that more than 75 of all graphics were
time series (Tufte, 1983).
3
Why is Working With Time Series so Difficult?
Answer We are dealing with subjective notions of
similarity.
The definition of similarity depends on the
user, the domain and the task at hand. We need to
be able to handle this subjectivity.
4
Wavelet Transform - Overview
History

Fourier (1807)
Haar (1910)
Math World

5
Wavelet Transform - Overview

What kind of Could be useful?
Impulse Function (Haar) Best time resolution
Sinusoids (Fourier) Best frequency resolution
We want both of the best resolutions

Heisenberg (1930)
Uncertainty Principle
There is a lower bound for(An intuitive prove
in Mac91)

6
Wavelet Transform - Overview

Gabor (1945)
Short Time Fourier Transform (STFT)
Disadvantage Fixed window size

7
Wavelet Transform - Overview

Constructing Wavelets
Daubechies (1988)
Compactly Supported Wavelets
Computation of WT Coefficients
Mallat (1989)
A fast algorithm using filter banks

8
Discrete Fourier Transform I
Basic Idea Represent the time series as a linear
combination of sines and cosines, but keep only
the first n/2 coefficients. Why n/2
coefficients? Because each sine wave requires 2
numbers, for the phase (w) and amplitude (A,B).
X
X'
0
20
40
60
80
100
120
140
Jean Fourier 1768-1830
0
1
2
3
4
5
6
7
Excellent free Fourier Primer Hagit Shatkay, The
Fourier Transform - a Primer'', Technical Report
CS-95-37, Department of Computer Science, Brown
University, 1995. http//www.ncbi.nlm.nih.gov/CBB
research/Postdocs/Shatkay/
8
9
9
Discrete Fourier Transform II

Pros and Cons of DFT as a time series
representation.
Good ability to compress most natural signals.
Fast, off the shelf DFT algorithms exist.
O(nlog(n)).
(Weakly) able to support time warped queries.
Difficult to deal with sequences of different
lengths.
Cannot support weighted distance measures.

X
X'
0
20
40
60
80
100
120
140
0
1
2
3
4
5
6
7
Note The related transform DCT, uses only cosine
basis functions. It does not seem to offer any
particular advantages over DFT.
8
9
10
History
11
Discrete Wavelet Transform I
Basic Idea Represent the time series as a linear
combination of Wavelet basis functions, but keep
only the first N coefficients. Although there
are many different types of wavelets, researchers
in time series mining/indexing generally use Haar
wavelets. Haar wavelets seem to be as powerful
as the other wavelets for most problems and are
very easy to code.
Alfred Haar 1885-1933
Excellent free Wavelets Primer Stollnitz, E.,
DeRose, T., Salesin, D. (1995). Wavelets for
computer graphics A primer IEEE Computer
Graphics and Applications.
12
Wavelet Series
13
Discrete Wavelet Transform III

Pros and Cons of Wavelets as a time series
representation.
Good ability to compress stationary signals.
Fast linear time algorithms for DWT exist.
Able to support some interesting non-Euclidean
similarity measures.
Signals must have a length n 2some_integer
Works best if N is 2some_integer. Otherwise
wavelets approximate the left side of signal at
the expense of the right side.
Cannot support weighted distance measures.

14
Singular Value Decomposition I
Basic Idea Represent the time series as a linear
combination of eigenwaves but keep only the first
N coefficients. SVD is similar to Fourier and
Wavelet approaches, we represent the data in
terms of a linear combination of shapes (in this
case eigenwaves). SVD differs in that the
eigenwaves are data dependent. SVD has been
successfully used in the text processing
community (where it is known as Latent Symantec
Indexing ) for many years. Good free SVD Primer
Singular Value Decomposition - A Primer. Sonia
Leach
X
X'
SVD
James Joseph Sylvester 1814-1897
0
20
40
60
80
100
120
140
Camille Jordan (1838--1921)
Eugenio Beltrami 1835-1899
15
Singular Value Decomposition II
How do we create the eigenwaves?
We have previously seen that we can regard time
series as points in high dimensional space. We
can rotate the axes such that axis 1 is aligned
with the direction of maximum variance, axis 2 is
aligned with the direction of maximum variance
orthogonal to axis 1 etc. Since the first few
eigenwaves contain most of the variance of the
signal, the rest can be truncated with little
loss.
X
X'
SVD
0
20
40
60
80
100
120
140
This process can be achieved by factoring a M by
n matrix of time series into 3 other matrices,
and truncating the new matrices at size N.
16
Singular Value Decomposition III

Pros and Cons of SVD as a time series
representation.
Optimal linear dimensionality reduction
technique .
The eigenvalues tell us something about the
underlying structure of the data.
Computationally very expensive.
Time O(Mn2)
Space O(Mn)
An insertion into the database requires
recomputing the SVD.
Cannot support weighted distance measures or non
Euclidean measures.

X
X'
SVD
0
20
40
60
80
100
120
140
Note There has been some promising research into
mitigating SVDs time and space complexity.
17
Piecewise Linear Approximation I
Basic Idea Represent the time series as a
sequence of straight lines. Lines could be
connected, in which case we are allowed N/2
lines If lines are disconnected, we are
allowed only N/3 lines Personal experience on
dozens of datasets suggest disconnected is
better. Also only disconnected allows a lower
bounding Euclidean approximation
X
Karl Friedrich Gauss 1777 - 1855
X'
0
20
40
60
80
100
120
140

Each line segment has
length
left_height
(right_height can be inferred by looking at the
next segment)

Each line segment has
length
left_height
right_height

18
Problem with Fourier
Fourier analysis -- breaks down a signal into
constituent sinusoids of different frequencies.
A serious drawback in transforming to the
frequency domain, time information is lost. When
looking at a Fourier transform of a signal, it is
impossible to tell when a particular event took
place.
19
Function Representations

sequence of samples (time domain)
finite difference method
pyramid (hierarchical)
polynomial
sinusoids of various frequency (frequency domain)
Fourier series
piecewise polynomials (finite support)
finite element method, splines
wavelet (hierarchical, finite support)
(time/frequency domain)

20
What Are Wavelets?

In general, a family of representations using
hierarchical (nested) basis functions
finite (compact) support
basis functions often orthogonal
fast transforms, often linear-time

21
Function Representations Desirable Properties

generality approximate anything well
discontinuities, nonperiodicity, ...
adaptable to application
audio, pictures, flow field, terrain data, ...
compact approximate function with few
coefficients
facilitates compression, storage, transmission
fast to compute with
differential/integral operators are sparse in
this basis
Convert n-sample function to representation in
O(nlogn) or O(n) time

22
Wavelet History, Part 1

1805 Fourier analysis developed
1965 Fast Fourier Transform (FFT) algorithm
1980s beginnings of wavelets in physics, vision,
speech processing (ad hoc)
little theory why/when do wavelets work?
1986 Mallat unified the above work
1985 Morlet Grossman continuous wavelet
transform asking how can you get perfect
reconstruction without redundancy?

23
Wavelet History, Part 2

1985 Meyer tried to prove that no orthogonal
wavelet other than Haar exists, found one by
trial and error!
1987 Mallat developed multiresolution theory,
DWT, wavelet construction techniques (but still
noncompact)
1988 Daubechies added theory found compact,
orthogonal wavelets with arbitrary number of
vanishing moments!
1990s wavelets took off, attracting both
theoreticians and engineers

24
Time-Frequency Analysis

For many applications, you want to analyze a
function in both time and frequency
Analogous to a musical score
Fourier transforms give you frequency
information, smearing time.
Samples of a function give you temporal
information, smearing frequency.
Note substitute space for time for pictures.

25
Comparison to Fourier Analysis

Fourier analysis
Basis is global
Sinusoids with frequencies in arithmetic
progression
Short-time Fourier Transform ( Gabor filters)
Basis is local
Sinusoid times Gaussian
Fixed-width Gaussian window
Wavelet
Basis is local
Frequencies in geometric progression
Basis has constant shape independent of scale

26
Wavelets are faster than ffts!
27
The results of the CWT are many wavelet
coefficients, which are a function of scale and
position
28
Gabors Proposal Short Time Fourier Transform
Requirements
Signal in time domain require short time window
to depict features of signal.
Signal in frequency domain require short
frequency window (long time window) to depict
features of signal.
29
What are wavelets?
Wavelets are functions defined over a finite
interval and having an average value of zero.
30
What is wavelet transform?
The wavelet transform is a tool for carving up
functions, operators, or data into components of
different frequency, allowing one to study each
component separately.
The basic idea of the wavelet transform is to
represent any arbitrary function ƒ(t) as a
superposition of a set of such wavelets or basis
functions.
These basis functions or baby wavelets are
obtained from a single prototype wavelet called
the mother wavelet, by dilations or contractions
(scaling) and translations (shifts).
31
The continuous wavelet transform (CWT)
Fourier Transform
FT is the sum over all the time of signal f(t)
multiplied by a complex exponential.
32
The variables s and t are the new dimensions,
scale and translation (position), after the
wavelet transform.
33
s is the scale factor, t is the translation
factor and the factor s-1/2 is for energy
normalization across the different scales.
It is important to note that in the above
transforms the wavelet basis functions are not
specified.
This is a difference between the wavelet
transform and the Fourier transform, or other
transforms.
34
Scale
Scaling a wavelet simply means stretching (or
compressing) it.
35
Scale and Frequency
Low scale a
Compressed wavelet
Rapidly changing details
High scale a
stretched wavelet
slowly changing details
Translation (shift)
Translating a wavelet simply means delaying (or
hastening) its onset.
36
(No Transcript)
37
Discrete Wavelets
Discrete wavelet is written as
j and k are integers and s0 gt 1 is a fixed
dilation step. The translation factor t0 depends
on the dilation step. The effect of discretizing
the wavelet is that the time-scale space is now
sampled at discrete intervals. We usually choose
s0 2
38
A band-pass filter
The wavelet has a band-pass like spectrum
From Fourier theory we know that compression in
time is equivalent to stretching the spectrum and
shifting it upwards
Suppose a2
This means that a time compression of the wavelet
by a factor of 2 will stretch the frequency
spectrum of the wavelet by a factor of 2 and also
shift all frequency components up by a factor of
2.
39
Subband coding
If we regard the wavelet transform as a filter
bank, then we can consider wavelet transforming a
signal as passing the signal through this filter
bank.
The outputs of the different filter stages are
the wavelet- and scaling function transform
coefficients.
In general we will refer to this kind of
analysis as a multiresolution.
That is called subband coding.
40
Splitting the signal spectrum with an iterated
filter bank.
Summarizing, if we implement the wavelet
transform as an iterated filter bank, we do not
have to specify the wavelets explicitly! This is
a remarkable result.
41
The Discrete Wavelet Transform
Calculating wavelet coefficients at every
possible scale is a fair amount of work, and it
generates an awful lot of data. What if we choose
only a subset of scales and positions at which to
make our calculations?
It turns out, rather remarkably, that if we
choose scales and positions based on powers of
two -- so-called dyadic scales and positions --
then our analysis will be much more efficient and
just as accurate. We obtain just such an analysis
from the discrete wavelet transform (DWT).
42
Approximations and Details
The approximations are the high-scale,
low-frequency components of the signal. The
details are the low-scale, high-frequency
components. The filtering process, at its most
basic level, looks like this
The original signal, S, passes through two
complementary filters and emerges as two signals .
43
Downsampling
Unfortunately, if we actually perform this
operation on a real digital signal, we wind up
with twice as much data as we started with.
Suppose, for instance, that the original signal S
consists of 1000 samples of data. Then
the approximation and the detail will each have
1000 samples, for a total of 2000.
To correct this problem, we introduce the
notion of downsampling. This simply means
throwing away every second data point.
44
An example
45
Reconstructing Approximation and Details
Upsampling
46
Wavelet Decomposition
Multiple-Level Decomposition
The decomposition process can be iterated, with
successive approximations being decomposed in
turn, so that one signal is broken down into many
lower-resolution components. This is called the
wavelet decomposition tree.
47
The signal f(t) can be expresses as
DWT
48
(No Transcript)
49
(No Transcript)
50
Wavelet Reconstruction (Synthesis)
Perfect reconstruction
51
(4,0)
y1
x1
(1,0)
52
2-D Discrete Wavelet Transform
A 2-D DWT can be done as follows
Step 1 Replace each row with its 1-D DWT
Step 2 Replace each column with its 1-D DWT
Step 3 repeat steps (1) and (2) on the lowest
subband for the next scale
Step 4 repeat steps (3) until as many scales as
desired have been completed
53
Image at different scales
54
Correlation between features at different scales
55
Wavelet construction a simplified approach

Traditional approaches to wavelets have used a
filterbank interpretation
Fourier techniques required to get synthesis
(reconstruction) filters from analysis filters
Not easy to generalize

56

Wavelet construction lifting

3 steps
Split
Predict (P step)
Update (U step)

57
Example the Haar wavelet

S step
Splits the signal into odd and even samples

even samples
odd samples
58
Example the Haar wavelet

P step
Predict the odd samples from the even samples

For the Haar wavelet, the prediction for the odd
sample is the previous even sample
59
Example the Haar wavelet
Detail signal
l
60
Example the Haar wavelet

U step
Update the even samples to produce the next
coarser scale approximation

The signal average is maintained
61
Summary of the Haar wavelet decomposition
Can be computed in place
..
..
-1
-1
P step
1/2
U step
1/2
62
Inverse Haar wavelet transform

Simply run the forward Haar wavelet transform
backwards!

Then merge even and odd samples
Merge
63
General lifting stage of wavelet decomposition

U
P
Split
-
64
Multi-level wavelet decomposition

We can produce a multi-level decomposition by
cascading lifting stages

lift
lift
lift
65
General lifting stage of inverse wavelet
synthesis
-
P
Merge
U

66
Multi-level inverse wavelet synthesis

We can produce a multi-level inverse wavelet
synthesis by cascading lifting stages

lift
...
lift
lift
67
Advantages of the lifting implementation

Inverse transform
Inverse transform is trivial just run the code
backwards
No need for Fourier techniques
Generality
The design of the transform is performed without
reference to particular forms for the predict and
update operators
Can even include non-linearities (for integer
wavelets)

68
Example 2 the linear spline wavelet

A more sophisticated wavelet uses slightly more
complex P and U operators
Uses linear prediction to determine odd samples
from even samples

69
The linear spline wavelet

P-step linear prediction

Linear prediction at odd samples
Detail signal (prediction error at odd samples)
Original signal
70
The linear spline wavelet

The prediction for the odd samples is based on
the two even samples either side

71
The linear spline wavelet

The U step use current and previous detail
signal sample

72
The linear spline wavelet

Preserves signal average and first-order moment
(signal position)

73
The linear spline wavelet

Can still implement in place

-1/2
P step
-1/2
-1/2
-1/2
U step
1/4
1/4
1/4
1/4
74
Summary of linear spline wavelet decomposition
Computing the inverse is trivial
The even and odd samples are then merged as before
75
Wavelet decomposition applied to a 2D image
76
Wavelet decomposition applied to a 2D image
approx
77
Why is wavelet-based compression effective?

Allows for intra-scale prediction (like many
other compression methods) equivalently the
wavelet transform is a decorrelating transform
just like the DCT as used by JPEG
Allows for inter-scale (coarse-fine scale)
prediction

78
Why is wavelet-based compression effective?
1 level Haar
Original
1 level linear spline
2 level Haar
79
Why is wavelet-based compression effective?

Wavelet coefficient histogram

80
Why is wavelet-based compression effective?

Coefficient entropies

81
Why is wavelet-based compression effective?

Wavelet coefficient dependencies

X
82
Why is wavelet-based compression effective?

Lets define sets S (small) and L (large) wavelet
coefficients
The following two probabilities describe
interscale dependancies

83
Why is wavelet-based compression effective?

Without interscale dependancies

84
Why is wavelet-based compression effective?

Measured dependancies from Lena

0.886 0.529 0.781 0.219
85
Why is wavelet-based compression effective?

Intra-scale dependencies

X1
X
X8
86
Why is wavelet-based compression effective?

Measured dependancies from Lena

0.912 0.623 0.781 0.219
87
Why is wavelet-based compression effective?

Have to use a causal neighbourhood for spatial
prediction

88
Example image compression algorithms

We will look at 3 state of the art algorithms
Set partitioning in hierarchical sets (SPIHT)
Significance linked connected components analysis
(SLCCA)
Embedded block coding with optimal truncation
(EBCOT) which is the basis of JPEG2000

89
The SPIHT algorithm

Coefficients transmitted in partial order

Coeff. number
1 2 3 4 5 6 7 8
9 10 11 12 13 14.
msb
5 4 3 2 1 0
0
lsb
90
The SPIHT algorithm

2 components to the algorithm
Sorting pass
Sorting information is transmitted on the basis
of the most significant bit-plane
Refinement pass
Bits in bit-planes lower than the most
significant bit plane are transmitted

91
The SPIHT algorithm
N msb of (max(abs(wavelet coefficient))) for
(bit-plane-counter)N downto 1 transmit
significance/insignificance wrt bit-plane
counter transmit refinement bits of all
coefficients that are already significant
92
The SPIHT algorithm

Insignificant coefficients (with respect to
current bitplane counter) organised into
zerotrees

93
The SPIHT algorithm

Groups of coefficients made into zerotrees by set
paritioning

94
The SPIHT algorithm

SPIHT produces an embedded bitstream

bitstream
.110010101110010110001101011100010111011011101
101.
95
The SLCCA algorithm
Bit-plane encode significant coefficients
Wavelet transform
Quantise coefficients
Cluster and transmit significance map
96
The SLCCA algorithm

The significance map is grouped into clusters

97
The SLCCA algorithm

Clusters grown out from a seed

Seed
Significant coeff
Insignificant coeff
98
The SLCCA algorithm

Significance link symbol

Significance link
99
Image compression results

Evaluation
Mean squared error
Human visual-based metrics
Subjective evaluation

100
Image compression results

Mean-squared error

Usually expressed as peak-signal-to-noise (in dB)
101
Image compression results
102
Image compression results
103
Image compression results
SPIHT 0.2 bits/pixel
JPEG 0.2 bits/pixel
104
Image compression results
SPIHT
JPEG
105
EBCOT, JPEG2000

JPEG2000, based on embedded block coding and
optimal truncation is the state-of-the-art
compression standard
Wavelet-based
It addresses the key issue of scalability
SPIHT is distortion scalable as we have already
seen
JPEG2000 introduces both resolution and spatial
scalability also
An excellent reference to JPEG2000 and
compression in general is JPEG2000 by D.Taubman
and M. Marcellin

106
EBCOT, JPEG2000

Resolution scalability is the ability to extract
from the bitstream the sub-bands representing any
resolution level

.110010101110010110001101011100010111011011101
101.
bitstream
107
EBCOT, JPEG2000

Spatial scalability is the ability to extract
from the bitstream the sub-bands representing
specific regions in the image
Very useful if we want to selectively decompress
certain regions of massive images

.110010101110010110001101011100010111011011101
101.
bitstream
108
Introduction to EBCOT

JPEG2000 is able to implement this general
scalability by implementing the EBCOT paradigm
In EBCOT, the unit of compression is the
codeblock which is a partition of a wavelet
sub-band
Typically, following the wavelet transform,each
sub-band is partitioned into small blocks
(typically 32x32)

109
Introduction to EBCOT

Codeblocks partitions of wavelet sub-bands

codeblock
110
Introduction to EBCOT

A simple bit stream organisation could comprise
concatenated code block bit streams

Length of next code-block stream
111
Introduction to EBCOT

This simple bit stream structure is resolution
and spatially scalable but not distortion
scalable
Complete scalability is obtained by introducing
quality layers
Each code block bitstream is individually
(optimally) truncated in each quality layer
Loss of parent-child redundancy more than
compensated by ability to individually optimise
separate code block bitstreams

112
Introduction to EBCOT

Each code block bit stream partitioned into a set
of quality layers

113
EBCOT advantages

Multiple scalability
Distortion, spatial and resolution scalability
Efficient compression
This results from independent optimal truncation
of each code block bit stream
Local processing
Independent processing of each code block allows
for efficient parallel implementations as well as
hardware implementations

114
EBCOT advantages

Error resilience
Again this results from independent code block
processing which limits the influence of errors

115
Performance comparison

A performance comparison with other wavelet-based
coders is not straightforward as it would depend
on the target bit rates which the bit streams
were truncated for
With SPIHT, we simply truncate the bit stream
when the target bit rate has been reached
However, we only have distortion scalability with
SPIHT
Even so, we still get favourable PSNR (dB)
results when comparing EBCOT (JPEG200) with SPIHT

116
Performance comparison

We can understand this more fully by looking at
graphs of distortion (D) against rate (R)
(bitstream length)

D
R-D curve for continuously modulated quantisation
step size
Truncation points
R
117
Performance comparison

Truncating the bit stream to some arbitrary rate
will yield sub-optimal performance

D
R
118
Performance comparison
119
Performance comparison

Comparable PSNR (dB) results between EBCOT and
SPIHT even though
Results for EBCOT are for 5 quality layers (5
optimal bit rates)
Intermediate bit rates sub-optimal
We have resolution, spatial, distortion
scalability in EBCOT but only distortion
scalability in SPIHT

Write a Comment

User Comments (0)