Title: Introduction to Wavelet Transform
1Introduction to Wavelet Transform
2Time Series are Ubiquitous!
A random sample of 4,000 graphics from 15 of the
worlds newspapers published from 1974 to 1989
found that more than 75 of all graphics were
time series (Tufte, 1983).
3Why is Working With Time Series so Difficult?
Answer We are dealing with subjective notions of
similarity.
The definition of similarity depends on the
user, the domain and the task at hand. We need to
be able to handle this subjectivity.
4Wavelet Transform - Overview
History
- Fourier (1807)
- Haar (1910)
- Math World
5Wavelet Transform - Overview
- What kind of Could be useful?
- Impulse Function (Haar) Best time resolution
- Sinusoids (Fourier) Best frequency resolution
- We want both of the best resolutions
- Heisenberg (1930)
- Uncertainty Principle
- There is a lower bound for(An intuitive prove
in Mac91)
6Wavelet Transform - Overview
- Gabor (1945)
- Short Time Fourier Transform (STFT)
- Disadvantage Fixed window size
7Wavelet Transform - Overview
- Constructing Wavelets
- Daubechies (1988)
- Compactly Supported Wavelets
- Computation of WT Coefficients
- Mallat (1989)
- A fast algorithm using filter banks
8Discrete Fourier Transform I
Basic Idea Represent the time series as a linear
combination of sines and cosines, but keep only
the first n/2 coefficients. Why n/2
coefficients? Because each sine wave requires 2
numbers, for the phase (w) and amplitude (A,B).
X
X'
0
20
40
60
80
100
120
140
Jean Fourier 1768-1830
0
1
2
3
4
5
6
7
Excellent free Fourier Primer Hagit Shatkay, The
Fourier Transform - a Primer'', Technical Report
CS-95-37, Department of Computer Science, Brown
University, 1995. http//www.ncbi.nlm.nih.gov/CBB
research/Postdocs/Shatkay/
8
9
9Discrete Fourier Transform II
- Pros and Cons of DFT as a time series
representation. - Good ability to compress most natural signals.
- Fast, off the shelf DFT algorithms exist.
O(nlog(n)). - (Weakly) able to support time warped queries.
- Difficult to deal with sequences of different
lengths. - Cannot support weighted distance measures.
X
X'
0
20
40
60
80
100
120
140
0
1
2
3
4
5
6
7
Note The related transform DCT, uses only cosine
basis functions. It does not seem to offer any
particular advantages over DFT.
8
9
10History
11Discrete Wavelet Transform I
Basic Idea Represent the time series as a linear
combination of Wavelet basis functions, but keep
only the first N coefficients. Although there
are many different types of wavelets, researchers
in time series mining/indexing generally use Haar
wavelets. Haar wavelets seem to be as powerful
as the other wavelets for most problems and are
very easy to code.
Alfred Haar 1885-1933
Excellent free Wavelets Primer Stollnitz, E.,
DeRose, T., Salesin, D. (1995). Wavelets for
computer graphics A primer IEEE Computer
Graphics and Applications.
12Wavelet Series
13Discrete Wavelet Transform III
- Pros and Cons of Wavelets as a time series
representation. - Good ability to compress stationary signals.
- Fast linear time algorithms for DWT exist.
- Able to support some interesting non-Euclidean
similarity measures. - Signals must have a length n 2some_integer
- Works best if N is 2some_integer. Otherwise
wavelets approximate the left side of signal at
the expense of the right side. - Cannot support weighted distance measures.
14Singular Value Decomposition I
Basic Idea Represent the time series as a linear
combination of eigenwaves but keep only the first
N coefficients. SVD is similar to Fourier and
Wavelet approaches, we represent the data in
terms of a linear combination of shapes (in this
case eigenwaves). SVD differs in that the
eigenwaves are data dependent. SVD has been
successfully used in the text processing
community (where it is known as Latent Symantec
Indexing ) for many years. Good free SVD Primer
Singular Value Decomposition - A Primer. Sonia
Leach
X
X'
SVD
James Joseph Sylvester 1814-1897
0
20
40
60
80
100
120
140
Camille Jordan (1838--1921)
Eugenio Beltrami 1835-1899
15Singular Value Decomposition II
How do we create the eigenwaves?
We have previously seen that we can regard time
series as points in high dimensional space. We
can rotate the axes such that axis 1 is aligned
with the direction of maximum variance, axis 2 is
aligned with the direction of maximum variance
orthogonal to axis 1 etc. Since the first few
eigenwaves contain most of the variance of the
signal, the rest can be truncated with little
loss.
X
X'
SVD
0
20
40
60
80
100
120
140
This process can be achieved by factoring a M by
n matrix of time series into 3 other matrices,
and truncating the new matrices at size N.
16Singular Value Decomposition III
- Pros and Cons of SVD as a time series
representation. - Optimal linear dimensionality reduction
technique . - The eigenvalues tell us something about the
underlying structure of the data. - Computationally very expensive.
- Time O(Mn2)
- Space O(Mn)
- An insertion into the database requires
recomputing the SVD. - Cannot support weighted distance measures or non
Euclidean measures.
X
X'
SVD
0
20
40
60
80
100
120
140
Note There has been some promising research into
mitigating SVDs time and space complexity.
17Piecewise Linear Approximation I
Basic Idea Represent the time series as a
sequence of straight lines. Lines could be
connected, in which case we are allowed N/2
lines If lines are disconnected, we are
allowed only N/3 lines Personal experience on
dozens of datasets suggest disconnected is
better. Also only disconnected allows a lower
bounding Euclidean approximation
X
Karl Friedrich Gauss 1777 - 1855
X'
0
20
40
60
80
100
120
140
- Each line segment has
- length
- left_height
- (right_height can be inferred by looking at the
next segment)
- Each line segment has
- length
- left_height
- right_height
18Problem with Fourier
Fourier analysis -- breaks down a signal into
constituent sinusoids of different frequencies.
A serious drawback in transforming to the
frequency domain, time information is lost. When
looking at a Fourier transform of a signal, it is
impossible to tell when a particular event took
place.
19Function Representations
- sequence of samples (time domain)
- finite difference method
- pyramid (hierarchical)
- polynomial
- sinusoids of various frequency (frequency domain)
- Fourier series
- piecewise polynomials (finite support)
- finite element method, splines
- wavelet (hierarchical, finite support)
- (time/frequency domain)
20What Are Wavelets?
- In general, a family of representations using
- hierarchical (nested) basis functions
- finite (compact) support
- basis functions often orthogonal
- fast transforms, often linear-time
21Function Representations Desirable Properties
- generality approximate anything well
- discontinuities, nonperiodicity, ...
- adaptable to application
- audio, pictures, flow field, terrain data, ...
- compact approximate function with few
coefficients - facilitates compression, storage, transmission
- fast to compute with
- differential/integral operators are sparse in
this basis - Convert n-sample function to representation in
O(nlogn) or O(n) time
22Wavelet History, Part 1
- 1805 Fourier analysis developed
- 1965 Fast Fourier Transform (FFT) algorithm
-
- 1980s beginnings of wavelets in physics, vision,
speech processing (ad hoc) - little theory why/when do wavelets work?
- 1986 Mallat unified the above work
- 1985 Morlet Grossman continuous wavelet
transform asking how can you get perfect
reconstruction without redundancy?
23Wavelet History, Part 2
- 1985 Meyer tried to prove that no orthogonal
wavelet other than Haar exists, found one by
trial and error! - 1987 Mallat developed multiresolution theory,
DWT, wavelet construction techniques (but still
noncompact) - 1988 Daubechies added theory found compact,
orthogonal wavelets with arbitrary number of
vanishing moments! - 1990s wavelets took off, attracting both
theoreticians and engineers
24Time-Frequency Analysis
- For many applications, you want to analyze a
function in both time and frequency - Analogous to a musical score
- Fourier transforms give you frequency
information, smearing time. - Samples of a function give you temporal
information, smearing frequency. - Note substitute space for time for pictures.
25Comparison to Fourier Analysis
- Fourier analysis
- Basis is global
- Sinusoids with frequencies in arithmetic
progression - Short-time Fourier Transform ( Gabor filters)
- Basis is local
- Sinusoid times Gaussian
- Fixed-width Gaussian window
- Wavelet
- Basis is local
- Frequencies in geometric progression
- Basis has constant shape independent of scale
26Wavelets are faster than ffts!
27 The results of the CWT are many wavelet
coefficients, which are a function of scale and
position
28Gabors Proposal Short Time Fourier Transform
Requirements
Signal in time domain require short time window
to depict features of signal.
Signal in frequency domain require short
frequency window (long time window) to depict
features of signal.
29What are wavelets?
Wavelets are functions defined over a finite
interval and having an average value of zero.
30What is wavelet transform?
The wavelet transform is a tool for carving up
functions, operators, or data into components of
different frequency, allowing one to study each
component separately.
The basic idea of the wavelet transform is to
represent any arbitrary function Æ’(t) as a
superposition of a set of such wavelets or basis
functions.
These basis functions or baby wavelets are
obtained from a single prototype wavelet called
the mother wavelet, by dilations or contractions
(scaling) and translations (shifts).
31The continuous wavelet transform (CWT)
Fourier Transform
FT is the sum over all the time of signal f(t)
multiplied by a complex exponential.
32The variables s and t are the new dimensions,
scale and translation (position), after the
wavelet transform.
33s is the scale factor, t is the translation
factor and the factor s-1/2 is for energy
normalization across the different scales.
It is important to note that in the above
transforms the wavelet basis functions are not
specified.
This is a difference between the wavelet
transform and the Fourier transform, or other
transforms.
34Scale
Scaling a wavelet simply means stretching (or
compressing) it.
35Scale and Frequency
Low scale a
Compressed wavelet
Rapidly changing details
High scale a
stretched wavelet
slowly changing details
Translation (shift)
Translating a wavelet simply means delaying (or
hastening) its onset.
36(No Transcript)
37Discrete Wavelets
Discrete wavelet is written as
j and k are integers and s0 gt 1 is a fixed
dilation step. The translation factor t0 depends
on the dilation step. The effect of discretizing
the wavelet is that the time-scale space is now
sampled at discrete intervals. We usually choose
s0 2
38A band-pass filter
The wavelet has a band-pass like spectrum
From Fourier theory we know that compression in
time is equivalent to stretching the spectrum and
shifting it upwards
Suppose a2
This means that a time compression of the wavelet
by a factor of 2 will stretch the frequency
spectrum of the wavelet by a factor of 2 and also
shift all frequency components up by a factor of
2.
39Subband coding
If we regard the wavelet transform as a filter
bank, then we can consider wavelet transforming a
signal as passing the signal through this filter
bank.
The outputs of the different filter stages are
the wavelet- and scaling function transform
coefficients.
In general we will refer to this kind of
analysis as a multiresolution.
That is called subband coding.
40 Splitting the signal spectrum with an iterated
filter bank.
Summarizing, if we implement the wavelet
transform as an iterated filter bank, we do not
have to specify the wavelets explicitly! This is
a remarkable result.
41The Discrete Wavelet Transform
Calculating wavelet coefficients at every
possible scale is a fair amount of work, and it
generates an awful lot of data. What if we choose
only a subset of scales and positions at which to
make our calculations?
It turns out, rather remarkably, that if we
choose scales and positions based on powers of
two -- so-called dyadic scales and positions --
then our analysis will be much more efficient and
just as accurate. We obtain just such an analysis
from the discrete wavelet transform (DWT).
42Approximations and Details
The approximations are the high-scale,
low-frequency components of the signal. The
details are the low-scale, high-frequency
components. The filtering process, at its most
basic level, looks like this
The original signal, S, passes through two
complementary filters and emerges as two signals .
43Downsampling
Unfortunately, if we actually perform this
operation on a real digital signal, we wind up
with twice as much data as we started with.
Suppose, for instance, that the original signal S
consists of 1000 samples of data. Then
the approximation and the detail will each have
1000 samples, for a total of 2000.
To correct this problem, we introduce the
notion of downsampling. This simply means
throwing away every second data point.
44An example
45Reconstructing Approximation and Details
Upsampling
46Wavelet Decomposition
Multiple-Level Decomposition
The decomposition process can be iterated, with
successive approximations being decomposed in
turn, so that one signal is broken down into many
lower-resolution components. This is called the
wavelet decomposition tree.
47 The signal f(t) can be expresses as
DWT
48(No Transcript)
49(No Transcript)
50Wavelet Reconstruction (Synthesis)
Perfect reconstruction
51(4,0)
y1
x1
(1,0)
52 2-D Discrete Wavelet Transform
A 2-D DWT can be done as follows
Step 1 Replace each row with its 1-D DWT
Step 2 Replace each column with its 1-D DWT
Step 3 repeat steps (1) and (2) on the lowest
subband for the next scale
Step 4 repeat steps (3) until as many scales as
desired have been completed
53Image at different scales
54Correlation between features at different scales
55Wavelet construction a simplified approach
- Traditional approaches to wavelets have used a
filterbank interpretation - Fourier techniques required to get synthesis
(reconstruction) filters from analysis filters - Not easy to generalize
56Wavelet construction lifting
- 3 steps
- Split
- Predict (P step)
- Update (U step)
57Example the Haar wavelet
- S step
- Splits the signal into odd and even samples
even samples
odd samples
58Example the Haar wavelet
- P step
- Predict the odd samples from the even samples
For the Haar wavelet, the prediction for the odd
sample is the previous even sample
59Example the Haar wavelet
Detail signal
l
60Example the Haar wavelet
- U step
- Update the even samples to produce the next
coarser scale approximation
The signal average is maintained
61Summary of the Haar wavelet decomposition
Can be computed in place
..
..
-1
-1
P step
1/2
U step
1/2
62Inverse Haar wavelet transform
- Simply run the forward Haar wavelet transform
backwards! -
Then merge even and odd samples
Merge
63General lifting stage of wavelet decomposition
U
P
Split
-
64Multi-level wavelet decomposition
- We can produce a multi-level decomposition by
cascading lifting stages -
lift
lift
lift
65General lifting stage of inverse wavelet
synthesis
-
P
Merge
U
66Multi-level inverse wavelet synthesis
- We can produce a multi-level inverse wavelet
synthesis by cascading lifting stages -
lift
...
lift
lift
67Advantages of the lifting implementation
- Inverse transform
- Inverse transform is trivial just run the code
backwards - No need for Fourier techniques
- Generality
- The design of the transform is performed without
reference to particular forms for the predict and
update operators - Can even include non-linearities (for integer
wavelets) -
68Example 2 the linear spline wavelet
- A more sophisticated wavelet uses slightly more
complex P and U operators - Uses linear prediction to determine odd samples
from even samples -
69The linear spline wavelet
Linear prediction at odd samples
Detail signal (prediction error at odd samples)
Original signal
70The linear spline wavelet
- The prediction for the odd samples is based on
the two even samples either side -
71The linear spline wavelet
- The U step use current and previous detail
signal sample
72The linear spline wavelet
- Preserves signal average and first-order moment
(signal position)
73The linear spline wavelet
- Can still implement in place
-1/2
P step
-1/2
-1/2
-1/2
U step
1/4
1/4
1/4
1/4
74Summary of linear spline wavelet decomposition
Computing the inverse is trivial
The even and odd samples are then merged as before
75Wavelet decomposition applied to a 2D image
76Wavelet decomposition applied to a 2D image
approx
77Why is wavelet-based compression effective?
- Allows for intra-scale prediction (like many
other compression methods) equivalently the
wavelet transform is a decorrelating transform
just like the DCT as used by JPEG - Allows for inter-scale (coarse-fine scale)
prediction
78Why is wavelet-based compression effective?
1 level Haar
Original
1 level linear spline
2 level Haar
79Why is wavelet-based compression effective?
- Wavelet coefficient histogram
80Why is wavelet-based compression effective?
81Why is wavelet-based compression effective?
- Wavelet coefficient dependencies
X
82Why is wavelet-based compression effective?
- Lets define sets S (small) and L (large) wavelet
coefficients - The following two probabilities describe
interscale dependancies
83Why is wavelet-based compression effective?
- Without interscale dependancies
84Why is wavelet-based compression effective?
- Measured dependancies from Lena
0.886 0.529 0.781 0.219
85Why is wavelet-based compression effective?
X1
X
X8
86Why is wavelet-based compression effective?
- Measured dependancies from Lena
0.912 0.623 0.781 0.219
87Why is wavelet-based compression effective?
- Have to use a causal neighbourhood for spatial
prediction
88Example image compression algorithms
- We will look at 3 state of the art algorithms
- Set partitioning in hierarchical sets (SPIHT)
- Significance linked connected components analysis
(SLCCA) - Embedded block coding with optimal truncation
(EBCOT) which is the basis of JPEG2000
89The SPIHT algorithm
- Coefficients transmitted in partial order
Coeff. number
1 2 3 4 5 6 7 8
9 10 11 12 13 14.
msb
5 4 3 2 1 0
0
lsb
90The SPIHT algorithm
- 2 components to the algorithmÂ
- Sorting pass
- Sorting information is transmitted on the basis
of the most significant bit-plane - Refinement pass
- Bits in bit-planes lower than the most
significant bit plane are transmitted -
91The SPIHT algorithm
N msb of (max(abs(wavelet coefficient))) for
(bit-plane-counter)N downto 1 transmit
significance/insignificance wrt bit-plane
counter transmit refinement bits of all
coefficients that are already significant
92The SPIHT algorithm
- Insignificant coefficients (with respect to
current bitplane counter) organised into
zerotrees
93The SPIHT algorithm
- Groups of coefficients made into zerotrees by set
paritioning
94The SPIHT algorithm
- SPIHT produces an embedded bitstream
bitstream
.110010101110010110001101011100010111011011101
101.
95The SLCCA algorithm
Bit-plane encode significant coefficients
Wavelet transform
Quantise coefficients
Cluster and transmit significance map
96The SLCCA algorithm
- The significance map is grouped into clusters
97The SLCCA algorithm
- Clusters grown out from a seed
Seed
Significant coeff
Insignificant coeff
98The SLCCA algorithm
Significance link
99Image compression results
- EvaluationÂ
- Mean squared error
- Human visual-based metrics
- Subjective evaluation
100Image compression results
Usually expressed as peak-signal-to-noise (in dB)
101Image compression results
102Image compression results
103Image compression results
SPIHT 0.2 bits/pixel
JPEG 0.2 bits/pixel
104Image compression results
SPIHT
JPEG
105EBCOT, JPEG2000
- JPEG2000, based on embedded block coding and
optimal truncation is the state-of-the-art
compression standard - Wavelet-based
- It addresses the key issue of scalability
- SPIHT is distortion scalable as we have already
seen - JPEG2000 introduces both resolution and spatial
scalability also - An excellent reference to JPEG2000 and
compression in general is JPEG2000 by D.Taubman
and M. Marcellin
106EBCOT, JPEG2000
- Resolution scalability is the ability to extract
from the bitstream the sub-bands representing any
resolution level
.110010101110010110001101011100010111011011101
101.
bitstream
107EBCOT, JPEG2000
- Spatial scalability is the ability to extract
from the bitstream the sub-bands representing
specific regions in the image - Very useful if we want to selectively decompress
certain regions of massive images
.110010101110010110001101011100010111011011101
101.
bitstream
108Introduction to EBCOT
- JPEG2000 is able to implement this general
scalability by implementing the EBCOT paradigm - In EBCOT, the unit of compression is the
codeblock which is a partition of a wavelet
sub-band - Typically, following the wavelet transform,each
sub-band is partitioned into small blocks
(typically 32x32)
109Introduction to EBCOT
- Codeblocks partitions of wavelet sub-bands
codeblock
110Introduction to EBCOT
- A simple bit stream organisation could comprise
concatenated code block bit streams
Length of next code-block stream
111Introduction to EBCOT
- This simple bit stream structure is resolution
and spatially scalable but not distortion
scalable - Complete scalability is obtained by introducing
quality layers - Each code block bitstream is individually
(optimally) truncated in each quality layer - Loss of parent-child redundancy more than
compensated by ability to individually optimise
separate code block bitstreams
112Introduction to EBCOT
- Each code block bit stream partitioned into a set
of quality layers
113EBCOT advantages
- Multiple scalability
- Distortion, spatial and resolution scalability
- Efficient compression
- This results from independent optimal truncation
of each code block bit stream - Local processing
- Independent processing of each code block allows
for efficient parallel implementations as well as
hardware implementations
114EBCOT advantages
- Error resilience
- Again this results from independent code block
processing which limits the influence of errors
115Performance comparison
- A performance comparison with other wavelet-based
coders is not straightforward as it would depend
on the target bit rates which the bit streams
were truncated for - With SPIHT, we simply truncate the bit stream
when the target bit rate has been reached - However, we only have distortion scalability with
SPIHT - Even so, we still get favourable PSNR (dB)
results when comparing EBCOT (JPEG200) with SPIHT
116Performance comparison
- We can understand this more fully by looking at
graphs of distortion (D) against rate (R)
(bitstream length)
D
R-D curve for continuously modulated quantisation
step size
Truncation points
R
117Performance comparison
- Truncating the bit stream to some arbitrary rate
will yield sub-optimal performance
D
R
118Performance comparison
119Performance comparison
- Comparable PSNR (dB) results between EBCOT and
SPIHT even though - Results for EBCOT are for 5 quality layers (5
optimal bit rates) - Intermediate bit rates sub-optimal
- We have resolution, spatial, distortion
scalability in EBCOT but only distortion
scalability in SPIHT