scientific data compression through wavelet transformation - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

scientific data compression through wavelet transformation

Description:

scientific data compression through wavelet transformation chris fleizach cse262 Problem Statement Scientific data is hard to compress with traditional run-length ... – PowerPoint PPT presentation

Number of Views:193

Avg rating:3.0/5.0

Slides: 37

Provided by: fightlite

Category:

more less

Transcript and Presenter's Notes

Title: scientific data compression through wavelet transformation

1
scientific data compression through wavelet
transformation

chris fleizach
cse262

2
Problem Statement

Scientific data is hard to compress with
traditional run-length encoders, like gzip or
bzip because common patterns do not exist
If the data can be transformed into a form that
retains the information but can be thresholded,
it can be compressed
Thresholding removes excessively small features
and replaces with a zero (much easier to compress
than a 64b double)
Wavelet transforms are well suited for this
purpose and have been used in image compression
(jpeg 2000)

3
Why Wavelets?

Wavelet transforms encode more information than
other techniques like Fourier transforms.
Time and frequency information is saved
In practical terms, the transformation is applied
to many scales and sizes within the signal
This results in vectors that encode approximation
and detail information.
By separating the signals, it is easier to
threshold and remove information. Thus the data
can be compressed

4
Wavelets in Compression

The jpeg2000 standard gave up the discrete cosine
transform in favor of a wavelet transform
The FBI uses wavelets to compress fingerprint
scans by 15 20 times the original size

5
Choosing the Right WaveletThe Transform

Continuous wavelet transform the sum over the
time of the signal convolved by the scaled and
shifted versions of the wavelet
Unfortunately, its slow and generates way too
much data. Its also hard to implement

From mathworks.com
6
Choosing the Right WaveletThe Transform

The discrete transform - if the scales and
positions are chosen based on powers of two, then
the transform will be much more efficient and
just as accurate.
Then the signal is sent through only two
subband coders (which get the approximation and
the detail data from the signal).

Signal decomposed by low pass and high
pass filters to get approx and detail info.
From mathworks.com
7
Choosing the Right WaveletThe Decomposition

The signal can be recursively decomposed to get
finer detail and more general approximation.
This is called multi-level decomposition.
A signal can be decomposed as many times as it
can be divided in half.
Thus, we only have one approximation signal at
the end of the process

From mathworks.com
8
Choosing the Right WaveletThe Wavelet
Approx/ Low pass/ Scaling

The low and high pass filters (subband coders)
are in reality the wavelet that is used. There
have been a wide variety of wavelets created over
time
The low pass is called the scaling function
The high pass is the wavelet function
Different wavelets give better results depending
on the type of data

Detail/ High pass/ Wavelet
From mathworks.com
9
Choosing the Right WaveletThe Wavelet

The wavelets that turned out to give the best
results were the Biorthogonal wavelets
These were discovered by Daubechies and make use
of the fact that exact reconstruction is
impossible if you use the same wavelet.
Instead a reconstruction wavelet and a
decomposition wavelet are used that are slightly
different

These are the coefficients of the filters used
for convolution
Actual wavelet and scaling functions
From mathworks.com
10
Testing Methodology

In order to find what was the best combination of
wavelet, decomposition, and thresholding, an
exhaustive search was done with Matlab
A 1000x1000 grid of vorticity data from the
navier stokes simulator was first compressed with
gzip. This was the baseline file to compare
against
Then each available wavelet in Matlab was tested
with 1, 3 and 5 level decomposition, in
combination with thresholding by removing values
smaller than 1x10-4 to 1x10-7.
The resulting data was saved and compressed with
gzip and compared against the baseline.
Then the data was reconstructed and the max and
average error was taken.

11
Testing Methodology
Sample Results
For the application, I chose three of the
methodologies that represented High
compression/high error, medium/medium error and
low compression/low error.
12
Matlab functions

Four matlab functions were made for compression
and decompression
wavecompress (1D) and wavecompress2 (2D)
wavedecompress (1D) and wavedecompress2 (2D)

wavecompress2 - Lossy compression for 2D data
savings WAVECOMPRESS2(x,mode,outputfile)
compresses the 2-D data in x using the mode
specified and saves to outputfile. The return
value is compression ratio achieved The
valid values for mode are 1 high
compression, high error (uses bior3.1
filter and 1xE-4 limit) 2 medium
compression, medium error (uses
bior3.1 filter and 1xE-5 limit) 3 low
compression, low error (uses bior5.5
filter and 1xE-7 limit) To decompress the
data, see wavedecompress2
13
Some Pictures
14
C Implementation

With the easy work out of the way, the next phase
of the project was a C implementation. There
were a few reasons for reinventing the wheel
I wanted to fully understand the process
I could try my hand at some parallel processing
I could have a native 3-D transformation
And Matlab makes my computer very slow

15
Demo

./wavecomp c 1 d 2 vorticity000.dat
./wavedec vorticity000.dat.wcm

16
Algorithm

The basic algorithm for 1-D multi-level
decomposition
Convolve the input with the low pass filter to
get the approx. vector.
Convolve the input with the high pass to get the
detail vector
Set the input approx, and repeat for number of
times to get desired decomposition level

17
Convolution

The convolution step is tricky though because the
filters use data from before and after a specific
point, which makes edges hard to handle
For signals that arent sized appropriately, the
data must be extended. The most common way is
periodically, symmetrically or with zero-padding.
The convolution algorithm
for (k 0 k lt signal size k 2)
int sum 0
for (j 0 j lt filter size j)
sum filterjinputk-j
outputk sum

18
Implementation

The convolution caused the most problems as many
available libraries didnt seem to do it
correctly or assumed the data was periodic or
symmetric.
I finally appropriated some code from the
PyWavelets project that handled zero padding
extension, determining the appropriate output
sizes, and performing the correct convolution
along the edges

19
2-D transformation

The 2-D transformation proved more challenging in
terms of how to store the data and how decompose
it.
Convolve each row with the low pass filter to get
the approx. vector, then downsample rows
Convolve each row with the high pass to get the
detail vector, then downsample columns
Convolve each remaining low pass column with low
pass
Convolve each remaining low pass col with high
pass
Convolve each high pass column with low pass
Convolve each high pass column with high pass
Downsample each result
Store the 3 detail vectors and set input low
pass/low pass. Repeat desired number of levels.

20
Post Transformation

After the data was transformed and stored in an
appropriate data structure, an in memory gzip
compression (which was oddly better than bzip2)
was done on the data and it was outputted in
binary format
Reconstruction is another program that does
everything in reverse except uses the
reconstruction filters.
There was trouble in storing and reading the data
back in an appropriate form based on the
decomposition structure.

DD L1
DA L1
AD L1
DD L2
DA L2
AD L2
DDL3
DAL3
ADL3
AA3
Storing data
21
2D Results
Results for 2-D data vorticity data sets using
high compression (uses bior3.1 wavelet. 1x10-4
threshold)
2500x2500 went from size 50,000,016B -gt
169,344B 1024x1024 went from 8,388,624B -gt
201,601B !!!
22
2D Results
Compression Ratio for different 2-D data sizes
23
2D Results
Results for 2-D data vorticity data sets using
low compression (uses bior5.5 wavelet (more
coefficients than bior3.1). 1x10-7 threshold)
64x64 actually increases in size because the
decomposition creates matrices whose sum is
larger than the original and the threshold level
is too low.
24
Comparison

Compared to the adaptive subsampling presented in
the thesis of Tallat..

25
2D Pictures
128x128 vorticity ORIGINAL
128x128 vorticity RECONSTRUCTED
Using High compression
26
2D Pictures
Difference between 128x128 vorticity original and
reconstructed
27
2D Pictures
Plot of 1024x1024 max difference for each row
between orig. and reconstructed
28
3-D Data
From http//taco.poly.edu/WaveletSoftware/standar
d3D.html

Similar to 2-D except more
Complicated.
My implementation
Take Z axis and downsample
Get A and D
Take Y axis and downsample
Get AA, AD, DA, DD
Take X axis and downsample
Get AAA, AAD, ADA, ADD,
DAA, DAD, DDA, DDD
Take AAA and set as input,
Repeat for desired level of steps
Real trouble in trying to store the data in way
that can be reconstructed later

y
Store next decomposition here
ADD
ADA
x
AAD
AAA
z
DDA
DDD
DAD
DAA
29
3-D Data

Its also problematic getting 3-D data.
Took vorticity frames and concatenated.
Tested with 64x64 vorticity with 64 frames (so
not really 3D data).

30
3-D Visual Comparison
Original 64x64x64 vorticity data
Reconstructed 64x64x64 vorticity data
31
Parallel Processing

The detail and approximation data can be
calculated independently.
XML-RPC was used to send an input vector to one
node to find the detail data and another to find
the approximation data.
The master node coordinates the sending and
receiving.
This led to an enormous slowdown in performance
as expected.
XML-RPC adds a huge overhead
Data is sent one row at a time instead of sending
the entire level decomposition. This creates
excessive communication
In the 2-D decomposition, when convolving the
columns, four operations can be done in parallel,
but instead are only done two at a time
Performance was not the main goal here, rather it
was a proof of concept.

32
Demo

Master Node
./wavecomp c 1 d 2 -p -m -s 132.239.55.175
132.239.55.174 vorticity001.dat
Slave Node
./wavecomp p -s

33
Parallel Processing Results
Results for 2-D data vorticity data sets using
high compression Parallel Processing on two
nodes
34
Known Issues

This method is not applicable for all kinds of
data.
If enough values are not thresholded (because the
limit is too low or the wavelet wasnt
appropriate), then the size can actually increase
(because decomposition usually creates detail and
approx vectors larger in sum than the original)
My implementation does excessive data copying,
which could be eliminated, speeding up the time
for processing. It comes down to the question of
whether the transformations should be done
in-place (which is tricky because sizes can
change)

35
Conclusion

Lossy compression is applicable for many kinds of
data, but the user should have a basic
understanding of the thresholding required
Wavelets are a good choice for doing such
compression, as evidenced by other applications
and these results.
The finer the resolution of the data, the better
the compression

36
References

The following helped significantly.
Matlab Wavelet Toolbox http//www.mathworks.com/a
ccess/helpdesk/help/toolbox/wavelet/wavelet.html
Robi Polikar Wavelet Tutorial -
http//users.rowan.edu/polikar/WAVELETS/WTtutoria
l.html
PyWavelets - http//www.pybytes.com/pywavelets/
Geoff Davis Wavelet Construction Kit -
http//www.geoffdavis.net/dartmouth/wavelet/wavele
t.html
Wickerhauser Adapted Wavelet Analysis from
Theory to Software