scientific data compression through wavelet transformation - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

scientific data compression through wavelet transformation

Description:

scientific data compression through wavelet transformation chris fleizach cse262 Problem Statement Scientific data is hard to compress with traditional run-length ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 37
Provided by: fightlite
Category:

less

Transcript and Presenter's Notes

Title: scientific data compression through wavelet transformation


1
scientific data compression through wavelet
transformation
  • chris fleizach
  • cse262

2
Problem Statement
  • Scientific data is hard to compress with
    traditional run-length encoders, like gzip or
    bzip because common patterns do not exist
  • If the data can be transformed into a form that
    retains the information but can be thresholded,
    it can be compressed
  • Thresholding removes excessively small features
    and replaces with a zero (much easier to compress
    than a 64b double)
  • Wavelet transforms are well suited for this
    purpose and have been used in image compression
    (jpeg 2000)

3
Why Wavelets?
  • Wavelet transforms encode more information than
    other techniques like Fourier transforms.
  • Time and frequency information is saved
  • In practical terms, the transformation is applied
    to many scales and sizes within the signal
  • This results in vectors that encode approximation
    and detail information.
  • By separating the signals, it is easier to
    threshold and remove information. Thus the data
    can be compressed

4
Wavelets in Compression
  • The jpeg2000 standard gave up the discrete cosine
    transform in favor of a wavelet transform
  • The FBI uses wavelets to compress fingerprint
    scans by 15 20 times the original size

5
Choosing the Right WaveletThe Transform
  • Continuous wavelet transform the sum over the
    time of the signal convolved by the scaled and
    shifted versions of the wavelet
  • Unfortunately, its slow and generates way too
    much data. Its also hard to implement

From mathworks.com
6
Choosing the Right WaveletThe Transform
  • The discrete transform - if the scales and
    positions are chosen based on powers of two, then
    the transform will be much more efficient and
    just as accurate.
  • Then the signal is sent through only two
    subband coders (which get the approximation and
    the detail data from the signal).

Signal decomposed by low pass and high
pass filters to get approx and detail info.
From mathworks.com
7
Choosing the Right WaveletThe Decomposition
  • The signal can be recursively decomposed to get
    finer detail and more general approximation.
  • This is called multi-level decomposition.
  • A signal can be decomposed as many times as it
    can be divided in half.
  • Thus, we only have one approximation signal at
    the end of the process

From mathworks.com
8
Choosing the Right WaveletThe Wavelet
Approx/ Low pass/ Scaling
  • The low and high pass filters (subband coders)
    are in reality the wavelet that is used. There
    have been a wide variety of wavelets created over
    time
  • The low pass is called the scaling function
  • The high pass is the wavelet function
  • Different wavelets give better results depending
    on the type of data

Detail/ High pass/ Wavelet
From mathworks.com
9
Choosing the Right WaveletThe Wavelet
  • The wavelets that turned out to give the best
    results were the Biorthogonal wavelets
  • These were discovered by Daubechies and make use
    of the fact that exact reconstruction is
    impossible if you use the same wavelet.
  • Instead a reconstruction wavelet and a
    decomposition wavelet are used that are slightly
    different

These are the coefficients of the filters used
for convolution
Actual wavelet and scaling functions
From mathworks.com
10
Testing Methodology
  • In order to find what was the best combination of
    wavelet, decomposition, and thresholding, an
    exhaustive search was done with Matlab
  • A 1000x1000 grid of vorticity data from the
    navier stokes simulator was first compressed with
    gzip. This was the baseline file to compare
    against
  • Then each available wavelet in Matlab was tested
    with 1, 3 and 5 level decomposition, in
    combination with thresholding by removing values
    smaller than 1x10-4 to 1x10-7.
  • The resulting data was saved and compressed with
    gzip and compared against the baseline.
  • Then the data was reconstructed and the max and
    average error was taken.

11
Testing Methodology
Sample Results
For the application, I chose three of the
methodologies that represented High
compression/high error, medium/medium error and
low compression/low error.
12
Matlab functions
  • Four matlab functions were made for compression
    and decompression
  • wavecompress (1D) and wavecompress2 (2D)
  • wavedecompress (1D) and wavedecompress2 (2D)

wavecompress2 - Lossy compression for 2D data
savings WAVECOMPRESS2(x,mode,outputfile)
compresses the 2-D data in x using the mode
specified and saves to outputfile. The return
value is compression ratio achieved The
valid values for mode are 1 high
compression, high error (uses bior3.1
filter and 1xE-4 limit) 2 medium
compression, medium error (uses
bior3.1 filter and 1xE-5 limit) 3 low
compression, low error (uses bior5.5
filter and 1xE-7 limit) To decompress the
data, see wavedecompress2
13
Some Pictures
14
C Implementation
  • With the easy work out of the way, the next phase
    of the project was a C implementation. There
    were a few reasons for reinventing the wheel
  • I wanted to fully understand the process
  • I could try my hand at some parallel processing
  • I could have a native 3-D transformation
  • And Matlab makes my computer very slow

15
Demo
  • ./wavecomp c 1 d 2 vorticity000.dat
  • ./wavedec vorticity000.dat.wcm

16
Algorithm
  • The basic algorithm for 1-D multi-level
    decomposition
  • Convolve the input with the low pass filter to
    get the approx. vector.
  • Convolve the input with the high pass to get the
    detail vector
  • Set the input approx, and repeat for number of
    times to get desired decomposition level

17
Convolution
  • The convolution step is tricky though because the
    filters use data from before and after a specific
    point, which makes edges hard to handle
  • For signals that arent sized appropriately, the
    data must be extended. The most common way is
    periodically, symmetrically or with zero-padding.
  • The convolution algorithm
  • for (k 0 k lt signal size k 2)
  • int sum 0
  • for (j 0 j lt filter size j)
  • sum filterjinputk-j
  • outputk sum

18
Implementation
  • The convolution caused the most problems as many
    available libraries didnt seem to do it
    correctly or assumed the data was periodic or
    symmetric.
  • I finally appropriated some code from the
    PyWavelets project that handled zero padding
    extension, determining the appropriate output
    sizes, and performing the correct convolution
    along the edges

19
2-D transformation
  • The 2-D transformation proved more challenging in
    terms of how to store the data and how decompose
    it.
  • Convolve each row with the low pass filter to get
    the approx. vector, then downsample rows
  • Convolve each row with the high pass to get the
    detail vector, then downsample columns
  • Convolve each remaining low pass column with low
    pass
  • Convolve each remaining low pass col with high
    pass
  • Convolve each high pass column with low pass
  • Convolve each high pass column with high pass
  • Downsample each result
  • Store the 3 detail vectors and set input low
    pass/low pass. Repeat desired number of levels.

20
Post Transformation
  • After the data was transformed and stored in an
    appropriate data structure, an in memory gzip
    compression (which was oddly better than bzip2)
    was done on the data and it was outputted in
    binary format
  • Reconstruction is another program that does
    everything in reverse except uses the
    reconstruction filters.
  • There was trouble in storing and reading the data
    back in an appropriate form based on the
    decomposition structure.

DD L1
DA L1
AD L1
DD L2
DA L2
AD L2
DDL3
DAL3
ADL3
AA3
Storing data
21
2D Results
Results for 2-D data vorticity data sets using
high compression (uses bior3.1 wavelet. 1x10-4
threshold)
2500x2500 went from size 50,000,016B -gt
169,344B 1024x1024 went from 8,388,624B -gt
201,601B !!!
22
2D Results
Compression Ratio for different 2-D data sizes
23
2D Results
Results for 2-D data vorticity data sets using
low compression (uses bior5.5 wavelet (more
coefficients than bior3.1). 1x10-7 threshold)
64x64 actually increases in size because the
decomposition creates matrices whose sum is
larger than the original and the threshold level
is too low.
24
Comparison
  • Compared to the adaptive subsampling presented in
    the thesis of Tallat..

25
2D Pictures
128x128 vorticity ORIGINAL
128x128 vorticity RECONSTRUCTED
Using High compression
26
2D Pictures
Difference between 128x128 vorticity original and
reconstructed
27
2D Pictures
Plot of 1024x1024 max difference for each row
between orig. and reconstructed
28
3-D Data
From http//taco.poly.edu/WaveletSoftware/standar
d3D.html
  • Similar to 2-D except more
  • Complicated.
  • My implementation
  • Take Z axis and downsample
  • Get A and D
  • Take Y axis and downsample
  • Get AA, AD, DA, DD
  • Take X axis and downsample
  • Get AAA, AAD, ADA, ADD,
  • DAA, DAD, DDA, DDD
  • Take AAA and set as input,
  • Repeat for desired level of steps
  • Real trouble in trying to store the data in way
    that can be reconstructed later

y
Store next decomposition here
ADD
ADA
x
AAD
AAA
z
DDA
DDD
DAD
DAA
29
3-D Data
  • Its also problematic getting 3-D data.
  • Took vorticity frames and concatenated.
  • Tested with 64x64 vorticity with 64 frames (so
    not really 3D data).

30
3-D Visual Comparison
Original 64x64x64 vorticity data
Reconstructed 64x64x64 vorticity data
31
Parallel Processing
  • The detail and approximation data can be
    calculated independently.
  • XML-RPC was used to send an input vector to one
    node to find the detail data and another to find
    the approximation data.
  • The master node coordinates the sending and
    receiving.
  • This led to an enormous slowdown in performance
    as expected.
  • XML-RPC adds a huge overhead
  • Data is sent one row at a time instead of sending
    the entire level decomposition. This creates
    excessive communication
  • In the 2-D decomposition, when convolving the
    columns, four operations can be done in parallel,
    but instead are only done two at a time
  • Performance was not the main goal here, rather it
    was a proof of concept.

32
Demo
  • Master Node
  • ./wavecomp c 1 d 2 -p -m -s 132.239.55.175
    132.239.55.174 vorticity001.dat
  • Slave Node
  • ./wavecomp p -s

33
Parallel Processing Results
Results for 2-D data vorticity data sets using
high compression Parallel Processing on two
nodes
34
Known Issues
  • This method is not applicable for all kinds of
    data.
  • If enough values are not thresholded (because the
    limit is too low or the wavelet wasnt
    appropriate), then the size can actually increase
    (because decomposition usually creates detail and
    approx vectors larger in sum than the original)
  • My implementation does excessive data copying,
    which could be eliminated, speeding up the time
    for processing. It comes down to the question of
    whether the transformations should be done
    in-place (which is tricky because sizes can
    change)

35
Conclusion
  • Lossy compression is applicable for many kinds of
    data, but the user should have a basic
    understanding of the thresholding required
  • Wavelets are a good choice for doing such
    compression, as evidenced by other applications
    and these results.
  • The finer the resolution of the data, the better
    the compression

36
References
  • The following helped significantly.
  • Matlab Wavelet Toolbox http//www.mathworks.com/a
    ccess/helpdesk/help/toolbox/wavelet/wavelet.html
  • Robi Polikar Wavelet Tutorial -
    http//users.rowan.edu/polikar/WAVELETS/WTtutoria
    l.html
  • PyWavelets - http//www.pybytes.com/pywavelets/
  • Geoff Davis Wavelet Construction Kit -
    http//www.geoffdavis.net/dartmouth/wavelet/wavele
    t.html
  • Wickerhauser Adapted Wavelet Analysis from
    Theory to Software
Write a Comment
User Comments (0)
About PowerShow.com