N-bit and ScaleOffset filters presentation

About This Presentation

Transcript and Presenter's Notes

Title: N-bit and ScaleOffset filters

1
N-bit and ScaleOffset filters
MuQun Yang National Center for Supercomputing
Applications University of Illinois at
Urbana-Champaign Urbana, IL 61801 contact
ymuqun_at_ncsa.uiuc.edu
2
N-Bit Filter Outline

Definition
An usage example
Limitations

3
N-Bit datatype

A user-defined HDF5 datatype that can use less
bits than predefined datatype.

Illustration of an N-Bit datatype on a
little-endian machine base type
integer offset 4 precision 16
byte 3 byte 2 byte 1 byte 0
????????????SPPPPPPPPPPPPPPP???? S -
sign bit P - significant bit ? - padding
bit
Without using N-bit filter, HDF5 saves the
padding bits of N-bit datatype, no disk space is
saved.
4
N-bit filter

When using N-bit filter for N-bit datatype,
all padding bits will be chopped off during
compression, and will be stored on disk like
SPPPPPPPPPPPPPPPSPPPPPPPPPPPPPPP

5
Enable N-Bit filter

Create a dataset creation property list
Set chunking (and specify chunk dimensions)
Set up use of the N-Bit filter
Create dataset specifying this property list
Close property list

6
N-Bit filter usage example

/Define dataset datatype (N-Bit), and set
precision, offset /
datatype H5Tcopy(H5T_NATIVE_INT)
precision 17
H5Tset_precision(datatype,precision)
offset 4
if(H5Tset_offset(datatype,offset)
/ Set the dataset creation property list for
N-Bit /
chunk_size0 CH_NX
chunk_size1 CH_NY
properties H5Pcreate (H5P_DATASET_CREATE)
H5Pset_chunk (properties, 2, chunk_size)
H5Pset_nbit (properties)
/ Create a new dataset with N-Bit datatype /
dataset H5Dcreate (file, DATASET_NAME,
datatype, dataspace, properties)

7
N-bit filter restrictions

Only compresses N-Bit datatype or field derived
from integer or floating-point
Assumes padding bits of zero
fill value is not treated differently

8
ScaleOffset Filter Outline

Definition
How does ScaleOffset filter work
Why uses ScaleOffset filter
Usage examples
Performance with EOS data
Limitations

9
ScaleOffset filter

ScaleOffset compression performs a scale and/or
offset operation on each data value and truncates
the resulting value to a minimum number of bits
before storing it. The datatype is either integer
or floating-point.
offset in Scale-Offset compression means the
minimum value of a set of data values
If a fill value is defined for the dataset, the
filter will ignore it when finding the minimum
value

10
How ScaleOffset works

An example for Integer Type
Maximum is 7065 Minimum is 2970
The "span" Max-Min1 4076
Case 1 No fill value is defined.
Minimum number of bits per element to store
ceiling(log2(span)) 12
Case2 Fill value is defined in this array.
Minimum number of bits per element to store
ceiling(log2(span1)) 13

11
How ScaleOffset works

An example for Integer Type (cont.)
Compression
1. Subtract each element from minimum value
2. Pack all data with minimum number of bits
Decompression
1. Unpack all data
2. Add each element to minimum value
Outcome
1. Save about 60 disk space for this case

12
How ScaleOffset works

An example for Floating-point Type
D-scaling factor
the number of decimal precision to keep for the
filter output
Floating-point data 104.561, 99.459, 100.545,
105.644
D-scaling factor 2
Preparation for Compression
1. Calculate the minimum value 99.459
2. Subtract the minimum value
Modified data 5.102, 0, 1.086, 6.185
3. Scale the data by multiplying 10 D-scaling
factor 100
Modified data 510.2, 0, 108.6, 618.5
4. Round the data to integer
Modified data 510 , 0, 109, 619

13
How ScaleOffset works

An example for Floating-point Type (cont.)
Compression and Decompression
using ScaleOffset for integer
Restoration after decompression
1. Divide each value by 10 D-scaling factor
2. Add the offset 99.459
3. The floating point data
104.559, 99.459, 100.549, 105.649
Outcome
1. Lossy compression
2. Compression ratio will depend on D-scaling
factor

14
Scale-Offset filter compresses floating-point
data

GRiB data packing method
The Scale-Offset compression of floating-point
data is lossy
Two scaling methods
D-scaling
E-scaling
Currently only D-scaling is implemented

15
Why ScaleOffset Filter?

Internal HDF5 filter
Easy to understand
Integer lossless or lossy
Floating-point GRIB lossy compression
Easy to control floating compression
D-scaling factor
Easy to estimate the compression ratio

16
H5Pset_scaleoffset API
H5Pset_scaleoffset (hid_t plist_id,
H5Z_SO_scale_type_t scale_type, int
scale_factor)

plist_id IN Dataset creation
property list identifier
scale_type IN Flag indicating
compression method
H5Z_SO_FLOAT_DSCALE (0) Floating-point
type
H5Z_SO_INT (2) Integer
type
scale_factor IN Flag indicating
compression method
If scale_type is H5Z_SO_FLOAT_DSCALE,
decimal scale factor
If scale_type is H5Z_SO_INT,
scale_factor denotes minimum-bits, should be a
positive integer or H5Z_SO_INT_MINBITS_DEFAULT

17
Integer example

/ Set the fill value of dataset /
fill_val 10000
H5Pset_fill_value(properties, H5T_NATIVE_INT,
fill_val)
/ Set parameters for Scale-Offset
compression/
H5Pset_scaleoffset (properties, H5Z_SO_INT
H5Z_SO_INT_MINBITS_DEFAULT)
/ Create a new dataset /
dataset H5Dcreate (file, DATASET_NAME,
H5T_NATIVE_INT, dataspace, properties)

18
Floating-point example

fill_val 10000.0
/ Set the fill value of dataset /
H5Pset_fill_value(properties,
H5T_NATIVE_FLOAT,
fill_val)
/ Set parameters for Scale-Offset compression
use D-scaling method,
set decimal scale factor to 3 /
H5Pset_scaleoffset (properties,H5Z_SO_FLOAT_DSC
ALE, 3)
/ Create a new dataset /
dataset H5Dcreate (file, DATASET_NAME,
H5T_NATIVE_FLOAT,
dataspace, properties)

19
(No Transcript)
20
Limitations

Compressed floating-point data range is limited
by the size of corresponding unsigned integer
type.
Long double is not supported.

Write a Comment

User Comments (0)

About PowerShow.com

N-bit and ScaleOffset filters PowerPoint PPT Presentation