Title: N-bit and ScaleOffset filters
1N-bit and ScaleOffset filters
MuQun Yang National Center for Supercomputing
Applications University of Illinois at
Urbana-Champaign Urbana, IL 61801 contact
ymuqun_at_ncsa.uiuc.edu
2N-Bit Filter Outline
- Definition
- An usage example
- Limitations
3N-Bit datatype
- A user-defined HDF5 datatype that can use less
bits than predefined datatype.
Illustration of an N-Bit datatype on a
little-endian machine base type
integer offset 4 precision 16
byte 3 byte 2 byte 1 byte 0
????????????SPPPPPPPPPPPPPPP???? S -
sign bit P - significant bit ? - padding
bit
Without using N-bit filter, HDF5 saves the
padding bits of N-bit datatype, no disk space is
saved.
4N-bit filter
- When using N-bit filter for N-bit datatype,
- all padding bits will be chopped off during
compression, and will be stored on disk like - SPPPPPPPPPPPPPPPSPPPPPPPPPPPPPPP
5Enable N-Bit filter
- Create a dataset creation property list
- Set chunking (and specify chunk dimensions)
- Set up use of the N-Bit filter
- Create dataset specifying this property list
- Close property list
-
6N-Bit filter usage example
- /Define dataset datatype (N-Bit), and set
precision, offset / - datatype H5Tcopy(H5T_NATIVE_INT)
- precision 17
- H5Tset_precision(datatype,precision)
- offset 4
- if(H5Tset_offset(datatype,offset)
- / Set the dataset creation property list for
N-Bit / - chunk_size0 CH_NX
- chunk_size1 CH_NY
- properties H5Pcreate (H5P_DATASET_CREATE)
- H5Pset_chunk (properties, 2, chunk_size)
- H5Pset_nbit (properties)
- / Create a new dataset with N-Bit datatype /
- dataset H5Dcreate (file, DATASET_NAME,
datatype, dataspace, properties)
7N-bit filter restrictions
- Only compresses N-Bit datatype or field derived
from integer or floating-point - Assumes padding bits of zero
- fill value is not treated differently
8ScaleOffset Filter Outline
- Definition
- How does ScaleOffset filter work
- Why uses ScaleOffset filter
- Usage examples
- Performance with EOS data
- Limitations
9ScaleOffset filter
- ScaleOffset compression performs a scale and/or
offset operation on each data value and truncates
the resulting value to a minimum number of bits
before storing it. The datatype is either integer
or floating-point. - offset in Scale-Offset compression means the
minimum value of a set of data values - If a fill value is defined for the dataset, the
filter will ignore it when finding the minimum
value
10How ScaleOffset works
- An example for Integer Type
- Maximum is 7065 Minimum is 2970
- The "span" Max-Min1 4076
- Case 1 No fill value is defined.
- Minimum number of bits per element to store
- ceiling(log2(span)) 12
- Case2 Fill value is defined in this array.
- Minimum number of bits per element to store
- ceiling(log2(span1)) 13
11How ScaleOffset works
- An example for Integer Type (cont.)
- Compression
- 1. Subtract each element from minimum value
- 2. Pack all data with minimum number of bits
- Decompression
- 1. Unpack all data
- 2. Add each element to minimum value
- Outcome
- 1. Save about 60 disk space for this case
12How ScaleOffset works
- An example for Floating-point Type
- D-scaling factor
- the number of decimal precision to keep for the
filter output - Floating-point data 104.561, 99.459, 100.545,
105.644 - D-scaling factor 2
- Preparation for Compression
- 1. Calculate the minimum value 99.459
- 2. Subtract the minimum value
- Modified data 5.102, 0, 1.086, 6.185
- 3. Scale the data by multiplying 10 D-scaling
factor 100 - Modified data 510.2, 0, 108.6, 618.5
- 4. Round the data to integer
- Modified data 510 , 0, 109, 619
13How ScaleOffset works
- An example for Floating-point Type (cont.)
- Compression and Decompression
- using ScaleOffset for integer
- Restoration after decompression
- 1. Divide each value by 10 D-scaling factor
- 2. Add the offset 99.459
- 3. The floating point data
- 104.559, 99.459, 100.549, 105.649
- Outcome
- 1. Lossy compression
- 2. Compression ratio will depend on D-scaling
factor
14 Scale-Offset filter compresses floating-point
data
- GRiB data packing method
- The Scale-Offset compression of floating-point
data is lossy - Two scaling methods
- D-scaling
- E-scaling
- Currently only D-scaling is implemented
15Why ScaleOffset Filter?
- Internal HDF5 filter
- Easy to understand
- Integer lossless or lossy
- Floating-point GRIB lossy compression
- Easy to control floating compression
- D-scaling factor
- Easy to estimate the compression ratio
16H5Pset_scaleoffset API
H5Pset_scaleoffset (hid_t plist_id,
H5Z_SO_scale_type_t scale_type, int
scale_factor)
- plist_id IN Dataset creation
property list identifier - scale_type IN Flag indicating
compression method - H5Z_SO_FLOAT_DSCALE (0) Floating-point
type - H5Z_SO_INT (2) Integer
type -
- scale_factor IN Flag indicating
compression method - If scale_type is H5Z_SO_FLOAT_DSCALE,
- decimal scale factor
- If scale_type is H5Z_SO_INT,
- scale_factor denotes minimum-bits, should be a
positive integer or H5Z_SO_INT_MINBITS_DEFAULT
17Integer example
- / Set the fill value of dataset /
- fill_val 10000
- H5Pset_fill_value(properties, H5T_NATIVE_INT,
fill_val) - / Set parameters for Scale-Offset
compression/ - H5Pset_scaleoffset (properties, H5Z_SO_INT
H5Z_SO_INT_MINBITS_DEFAULT) - / Create a new dataset /
- dataset H5Dcreate (file, DATASET_NAME,
H5T_NATIVE_INT, dataspace, properties)
18Floating-point example
- fill_val 10000.0
- / Set the fill value of dataset /
- H5Pset_fill_value(properties,
H5T_NATIVE_FLOAT, - fill_val)
- / Set parameters for Scale-Offset compression
- use D-scaling method,
- set decimal scale factor to 3 /
- H5Pset_scaleoffset (properties,H5Z_SO_FLOAT_DSC
ALE, 3) - / Create a new dataset /
- dataset H5Dcreate (file, DATASET_NAME,
H5T_NATIVE_FLOAT, - dataspace, properties)
19(No Transcript)
20Limitations
- Compressed floating-point data range is limited
by the size of corresponding unsigned integer
type. - Long double is not supported.