N-bit and ScaleOffset filters PowerPoint PPT Presentation

presentation player overlay
1 / 20
About This Presentation
Transcript and Presenter's Notes

Title: N-bit and ScaleOffset filters


1
N-bit and ScaleOffset filters
MuQun Yang National Center for Supercomputing
Applications University of Illinois at
Urbana-Champaign Urbana, IL 61801 contact
ymuqun_at_ncsa.uiuc.edu
2
N-Bit Filter Outline
  • Definition
  • An usage example
  • Limitations

3
N-Bit datatype
  • A user-defined HDF5 datatype that can use less
    bits than predefined datatype.

Illustration of an N-Bit datatype on a
little-endian machine base type
integer offset 4 precision 16
byte 3 byte 2 byte 1 byte 0
????????????SPPPPPPPPPPPPPPP???? S -
sign bit P - significant bit ? - padding
bit
Without using N-bit filter, HDF5 saves the
padding bits of N-bit datatype, no disk space is
saved.
4
N-bit filter
  • When using N-bit filter for N-bit datatype,
  • all padding bits will be chopped off during
    compression, and will be stored on disk like
  • SPPPPPPPPPPPPPPPSPPPPPPPPPPPPPPP

5
Enable N-Bit filter
  • Create a dataset creation property list
  • Set chunking (and specify chunk dimensions)
  • Set up use of the N-Bit filter
  • Create dataset specifying this property list
  • Close property list

6
N-Bit filter usage example
  • /Define dataset datatype (N-Bit), and set
    precision, offset /
  • datatype H5Tcopy(H5T_NATIVE_INT)
  • precision 17
  • H5Tset_precision(datatype,precision)
  • offset 4
  • if(H5Tset_offset(datatype,offset)
  • / Set the dataset creation property list for
    N-Bit /
  • chunk_size0 CH_NX
  • chunk_size1 CH_NY
  • properties H5Pcreate (H5P_DATASET_CREATE)
  • H5Pset_chunk (properties, 2, chunk_size)
  • H5Pset_nbit (properties)
  • / Create a new dataset with N-Bit datatype /
  • dataset H5Dcreate (file, DATASET_NAME,
    datatype, dataspace, properties)

7
N-bit filter restrictions
  • Only compresses N-Bit datatype or field derived
    from integer or floating-point
  • Assumes padding bits of zero
  • fill value is not treated differently

8
ScaleOffset Filter Outline
  • Definition
  • How does ScaleOffset filter work
  • Why uses ScaleOffset filter
  • Usage examples
  • Performance with EOS data
  • Limitations

9
ScaleOffset filter
  • ScaleOffset compression performs a scale and/or
    offset operation on each data value and truncates
    the resulting value to a minimum number of bits
    before storing it. The datatype is either integer
    or floating-point.
  • offset in Scale-Offset compression means the
    minimum value of a set of data values
  • If a fill value is defined for the dataset, the
    filter will ignore it when finding the minimum
    value

10
How ScaleOffset works
  • An example for Integer Type
  • Maximum is 7065 Minimum is 2970
  • The "span" Max-Min1 4076
  • Case 1 No fill value is defined.
  • Minimum number of bits per element to store
  • ceiling(log2(span)) 12
  • Case2 Fill value is defined in this array.
  • Minimum number of bits per element to store
  • ceiling(log2(span1)) 13

11
How ScaleOffset works
  • An example for Integer Type (cont.)
  • Compression
  • 1. Subtract each element from minimum value
  • 2. Pack all data with minimum number of bits
  • Decompression
  • 1. Unpack all data
  • 2. Add each element to minimum value
  • Outcome
  • 1. Save about 60 disk space for this case

12
How ScaleOffset works
  • An example for Floating-point Type
  • D-scaling factor
  • the number of decimal precision to keep for the
    filter output
  • Floating-point data 104.561, 99.459, 100.545,
    105.644
  • D-scaling factor 2
  • Preparation for Compression
  • 1. Calculate the minimum value 99.459
  • 2. Subtract the minimum value
  • Modified data 5.102, 0, 1.086, 6.185
  • 3. Scale the data by multiplying 10 D-scaling
    factor 100
  • Modified data 510.2, 0, 108.6, 618.5
  • 4. Round the data to integer
  • Modified data 510 , 0, 109, 619

13
How ScaleOffset works
  • An example for Floating-point Type (cont.)
  • Compression and Decompression
  • using ScaleOffset for integer
  • Restoration after decompression
  • 1. Divide each value by 10 D-scaling factor
  • 2. Add the offset 99.459
  • 3. The floating point data
  • 104.559, 99.459, 100.549, 105.649
  • Outcome
  • 1. Lossy compression
  • 2. Compression ratio will depend on D-scaling
    factor

14
Scale-Offset filter compresses floating-point
data
  • GRiB data packing method
  • The Scale-Offset compression of floating-point
    data is lossy
  • Two scaling methods
  • D-scaling
  • E-scaling
  • Currently only D-scaling is implemented

15
Why ScaleOffset Filter?
  • Internal HDF5 filter
  • Easy to understand
  • Integer lossless or lossy
  • Floating-point GRIB lossy compression
  • Easy to control floating compression
  • D-scaling factor
  • Easy to estimate the compression ratio

16
H5Pset_scaleoffset API
H5Pset_scaleoffset (hid_t plist_id,
H5Z_SO_scale_type_t scale_type, int
scale_factor)
  • plist_id IN Dataset creation
    property list identifier
  • scale_type IN Flag indicating
    compression method
  • H5Z_SO_FLOAT_DSCALE (0) Floating-point
    type
  • H5Z_SO_INT (2) Integer
    type
  • scale_factor IN Flag indicating
    compression method
  • If scale_type is H5Z_SO_FLOAT_DSCALE,
  • decimal scale factor
  • If scale_type is H5Z_SO_INT,
  • scale_factor denotes minimum-bits, should be a
    positive integer or H5Z_SO_INT_MINBITS_DEFAULT

17
Integer example
  • / Set the fill value of dataset /
  • fill_val 10000
  • H5Pset_fill_value(properties, H5T_NATIVE_INT,
    fill_val)
  • / Set parameters for Scale-Offset
    compression/
  • H5Pset_scaleoffset (properties, H5Z_SO_INT
    H5Z_SO_INT_MINBITS_DEFAULT)
  • / Create a new dataset /
  • dataset H5Dcreate (file, DATASET_NAME,
    H5T_NATIVE_INT, dataspace, properties)

18
Floating-point example
  • fill_val 10000.0
  • / Set the fill value of dataset /
  • H5Pset_fill_value(properties,
    H5T_NATIVE_FLOAT,
  • fill_val)
  • / Set parameters for Scale-Offset compression
  • use D-scaling method,
  • set decimal scale factor to 3 /
  • H5Pset_scaleoffset (properties,H5Z_SO_FLOAT_DSC
    ALE, 3)
  • / Create a new dataset /
  • dataset H5Dcreate (file, DATASET_NAME,
    H5T_NATIVE_FLOAT,
  • dataspace, properties)

19
(No Transcript)
20
Limitations
  • Compressed floating-point data range is limited
    by the size of corresponding unsigned integer
    type.
  • Long double is not supported.
Write a Comment
User Comments (0)
About PowerShow.com