Title: HDF5 Life cycle of data
1HDF5Life cycle of data
2Outline
- Life cycle of HDF5 data
- I/O operations for datasets with different
storage layouts - Compact dataset
- Contiguous dataset
- Datatype conversion
- Partial I/O for contiguous dataset
- Chunked dataset
- I/O for chunked dataset
- Variable length datasets and I/O
3Life cycle of HDF5 data
- Life cycle what does happen to data when it is
transferred from application buffer to HDF5 file?
Application
Data buffer
Object API
H5Dwrite
Library internals
Magic box
Virtual file I/O
Unbuffered I/O
File or other storage
Data in a file
4Life cycle of HDF5 data inside the magic box
- Operations on data inside the magic box
- Datatype conversion
- Scattering - gathering
- Data transformation (filters, compression)
- Copying to/from internal buffers
- Concepts involved
- HDF5 metadata, metadata cache
- Chunking, chunk cache
- Data structures used
- B-trees (groups, dataset chunks)
- Hash tables
- Local and Global heaps (variable length data
link names, strings, etc.)
5Life cycle of HDF5 data inside the magic box
- Understanding of what is happening to data inside
the magic box will help to write efficient
applications - HDF5 library has mechanisms to control behavior
inside the magic box - Goals of this and the next talk are to
- Introduce the basic concepts and internal data
structures and explain how they affect
performance and storage sizes - Give some recipes for how to improve
performance
6Operations on data inside the magic box
- Datatype conversion
- Examples
- float ? integer
- LE ? BE
- 64-bit integer to 16-bit integer (overflow may
occur!) - Scattering - gathering
- Data is scattered/gathered from/to users buffers
into internal buffers for datatype conversion and
partial I/O - Data transformation (filters, compression)
- Checksum on raw data and metadata (in 1.8.0)
- Algebraic transform
- GZIP and SZIP compressions
- User-defined filters
- Copying to/from internal buffers
7Life cycle of HDF5 data inside the magic box
- HDF5 metadata
- Information about HDF5 objects used by the
library - Examples object headers, B-tree nodes for group,
B-Tree nodes for chunks, heaps, super-block, etc.
- Usually small compared to raw data sizes (KB vs.
MB-GB) - Metadata cache
- Space allocated to handle pieces of the HDF5
metadata - Allocated by the HDF5 library in applications
memory space - Cache behavior affects overall performance
- Will cover in the next talk
8Life cycle of HDF5 data inside the magic box
- Chunking mechanism
- Chunking storage layout where a dataset is
partitioned in fixed-size multi-dimensional tiles
or chunks - Used for extendible datasets and datasets with
filters applied (checksum, compression) - HDF5 library treats each chunk as atomic object
- Greatly affects performance and file sizes
- Chunk cache
- Created for each chunked dataset
- Default size 1MB
9Writing a dataset
10I/O operations for HDF5 datasets with different
storage layouts
- Storage layouts
- Compact
- Contiguous
- Chunked
- I/O performance depends on
- Dataset storage properties
- Chunking strategy
- Metadata cache performance
- Etc.
11Writing a compact dataset
Application memory
Metadata cache
Dataset header
.
Datatype
Dataspace
.
Attribute 1
Attribute 2
Data
Raw data is stored within the dataset header
File
12Writing a contiguous dataset with no datatype
conversion
Metadata cache
Dataset header
User buffer (matrix 5x4x7)
.
Datatype
Dataspace
.
Attribute 1
Attribute 2
File
13Writing a contiguous dataset with conversion
Dataset raw data
Metadata cache
Dataset header
.
Datatype
Dataspace
.
Attribute 1
Conversion buffer 1MB
Attribute 2
Application memory
File
Dataset header
Dataset raw data
14Sub-setting of contiguous datasetSeries of
adjacent rows
Application data in memory
N
M
One I/O operation
M rows
File
Data is contiguous in a file
15Sub-setting of contiguous datasetAdjacent,
partial rows
Application data in memory
N
Several small I/O operation
M
N elements
File
Data is scattered in a file in M contiguous blocks
16Sub-setting of contiguous datasetExtreme case
writing a column
Application data in memory
N
Several small I/O operation
M
1 element
Data is scattered in a file in M different
locations
17Sub-setting of contiguous datasetData sieve
buffer
Application data in memory
Data is gathered in a sieve buffer in memory 64K
memcopy
N
M
1 element
File
Data is scattered in a file in M contiguous blocks
18Performance tuning for contiguous dataset
- Datatype conversion
- Avoid for better performance
- Use H5Pset_buffer function to customize
conversion buffer size - Partial I/O
- Write/read in big contiguous blocks (at least the
size of a block on FS) - Use H5Pset_sieve_buf_size to improve performance
for complex subsetting
19Possible tuning work
- Datatype conversion
- Use of multiple threads for datatype conversion
- Partial I/O
- OS vector I/O
- Asynchronous I/O
20Writing chunked dataset
Dimension sizes X x Y x Z
Dataset is partitioned into fixed-size
multi-dimensional chunks of sizes X/4 x Y/2 x Z
21Extending chunked dataset in any dimension
- Data can be added in any dimensions
- Compression is applied to each chunk
- Datatype conversion is applied to each chunk
22Writing chunked dataset
Chunk cache
Chunked dataset
A
C
C
B
Filter pipeline
A
B
C
File
..
- Each chunk is written as a contiguous blob
- Chunks may be scattered all over the file
- Compression is performed when chunk is evicted
from the chunk cache - Other filters when data goes through filter
pipeline (e.g. encryption)
23Writing chunked dataset
Metadata cache
Dataset_1 header
Chunk cache Default size is 1MB
Dataset_N header
Chunking B-tree nodes
- Size of chunk cache is set for file
- Each chunked dataset has its own chunk cache
- Chunk may be too big to fit into cache
- Memory may grow if application keeps opening
datasets
Application memory
24Partial I/O for chunked dataset
- Build list of chunks and loop through the list
- For each chunk
- Bring chunk into memory
- Map selection in memory to selection in file
- Gather elements into conversion buffer and
- perform conversion
- Scatter elements back to the chunk
- Apply filters (compression) when chunk is
- flushed from chunk cache
- For each element 3 memcopy performed
1
2
3
4
25Partial I/O for chunked dataset
Application buffer
3
Chunk
memcopy
Elements participated in I/O are gathered into
corresponding chunk
Application memory
26Partial I/O for chunked dataset
Chunk cache
Gather data
Conversion buffer
3
Scatter data
Application memory
On eviction from cache chunk is compressed and
is written to the file
Chunk
File
27Variable length datasets and I/O
- Examples of variable-length data
- String
- A0 the first string we want to write
-
- AN-1 the N-th string we want to write
- Each element is a record of variable-length
- A0 (1,1,0,0,0,5,6,7,8,9) length of the first
record is 10 - A1 (0,0,110,2005)
- ..
- AN (1,2,3,4,5,6,7,8,9,10,11,12,.,M) length of
the N1 record is M
28Variable length datasets and I/O
- Variable length description in HDF5 application
- typedef struct
- size_t length
- void p
- hvl_t
- Base type can be any HDF5 type
- H5Tvlen_create(base_type)
- 20 bytes overhead for each element
- Raw data cannot be compressed
29Variable length datasets and I/O
Raw data
Global heap
Global heap
Application buffer
Elements in application buffer point to global
heaps where actual data is stored
30Writing chunked VL datasets
Application memory
Metadata cache
B-tree nodes
Chunk cache
Dataset header
Global heap
Raw data
Chunk cache
Conversion buffer
Filter pipeline
VL chunked dataset with selected region
File
31VL chunked dataset in a file
Chunking B-tree
File
Dataset header
Dataset chunks
Raw data
32Variable length datasets and I/O
- Hints
- Avoid closing/opening a file while writing VL
datasets - global heap information is lost
- global heaps may have unused space
- Avoid writing VL datasets interchangeably
- data from different datasets will is written to
the same heap - If maximum length of the record is known, use
fixed-length records and compression
33Thank you!
Questions ?
34Acknowledgement
This report is based upon work supported in part
by a Cooperative Agreement with NASA under NASA
NNG05GC60A. Any opinions, findings, and
conclusions or recommendations expressed in this
material are those of the author(s) and do not
necessarily reflect the views of the National
Aeronautics and Space Administration.