Title: HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999
1HDFHDF/HDF-EOS Workshop IIISept. 14-16, 1999
Mike Folk, HDF Group http//hdf.ncsa.uiuc.edu/ Nat
ional Center for Supercomputing
Applications University of Illinois at
Urbana-Champaign
2Topics
- I. Overview
- II. NCSA HDF Activities
- III. HDF5
- IV. HDF4 vs. HDF5
3I. HDF Overview
4HDF Mission
To develop, promote, deploy, and support open and
free technologies that facilitate scientific data
storage, exchange, access, analysis and
discovery.
5What is HDF?
- Scientific data file format supporting software
- For images, arrays, tables, other structures
- Features
- Portability across architectures
- I/O library
- Files
- Efficient I/O
- Efficient storage
6Why use HDF?
- Manage data
- Share data
- Use software that understands HDF
- Improve I/O performance
- Improve storage efficiency
- Use an open standard
7An HDF File A Collection of Scientific Data
Objects
HDF file containing four 3-D arrays
8Mixing HDF Objects in One File
3-D array
group
Raster image
palette
Lat lon temp ---- ---- ----- 12 23 3.1 15
24 4.2 17 21 3.6 16 35 5.7
HDF file
Raster image
3-D array
Table
9HDF Software
- Utilities and applications for manipulating,
viewing, and analyzing data. -
- HDF I/O library
- High-level, object-specific APIs.
- Low-level API for I/O to files, etc.
- File or other data source.
General Applications
Application Programming Interfaces
Low-level Interface
HDF file
10HDF Applications Software
- Free software
- NCSA HDF library and utilities
- Other software
- Commercial/other software that understands
- all of HDF (Noesys, IDL, HDF Explorer)
- certain HDF objects (MATLAB, WebWinds)
- certain HDF applications (SHARP, WIM)
- http//hdf.ncsa.uiuc.edu/tools.html
11What platforms does HDF run on?
- Sun Solaris
- SGI Indy, Power Challenge, Origin, Cray C90,
YMP, T3E - HP9000, HP-Convex Exemplar
- IBM RS6000, SP2
- DEC Alpha/Digital UNIX, OpenVMSVAX OpenVMS
- Intel Solarisx86, Linux, FreeBSD, Windows NT/98
- PowerPC Mac-OS
University of Illinois at Urbana-Champaign
12A Sampling of HDF Users
NCSA-affiliated Science teams Visualization,
data exch, fast I/O, ... Mathworks, Fortner
Software, Format supported by vendors of vis
Research Systems Inc., etc. and data analysis
software Boeing Space-time change detection in
images Distributed Oceanographic Data Remote
access to earth science dataSystem (DODS) Army
Research Lab Network distributed global
memory Center for Analysis Prediction Fast
parallel I/O, portability, of Storms
multi-resolution grids TRAPPIST Exchange,
analysis visualization of (Euro consortium)
non-destructive testing data
13Major User 1 EOSDIS
- ESDIS Project
- open standard exchange format and I/O library for
EOSDIS - EOS applications
- HDF requirements
- Earth science data types (HDF-EOS, etc,)
- User support for scientists, data producers, etc.
- Library and file structure improvements
- HDF tools, utilities, access software
- Software maintenance and QA
14Major User 2 ASCI
- ASCI Data Models and Formats (DMF) Group
- open standard exchange format and I/O library for
ASCI - DOE tri-lab ASCI applications
- HDF requirements
- large datasets (gt a terabyte)
- ASCI data types, especially meshes
- good performance in massive parallel environments
- primarily HDF 5
15II. NCSA HDF Activities
16Java applications
- HDF APIs
- Basis for tools that access HDF
- HDF Viewers
- HDF browser/visualizer
- HDF4 Data Server Prototype
- Lessons learned about remote access to
17Remote Data Access
- The SDB Web-based Server-side Data Browser
- Java for remote access
- WP-ESIP DODS project
- Computational Grids (Globus/GASS)
18HDF Standardization
- To share files, users must organize them
similarly. - HDF user groups create standard profiles
- Ways to organize data in HDF files.
- Metadata
- API
- Examples HDF-EOS, ASCI DMF
19HDF-EOS software layers
Application Programming Interfaces
Low-level Interface
HDF file
20HDF Configuration Record (HCR)
- To simplify the tasks of defining, comparing, and
producing HDF-EOS files - Formal (ODL) descriptions of HDF-EOS objects
21HCR of Swath
/ Project XYZ / / First version defined on
June 10th, 1998 / OBJECT SWATH NAME
SCAN1 OBJECT Dimension NAME GeoTrack Size
1200 END_OBJECT Dimension OBJECT
Dimension NAME GeoCrossTrack Size
205 END_OBJECT Dimension OBJECT
Dimension NAME DataX Size 2410 END_OBJECT
Dimension END_OBJECT SWATH END
22HCR
- HCR Utilities
- Converters HCR ? HDF-EOS
- Edit HCR and HDF-EOS
- Compare HCR with HDF-EOS file
- Current projects
- Extend HCR converters to all of HDF4
- Similar work with HDF5
- XML too
23III. HDF5
24Why HDF5?
- HDF shortcomings exposed by EOSDIS, ASCI and
others... - Limits on object file size (lt2GB)
- Limited number of of objects (lt20K)
- Rigid data models
- I/O performance
- Aging software infrastructure (code entropy)
25- new Demands...
- Bigger, faster machines and storage systems
- massive parallelism, parallel file systems
- teraflop speeds, terabyte storage
- Greater complexity
- complex data structures
- complex subsetting
- More emphasis on remote distributed access
26- and ASCI Requirements
- Compatibility with vector bundle model
- Compatibility with MPI-IO
- Ability to transform data between memory
storage - Parallel file systems PIOFS, HPSS, etc.
27New HDF5 Features
- More scalable
- Larger arrays and files
- More objects
- Improved data model
- New datatypes
- Single comprehensive dataset object
- Improved software
- More flexible, robust library
- More flexible API
- More I/O options
28HDF5 data model
- Two primary objects
- Dataset
- multidimensional array of elements
- rich variety of datatypes
- group
- directory-like structure
- contains datasets, groups, other objects
29Dataset components
- multidimensional array
- header with metadata
- datatype
- dataspace
- attributes
- storage properties
30Simple datatypes
- The usual scalars integer float
- user-defined scalars (e.g. 13-bit integers)
- variable length (e.g. strings)
- pointers to objects or regions of datasets
- enumeration
- opaque
31Compound datatypes
- User-defined
- Comparable to C structs
- Members can be simple or compound types
- Members can be multidimensional
32Data Spaces
- How data are organized to form a dataset
- rank
- dimensions
- Subsetting during I/O operations
- What subset of data is to be moved
- In-memory organization of data
- In-file organization of data
33HDF5 dataset array of records
Dimensionality 5 x 3
34DataspacesReading Dataset into Memory from File
35Selection Examples of mappings between file
selections and memory selections.
(b) A regular series of blocks from a 2D array
to a contiguous sequence at a certain offset in a
1D array
(a) A hyperslab from a 2D array to the corner of
a smaller 2D array
(c) A sequence of points from a 2D array to a
sequence of points in a 3D array.
(d) Union of slabs in file to union of slabs in
memory. No. of elements must be equal.
36Attributes
- Named pieces of data
- Stored in a dataset or group header
- Operations are scaleddown versions of the
dataset operations - Not extendible
- No compression
- No partial I/O
37Property list
- Properties of objects or operations
- Describe how to create, store, access and
transfer data
38Some Properties
Better subsetting access time extendable
- chunked
- compressed
- extendable
- split file
Improves storage efficiency, transmission speed
Datasets can be extended in any direction
Metadata in one file, raw data in another.
39Dataset components
Dataset
Metadata
Data
Attributes
Dataspace
time 32.4 pressure 987 temp 56
Datatype
int16
Dim_32
Storage properties
Dim_24
Rank2
Chunked compressed
Dim_15
40Groups
- Structures for organizing the file
- Like Vgroups in HDF4
- Like directories in hierarchical file system
- Every file starts with a root group
- Groups have attributes
41Groups
- A mechanism for collections of related objects
- Every file starts with a root group
- Can have attributes
- Like directories in Unix, but a graph, rather
than a tree
root
42Groups
- Groups and members of groups can be shared
root
43Mounting
File B
44Reading writing with HDF5
- Set properties
- Describe the data
- datatypes
- rank and dimensions
- mapping between file and memory
- Read/write
45Files neednt be files - Virtual File Layer
VFL A public API for writing I/O drivers
Hid_t
File Handle
VFL Virtual File I/O Layer
I/O drivers
memory
mpio
stdio
network
Storage
Memory
Network
Files
46HDF5 tools
- Current
- hdf5ls - lists contents of HDF5 file
- h5dumper - higher level view
- hdf5?hdf4 converter
- Future
- Convert HDF5 ? ascii, binary, GIFF, etc
- Convert HDF4 ? HDF5
- Java tools - VisAD, etc.
- File/code generation from DDL description
- Talking to vendors
47Other HDF5 activities
- Performance tuning
- Object model
- Fortran and C API
- Thread-safe HDF5
48IV. HDF4 vs. HDF5
49HDF4 vs. HDF5
- HDF4
- Original format and library
- Compatible with all earlier versions
- 6 primary objects
- multidim array of scalars
- raster image, palette
- table
- annotation
- group
- Biggest current user Earth Observing System Data
and Info System (EOSDIS)
- HDF5 - successor to HDF4
- New format and library
- Not compatible with earlier versions
- 2 primary objects
- multidim. array of records
- group
- Biggest current user Accelerated Strategic
Computing Initiative (ASCI)
50HDF4 object types can be derived from HDF5
datasets and groups
51Status of HDF4 vs. HDF5
- HDF4 is still an EOS standard
- HDF5 likely also
- HDF4 maintenance
- Maintained as long as EOS needs it
- Minimal new feature
- New applications use HDF5 if possible!
- New features, performance improvements, etc.
52HDF Information
- HDF Information Center
- http//hdf.ncsa.uiuc.edu/
- HDF Help email address
- hdfhelp_at_ncsa.uiuc.edu
- HDF users mailing list
- hdfnews_at_ncsa.uiuc.edu