HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999

About This Presentation
Title:

HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999

Description:

... RS6000, SP2 DEC: Alpha/Digital UNIX, OpenVMS VAX: OpenVMS Intel: Solarisx86, Linux, FreeBSD, Windows NT/98 PowerPC: Mac-OS A Sampling of ... –

Number of Views:119
Avg rating:3.0/5.0
Slides: 53
Provided by: hdf4
Learn more at: http://www.hdfeos.org
Category:
Tags: eos | hdf | iii | openvms | sept | workshop

less

Transcript and Presenter's Notes

Title: HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999


1
HDFHDF/HDF-EOS Workshop IIISept. 14-16, 1999

Mike Folk, HDF Group http//hdf.ncsa.uiuc.edu/ Nat
ional Center for Supercomputing
Applications University of Illinois at
Urbana-Champaign
2
Topics
  • I. Overview
  • II. NCSA HDF Activities
  • III. HDF5
  • IV. HDF4 vs. HDF5

3
I. HDF Overview
4
HDF Mission
To develop, promote, deploy, and support open and
free technologies that facilitate scientific data
storage, exchange, access, analysis and
discovery.
5
What is HDF?
  • Scientific data file format supporting software
  • For images, arrays, tables, other structures
  • Features
  • Portability across architectures
  • I/O library
  • Files
  • Efficient I/O
  • Efficient storage

6
Why use HDF?
  • Manage data
  • Share data
  • Use software that understands HDF
  • Improve I/O performance
  • Improve storage efficiency
  • Use an open standard

7
An HDF File A Collection of Scientific Data
Objects
HDF file containing four 3-D arrays
8
Mixing HDF Objects in One File
3-D array
group
Raster image
palette
Lat lon temp ---- ---- ----- 12 23 3.1 15
24 4.2 17 21 3.6 16 35 5.7
HDF file
Raster image
3-D array
Table
9
HDF Software
  • Utilities and applications for manipulating,
    viewing, and analyzing data.
  • HDF I/O library
  • High-level, object-specific APIs.
  • Low-level API for I/O to files, etc.
  • File or other data source.

General Applications

Application Programming Interfaces
Low-level Interface
HDF file
10
HDF Applications Software
  • Free software
  • NCSA HDF library and utilities
  • Other software
  • Commercial/other software that understands
  • all of HDF (Noesys, IDL, HDF Explorer)
  • certain HDF objects (MATLAB, WebWinds)
  • certain HDF applications (SHARP, WIM)
  • http//hdf.ncsa.uiuc.edu/tools.html

11
What platforms does HDF run on?
  • Sun Solaris
  • SGI Indy, Power Challenge, Origin, Cray C90,
    YMP, T3E
  • HP9000, HP-Convex Exemplar
  • IBM RS6000, SP2
  • DEC Alpha/Digital UNIX, OpenVMSVAX OpenVMS
  • Intel Solarisx86, Linux, FreeBSD, Windows NT/98
  • PowerPC Mac-OS

University of Illinois at Urbana-Champaign
12
A Sampling of HDF Users
NCSA-affiliated Science teams Visualization,
data exch, fast I/O, ... Mathworks, Fortner
Software, Format supported by vendors of vis
Research Systems Inc., etc. and data analysis
software Boeing Space-time change detection in
images Distributed Oceanographic Data Remote
access to earth science dataSystem (DODS) Army
Research Lab Network distributed global
memory Center for Analysis Prediction Fast
parallel I/O, portability, of Storms
multi-resolution grids TRAPPIST Exchange,
analysis visualization of (Euro consortium)
non-destructive testing data
13
Major User 1 EOSDIS
  • ESDIS Project
  • open standard exchange format and I/O library for
    EOSDIS
  • EOS applications
  • HDF requirements
  • Earth science data types (HDF-EOS, etc,)
  • User support for scientists, data producers, etc.
  • Library and file structure improvements
  • HDF tools, utilities, access software
  • Software maintenance and QA

14
Major User 2 ASCI
  • ASCI Data Models and Formats (DMF) Group
  • open standard exchange format and I/O library for
    ASCI
  • DOE tri-lab ASCI applications
  • HDF requirements
  • large datasets (gt a terabyte)
  • ASCI data types, especially meshes
  • good performance in massive parallel environments
  • primarily HDF 5

15
II. NCSA HDF Activities
16
Java applications
  • HDF APIs
  • Basis for tools that access HDF
  • HDF Viewers
  • HDF browser/visualizer
  • HDF4 Data Server Prototype
  • Lessons learned about remote access to

17
Remote Data Access
  • The SDB Web-based Server-side Data Browser
  • Java for remote access
  • WP-ESIP DODS project
  • Computational Grids (Globus/GASS)

18
HDF Standardization
  • To share files, users must organize them
    similarly.
  • HDF user groups create standard profiles
  • Ways to organize data in HDF files.
  • Metadata
  • API
  • Examples HDF-EOS, ASCI DMF

19
HDF-EOS software layers
Application Programming Interfaces
Low-level Interface
HDF file
20
HDF Configuration Record (HCR)
  • To simplify the tasks of defining, comparing, and
    producing HDF-EOS files
  • Formal (ODL) descriptions of HDF-EOS objects

21
HCR of Swath
/ Project XYZ / / First version defined on
June 10th, 1998 / OBJECT SWATH NAME
SCAN1 OBJECT Dimension NAME GeoTrack Size
1200 END_OBJECT Dimension OBJECT
Dimension NAME GeoCrossTrack Size
205 END_OBJECT Dimension OBJECT
Dimension NAME DataX Size 2410 END_OBJECT
Dimension END_OBJECT SWATH END
22
HCR
  • HCR Utilities
  • Converters HCR ? HDF-EOS
  • Edit HCR and HDF-EOS
  • Compare HCR with HDF-EOS file
  • Current projects
  • Extend HCR converters to all of HDF4
  • Similar work with HDF5
  • XML too

23
III. HDF5
24
Why HDF5?
  • HDF shortcomings exposed by EOSDIS, ASCI and
    others...
  • Limits on object file size (lt2GB)
  • Limited number of of objects (lt20K)
  • Rigid data models
  • I/O performance
  • Aging software infrastructure (code entropy)

25
  • new Demands...
  • Bigger, faster machines and storage systems
  • massive parallelism, parallel file systems
  • teraflop speeds, terabyte storage
  • Greater complexity
  • complex data structures
  • complex subsetting
  • More emphasis on remote distributed access

26
  • and ASCI Requirements
  • Compatibility with vector bundle model
  • Compatibility with MPI-IO
  • Ability to transform data between memory
    storage
  • Parallel file systems PIOFS, HPSS, etc.

27
New HDF5 Features
  • More scalable
  • Larger arrays and files
  • More objects
  • Improved data model
  • New datatypes
  • Single comprehensive dataset object
  • Improved software
  • More flexible, robust library
  • More flexible API
  • More I/O options

28
HDF5 data model
  • Two primary objects
  • Dataset
  • multidimensional array of elements
  • rich variety of datatypes
  • group
  • directory-like structure
  • contains datasets, groups, other objects

29
Dataset components
  • multidimensional array
  • header with metadata
  • datatype
  • dataspace
  • attributes
  • storage properties

30
Simple datatypes
  • The usual scalars integer float
  • user-defined scalars (e.g. 13-bit integers)
  • variable length (e.g. strings)
  • pointers to objects or regions of datasets
  • enumeration
  • opaque

31
Compound datatypes
  • User-defined
  • Comparable to C structs
  • Members can be simple or compound types
  • Members can be multidimensional

32
Data Spaces
  • How data are organized to form a dataset
  • rank
  • dimensions
  • Subsetting during I/O operations
  • What subset of data is to be moved
  • In-memory organization of data
  • In-file organization of data

33
HDF5 dataset array of records
Dimensionality 5 x 3
34
DataspacesReading Dataset into Memory from File
35
Selection Examples of mappings between file
selections and memory selections.
(b) A regular series of blocks from a 2D array
to a contiguous sequence at a certain offset in a
1D array
(a) A hyperslab from a 2D array to the corner of
a smaller 2D array
(c) A sequence of points from a 2D array to a
sequence of points in a 3D array.
(d) Union of slabs in file to union of slabs in
memory. No. of elements must be equal.
36
Attributes
  • Named pieces of data
  • Stored in a dataset or group header
  • Operations are scaleddown versions of the
    dataset operations
  • Not extendible
  • No compression
  • No partial I/O

37
Property list
  • Properties of objects or operations
  • Describe how to create, store, access and
    transfer data

38
Some Properties
Better subsetting access time extendable
  • chunked
  • compressed
  • extendable
  • split file

Improves storage efficiency, transmission speed
Datasets can be extended in any direction
Metadata in one file, raw data in another.
39
Dataset components
Dataset
Metadata
Data
Attributes
Dataspace
time 32.4 pressure 987 temp 56
Datatype
int16
Dim_32
Storage properties
Dim_24
Rank2
Chunked compressed
Dim_15
40
Groups
  • Structures for organizing the file
  • Like Vgroups in HDF4
  • Like directories in hierarchical file system
  • Every file starts with a root group
  • Groups have attributes

41
Groups
  • A mechanism for collections of related objects
  • Every file starts with a root group
  • Can have attributes
  • Like directories in Unix, but a graph, rather
    than a tree

root
42
Groups
  • Groups and members of groups can be shared

root
43
Mounting
File B
44
Reading writing with HDF5
  • Set properties
  • Describe the data
  • datatypes
  • rank and dimensions
  • mapping between file and memory
  • Read/write

45
Files neednt be files - Virtual File Layer
VFL A public API for writing I/O drivers
Hid_t
File Handle
VFL Virtual File I/O Layer
I/O drivers
memory
mpio
stdio
network
Storage
Memory
Network
Files
46
HDF5 tools
  • Current
  • hdf5ls - lists contents of HDF5 file
  • h5dumper - higher level view
  • hdf5?hdf4 converter
  • Future
  • Convert HDF5 ? ascii, binary, GIFF, etc
  • Convert HDF4 ? HDF5
  • Java tools - VisAD, etc.
  • File/code generation from DDL description
  • Talking to vendors

47
Other HDF5 activities
  • Performance tuning
  • Object model
  • Fortran and C API
  • Thread-safe HDF5

48
IV. HDF4 vs. HDF5
49
HDF4 vs. HDF5
  • HDF4
  • Original format and library
  • Compatible with all earlier versions
  • 6 primary objects
  • multidim array of scalars
  • raster image, palette
  • table
  • annotation
  • group
  • Biggest current user Earth Observing System Data
    and Info System (EOSDIS)
  • HDF5 - successor to HDF4
  • New format and library
  • Not compatible with earlier versions
  • 2 primary objects
  • multidim. array of records
  • group
  • Biggest current user Accelerated Strategic
    Computing Initiative (ASCI)

50
HDF4 object types can be derived from HDF5
datasets and groups


51
Status of HDF4 vs. HDF5
  • HDF4 is still an EOS standard
  • HDF5 likely also
  • HDF4 maintenance
  • Maintained as long as EOS needs it
  • Minimal new feature
  • New applications use HDF5 if possible!
  • New features, performance improvements, etc.

52
HDF Information
  • HDF Information Center
  • http//hdf.ncsa.uiuc.edu/
  • HDF Help email address
  • hdfhelp_at_ncsa.uiuc.edu
  • HDF users mailing list
  • hdfnews_at_ncsa.uiuc.edu
Write a Comment
User Comments (0)
About PowerShow.com