Title: Using HDF5 for Geospatial Vector Data
1Using HDF5 for Geospatial Vector Data
Question How suitable is a general purpose
format like HDF5 for storing and accessing
geospatial feature data?
2Using HDF5 for Geospatial Vector Data
Feature (vector) data example
ESRI Environmental Systems Research Institute,
Inc
3Using HDF5 for Geospatial Vector Data
- Test case ESRI Shapefiles
- Store geometry and attribute information for
spatial features as shapes with vector
coordinates. - Support point, line, and area features.
- Widely used file format for geospatial feature
data.
HDF5 example (1 file)
Shapefile format (3 files)
.shp
.shx
.dbx
ESRI Environmental Systems Research Institute,
Inc
4Using HDF5 for Geospatial Vector Data
Shapefiles tested
Shapefile Shapefile size (M bytes) (.shp .shx) Total shapes Total vertices Max. vert fora shape
A 0.001 1 66 66
B 0.01 44 191 12
C 0.2 219 9,397 1,632
D 3.0 2,253 179,106 38,725
E 12.3 11,576 721,123 500
F 18.8 8,877 1,140,460 500
ESRI Environmental Systems Research Institute,
Inc
5Using HDF5 for Geospatial Vector Data
metadata
x
y
x
y
metadata
x
y
2
- Ragged array 1-D array of variable-length data
types - Index array of offsets to data values in single
linear array. Similar to Shapefiles. - 2-D array one shape per row, multiple arrays
when shape sizes vary.
metadata
x
y
x
y
x
3
metadata
x
y
4
metadata
x
y
x
y
5
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
metadata
1
0
metadata
2
2
metadata
3
3
metadata
4
6
metadata
5
7
Distribution showing vertices/shape
6Results Comparing Shapefile and HDF5
File size
Access time
- Variable length and compound types significantly
slows access in HDF5. - Can be improved considerably by turning off
internal free lists. - When compound and variable-length types not used,
HDF5 access time is comparable to Shapefile
access.
- Overhead for variable-length structures (ragged
array) is high. - HDF5 linear array with index is comparable to
shapefile. - Compression
- HDF5 linear array with index saves up to 40 vs.
Shapefile. - HDF5 2-D arrays comparable to Shapefile when
compression used. Without compression, HDF5 files
much larger.