Title: Spatiotemporal Databases
1NERC DataGrid data model and its application
- Andrew Woolf1 (A.Woolf_at_rl.ac.uk), Ray Cramer2,
Marta Gutierrez3, Kerstin Kleese van Dam1, Siva
Kondapalli2, Susan Latham3, Bryan Lawrence3, Roy
Lowry2, Kevin ONeill1, Ag Stephens3 - 1 CCLRC e-Science Centre
- 2 British Oceanographic Data Centre
- 3 British Atmospheric Data Centre
2Outline
- NERC DataGrid data integration problem
- Semantics as integration key
- CSML
- Wrapper/mediator architecture
- Use and future
3NERC DataGrid
4NDG data integration
- Most (but not all) NDG data is file-based
- ?On the Grid, no-one should know if youre a file
or relational table (one service to bind them
all) - The file problem
- multiple formats
- focus usually on container, not content
- Scientific file format examples (earth sciences)
- netCDF
- HDF4
- HDF5
- GRIB
- NASA Ames
- ...
5NDG data integration
6NDG data integration
- Typically, API is fundamental point of reference
- binary format details not always exposed (or
guaranteed) - public API often the only supported access
mechanism - API typically implemented as optimised native
library - why reinvent a well-known working interface?
- Data Format Description Language (DFDL)
- XML facade to file formats
- earth science files often giga-scale ? XML query
interface not likely to be efficient - encapsulating format not the issue for NDG...
- ...integrating domain-specific semantics
efficiently across files and formats is!
7NDG data integration
- Information and file contents
- same information in different file formats want
to expose information, not format (seen earlier) - in addition, semantic information structures may
be composed across files
8Integration semantics
- Want semantic access to information, not abstract
data - getData(potential temperature from ERA-40 dataset
in North Atlantic from 1990 to 2000) - not getData(era40.nc, PTMP, 2050, 300340,
190200) - or even worse
- for j19902000
- getData(era40_j.nc, PTMP, 2050, 300340)
- Lossy is OK!
- Care less about completeness of representation
than semantic unification
9NDG data integration
- Integration approaches warehousing
Integration approaches wrapper/mediator
10Integration semantics
- Summary
- What we require is
- semantic access to information (within and across
files) - and to use native (well-known) efficient APIs
under the covers - also
- scalability across providers
- warehousing not an option (tera-scale!)
- enhance access and use, outwards-facing (e.g.
impacts community, policymakers) - storage heterogeneity
11Integration semantics
- Database data modelling
- Relational model (Codd, 1970)
- Entity-relationship model (Chen, 1976)
- Semantic data models
- Object-oriented data models (inheritance,
aggregation, behaviour) - File-based data modelling
- Far less advanced
- Abstract models (variables, arrays, etc. no
object file formats in widespread use for earth
science data) - API-driven
12Integration semantics
- Fundamentally, an information community is
defined by shared semantics - semantics often (but not always) implicit
- use information semantics for data integration
- ? Semantics as integration key
- common language across providers (and users)
- supports wrapper/mediator architecture
- NDG Solution components
- semantic data model (Climate Science Modelling
Language) - storage descriptor (wrapper)
- data services (mediator)
13CSML
- Geographic features
- abstraction of real world phenomena ISO 19101
- Object models for data types type or instance
- Encapsulate important semantics in universe of
discourse - Application schema
- Defines semantic content and logical structure of
datasets - ISO standards provide conceptual toolkit
- spatial/temporal referencing
- geometry (1-, 2-, 3-D)
- topology
- dictionaries (phenomena, units, etc.)
- GML canonical encoding
from ISO 19109 Geographic information Rules
for Application Schema
14CSML
- CSML aims
- provide semantic integration mechanism for NDG
data - explore new standards-based interoperability
framework - emphasise content, not container
- Design principles
- offload semantics onto parameter type
(phenomenon, observable, measurand) - e.g. wind-profiler, balloon temperature sounding
- offload semantics onto CRS
- e.g. scanning radar, sounding radar
- sensible plotting as discriminant
- in-principle unsupervised portrayal
- explicitly aim for small number of weakly-typed
features (in accordance with governance principle
and NDG remit)
15CSML
- Semantic data model
- Climate Science Modelling Language (CSML),
http//ndg.nerc.ac.uk/csml - Weakly-typed conceptual models for range of
information types - Independent of storage concerns
- Based on ISO geographic feature types framework
- Defined on basis of geometric and topologic
structure
CSML feature type Description Examples
TrajectoryFeature Discrete path in time and space of a platform or instrument. ships cruise track, aircrafts flight path
PointFeature Single point measurement. raingauge measurement
ProfileFeature Single profile of some parameter along a directed line in space. wind sounding, XBT, CTD, radiosonde
GridFeature Single time-snapshot of a gridded field. gridded analysis field
PointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries
ProfileSeriesFeature Series of profile-type measurements. vertical or scanning radar, shipborne ADCP, thermistor chain timeseries
GridSeriesFeature Timeseries of gridded parameter fields. numerical weather prediction model, ocean general circulation model
16CSML
- CSML feature type examples
17Wrapper
- Numerical array descriptors
- provides wrapper architecture for legacy data
files - proxy for numerical content within feature
instances - Connected to data model numerical content
through xlinkhref - Three subtypes
- InlineArray
- ArrayGenerator
- FileExtract (NASAAmes, NetCDF, GRIB)
- Composite design pattern for aggregation
18Wrapper
ltNDGNASAAmesExtractgt ltarraySizegt526lt/arraySizegt
ltnumericTypegtdoublelt/numericTypegt ltfileNamegt/data
/BADC/macehead/mh960606.cf1lt/fileNamegt ltvariableN
amegtCFC-12lt/variableNamegt lt/NDGNASAAmesExtractgt
ltNDGNetCDFExtract gmlid"feat04azimuth"gt ltarra
ySizegt10000lt/arraySizegt ltfileNamegtradar_data.nclt
/fileNamegt ltvariableNamegtazlt/variableNamegt lt/ND
GNetCDFExtractgt
ltNDGGRIBExtractgt ltarraySizegt320
160lt/arraySizegt ltnumericTypegtdoublelt/numericTypegt
ltfileNamegt/e40/ggas1992010100rsn.grblt/fileNamegt
ltparameterCodegt203lt/parameterCodegt ltrecordNumber
gt5lt/ recordNumbergt ltfileOffsetgt289412lt/fileOffset
gt lt/NDGGRIBExtractgt
19Wrapper
- Aggregated array
- arrays may be aggregated along an existing or
new dimension
ltAggregatedArray gmlid"globaltemperature"gt
ltarraySizegt180 360lt/arraySizegt
ltaggTypegtexistinglt/aggTypegt
ltaggIndexgt1lt/aggIndexgt ltcomponentgt
ltNetCDFExtractgt ltarraySizegt90
360lt/arraySizegt ltfileNamegtnorthern_hem
isphere.nclt/fileNamegt
ltvariableNamegtTMPlt/variableNamegt
lt/NetCDFExtractgt lt/componentgt
ltcomponentgt ltNetCDFExtractgt
ltarraySizegt90 360lt/arraySizegt
ltfileNamegtsouthern_hemisphere.nclt/fileNamegt
ltvariableNamegtTMPlt/variableNamegt
lt/NetCDFExtractgt lt/componentgt lt/AggregatedArra
ygt
20Mediator
- Data services (mediator)
- Data services expose semantic model
- Mappings to third-party data models (e.g. file
formats, OPeNDAP) - Canonical serialisation (e.g. ISO 19118 UML ? XML
mapping) Geography Markup Language - Example services
- netCDF file instantiation
- OPeNDAP delivery
- Open Geospatial Consortium (OGC) web services,
e.g. Web Feature Service, Web Coverage Service - Pushed down to the file level, data access
request should use optimised native file
format-specific I/O
21Mediator
instantiateNetCDF(DatasetID, FeatureID)
- Provides semantic abstraction layer
22Using CSML
- Example of CSML use MarineXML
For each XSD (for the source data) there is an
XSLT to translate the data to the Feature Types
(FT) defined by CSML. The FTs and XSLT are
maintained in a MarineXML registry
Phenomena in the XSD must have an associated
portrayal
Data from different parts of the marine community
conforming to a variety of schema (XSD)
The FTs can then be translated to equivalent FTs
for display in the ECDIS system
XSD
XML
Biological Species
S52 Portrayal Library
XSD
XML
MarineGML(NDG) Feature Types
XML Parser
Chl-a from Satellite
XSLT
XML
XSLT
XSLT
SENC
SeeMyDENC
XML
XSLT
XSLT
XSD
XML
XSLT
with thanks to Keiran Millard, HR Wallingford
MeasuredHydrodynamics
ECDIS acts as an example client for the data.
Data Dictionary
The result of the translation is an encoding
that contains the marine data in weakly typed
(i.e. generic) Features
XSD
XML
Features in the source XSD must be present in the
data dictionary.
ModelledHydrodynamics
23Using CSML
ltgmldefinitionMembergt ltomPhenomenon
gmlid"taxon"gt ltgmldescriptiongtThe
taxon namelt/gmldescriptiongt ltgmlname
codeSpace"http//www.vliz.be"gttaxonlt/gmlnamegt
lt/omPhenomenongt lt/gmldefinitionMembergt
lt/NDGPhenomenonDefinitionsgt lt!--
--gt ltgmlFeatureCollectiongt lt!--
--gt ltgmlfeatureMembergt
ltNDGPointFeature gmlid"ICES_100"gt
ltNDGPointDomaingt ltdomainReferencegt
ltNDGPosition srsName"urnEPSGgeographicCR
S4979" axisLabels"Lat Long" uomLabels"degree
degree"gt ltlocationgt55.25
6.5lt/locationgt lt/NDGPositiongt
lt/domainReferencegt lt/NDGPointDomaingt
ltgmlrangeSetgt ltgmlDataBlockgt
ltgmlrangeParametersgt
ltgmlCompositeValuegt ltgmlvalueComponentsgt
ltgmlmeasure uom"tn"/gt ltgmlmeasure
uom"amount"/gt ltgmlmeasure uom"gsm"/gt
lt/gmlvalueComponentsgt
lt/gmlCompositeValuegt
lt/gmlrangeParametersgt
ltgmltupleListgt 'ANTHOZOA',63.1,missing
'Scoloplos armiger',66.1,missing 'Spio
filicornis',10,missing 'Spiophanes
bombyx',60.3,missing 'Capitellidae',131.8,missin
g 'Pholoe',10,missing 'Owenia
fusiformis',23.4,missing 'Hypereteone
lactea',6.8,missing 'Anaitides
groenlandica',13.2,missing 'Anaitides
mucosa',6.8,missing
MarineXML is an initiative of the IOC/IODE of
UNESCO to improve marine data exchange within
the marine community. The European Commission
has provided a funding contribution to this
initiative as part of its 5th Framework Programme
to undertake a pre-standardisation task of
identifying the approaches the marine community
should adopt regarding XML technology to achieve
improved data exchange.
... there is a momentum from organisations such
as IHO and WMO to adopt consistent approaches for
the vocabulary of their data along the reference
implementation of ISO Standards prescribed by the
Open Geospatial Consortium...
The NDG format proved a robust recipient for the
data from each community. It produced economical
files with few redundant elements, striking about
the right balance between weak and strong typing.
24Conclusions/future
- Conclusions
- Mechanism is lossy, in general
- ? semantic integration is far more important than
completeness of representation - Emphasis on content, not container
- Mediator services can expose data model
- Well-known community formats use efficient
legacy APIs - Initial semantic decoration can add context to
entire workflow chain - Loose relationship between legacy file data model
and semantic (feature) instance to which it is
mapped
25Conclusions/future
- Current and future work (NDG)
- Implement tooling
- CSML parsing/processing
- Automated scanner files ? CSML
- Implement NDG data delivery (mediator) services
layered over data model - Further perspectives
- Integrate with broader interoperability
frameworks (e.g. semantics repositories Feature
Type Catalogues WMO, IOC, INSPIRE) - Generalise approach
- meta-model for data modelling
- data storage description language for file
mappings (DFDL role?) - canonicalised serialisation for workflows
26Conclusions/future
Managing semantics
conceptual model
define data models
auto-generate XSD
GML dataset
GML app schema
ltgmlfeatureMembergt ltNDGPointFeature
gmlid"ICES_100"gt ltNDGPointDomaingt
ltdomainReferencegt ltNDGPosition
srsName"urnEPSGgeographicCRS4979"
axisLabels"Lat Long" uomLabels"degree degree"gt
ltlocationgt55.25 6.5lt/locationgt
lt/NDGPositiongt lt/domainReferencegt
lt/NDGPointDomaingt ltgmlrangeSetgt
ltgmlDataBlockgt
ltgmlrangeParametersgt
ltgmlCompositeValuegt ltgmlvalueComponentsgt
ltgmlmeasure uom"tn"/gt ltgmlmeasure
uom"amount"/gt ltgmlmeasure uom"gsm"/gt
lt/gmlvalueComponentsgt
lt/gmlCompositeValuegt
lt/gmlrangeParametersgt
ltgmltupleListgt 'ANTHOZOA',63.1,missing
'Scoloplos armiger',66.1,missing 'Spio
filicornis',10,missing 'Spiophanes
bombyx',60.3,missing 'Capitellidae',131.8,missin
g
auto-generated parser
populate dataset instances
27Conclusions/future
Parser
- Stack of Builders (for UML meta-model)
- current class, object, attribute
- specialised for particular UML?XML mapping
- Builder receives
- filtered SAX events
- built object
- Builder returns
- built object
- new object class
- new Builder (for inheritance through
substitutionGroups)