Spatiotemporal Databases - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Spatiotemporal Databases

Description:

NERC DataGrid data model and its application Andrew Woolf1 (A.Woolf_at_rl.ac.uk), Ray Cramer2, Marta Gutierrez3, Kerstin Kleese van Dam1, Siva Kondapalli2, Susan Latham3 ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 28
Provided by: nescAcUkt8
Category:

less

Transcript and Presenter's Notes

Title: Spatiotemporal Databases


1
NERC DataGrid data model and its application
  • Andrew Woolf1 (A.Woolf_at_rl.ac.uk), Ray Cramer2,
    Marta Gutierrez3, Kerstin Kleese van Dam1, Siva
    Kondapalli2, Susan Latham3, Bryan Lawrence3, Roy
    Lowry2, Kevin ONeill1, Ag Stephens3
  • 1 CCLRC e-Science Centre
  • 2 British Oceanographic Data Centre
  • 3 British Atmospheric Data Centre

2
Outline
  • NERC DataGrid data integration problem
  • Semantics as integration key
  • CSML
  • Wrapper/mediator architecture
  • Use and future

3
NERC DataGrid
4
NDG data integration
  • Most (but not all) NDG data is file-based
  • ?On the Grid, no-one should know if youre a file
    or relational table (one service to bind them
    all)
  • The file problem
  • multiple formats
  • focus usually on container, not content
  • Scientific file format examples (earth sciences)
  • netCDF
  • HDF4
  • HDF5
  • GRIB
  • NASA Ames
  • ...

5
NDG data integration
6
NDG data integration
  • Typically, API is fundamental point of reference
  • binary format details not always exposed (or
    guaranteed)
  • public API often the only supported access
    mechanism
  • API typically implemented as optimised native
    library
  • why reinvent a well-known working interface?
  • Data Format Description Language (DFDL)
  • XML facade to file formats
  • earth science files often giga-scale ? XML query
    interface not likely to be efficient
  • encapsulating format not the issue for NDG...
  • ...integrating domain-specific semantics
    efficiently across files and formats is!

7
NDG data integration
  • Information and file contents
  • same information in different file formats want
    to expose information, not format (seen earlier)
  • in addition, semantic information structures may
    be composed across files

8
Integration semantics
  • Want semantic access to information, not abstract
    data
  • getData(potential temperature from ERA-40 dataset
    in North Atlantic from 1990 to 2000)
  • not getData(era40.nc, PTMP, 2050, 300340,
    190200)
  • or even worse
  • for j19902000
  • getData(era40_j.nc, PTMP, 2050, 300340)
  • Lossy is OK!
  • Care less about completeness of representation
    than semantic unification

9
NDG data integration
  • Integration approaches warehousing

Integration approaches wrapper/mediator
10
Integration semantics
  • Summary
  • What we require is
  • semantic access to information (within and across
    files)
  • and to use native (well-known) efficient APIs
    under the covers
  • also
  • scalability across providers
  • warehousing not an option (tera-scale!)
  • enhance access and use, outwards-facing (e.g.
    impacts community, policymakers)
  • storage heterogeneity

11
Integration semantics
  • Database data modelling
  • Relational model (Codd, 1970)
  • Entity-relationship model (Chen, 1976)
  • Semantic data models
  • Object-oriented data models (inheritance,
    aggregation, behaviour)
  • File-based data modelling
  • Far less advanced
  • Abstract models (variables, arrays, etc. no
    object file formats in widespread use for earth
    science data)
  • API-driven

12
Integration semantics
  • Fundamentally, an information community is
    defined by shared semantics
  • semantics often (but not always) implicit
  • use information semantics for data integration
  • ? Semantics as integration key
  • common language across providers (and users)
  • supports wrapper/mediator architecture
  • NDG Solution components
  • semantic data model (Climate Science Modelling
    Language)
  • storage descriptor (wrapper)
  • data services (mediator)

13
CSML
  • Geographic features
  • abstraction of real world phenomena ISO 19101
  • Object models for data types type or instance
  • Encapsulate important semantics in universe of
    discourse
  • Application schema
  • Defines semantic content and logical structure of
    datasets
  • ISO standards provide conceptual toolkit
  • spatial/temporal referencing
  • geometry (1-, 2-, 3-D)
  • topology
  • dictionaries (phenomena, units, etc.)
  • GML canonical encoding

from ISO 19109 Geographic information Rules
for Application Schema
14
CSML
  • CSML aims
  • provide semantic integration mechanism for NDG
    data
  • explore new standards-based interoperability
    framework
  • emphasise content, not container
  • Design principles
  • offload semantics onto parameter type
    (phenomenon, observable, measurand)
  • e.g. wind-profiler, balloon temperature sounding
  • offload semantics onto CRS
  • e.g. scanning radar, sounding radar
  • sensible plotting as discriminant
  • in-principle unsupervised portrayal
  • explicitly aim for small number of weakly-typed
    features (in accordance with governance principle
    and NDG remit)

15
CSML
  • Semantic data model
  • Climate Science Modelling Language (CSML),
    http//ndg.nerc.ac.uk/csml
  • Weakly-typed conceptual models for range of
    information types
  • Independent of storage concerns
  • Based on ISO geographic feature types framework
  • Defined on basis of geometric and topologic
    structure

CSML feature type Description Examples
TrajectoryFeature Discrete path in time and space of a platform or instrument. ships cruise track, aircrafts flight path
PointFeature Single point measurement. raingauge measurement
ProfileFeature Single profile of some parameter along a directed line in space. wind sounding, XBT, CTD, radiosonde
GridFeature Single time-snapshot of a gridded field. gridded analysis field
PointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries
ProfileSeriesFeature Series of profile-type measurements. vertical or scanning radar, shipborne ADCP, thermistor chain timeseries
GridSeriesFeature Timeseries of gridded parameter fields. numerical weather prediction model, ocean general circulation model
16
CSML
  • CSML feature type examples

17
Wrapper
  • Numerical array descriptors
  • provides wrapper architecture for legacy data
    files
  • proxy for numerical content within feature
    instances
  • Connected to data model numerical content
    through xlinkhref
  • Three subtypes
  • InlineArray
  • ArrayGenerator
  • FileExtract (NASAAmes, NetCDF, GRIB)
  • Composite design pattern for aggregation

18
Wrapper
  • File extract examples

ltNDGNASAAmesExtractgt ltarraySizegt526lt/arraySizegt
ltnumericTypegtdoublelt/numericTypegt ltfileNamegt/data
/BADC/macehead/mh960606.cf1lt/fileNamegt ltvariableN
amegtCFC-12lt/variableNamegt lt/NDGNASAAmesExtractgt
ltNDGNetCDFExtract gmlid"feat04azimuth"gt ltarra
ySizegt10000lt/arraySizegt ltfileNamegtradar_data.nclt
/fileNamegt ltvariableNamegtazlt/variableNamegt lt/ND
GNetCDFExtractgt
ltNDGGRIBExtractgt ltarraySizegt320
160lt/arraySizegt ltnumericTypegtdoublelt/numericTypegt
ltfileNamegt/e40/ggas1992010100rsn.grblt/fileNamegt
ltparameterCodegt203lt/parameterCodegt ltrecordNumber
gt5lt/ recordNumbergt ltfileOffsetgt289412lt/fileOffset
gt lt/NDGGRIBExtractgt
19
Wrapper
  • Aggregated array
  • arrays may be aggregated along an existing or
    new dimension

ltAggregatedArray gmlid"globaltemperature"gt
ltarraySizegt180 360lt/arraySizegt
ltaggTypegtexistinglt/aggTypegt
ltaggIndexgt1lt/aggIndexgt ltcomponentgt
ltNetCDFExtractgt ltarraySizegt90
360lt/arraySizegt ltfileNamegtnorthern_hem
isphere.nclt/fileNamegt
ltvariableNamegtTMPlt/variableNamegt
lt/NetCDFExtractgt lt/componentgt
ltcomponentgt ltNetCDFExtractgt
ltarraySizegt90 360lt/arraySizegt
ltfileNamegtsouthern_hemisphere.nclt/fileNamegt
ltvariableNamegtTMPlt/variableNamegt
lt/NetCDFExtractgt lt/componentgt lt/AggregatedArra
ygt
20
Mediator
  • Data services (mediator)
  • Data services expose semantic model
  • Mappings to third-party data models (e.g. file
    formats, OPeNDAP)
  • Canonical serialisation (e.g. ISO 19118 UML ? XML
    mapping) Geography Markup Language
  • Example services
  • netCDF file instantiation
  • OPeNDAP delivery
  • Open Geospatial Consortium (OGC) web services,
    e.g. Web Feature Service, Web Coverage Service
  • Pushed down to the file level, data access
    request should use optimised native file
    format-specific I/O

21
Mediator
instantiateNetCDF(DatasetID, FeatureID)
  • Provides semantic abstraction layer

22
Using CSML
  • Example of CSML use MarineXML

For each XSD (for the source data) there is an
XSLT to translate the data to the Feature Types
(FT) defined by CSML. The FTs and XSLT are
maintained in a MarineXML registry
Phenomena in the XSD must have an associated
portrayal
Data from different parts of the marine community
conforming to a variety of schema (XSD)
The FTs can then be translated to equivalent FTs
for display in the ECDIS system
XSD
XML
Biological Species
S52 Portrayal Library
XSD
XML
MarineGML(NDG) Feature Types
XML Parser
Chl-a from Satellite
XSLT
XML
XSLT
XSLT
SENC
SeeMyDENC
XML
XSLT
XSLT
XSD
XML
XSLT
with thanks to Keiran Millard, HR Wallingford
MeasuredHydrodynamics
ECDIS acts as an example client for the data.
Data Dictionary
The result of the translation is an encoding
that contains the marine data in weakly typed
(i.e. generic) Features
XSD
XML
Features in the source XSD must be present in the
data dictionary.
ModelledHydrodynamics
23
Using CSML
  • EU project MarineXML

ltgmldefinitionMembergt ltomPhenomenon
gmlid"taxon"gt ltgmldescriptiongtThe
taxon namelt/gmldescriptiongt ltgmlname
codeSpace"http//www.vliz.be"gttaxonlt/gmlnamegt
lt/omPhenomenongt lt/gmldefinitionMembergt
lt/NDGPhenomenonDefinitionsgt lt!--

--gt ltgmlFeatureCollectiongt lt!--

--gt ltgmlfeatureMembergt
ltNDGPointFeature gmlid"ICES_100"gt
ltNDGPointDomaingt ltdomainReferencegt
ltNDGPosition srsName"urnEPSGgeographicCR
S4979" axisLabels"Lat Long" uomLabels"degree
degree"gt ltlocationgt55.25
6.5lt/locationgt lt/NDGPositiongt
lt/domainReferencegt lt/NDGPointDomaingt
ltgmlrangeSetgt ltgmlDataBlockgt
ltgmlrangeParametersgt
ltgmlCompositeValuegt ltgmlvalueComponentsgt
ltgmlmeasure uom"tn"/gt ltgmlmeasure
uom"amount"/gt ltgmlmeasure uom"gsm"/gt
lt/gmlvalueComponentsgt
lt/gmlCompositeValuegt
lt/gmlrangeParametersgt
ltgmltupleListgt 'ANTHOZOA',63.1,missing
'Scoloplos armiger',66.1,missing 'Spio
filicornis',10,missing 'Spiophanes
bombyx',60.3,missing 'Capitellidae',131.8,missin
g 'Pholoe',10,missing 'Owenia
fusiformis',23.4,missing 'Hypereteone
lactea',6.8,missing 'Anaitides
groenlandica',13.2,missing 'Anaitides
mucosa',6.8,missing
MarineXML is an initiative of the IOC/IODE of
UNESCO to improve marine data exchange within
the marine community. The European Commission
has provided a funding contribution to this
initiative as part of its 5th Framework Programme
to undertake a pre-standardisation task of
identifying the approaches the marine community
should adopt regarding XML technology to achieve
improved data exchange.
... there is a momentum from organisations such
as IHO and WMO to adopt consistent approaches for
the vocabulary of their data along the reference
implementation of ISO Standards prescribed by the
Open Geospatial Consortium...
The NDG format proved a robust recipient for the
data from each community. It produced economical
files with few redundant elements, striking about
the right balance between weak and strong typing.
24
Conclusions/future
  • Conclusions
  • Mechanism is lossy, in general
  • ? semantic integration is far more important than
    completeness of representation
  • Emphasis on content, not container
  • Mediator services can expose data model
  • Well-known community formats use efficient
    legacy APIs
  • Initial semantic decoration can add context to
    entire workflow chain
  • Loose relationship between legacy file data model
    and semantic (feature) instance to which it is
    mapped

25
Conclusions/future
  • Current and future work (NDG)
  • Implement tooling
  • CSML parsing/processing
  • Automated scanner files ? CSML
  • Implement NDG data delivery (mediator) services
    layered over data model
  • Further perspectives
  • Integrate with broader interoperability
    frameworks (e.g. semantics repositories Feature
    Type Catalogues WMO, IOC, INSPIRE)
  • Generalise approach
  • meta-model for data modelling
  • data storage description language for file
    mappings (DFDL role?)
  • canonicalised serialisation for workflows

26
Conclusions/future
Managing semantics
conceptual model
define data models
auto-generate XSD
GML dataset
GML app schema
ltgmlfeatureMembergt ltNDGPointFeature
gmlid"ICES_100"gt ltNDGPointDomaingt
ltdomainReferencegt ltNDGPosition
srsName"urnEPSGgeographicCRS4979"
axisLabels"Lat Long" uomLabels"degree degree"gt
ltlocationgt55.25 6.5lt/locationgt
lt/NDGPositiongt lt/domainReferencegt
lt/NDGPointDomaingt ltgmlrangeSetgt
ltgmlDataBlockgt
ltgmlrangeParametersgt
ltgmlCompositeValuegt ltgmlvalueComponentsgt
ltgmlmeasure uom"tn"/gt ltgmlmeasure
uom"amount"/gt ltgmlmeasure uom"gsm"/gt
lt/gmlvalueComponentsgt
lt/gmlCompositeValuegt
lt/gmlrangeParametersgt
ltgmltupleListgt 'ANTHOZOA',63.1,missing
'Scoloplos armiger',66.1,missing 'Spio
filicornis',10,missing 'Spiophanes
bombyx',60.3,missing 'Capitellidae',131.8,missin
g
auto-generated parser
populate dataset instances
27
Conclusions/future
Parser
  • Stack of Builders (for UML meta-model)
  • current class, object, attribute
  • specialised for particular UML?XML mapping
  • Builder receives
  • filtered SAX events
  • built object
  • Builder returns
  • built object
  • new object class
  • new Builder (for inheritance through
    substitutionGroups)
Write a Comment
User Comments (0)
About PowerShow.com