Title: NESC workshop
1Data integration with theClimate Science
Modelling Language
- Andrew Woolf1, Bryan Lawrence2, Roy Lowry3,
Kerstin Kleese van Dam1, Ray Cramer3, Marta
Gutierrez2, Siva Kondapalli3, Susan Latham2,
Dominic Lowe2, Kevin ONeill1, Ag Stephens2 - 1CCLRC e-Science Centre
- 2British Atmospheric Data Centre
- 3British Oceanographic Data Centre
2Outline
- Background
- Standards a framework for interoperability
- Climate Science Modelling Language (CSML)
- Using CSML
3Background
- Data integration requirements
- scalability across providers
- warehousing not an option
- enhance access and use, outwards-facing (e.g.
impacts community, policymakers) - storage heterogeneity
- ? Semantics as integration key
- common language across providers (and users)
- supports wrapper/mediator architecture
4Standards
- Emerging ISO standards
- TC211 around 40 standards for geographic
information - Cover activity spectrum discovery ? access ? use
ISO 19101 Domain Reference Model
5Standards
- Geographic features
- abstraction of real world phenomena ISO 19101
- Type or instance
- Encapsulate important semantics in universe of
discourse - Application schema
- Defines semantic content and logical structure of
datasets - ISO standards provide toolkit
- spatial/temporal referencing
- geometry (1-, 2-, 3-D)
- topology
- dictionaries (phenomena, units, etc.)
- GML canonical encoding
from ISO 19109 Geographic information Rules
for Application Schema
6Standards
- The importance of governance
- Information community defined by shared semantics
- Need community process to manage those semantics
(definitions, models, vocabularies, taxonomies,
etc.) - e.g. CF conventions for netCDF files
- Role of Feature Type Catalogues ISO 19110 and
registers ISO 19135 - Governance as driver for granularity
- Remit / interest determines appropriate
granularity - e.g. IOC, IHO, WMO
ltmeasurement typeRadiosonde measurandtemperat
ure/gt
lttemperatureProfile/gt
ltSonde parametertemperature/gt
7Climate ScienceModelling Language
- Aims
- provide semantic integration mechanism for NDG
data - explore new standards-based interoperability
framework - emphasise content, not container
- Design principles
- offload semantics onto parameter type
(phenomenon, observable, measurand) - e.g. wind-profiler, balloon temperature sounding
- offload semantics onto CRS
- e.g. scanning radar, sounding radar
- sensible plotting as discriminant
- in-principle unsupervised portrayal
- explicitly aim for small number of weakly-typed
features (in accordance with governance principle
and NDG remit)
8Climate ScienceModelling Language
- CSML feature types
- defined on basis of geometric and topologic
structure
9Climate ScienceModelling Language
- CSML feature types
- examples...
10Climate ScienceModelling Language
- Application schema
- logical structure and semantic content of NDG
Dataset - Based on GML 3.1
11Climate ScienceModelling Language
- Integration approaches wrapper/mediator
12Climate ScienceModelling Language
- Numerical array descriptors
- provides wrapper architecture for legacy data
files - Connected to data model numerical content
through xlinkhref - Three subtypes
- InlineArray
- ArrayGenerator
- FileExtract (NASAAmes, NetCDF, GRIB)
- Composite design pattern for aggregation
13Climate ScienceModelling Language
- Inline array
- Array generator
ltNDGInlineArraygt ltarraySizegt5 2lt/arraySizegt ltuom
gtudunits.xmldegreeClt/uomgt ltnumericTypegtfloatlt/nu
mericTypegt ltregExpTransformgts/10/9/gelt/regExpTran
sformgt ltnumericTransformgt5lt/numericTransformgt lt
valuesgt1 2 3 4 5 6 7 8 9 10lt/valuesgt lt/NDGInlineAr
raygt
ltNDGArrayGeneratorgt ltarraySizegt10001lt/arraySizegt
ltuomgtudunits.xmlminutelt/uomgt ltnumericTypegtfloat
lt/numericTypegt ltexpressiongt0550000lt/expressiongt
lt/NDGArrayGeneratorgt
14Climate ScienceModelling Language
ltNDGNASAAmesExtractgt ltarraySizegt526lt/arraySizegt
ltnumericTypegtdoublelt/numericTypegt ltfileNamegt/data
/BADC/macehead/mh960606.cf1lt/fileNamegt ltvariableN
amegtCFC-12lt/variableNamegt lt/NDGNASAAmesExtractgt
ltNDGNetCDFExtract gmlid"feat04azimuth"gt ltarra
ySizegt10000lt/arraySizegt ltfileNamegtradar_data.nclt
/fileNamegt ltvariableNamegtazlt/variableNamegt lt/ND
GNetCDFExtractgt
ltNDGGRIBExtractgt ltarraySizegt320
160lt/arraySizegt ltnumericTypegtdoublelt/numericTypegt
ltfileNamegt/e40/ggas1992010100rsn.grblt/fileNamegt
ltparameterCodegt203lt/parameterCodegt ltrecordNumber
gt5lt/ recordNumbergt ltfileOffsetgt289412lt/fileOffset
gt lt/NDGGRIBExtractgt
15Climate ScienceModelling Language
- Aggregated array
- arrays may be aggregated along an existing or
new dimension
ltNDGAggregatedArray gmlid"feat05cruisetrack"gt
ltarraySizegt2 50lt/arraySizegt ltaggTypegtnewlt/aggTy
pegt ltaggIndexgt1lt/aggIndexgt ltcomponentgt ltNDG
NetCDFExtractgt ltarraySizegt50lt/arraySizegt lt
fileNamegtcruisetrack.nclt/fileNamegt ltvariableNa
megtalatlt/variableNamegt lt/NDGNetCDFExtractgt lt/
componentgt ltcomponentgt ltNDGNetCDFExtractgt
ltarraySizegt50lt/arraySizegt ltfileNamegtcruisetra
ck.nclt/fileNamegt ltvariableNamegtalonlt/variableN
amegt lt/NDGNetCDFExtractgt lt/componentgt lt/NDGA
ggregatedArraygt
16Climate ScienceModelling Language
instantiateNetCDF(DatasetID, FeatureID)
- Provides semantic abstraction layer
17Climate ScienceModelling Language
- Status
- Initial feature types defined
- First draft application schema complete
- Trial software tooling being coded (parser,
netCDF instantiation) - Initial deployment trial across BODC, BADC
datasets - Future
- Separate out wrapper implementation (array
descriptors) - Disallow internal dictionaries
- More strongly-typed features?
- Follow (and pursue!) GML evolution, enhance
compliance - Expand tooling
- Related work
- WMO, IOC, IHO
- MarineXML
- MOTIIVE (INSPIRE)
18Using CSML
ltgmldefinitionMembergt ltomPhenomenon
gmlid"taxon"gt ltgmldescriptiongtThe
taxon namelt/gmldescriptiongt ltgmlname
codeSpace"http//www.vliz.be"gttaxonlt/gmlnamegt
lt/omPhenomenongt lt/gmldefinitionMembergt
lt/NDGPhenomenonDefinitionsgt lt!--
--gt ltgmlFeatureCollectiongt lt!--
--gt ltgmlfeatureMembergt
ltNDGPointFeature gmlid"ICES_100"gt
ltNDGPointDomaingt ltdomainReferencegt
ltNDGPosition srsName"urnEPSGgeographicCR
S4979" axisLabels"Lat Long" uomLabels"degree
degree"gt ltlocationgt55.25
6.5lt/locationgt lt/NDGPositiongt
lt/domainReferencegt lt/NDGPointDomaingt
ltgmlrangeSetgt ltgmlDataBlockgt
ltgmlrangeParametersgt
ltgmlCompositeValuegt ltgmlvalueComponentsgt
ltgmlmeasure uom"tn"/gt ltgmlmeasure
uom"amount"/gt ltgmlmeasure uom"gsm"/gt
lt/gmlvalueComponentsgt
lt/gmlCompositeValuegt
lt/gmlrangeParametersgt
ltgmltupleListgt 'ANTHOZOA',63.1,missing
'Scoloplos armiger',66.1,missing 'Spio
filicornis',10,missing 'Spiophanes
bombyx',60.3,missing 'Capitellidae',131.8,missin
g 'Pholoe',10,missing 'Owenia
fusiformis',23.4,missing 'Hypereteone
lactea',6.8,missing 'Anaitides
groenlandica',13.2,missing 'Anaitides
mucosa',6.8,missing
MarineXML is an initiative of the IOC/IODE of
UNESCO to improve marine data exchange within
the marine community. The European Commission
has provided a funding contribution to this
initiative as part of its 5th Framework Programme
to undertake a pre-standardisation task of
identifying the approaches the marine community
should adopt regarding XML technology to achieve
improved data exchange.
... there is a momentum from organisations such
as IHO and WMO to adopt consistent approaches for
the vocabulary of their data along the reference
implementation of ISO Standards prescribed by the
Open Geospatial Consortium...
The NDG format proved a robust recipient for the
data from each community. It produced economical
files with few redundant elements, striking about
the right balance between weak and strong typing.
19Using CSML
Managing semantics
conceptual model
UGAS
GML dataset
GML app schema
ltgmlfeatureMembergt ltNDGPointFeature
gmlid"ICES_100"gt ltNDGPointDomaingt
ltdomainReferencegt ltNDGPosition
srsName"urnEPSGgeographicCRS4979"
axisLabels"Lat Long" uomLabels"degree degree"gt
ltlocationgt55.25 6.5lt/locationgt
lt/NDGPositiongt lt/domainReferencegt
lt/NDGPointDomaingt ltgmlrangeSetgt
ltgmlDataBlockgt
ltgmlrangeParametersgt
ltgmlCompositeValuegt ltgmlvalueComponentsgt
ltgmlmeasure uom"tn"/gt ltgmlmeasure
uom"amount"/gt ltgmlmeasure uom"gsm"/gt
lt/gmlvalueComponentsgt
lt/gmlCompositeValuegt
lt/gmlrangeParametersgt
ltgmltupleListgt 'ANTHOZOA',63.1,missing
'Scoloplos armiger',66.1,missing 'Spio
filicornis',10,missing 'Spiophanes
bombyx',60.3,missing 'Capitellidae',131.8,missin
g
parser
20Using CSML
- Stack of Builders (for UML meta-model)
- current class, object, attribute
- specialised for particular UML?XML mapping
- Builder receives
- filtered SAX events
- built object
- Builder returns
- built object
- new object class
- new Builder (for inheritance through
substitutionGroups)