Title: The GHRSST match-up database (MDB) Status report
1The GHRSST match-up database (MDB)Status report
- Jean-François Piollé (Ifremer/CERSAT),
- Craig Donlon (UKMet),
- Pierre Leborgne (Meteo-France) et al.
2The GHRSST-PP MDB Motivation
- The original idea for a common shared MDB system
was to have a data resource that everyone working
in GHRSST-PP could use to validate satellite SST
data. - Having a common shared resource means that groups
at least start the validation from a common set
of data which is not the case today. - Each group maintains its own different set of
MDB data QCd by different rules and thus
including different data. This is for historical
and political reasons. From these different data
sets it is not really possible to compare
different SSES derivations or validation results
as each database uses different QC and data
content. - The GHRSST-PP MDB should help to provide a better
validation framework that will also deliver a
scale of economy (1 database that everyone agrees
with compared to many different systems) - If we all start form the best common set of data
then the assumptions made in the analysis during
the validation process are fair.
3The GHRSST MDB what is existing?
- Specification exists in GDS for MDB definition,
processing, content and data exchange format - Original plan was RDACs to deliver MDB records
to a central database (GHRSST MDB), could be
modified for RDACs to send a copy of their
databases to be reformatted but there are issues - Only one RDAC (Medspiration) delivering routinely
MDB records based on GHRSST specifications - One implementation of MDB dedicated to GHRSST
product (independant on L2 provider) gt limited
to Medspiration data - Failed to meet original requirements
- What does this mean?
- Limited resources ort politically sensitive
issues? - Not a priority for RDACs or GHRSST system? gt
matter of schedule - No adequacy to the need? gt matter of
specification to be address by Science Team - Not relevant?
- How can we continue/strengthen this effort ?
4The Medspiration MDB
- Match-ups computed daily for all Medspiration
datastreams (AVHRR SEVIRI SAF OSI products,
AATSR, GAC/LAC, AMSRE,TMI) - limited to Atlantic coverage (except for AATSR
Global) - Use CORIOLIS as unique datastream for in situ
(inc. GTS data) - http//www.medspiration.org/tools/mdb/index.html
- Data available in NetCDF format
- http//www.medspiration.org/tools/mdb/preextractio
n.html - Flexible multi-criteria extraction interface (to
ascii/netcdf) - http//www.medspiration.org/tools/mdb/consultation
/ - Significant effort by Medspiration team and
funding by ESA to provide open access to all
users - Major item of the upcoming European GMES Thematic
Assembly Center (TAC) for SST (2008-2011)
5a GHRSST MDB why?
- Computing SSES why if already included by
producer in L2P files? - Original motivation for a GHRSST MDB
- Double check? intercomparison?
- Required by the GHRSST-PP RAN project to verify
the stability of SSES - It may be not possible to compute accurate SSES
from L2P content (if model depending on channels
combination, cloud screening, nadir/dual view
differences, anything related to sst retrieval
process) - What are respective responsabilities of GHRSST
and L2 producers? - L2 providers to provide accurate SSES
- GHRSST control and feedback service to warn
providers about bad SSES estimation - Independant publication of sensor error
statistics - Other usages
- Has proved to be valuable for intercomparison and
making decision tree for data selection when
merging sensors (depending on area, season,
retrieval conditions,) gt complementary even
when SSES computed by L2 providers - Bringing added value to or compared to other
existing MDBs - Ancillary data a reason why many user build
their own MDB - single access/format helps that process
- Having ancillary data already filled in is also a
benefit (although can not be exhaustive)
6Benefit on having a central MDB
- Develops a community for operational SSES
development and verification - Required by the GHRSST-PP RAN
- Provides easier access
- single point access (needs to be agreed as to
what and how this is done) - Same retrieval procedure formats, etc
- Homogeneity extremely important for
understanding the quality of different satellite
validation results - standard variables geolocation, main parameters
- In situ source, if all match-ups produced from
the same input data stream with the same
procedure - Quality control
- Content
- Matching rules and criteria
- Ancillary data can be added to each match-up
(single effort for investment, single sources) - Water vapour content (climatology or model or
satellite) - Wind
- Mixed layer estimation
- .
7Issues on having a central MDB
- Most of providers have their own MDB running
- No extra ressource available to duplicate
investment for inclusion into a centralized
database - Can be a huge task if all match-ups have to be
produced at the same place - Producing and delivering match-ups to GHRSST MDB
- Medspiration database was relying only on
Coriolis - Limited to content of GHRSST-PP files gt lack
information about BTs, Channels, etc - Need to consider the best format and data
content. Perhaps this needs to be flexible?
8Populating a central MDB
- Computing all match-ups at a single place
(Medspiration) - unique in situ data stream (Coriolis,)
- consistent and homogeneous content (same level
of quality control, same filtering, same
inputs,) - allows to compute proximity confidence (L2P
content) dependant SSES - more in situ inputs (e.g. does not consider only
GTS real-time data) gt more match-ups (or less
for unoperational sources M-AERI, research
cruises,) - optimal consistency
- high cost in manpower/resources managing
several dataflows, performing colocation for all
datasets, software maintenance, processing
hardware, - limited to L2P content (and ancillary data added
afterward) gt no radiances, TBs,
- Using match-ups as given by L2 providers
- which sources for in situ data? How can they be
intercompared? - unconsistency and heterogeneity gt which
filtering? Quality control? - may be no information to compute proximity
confidence (L2P content) dependant SSES - richer (or poorer!) content
- easier to manage
- keeping database consistency may be difficult
(updates, duplication,)
9Issues with distributed contributors
- Updates
- Two ways
- Provide additionnal match-ups periodically gt
difficult to guarantee no duplication (has to be
guaranteed by provider) - Full dataset gt requires removing all previous
match-ups and storing the new dataset - Provider has to guarantee the new datasets
includes all the match-ups of the previous
release - No mix!!!
10Using match-ups from L2 providers vs computing
them at the same central place
- Very heterogenous content
- Requires very flexible database structure gt
complex to build and maintain (many dependances
on L2 providers)
- MODIS
- Brightness temperature for channel 14 (low and
high gain),20,22,23,31,32,26,27,28,29 - min/max/median/average/std.dev values for each
channel in 3x3 boxes - Reynolds SST
- AMSRE/TMI
- satellite and in situ sst only
- AVHRR PathFinder
- brightness temperatures
- reynolds sst
- aerosols
- air temperature
- wind speed and direction
- radiances
- emissivity
- METOP
- air temperature
- wind speed and direction
- water vapour content (climatology, model)
- brightness temperature for channel 3.7,11,12
- radiance for channel 0.6,0.9,1.2
- sst algorithm information
- cloud information
- aerosols
- ice
- climatological information
- min/max/mean/std.dev
11Summary
- Is there a need for a central MDB?
- Do we need independant SSES estimation?
- Probably not in NRT but we do for the GHRSST-PP
RAN effort at least for homegeneity testing - May be seen as a simple assesment of native SSES
(will not provide better ones but feedback to
producer) - How do we build that?
- If the only need is for independant SSES
publication, all match-ups should be computed by
a single MDB system using L2P and the same in
situ data source gt high cost, interest to be
considered carefully - If other needs (research, remote-sensing,), we
should use the match-ups computed by providers
(likely richer content, historical data) gt
heterogenous content, reliability? But easier to
manage (few processing)
12Questionnaire
Are you a L2 or L2P data provider Most of them but not all of them (who seems to find it more painfull to have to do it)
Do you compute your own L2 or L2P match-ups All L2/L2P providers
If you compute your own match-ups, how frequently do you update them L2/L2P providers gt daily Research gt monthly
Do you need L2 or L2P match-ups? If it is not for the validation of your own L2 or L2P products, for which purpose? Some providers express need for double check/confirmation of their own SSES estimation Others SSES estimation, sensor characteriztion, data merging gt need for ancillary data
If your compute match-ups, are you ready to make them available to everybody ? Most of them Some L2/L2P providers can not (need to compute match-ups in this case)
Do you mind putting a copy of your MDB at the GHRSST MDB (with proper acknowledgement/credits/) in addition to any other access you may provide Those that make them available agree on duplication at GHRSST MDB (though it may not be the purpose of GHRSST MDB if we seek complete independancy!!)
Do you expect the GHRSST MDB to compute match-ups from your L2P datasets (from its own in situ data stream) or just store your own match-ups and no others? Question was ambiguous It was about creating independant match-ups at GHRSST level (not using the L2/L2P providers match-ups) Strongly related to objectives/independance level sought for GHRSST MDB
What service do you expect from the GHRSST MDB? A single access to the match-ups for all L2P datastreams? Homegeneous format ? More information associated with the match-ups (ancillary data,)? Finding SSES for each L2P dataset? Search and extraction tools? Some intercomparison tools (such as the graphical display of the HR-DDS systems, plots, maps,)? Credible and independant evaluation of L2Ps Single access point, format, match-up criteria Search/subsetting/extraction tools Access to specific buoys/cruises Integration with HR-DDS system Some requirements contradictory with minimal goal of MDB (which is to provide independant SSES)
13Suggestion for the GHRSST MDB
- GHRSST MDB will be redesigned for more
flexibility in content and more independancy to
Coriolis system - Central MDB computes match-ups from L2P and in
situ data gt computing ressource to be
investigated - Other in situ sources than CORIOLIS can be
considered if delivered in ARGO format (and not
duplicating Coriolis/GTS/Argo) - Match-ups delivered periodically in netCDF format
(and online from web interface) - gt Will allow SSES checking and intercomparison
with homogeneous content - L2/L2P providers are encouraged to complete the
datasets with whatever information they have
(brightness temperature,) - GHSST MDB will be able to add also ancillary data
from other sources - gt Will make MDB more complete and comprehensive
for wider use
Coriolis (inc.Argo, GTS,..) ARGO format
Other in situ sources ARGO format
Compute match-ups
GDAC L2P
Ingest in database
Add L1/L2/ancillary information
Add ancillary information
Match-ups netCDF format
L2/L2P providers
GHRSST MDB
14The MDB in GHRSST GDS 2.0
- GDS section on MDB was initially introduced for
exchange of MDB records between providers and MDB
central archive - Current status no exchange in GHRSST system
- XML format for MDB records was suitable for
flexible content and exchange, not for usage gt
add useless complexity and constraints - Output format provided by Medspiration MDB is
ascii and netCDF - Conclusion
- Suggest to remove the MDB requirement section
- Replace with description of output format
(netCDF) and access means to central MDB (if any)
gt user manual level information (should not be
in GDS at all ?)
15The MDB in GMES context
- MDB listed as a main feature of the future
European SST GDAC - Will imply improvement and some redesign of the
current Medspiration MDB, based on experience and
usage during the last months - Will conform to GHRSST ST requirements gt
important to reach agreement on MDB goals and
design - Add missing information
- Allow other input streams (for in situ and
satellite) - Works will only start end of 2008