Title: PowerPointPrsentation
1World Data Center Climate Status and Portal
Integration
Michael Lautenschlager, Hannes Thiemann and
Frank Toussaint ICSU World Data Center
Climate Model and Data / Max-Planck-Institute for
Meteorology Hamburg, Germany
GO-ESSP at LLNL Livermore, June 19th 21st, 2006
WDCC Home www.wdcc-climate.de / WDCC Contact
data_at_dkrz.de
2Content WDCC Status CERA Concept Portal
Integration
3WDCC Content
June 2006 590 Experiments / 79.000 Data Sets
Data from Earth System Modelling and Related
Observations
ERA40
Start Approved in January 2003 Maintenance
Model and Data (MD/MPI-M) and German Climate
Computing Centre (DKRZ)
4Data Export from WDC Climate
Corresponds to 2 10 TB/month
5Geographical Distribution of WDCC Users
Total number of registered users 750 (Mai 2006)
6Data Import into WDC Climate
ECHAM5/MPI-OM IPCC AR4 Scenarios (ca. 110 TB)
7CERA1) Concept Semantic Data Management
- (I) Data catalogue and Pointer to Unix files
- Enable search and identification of data
- Allow for data access as they are (coarse
granularity raw data files) - (II) Application-oriented data storage in BLOB
tables - Time series of individual variables are stored as
BLOB entries in DB Tables (fine granularity data
products) - Allow for fast and selective data access
- Storage in standard data format (GRIB, NetCDF/CF)
- Allow for application of standard data processing
routines (PINGOs, CDOs)
1) Climate and Environmental data Retrieval and
Archiving
8WDCC Data Topology
Level 1 - Interface Metadata entries (XML,
ASCII) Data Files
Level 2 Interf. Separate files containing
BLOB table data in application adapted
structure (time series of single variables)
BLOB DB Table corresponds to scalable, virtual
file at the operating system level.
9CERA Data Model
10Data matrix of model experiment
Model variables
Model Run Time
Raw data file in DKRZ Archive
2 D small BLOBS (180 KB) 3 D large BLOBS (3
MB) Raw data file direct model output (1.3
16.2 GB)
Each columm is one BLOB Table in CERA-DB
11Climate Model Data Structures
- Preferred DB-storage structure for web-based
access - single variable
- single level
- time series of 2D gridded data records
- Formats GRIB-1 NetCDF/CF (- GRIB-2)
Application related data structure (2-D)
original data structure (4-D)
12DKRZ Architecture
TX7 Intel Itanium-2 with Linux
13Portal Integration
Two strategies One way integration discovery
and use metadata are integrated in a central data
portal in one step Example C3Grid data
catalogue (refer to presentation from Heinrich
Widmann) Two way integration discovery metadata
are integrated in central data portal, use
metadata are extracted from remote archive when
they are needed for data download and
processing Example Primary data publication in
TIB library catalogue (STD-DOI) WDCC integration
in NDG (NERC Data Grid)
14Primary data publication (STD-DOI)
URL http//www.std-doi.de/
Primary Data Publication Process
Data Review
ISO 690-2 Metadata for citation of electronic
media
15Example Publ.-DOI from WDCC
16DOI URN
17Publ.-DOI
18830 GB
19Ident.-DOI
Data retrieval procudure is given at the end
(user identification is required)
20WDCC Metadaten und OAI-PMH
- O p e n A r c h i v e s I n i t i a t i v e
- Protocol for Metadata Harvesting
21WDCC support ofOAI-PMH requests
- Identify
- get information about a repository
- 2. ListMetadataFormats
- list of available metadata formats
- 3. ListSets
- list the structure of a repository (sets,...)
- 4. ListIdentifiers
- list of all identifiers of a set
- 5. GetRecord
- retrieve one individual metadata record
- 6. ListRecords
- list records of a set (used for harvesting)
Ü
22OAI-PMH http
- http request
- base URL
- list of keyword arguments
- Form keyvalue pairs
- Request type GET or POST (URI syntax)
- http response
- responseDate (format UTCdatetime)
- request (request that generated a response)
- error (incl. request that generated the error)
- http//www.openarchives.org/OAI/openarchivesprotoc
ol.html
Ü
23- WDCC OAI server at
-
- (Software dlese (www.dlese.org) apache-tomcat
5.5.12 Java 1.5) - http//uranus.dkrz.de8080/oai/provider
- - 35 IPCC experiments with more than 11000
datasets - Metadata Format ISO 19115
- C3Grid (http//gsphere.awi.de8080/gridsphere/g
ridsphere) - - 40 STD-DOI experiments with more than 1700
datasets - Metadata Format DIF
- GO-ESSP (NDG, http//ndg.badc.rl.ac.uk/)
Ü
24NDG
OAI Harvesting (Pull or Notification)
Ü
DIF XMLs WDCC
OAI Server WDCC (Software dlese)
OAI Client NDG (dlese)
Catalog NDG record 1...n
Discovery Portal NDG
DIF XMLs Provider 2
OAI Server 2
Process
OAI Server n
Delivery
25URL http//glue.badc.rl.ac.uk/discovery/ Keyword
ECHAM4