Title: GFDL Data Portal Update: Curator DB Approach
1GFDL Data Portal Update Curator DB Approach
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- S.Nikonov, V.Balaji, K.Dixon
- GFDL
2Outlines
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- GFDL Data Portal Hardware Upgrade
- Data Portal Statistics
- Metadata Database design for Data Portal usage
and for whole modeling process
3The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Data Portal Hardware Upgrade
- Dell PowerEdge 2850
- Two Intel 3.2GHz Xeon processors
- 2GB RAM
- 300GB system disk
- Two QLogic QLA2340 fiber channel controllers
(2Gb/s) - Red Hat Enterprise Linux 4.0 ES operating system
- Ten StorageTek FlexLine FLC200 fiber channel disk
arrays - Fourteen 250GB SATA drives per array
- 140 drives, total 35TB raw (27TB usable)
- Increasing by 40 data transferring and
processing speed - Future plan is to double storage capacity every 2
yrs
4Data Statistics
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- 01-Oct-2004 to
1-June-2006 - Total amount of data 8 TB (increased by 50 for
1 yr) - 12,500 NetCDF files, average file size 650 MB
- Distinct files requested 6,000
- Distinct hosts served 1,200
- Data transferred 20 TB (increased in 2 times)
- Average data transferred per day 25 GB
5Metadata Database Design
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- There is already progress done here in CGAM (NMM
Suite) also Curator project is devoted partially
to developing model and model output metadata
standards. Those ideas and discussions were
extremely useful for our design. - For comprehensive data analysis Data Portal
should give description not only data but also
how this data was generated. - It should use the same metadata database as
modeling system (Flexible Runtime Environment).
This database is a joining element of whole
system. - Analysis of existing data through Data Portal
will help to modelers in improving models and
planning new experiments. - Thus Data Portal can be considered not as a
separate independent system, but subsystem of
modeling system
6Common functionality schema of modeling system
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
7Metadata Database usage on different stages of
modeling process
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Data Portal Service
Postprocessing Plan
Experiment Preparation
Model Composition
Component Building
Metadata Database
8Main Database Compartments and their
relationships
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
9Scheme Rationales
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- Process Domains arenas where physical processes
play. - Physical Process descriptions of accepted
theoretical approaches for given processes
considered in modeling. - Algorithmization describes program modules of
elementary physical processes - Composition components, couplers drivers
technical environment - Simulation describe model output data and its
location, including all accompanied
administrative information.
10Process Domains
- They define phase spaces of the equations
expressing in mathematical form physical
phenomena. Also they serve as containers - where elements are put (gases, aerosols,
- clouds). It contains common descriptions
- and sets of elements constituent domain.
- Examples atmosphere or ocean 3D space
- for dynamics.
11Physical Processes
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- It contains theoretical assumptions, full
description, references and other information
specific for process. - Identified by name and domain where they act.
- Described individually in different tables.
- All process tables have subset of the same
fields - process id
- process name
- domain
- full description.
- Others reflect process specific.
- Process name and domain are the one of the
criteria for preventing to include the same
process into component or coupled model twice.
12Algorithmization
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- Process codebase set of modules implementing
process including input data description
(namelists and datasets) and accompanied with CVS
tag - Numeric artifices set of modules implementing
numeric smoothing (filters, artificial viscosity,
general algorithms, etc) - Tracer models descriptions with pointing to
fieldtables files associated with tracers - Grid specs
- Boundary conditions
- Namelists datasets (model parameters),
fieldtables (tracers) their locations,
versions, descriptions, checksums.
13Composition
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- Main actors here are components.
- Component can be of 2 types physical component
and coupler. - Component consists of modules.
- Modules constituent of component are defined by
physical process to be participating in final
model. These set of modules are described in
Algorithmization part of database. - Another entity of Composition compartment is a
driver. It is a program unit responsible for
running components (solely or as whole coupled
model). - Component is a minimal unit capable to be run by
driver - Components have PMIOD description and system
should make decision about components
compatibility using it. Other criteria working at
component building stage is that there should not
be two the same processes of the same domain in
component or in couple model. - Coupled Model table describes set of components
are member of final coupled model
14Simulation
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Contains tables having full description of
conducted experiment that includes
- Institution
- Author
- Project
- Scenario
- Experiment
- Realization
- Postprocessing plan
- Variables
- Variable bundles
- Metadata standards
- Data fields
- Files
15The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Compartment Structure of Curator Database
. . .
16Modes of working with database
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- Research mode - modeler introduces new physical
processes in modeling or new algorithmizations
and new components from newly developed modules
for future usage in coupled models. New
components are to be described in database. The
model runs conducted for this developed purpose
are not to be recorded in DB excepting final ones
proving physical correctness of new approach. - Production mode experimenter composes coupled
model from available components described in
database, builds scenario, postprocessing plan
and runs experiment. All this activity is
recorded in database. - Thoroughly elaborated very friendly GUI is
critical need for these modes otherwise users
will avoid the database based way of working, DB
will be empty, project will fail. - Automatic mode applications fill metadata into
database grabbing it from data files or reads
metadata for their needs during execution. - The most progress was done here with usage
Simulation compartment of Curator DB -
17Current usage of Curator DB
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
- Currently the Simulation part of DB is designed
for operational usage and its kept updated and
used in Data Portal activity. - DB serves for GFDL Data Portal web site for data
discovery and navigation IPCC CM2.1. The daemon
screens Data Portal storage seeking newly put
data files and records metadata extracted from
files and system information about them into DB. - Its used for bringing metadata consistency data
files on Data Portal with standards defined in
DB. The application accesses to DB for metadata
standard assumed for given file and
compares/fixes it in the file. - Its used by automatic tool for configuring DODS
Aggregation Server. The tool checks the
experiment status (public/not public) into DB and
requests all needed metadata for generating DODS
xml configuration file and creates this file.
18Tables examples - 1
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
19Tables examples - 2
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
CoupledModels
20Data examples - 1
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Experiments
21Data examples - 2
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
OutDataFields
OutDataFiles
22