Title: MURI DATA COMMITTEE May 2002, Update
1MURI DATA COMMITTEEMay 2002, Update
- Atmospheric Sciences
- Harry Edmon
- Eric Grimit
- David Ovens
- Statistics
- Yulia Gel
- Anton Westveld
- APL
- Leah Foechterle
- Keith Kerr
- Mark Kruger
2Data Management Components
- Planning
- Resource Allocation
- Personnel
- Computer Hardware/Software
- Execution
- Good communication
- Cooperative effort
3Issues of Concern
- Data requirements and delivery timetables
- Data storage
- Personnel and workload distribution
- Data manipulation tools
- Running models (MM5)
4Data Requirements (Statistics)
- Main approaches personnel
- Data requirements for various approaches
- Timetables for data availability
5Main approaches personnel
- (1) Ensembles of Initialization (Yulia Gel)
- (2) Bayesian Model Averaging (Fadoua Balabdoui)
- (3) MOS Extensions (Anton Westveld)
- (4) Bayesian Mirroring/Bayesian Melding (Tony
Eckle, Eric Grimit, Adrian Raftery)
6Data Set A (Statistics)
- A. Data for methods (1) (4)
- For each of the 7 large-scale synoptic models,
historic data of initial values (on a grid) used
to start MM5. - The data for 2 years if possible, but data for a
shorter time period would work to get started. - The lowest resolution possible to achieve a
reasonable 48-hour forecast. - All 6 variables used, and at least the minimum
number of layers needed to initialize MM5 should
be included. - Observed values at the 0, 3, 6, 9, 12, 15, 18,
21, 24, 27, 30, 33, 36, 39, 42, 45, and 48th
hours for each day of initializations. Again, a
subset of these would be sufficient if not all
these observations are available.
7Data Set B (Statistics)
- Data for methods (2) (3)
- For the second method we need data similar to
what we currently have (phase 1 and phase 2)
i.e. MM5 output from each of the 7 large-scale
synoptic models, observations and MM5 output
variables (Z, P, T, U, V, Q). - For every daily run of MM5, we would like to get
predictions for the 0, 3, 6, 9, 12, 15, 18, 21,
24, 27, 30, 33, 36, 39, 42, 45, and 48th forecast
hours. These may be either interpolated to the
observational sites, or in raw form on a grid. - We would also like observations for those runs,
at the times previously indicated. - (a) Variables P, T, U, V
- (b) Variables Z, Q (Should be the same days and
time as (a))
8Data Set C (Statistics)
- Data for all methods (1), (2), (3) (4)
- Climatological data, i.e. the long-term averages
of the 6 variables at each site for which there
are many observations - Z, P, T, U, V, Q
- Any other atmospheric variables that are part of
that data set. - We would like this for the same days and times as
for the other data sets. - We understand that this has only recently been
saved into an easy format (since December).
9Delivery Timetables for Statictics MURI Data
Data Set Date Asked When MURI-Stat would like to receive the data Date at which Atmospheric Science can provide the data Who will be working on the data collection Format of the data Stat would like Format the data was given
A February 10, 2002 June 1, 2002 June 1, 2002 Eric ASCII Bin
B (a) February 10, 2002 June 15, 2002 July 1, 2002 August 1, 2002 Eric ASCII ASCII
B (b) February 10, 2002 August 1, 2002 July 1, 2002 August 1, 2002 Eric ASCII ASCII
C May 13, 2002 August 15, 2002 June 1, 2002 David ASCII ASCII/Bin
10APL Data Requirements (1)Personnel, Current
Work, Data Status
- Mark Kruger and Brad Bell
- Simple statistical analysis for displaying the
variability in the data at a given location
(KNUW). - Temperature, and wind speed information from the
main mm5 output as well as the ensemble members. - Data is being provided, only the analysis
remains. - Tom Anderl and Keith Kerr
- Root mean square analysis of the global model
input data to determine the health of these
inputs over time. - Initially, 500mb heights and SLP data from all 6
of the global models for the 0, 24, and 48 hour
forecast times. More data sets may be requested
later, but these 36 data sets will be enough to
prototype the rms analysis technique. - 500mb height and SLP data from all 6 of the
global models for the 0, 24, and 48 hour
forecasts are available in the NetCDF format.
Retrieval of these sets from the ATMOS machines
to the APL is automated. The automatic creation
of these data sets is being held off until the
initial data sets can be vetted.
11APL Data Requirements (2)Personnel, Current
Work, Data Status
- Scott Sandgathe
- Pattern matching and other advanced analysis.
- Output from the mm5 ensemble members (6 different
input models, 1 centroid, 6 reflections of the
input models around the centroid) for SLP, 700mb
heights, 500 mb heights, 925mb winds, 850mb
temperature, relative humidity (surface?), and
precipitation. - I know where to find the ensemble outputs and how
to output the data in ascii. I may have to ask
some questions about what variables correspond to
the data Scott is looking for. Some work will
have to be done to automate the data flow from
ATMOS to APL. Barring any major hold ups, this
should take about a week to do.
12Data Storage
- Access network traffic are burdensome
- Eventually, each group needs their own repository
(up to a terabyte) - APL can store their own and help with data
storage/preparation for Statistics - Indexing and access mechanisms for data archive
should be thought out carefully
13Personnel Resources
- Main workload getting data ready falls on
Atmospheric Sciences (they know where everything
is) - New hire should be brought on to help with data
and programming within 2-3 months - APL can help with some data provisioning for
Statistics Mark Kruger will liaison
14Data Manipulation Tools
- NetCDF APIs (distribution format)
- Scripting for process chaining, etc.
- VisAD for numeric field operations
- Manipulation and visualization API
- Used by many meteorological groups
- Source code available
- Only Scott Eric seem to need this now
15Running Models
- APL may acquire COAMPS and learn how to run it in
6-12 months - MM5 Yulia will need someone who understands the
model physics to help her get started. (new
hire??)