Title: Scientific Investigations Support from Research Data Archives for Computing in Atmospheric Sciences
1Scientific Investigations Support from Research
Data Archivesfor Computing in Atmospheric
Sciences 2001
- 29 October, 2001
- Steven Worley
- National Center for Atmospheric Research
- Scientific Computing Division
2Key Steps of Scientific Investigations
- Formulate the questions and review the state of
understanding - Search and discover data
- Access data
- Analyzes data
- Community sharing and archive
- Document new understandings
3Search and Discover Data
- How? ? Web based Information Server
- Salient Features
- 2.5K html pages (metadata)
- All datasets are described (500)
- Location of all data files in MSS
- Higher level information
- Catalogs
- Project specific descriptions
Always current dataset descriptions
4- Features
- Organization Navigation
- Archive Navigation
- Pull down menus
- Search
- Project Links
5- Dataset Page
- Title and Brief description
- Systematic Navigation
- Metadata highlights
- Period of Record
- Usage
- Variables
- Related Sites (NOAA)
- Contact Person
- Related Datasets
6- Brief Archive History and Specifications
- Started in middle 1960s, (35 years)
- Managed by nine people
- 211K data files
- 17 TB in a MSS
- 530 datasets all sizes
7Global Observations
- Usages
- Input for global atmospheric reanalysis
- Basic long term climate assessment and case
studies
8Operational and Composite Analyses
- Daily SLP is a small but very popular dataset,
e.g. NAO evaluations - Two main operational centers provide the best
current analyses
9- Concerns
- Restricted distribution
- U.S. non-profits and UCAR members only
- Need online authentication and authorization for
easy access
- Key Aspects
- Medium size archive 170 Gigabytes
- multi-(product, temporal res., spatial res.) -
complex
10- Highlights
- Frequent updates to FNL, 1º, daily via FTP
- High resolution N. America product, ETA at 40km
- No distribution restrictions or cost
11Reanalyses
- Notes
- ERA-15 is finished, ERA-40 is running now
- NCEP II, primarily experimental run
12- Outstanding Features
- Three different coordinate surfaces
- Very long analysis, 2 Terabytes size
- Unrestricted distribution
- CD-ROMS are very popular
13Countries Receiving Reanalysis CDROMs
- Highlights
- Over 8900 CDROMs 1997-09/2001
- Recipients U.S. 46, Japan 11, (Canada, UK)
4, (Germany, India) 3, (Australia, S.Korea,
Spain, Mexico, Norway, Russia, France) 2
14Reanalysis Users for 2001 (4th qtr estimated)
209 From the MSS 157 Jan.-Sep. 47 On CDROM
35 48 Custom data orders on FTP or Tape 36
540 From the online server 406 844 Total Served
15- Reanalysis Data Distributed for 2001 (4th qtr
estimated) - 9616 GB from the MSS 7230 GB Jan.-Sep.
- 808 GB On CD-ROM 935, _at_650Mb/CDROM
- 1383 GB Custom orders, FTP and tape 1040
- 88 GB From the online server 66 GB
- 11895 GB, 11.9 TB Total
16- GCIP Model Data Center Collection
High resolution atmospheric models focused on
energy and hydrology cycles.
- Critical data for N. American mesoscale studies
- Complete archive is about 1 Terabyte
GCIP GEWEX Continental-Scale International
Project / GEWEX Global Energy and Water Cycle
Exper.
176-yr Mean T at 5 meters
University of Miami
MICOM Miami Isopynic Coordinate Ocean Model,
1/12th degree 70N to 28 S, 16-20 layers
18Dataset Sizes and Scales
- Today
- 800 Unique users
- 12 Terabytes data transferred
- 2 Terabyte dataset size
- Example NCEP/NCAR Reanalysis
- Near Future
- Excludes TB-PB Level 0 and 1 satellite and the
super scale experimental models - Numbers of Users, same
- Data transferred, 5x to 10x more ?
- Dataset size, 2-20 TB
- Examples
- Ocean and Atmosphere models
- ECMWF Reanalysis (ERA40)
19Access to Data
- Methods
- NCAR computers
- From the local MSS
- Web data server
- Custom data packages
- by request (FTP, tape, CDROM)
- Users
- World class programmer
- Research Scientist
- Graduate Students
- Undergraduate Students
20Data Access in the future
- Do we continue doing what we are doing?
- Absolutely
- Why? It Works
- Over 1000 users annually
- Very diverse skills
- The archive is a heterogeneous collection
- Many formats (ASCII, Binary, GrIB, BUFR, netCDF,
HDF) - Many sizes (1 MB to 2 TB)
- Capable of serving large and small projects
- Maintain a variety of flexible methods
21Data Access in the future
- Keys to handling future larger collections
- Plan to create useful data products
- Condensed datasets from high resolution output
- Group most popular variables products together
- Serve many, e.g. CDROMS and WWW
- Continue to develop emerging online data systems
- User driven subset selection with graphics and
data download options - Server-side elementary analysis
- Multi-dataset comparisons
- Statistical summaries and basic meteorological
calculations - Our development is the Community Data Portal
22Data Analysis
- Tools
- NCAR Command Language (NCL) software
- Features in brief
- I/O for many standard data formats
- Easy adaptations to read any format
- 100s meteorological functions
- Publication quality graphics
- The CDP is capable of analysis
- NCL is one of several middleware packages
23Community Sharing
- Support for the scientist
- A place to distribute new data results
- Possibly with authentication and authorization
control - E.g. model outputs
- Spin off benefit
- New data resources for the archive
- Many users can then use new product
24a
b
- NCEP Operational Analyses blended with QSCAT
Satellite data - Wind Stress Curl, 01/24/2000 1800 UTC
- NCEP Operational ONLY
- NCEP QSCAT swaths
- OI blend of NCEP QSCAT
- Blending by Colorado Research Associates
- We archive all three products.
c
25Key Steps of Scientific Investigations
- Formulate the questions and review the state of
understanding - Search and discover data
- Access data
- Analyzes data
- Community sharing and archive
- Document new understandings