Data Archiving - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Data Archiving

Description:

Steven Worley, NSF/NCAR/SCD. 1. Data Archiving. Definition: Research data archive ... Steven Worley, NSF/NCAR/SCD. 12. Under development. Numerous data curation ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 14
Provided by: steven172
Category:
Tags: archiving | data | steven

less

Transcript and Presenter's Notes

Title: Data Archiving


1
Data Archiving
  • Definition Research data archive
  • Meteorological and physical oceanographic data
  • In situ observations
  • Satellite measurements (some)
  • Analyses of the observations
  • Weather Center daily output
  • Reanalyses Projects
  • Predictive model output (Ocean, Atmosphere)
  • Built over 35 years
  • 500 datasets, 55 TB, 350K files on MSS
  • Nine data stewards (grad. degrees in met./ocn.)

2
  • NCEP (ADP)
  • 1975 2003
  • From operational files
  • 8/day to 4/day
  • code changes (e.g. wx)
  • always 8 synoptic time observations
  • hourly through 1996
  • now, 1997-2003
  • more hourly ASOS (NWS)
  • 20 min. data AWOS (FAA)
  • 40-50 duplicates

Jan. 1990, 50 obs/month
3
Data Archiving
  • Practices and Policies
  • Save 2x copies
  • Offsite backup
  • under different management system
  • Time stable attributes
  • No proprietary data formats
  • Access software in basic languages
  • Fortran, C,
  • Software dependence additional care
  • E.g. netCDF, HDF, GrIB
  • Shared Responsibility Cross Agency
  • For large collections

4
(No Transcript)
5
Data Maintenance
  • Practices and Policies
  • Use change control system, all transaction
  • Creation
  • File additions, fixes, replacement
  • Metadata updates
  • The data and metadata remain tightly linked.
  • Note CCS system itself, viable for decades
  • Same principles as the archive

6
Data Maintenance
  • PP - Concerns
  • Fact Huge collections of web based
    documentation.
  • Text, Images, Links
  • Embedded scripting (e.g. java script )
  • HOW DO YOU ARCHIVE WEB SITES?
  • Access content 20 years from now?
  • Data and metadata in DBMSs
  • Software dependent
  • Not viable for long-term archiving technology
    trap
  • Must have DBMS transition plans

7
Data Access
  • Discovery
  • Community Data Portal (NCAR/UCAR wide)
  • Standardized metadata
  • Data Server
  • Detailed metadata, all documentation
  • Four access modes
  • MSS
  • Only for users with MSS connection (NCAR/UCAR)
  • By individual request
  • Handled by data stewards
  • CDROM
  • Data Server
  • Not all data

8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Access from the DSS server
  • Metrics
  • Period 7/2003-6/2004
  • Web
  • 6TB total volume
  • 10K unique users, 1K repeat users 
  • FTP
  • 10TB total volume (6.6 TB ERA40)
  • 1K unique users, 140 repeat users

Top Ten Datasets (web)
12
Under development
  • Numerous data curation and stewardship project
  • Dataset Metadata Improvements
  • augment text to ltxmlgt
  • Some automatic
  • Some with tools
  • File metadata for historical files (many formats)
  • Standard ? Compliant or transformable to UCAR
    and US standards
  • Much more data online

13
Under development
  • Broad and easy access
  • Presence in shared file systems
  • One copy for service internal and external
  • Minimize MSS activity
  • CDP
  • Improved discovery based on better metadata
  • Well populated THREDDS catalogs
  • Leverage data GRID technologies
  • External user authentication
  • Gather access metrics
  • Real time data selection and access
  • Significant computing power, software, and
    strategies
  • Server-side data processing, analyses,
    interpolation, and comparison
Write a Comment
User Comments (0)
About PowerShow.com