Archiving - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Archiving

Description:

John Milburn & Jochen Horn, UCLA. March 7-8, 2000. DCS Preliminary Design Review. 2. SOFIA ... Maximize the scientific productivity of SOFIA. Provide public ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 36
Provided by: markm188
Learn more at: https://www.cis.rit.edu
Category:
Tags: archiving | horn

less

Transcript and Presenter's Notes

Title: Archiving


1
Archiving
  • Requirements for the SOFIA Data Cycle System
  • Mark Morris, UCLA
  • Joe Mazzarella Steve Lord, IPAC
  • John Milburn Jochen Horn, UCLA

2
SOFIA Data Archives
  • Purposes
  • Maximize the scientific productivity of SOFIA
  • Provide public information about existing data
  • Provide data backups
  • Verification of data products
  • Informs related science projects
  • Archival research
  • e.g., context of the Astrophysics Data Program
  • Supplement to other funded or unfunded research.
  • Motivates publication

3
SOFIA Data Archives
  • SOFIA observations to be archived in three forms
  • SUMMARY ARCHIVE
  • data headers and logs
  • WORKING ARCHIVE
  • raw data from all instruments
  • PUBLIC ARCHIVE
  • reduced data from facility instruments

4
SOFIA Databases
5
SOFIA Data Archives
  • SUMMARY ARCHIVE
  • on-line, equipped with search tool
  • maintained at the SSMOC
  • headers of each observation, giving
  • source names, positions
  • instrument its parameter settings, integration
    times
  • Important environmental aircraft parameters.
  • links to the flight and observing logs
  • identities of P.I. observer (if different)
  • includes proposal abstract

6
SOFIA Data Archives
  • LOGS
  • Highly automated
  • Can be annotated
  • Flight log
  • Details of observatory functions, flight
    parameters
  • Observing log
  • Fundamental set of observing parameters (e.g.,
    observer ID, source ID, position, instrument
    mode, frequency, filters, bandwidth, start stop
    times, integration times, chop/nod configuration,
    water vapor index, etc.)
  • Optional set of custom parameters
  • Includes wrap-up commentary (exit interview)

7
SOFIA Data Archives
  • WORKING ARCHIVE
  • purposes
  • Fundamental repository of all untreated SOFIA
    data
  • Backup
  • Resource for archival research
  • maintained by SSMOC staff
  • includes
  • Contents of summary archive
  • Environmental and housekeeping data
  • Raw science data from all instruments
  • made available upon request by qualified
    individuals with a web-based request form on a
    SOFIA archive page having links to the data
    reduction tools. Access requests subject to
    approval by a person with designated authority.
  • access subject to validation period for all but
    the proposing PI.

8
SOFIA Data Archives
  • PUBLIC ARCHIVE
  • created from Working Archive data from (at
    least) the facility instruments which have been
    carried through a standard data reduction
    pipeline
  • fully accessible on the web, following
    validation period
  • maintained at the SSMOC, mirrored at IPAC
  • Consistent in form and function with other
    mission archives embedded within IRSAat IPAC.
  • accompanying tools to examine and extract
    quantitative information from, the archived
    images and spectra. Where feasible, existing
    IRSA tools will be adapted to this end.

9
SOFIA Data Archives
  • METADATA ARCHIVES recognizing evolution of both
    software and instrumentation
  • Pipeline components - version tracking
  • Pipelines
  • Documentation
  • manuals
  • tutorials

10
Assumptions (1)
The SOFIA Archive Requirements and Design are
being developed with the following assumptions
  • The primary use is by scientists using the Web.
  • All components of the archive will reside at NASA
    Ames, with "mirrors" of the Public Archive placed
    at a remote data centers such as IPAC.
  • SOFIA Facility Instruments will support General
    Investigators (GIs) using the concept of
    Astronomical Observation Templates (AOTs) and
    Astronomical Observation Requests (AORs).
  • The Public Archive will support a well defined
    set of FI AOTs, each of which will be reduced by
    software module pipelines delivered to the SOFIA
    Data Cycle System (DCS).

11
Assumptions (2)
  • PI Instrument data will be supported at the
    Working Archive level.
  • The Archive will consist of science, calibration,
    and laboratory test data from the Facility
    Instruments, plus SSMOC Housekeeping data.
  • SOFIA archive data are for public use after a
    reasonable validation period for proper
    reduction, calibration, and science validation by
    observing teams with support from the SSMOC.
  • The requirements are aimed at the SSMOC, the
    Facility Instrument teams, and the DCS software
    developers for the archive system.
  • Observations will be tracked through their
    complete lifecycle from the AOR through the raw,
    reduced, and final calibrated science data
    products using a unique Observation
    Identification number (OBSID).

12
Archive Interactions with DCS Components.
13
High Level Archive Requirements
  • The archive shall simplify use and reuse of SOFIA
    data during reduction, analysis, interpretation
    and publication.
  • The archive shall enable the DCS to store and
    retrieve uniform data products.
  • The archive will adhere to existing (FITS) and
    emerging (XML) standards for data storage and
    interchange between software modules.
  • The archive shall support continuous improvement
    of data reduction pipelines and improvements in
    calibration procedures.
  • The archive shall support online data access for
    humans (Web interfaces) and remote software
    clients (e.g., via XML-based "server mode") from
    other astronomical data centers.
  • The archive shall provide services for archival
    research, including search tools and quantitative
    measurement tools.

14
Functional Requirements (1)
The Archive software shall support efficient and
reliable data insertion functions and procedures.
  • Functions shall be provided to insert raw data
    from all instruments into the Working Archive and
    update a registry (index) of data files. This
    shall be done routinely after each flight.
  • Insertion of intermediate and reduced data files
    resulting from pipeline processing will be
    handled by pipeline modules, these modules
    shall adhere to the file and directory naming
    conventions outlined in the Directory and File
    Naming Conventions.
  • Functions shall be provided to insert into the
    proper Archive level (Working, Summary, Public)
    FITS tables, catalogs, or text files, including
    but not limited to
  • Calibration sources
  • Source lists (targets) for observing programs
  • Observing Logs
  • Flight Plans

15
Functional Requirements (2)
Data for Facility and PI instruments shall be
stored and maintained in the Working, Summary and
Public Archive levels as follows
16
Functional Requirements (3)
Data in the Working, Summary and Public Archive
levels shall be publicly available online through
a Web interface for the different instrument
types as follows
17
Functional Requirements (4)
  • Functions shall be provided to verify the
    integrity and validity of the data products.
  • Functions shall be provided to copy and track
    (version control) validated data products from
    the Working Archive into the Public Archive.
  • The Archive software shall provide functions to
    extract metadata to populate the Summary Archive.
  • The software shall extract header records from
    FITS data in the Working Archive and insert
    metadata into DBMS tables to support queries of
    the Summary Archive.
  • The software shall convert (or "wrap") FITS to
    provide an API to the emerging Astronomical XML
    (AML) format for data and summary (metadata)
    interchange with other data and information
    systems, for example IRSA, STScI, HEASARC, NED,
    and others.
  • The software shall automatically create links
    between data products, calibration files, and
    documentation as described in the Summary Archive
    Contents Requirements
  • Functions shall be provided to cross-reference
    flight video and audio recordings with the
    Working Archive and other relevant FITS data
    products.

18
Functional Requirements (5)
Queries
  • The Archive software shall support queries for
    data sets meeting selection criteria meaningful
    to astronomers.
  • Queries shall allow location of raw data products
    in the Working Archive.
  • Queries shall allow location of reduced and
    calibrated data in the Public Archive.
  • After searching based on query constraints as
    described above, the user shall have the ability
    to select one or more returned data set
    "handles", which are based on well-documented
    Observation ID (OBSID) numbers, to download the
    data immediately to his or her local computer via
    HTTP or FTP.
  • A Web query form shall be provided which allows
    users to input a known Observation ID (OBSID)
    number to directly return the data products and
    optionally a subset of its associated
    Housekeeping Data and Documentation.

19
Functional Requirements (6)
  • The Archive software shall support queries
    involving astronomical positions in standard
    coordinate systems.
  • The Archive software shall recognize queries on
    astronomical sky regions using cone searches and
    ranges expressed in standard coordinate systems.
  • The Archive software shall support queries on
  • astronomical object names.
  • SOFIA instrument names.
  • AOT names and AOT parameters such as instrumental
    passbands, filter names, etc.
  • Wavelength ranges using standard astronomical
    conventions.
  • Time intervals
  • SOFIA Observation Identifiers (OBSIDs)
  • Observer names (PIs, Co-Is)

20
Functional Requirements (7)
Document Tracking
  • The Archive shall support tracking of data
    products.
  • The Archive shall support tracking of data
    reduction software modules and pipeline
    sequences.
  • The Archive shall support registration and
    tracking of documentation.

21
Functional Requirements (8)
The DCS User Interface shall shall support modes
of interaction with human users and software
components
  • Command-line user interfaces to each component.
  • Standard Uniform Resource Locators (URLs)
    accessible through Web-based forms and remote
    client software
  • A "server-mode" for use by client software within
    the DCS and from remote sites.
  • Graphical user interface (GUI) "widgets" for
    access to the archive integrated into the SOFIA
    Observation Planning and Flight Planning tools.
  • Results from archive queries shall be returned in
    well defined and clearly documented data
    structures. Ideally these data structures will be
    in a self-documenting, object-oriented format
    using XML.

22
Data Content Requirements - Summary Archive
The Summary Archive shall
  • store observation FITS header keywords and values
    extracted from the data products in a format that
    efficiently supports user queries.
  • contain Project Status information.
  • contain links to abstracts of Observing
    Proposals.
  • contain PI Observing Run Abstracts Detailed
    Observing Logs.
  • contain links to the executed Flight Plans.
  • contain Flight Director Logs.
  • contain links to the Working and Public Archives,
    Pipeline Software Archive, Documentation Library,
    and Bibliography.

23
Data Content Requirements - Working Archive (1)
The Working Archive shall
  • store raw data (science calibration) acquired
    from all SOFIA instruments. The raw data and
    related Housekeeping data shall be deposited into
    the Working Archive immediately after a
    successful SOFIA flight, ideally within a few
    hours after landing.
  • serve as the primary data repository. Data
    reduction pipelines will read raw data and write
    intermediate data produced by the Standard Data
    Product pipelines into the Working Archive.
  • serve as a data backup for General Investigators.
  • be housed at the SSMOC and made available to
    eligible PIs and CoIs as soon as it enters the
    archive, and to the public after the requisite
    validation period. The Working Archive will be
    available online, but Working datasets will be
    transferred onto a Web-accessible (FTP) area with
    password protection.

24
Data Content Requirements - Working Archive (2)
The Working Archive shall
  • track the processing history of science data
    products and instrument calibration files,
    notably for intermediate and reduced data
    products which are preliminary or unvalidated,
    and thus not yet copied to the Public Archive.
  • contain Housekeeping data pertaining to the state
    or status of the instruments, the aircraft, the
    telescope, and observing conditions (environment)
    while observations were made and data were
    collected.
  • contain FITS data files of Housekeeping
    instrument calibration data stored either as
    header keywords and values, or pointers to more
    extensive data in auxiliary files which are
    required for data reduction and calibration by
    the pipelines.
  • serve as a resource for archival research,
    especially for people who wish to develop
    improvements to the data reduction algorithms to
    push the limits of the observations to make new
    scientific discoveries or improvements to
    previous interpretations.

25
Data Content Requirements - Working Archive (3)
Data in the Working Archive shall be linked to
other Archive components
  • Summary Archive Metadata
  • Actual Flight Plans
  • Project Status
  • Flight Logs
  • appropriate versions of Pipeline Data Reduction
    Software Archive and supporting documentation.
  • Reduced data in the Public Archive
  • Documentation Library
  • Video and audio recordings

26
Data Content Requirements - Public Archive (1)
  • The SOFIA Facility Instruments will each have a
    Standard Pipeline that will produce reduced,
    calibrated images, photometric measurements, or
    spectra for standard modes, or AOTs. Data
    products resulting from filled-in AOTs, which are
    called Astronomical Observation Requests (AORs)
    comprise the Public Archive.
  • The Public Archive shall be accessible by GIs and
    the general public through Web-based query and
    request forms.
  • The Public Archive shall serve network-based
    requests for data from remote archive system
    software
  • The Public Archive data shall be mirrored at the
    Infrared Science Archive (IRSA) at IPAC, where
    interfaces and query engines will be developed
    and maintained in coordination with similar
    software used to support community access for
    data from NASA's other infrared missions.

27
Data Content Requirements - Public Archive (2)
Data in the Public Archive shall be linked to
other Archive components
  • Summary Archive Metadata
  • Actual Flight Plans
  • Project Status
  • Flight Logs
  • Pipeline Data Reduction Software Archive
  • the user interfaces for access to the raw data
    and housekeeping data in the Working Archive, and
    the Documentation Library

28
Data Format and Transport Standards
  • The SOFIA instruments shall produce files in FITS
    format as their primary raw data products. These
    will be transferred to the Archive team at the
    SSMOC and comprise the bulk of the Working
    Archive.
  • The DCS shall support archiving of FITS images
    and spectra using the Binary Table (BINTABLE)
    Extension Standard
  • SOFIA data shall follow a standard Dictionary for
    FITS Keyword Types.
  • Both FITS and XML formats will be supported for
    data interchange.
  • The "Observation Sequence Numbers" (OSNs) in a
    flight will be cross-referenced to the OBSID
    (Observation Identification) numbers in each PI's
    observing program using XML documents and/or
    database tables maintained in the Archive.

29
Pipeline Software Archive
  • Pipeline software a well-defined, documented,
    automated, scientifically validated, ordered
    sequence of data reduction module operations
    designed for a specific set of AOT's supported by
    the SOFIA DCS.
  • The data reduction modules shall be delivered to
    the DCS by the Facility Instrument Teams, along
    with the validated pipelines that support the
    chosen AOTs. The general pipeline architecture,
    maintenance and version control will subsequently
    be SSMOC and DCS responsibilities, initially in
    close collaboration with the instrument teams
  • An official pipeline version is associated with
    an approved scientific validation procedure
    defined by the SOFIA Science Center. Since data
    reduction software will evolve during the
    lifecycle of SOFIA, and data storage or transfer
    formats may change slightly as knowledge of
    calibration and reductions improves, all modules
    related to data reduction and calibration shall
    be archived and downloadable from the SOFIA DCS
    Web site.

30
Pipeline Software Archive
  • Reduced intermediate and calibrated data products
    which are the result of pipeline data reduction
    software shall contain FITS keywords that record
    the pipeline version that produced them.
  • The Web interface for the Software Archive shall
    indicate which versions of the pipeline software
    produced each AOR on a given date. There shall
    also be links to documentation of each data
    reduction software module.
  • NOTE Flight Planning software and Proposal
    Preparation software are not included in the
    Software Archive because they are not directly
    related to the science data archive itself.

31
Documentation Library
NOTE There is currently no centralized
Documentation Library that satisfies the needs of
all aspects of the DCS and the SOFIA project.
Although there is a clear need for the Archive to
have strong ties to the Documentation Library and
SOFIA Bibliography, it will not formally be
considered part of the SOFIA archive, which
concentrates on the science data. These
requirements which are related to the Archive are
included here for completeness, and they should
be considered in the design of the SOFIA
Documentation Library.
The Documentation Library shall
  • contain Users Manuals for the Facility
    Instruments, with version control.
  • contain data reduction and pipeline software
    descriptions and manuals, with version control.
  • contain the Observer's Guide to Aircraft
    Procedures, current version.
  • contain the Flight Planning Software Manual,
    current version.
  • contain the Calls for Proposals, both for
    observing and instrument development, with
    version control.
  • maintain a SOFIA Bibliography to support the
    project in tracking the productivity of each
    observing program. Its contents shall be
    cross-referenced to the Project Status
    information for each proposal.
  • be located at the SOFIA Science Center and
    closely linked to the SOFIA Web site and the data
    archives.

32
Implementation
  • Software Work Products
  • Data Inventory Generator
  • Facility Science Data Capture Tool
  • Housekeeping Data Capture Tool
  • Ancillary Data Capture Tool
  • Summary Generator
  • Header Consolidation
  • Link Generator
  • Validation of Required Files Present
  • Populates the Summary Archive

33
Implementation
  • Software Work Products (continued)
  • Archive Management Tool
  • Archive Integrity Checking Tool
  • Backup Tools
  • DBMS mangement GUI interface
  • Expert (Internal) Pipeline Interface
  • Pipeline Evocation Module
  • Query Tools
  • Web based Query interface
  • Interface to Commercial DBMS system
  • Report Generation Modules
  • Query Logging

34
Implementation
  • Protocol Documents
  • Format Documents
  • Facility Instruments Science Data Format
    Document
  • Flight Log Format Document
  • Observer Log Format Document
  • Housekeeping Data Format Document
  • Archive Directory Structure Document
  • Design Documents
  • Conceptual Archive Design Document
  • Archive Implementation Design Document

35
Implementation
  • Archive Test Results
  • Archive Testing Plan
  • Archive Testing Reports
  • Archive Performance Verification Reports
Write a Comment
User Comments (0)
About PowerShow.com