Title: Archiving
1Archiving
- Requirements for the SOFIA Data Cycle System
- Mark Morris, UCLA
- Joe Mazzarella Steve Lord, IPAC
- John Milburn Jochen Horn, UCLA
2SOFIA Data Archives
- Purposes
- Maximize the scientific productivity of SOFIA
- Provide public information about existing data
- Provide data backups
- Verification of data products
- Informs related science projects
- Archival research
- e.g., context of the Astrophysics Data Program
- Supplement to other funded or unfunded research.
- Motivates publication
3SOFIA Data Archives
- SOFIA observations to be archived in three forms
- SUMMARY ARCHIVE
- data headers and logs
- WORKING ARCHIVE
- raw data from all instruments
- PUBLIC ARCHIVE
- reduced data from facility instruments
4SOFIA Databases
5SOFIA Data Archives
- SUMMARY ARCHIVE
- on-line, equipped with search tool
- maintained at the SSMOC
- headers of each observation, giving
- source names, positions
- instrument its parameter settings, integration
times - Important environmental aircraft parameters.
- links to the flight and observing logs
- identities of P.I. observer (if different)
- includes proposal abstract
6SOFIA Data Archives
- LOGS
- Highly automated
- Can be annotated
- Flight log
- Details of observatory functions, flight
parameters - Observing log
- Fundamental set of observing parameters (e.g.,
observer ID, source ID, position, instrument
mode, frequency, filters, bandwidth, start stop
times, integration times, chop/nod configuration,
water vapor index, etc.) - Optional set of custom parameters
- Includes wrap-up commentary (exit interview)
7SOFIA Data Archives
- WORKING ARCHIVE
- purposes
- Fundamental repository of all untreated SOFIA
data - Backup
- Resource for archival research
- maintained by SSMOC staff
- includes
- Contents of summary archive
- Environmental and housekeeping data
- Raw science data from all instruments
- made available upon request by qualified
individuals with a web-based request form on a
SOFIA archive page having links to the data
reduction tools. Access requests subject to
approval by a person with designated authority. - access subject to validation period for all but
the proposing PI.
8SOFIA Data Archives
- PUBLIC ARCHIVE
- created from Working Archive data from (at
least) the facility instruments which have been
carried through a standard data reduction
pipeline - fully accessible on the web, following
validation period - maintained at the SSMOC, mirrored at IPAC
- Consistent in form and function with other
mission archives embedded within IRSAat IPAC. - accompanying tools to examine and extract
quantitative information from, the archived
images and spectra. Where feasible, existing
IRSA tools will be adapted to this end.
9SOFIA Data Archives
- METADATA ARCHIVES recognizing evolution of both
software and instrumentation - Pipeline components - version tracking
- Pipelines
- Documentation
- manuals
- tutorials
10Assumptions (1)
The SOFIA Archive Requirements and Design are
being developed with the following assumptions
- The primary use is by scientists using the Web.
- All components of the archive will reside at NASA
Ames, with "mirrors" of the Public Archive placed
at a remote data centers such as IPAC. - SOFIA Facility Instruments will support General
Investigators (GIs) using the concept of
Astronomical Observation Templates (AOTs) and
Astronomical Observation Requests (AORs). - The Public Archive will support a well defined
set of FI AOTs, each of which will be reduced by
software module pipelines delivered to the SOFIA
Data Cycle System (DCS).
11Assumptions (2)
- PI Instrument data will be supported at the
Working Archive level. - The Archive will consist of science, calibration,
and laboratory test data from the Facility
Instruments, plus SSMOC Housekeeping data. - SOFIA archive data are for public use after a
reasonable validation period for proper
reduction, calibration, and science validation by
observing teams with support from the SSMOC. - The requirements are aimed at the SSMOC, the
Facility Instrument teams, and the DCS software
developers for the archive system. - Observations will be tracked through their
complete lifecycle from the AOR through the raw,
reduced, and final calibrated science data
products using a unique Observation
Identification number (OBSID).
12Archive Interactions with DCS Components.
13High Level Archive Requirements
- The archive shall simplify use and reuse of SOFIA
data during reduction, analysis, interpretation
and publication. - The archive shall enable the DCS to store and
retrieve uniform data products. - The archive will adhere to existing (FITS) and
emerging (XML) standards for data storage and
interchange between software modules. - The archive shall support continuous improvement
of data reduction pipelines and improvements in
calibration procedures. - The archive shall support online data access for
humans (Web interfaces) and remote software
clients (e.g., via XML-based "server mode") from
other astronomical data centers. - The archive shall provide services for archival
research, including search tools and quantitative
measurement tools.
14Functional Requirements (1)
The Archive software shall support efficient and
reliable data insertion functions and procedures.
- Functions shall be provided to insert raw data
from all instruments into the Working Archive and
update a registry (index) of data files. This
shall be done routinely after each flight. - Insertion of intermediate and reduced data files
resulting from pipeline processing will be
handled by pipeline modules, these modules
shall adhere to the file and directory naming
conventions outlined in the Directory and File
Naming Conventions. - Functions shall be provided to insert into the
proper Archive level (Working, Summary, Public)
FITS tables, catalogs, or text files, including
but not limited to - Calibration sources
- Source lists (targets) for observing programs
- Observing Logs
- Flight Plans
15Functional Requirements (2)
Data for Facility and PI instruments shall be
stored and maintained in the Working, Summary and
Public Archive levels as follows
16Functional Requirements (3)
Data in the Working, Summary and Public Archive
levels shall be publicly available online through
a Web interface for the different instrument
types as follows
17Functional Requirements (4)
- Functions shall be provided to verify the
integrity and validity of the data products. - Functions shall be provided to copy and track
(version control) validated data products from
the Working Archive into the Public Archive. - The Archive software shall provide functions to
extract metadata to populate the Summary Archive. - The software shall extract header records from
FITS data in the Working Archive and insert
metadata into DBMS tables to support queries of
the Summary Archive. - The software shall convert (or "wrap") FITS to
provide an API to the emerging Astronomical XML
(AML) format for data and summary (metadata)
interchange with other data and information
systems, for example IRSA, STScI, HEASARC, NED,
and others. - The software shall automatically create links
between data products, calibration files, and
documentation as described in the Summary Archive
Contents Requirements - Functions shall be provided to cross-reference
flight video and audio recordings with the
Working Archive and other relevant FITS data
products.
18Functional Requirements (5)
Queries
- The Archive software shall support queries for
data sets meeting selection criteria meaningful
to astronomers. - Queries shall allow location of raw data products
in the Working Archive. - Queries shall allow location of reduced and
calibrated data in the Public Archive. - After searching based on query constraints as
described above, the user shall have the ability
to select one or more returned data set
"handles", which are based on well-documented
Observation ID (OBSID) numbers, to download the
data immediately to his or her local computer via
HTTP or FTP. - A Web query form shall be provided which allows
users to input a known Observation ID (OBSID)
number to directly return the data products and
optionally a subset of its associated
Housekeeping Data and Documentation.
19Functional Requirements (6)
- The Archive software shall support queries
involving astronomical positions in standard
coordinate systems. - The Archive software shall recognize queries on
astronomical sky regions using cone searches and
ranges expressed in standard coordinate systems. - The Archive software shall support queries on
- astronomical object names.
- SOFIA instrument names.
- AOT names and AOT parameters such as instrumental
passbands, filter names, etc. - Wavelength ranges using standard astronomical
conventions. - Time intervals
- SOFIA Observation Identifiers (OBSIDs)
- Observer names (PIs, Co-Is)
20Functional Requirements (7)
Document Tracking
- The Archive shall support tracking of data
products. - The Archive shall support tracking of data
reduction software modules and pipeline
sequences. - The Archive shall support registration and
tracking of documentation.
21Functional Requirements (8)
The DCS User Interface shall shall support modes
of interaction with human users and software
components
- Command-line user interfaces to each component.
- Standard Uniform Resource Locators (URLs)
accessible through Web-based forms and remote
client software - A "server-mode" for use by client software within
the DCS and from remote sites. - Graphical user interface (GUI) "widgets" for
access to the archive integrated into the SOFIA
Observation Planning and Flight Planning tools. - Results from archive queries shall be returned in
well defined and clearly documented data
structures. Ideally these data structures will be
in a self-documenting, object-oriented format
using XML.
22Data Content Requirements - Summary Archive
The Summary Archive shall
- store observation FITS header keywords and values
extracted from the data products in a format that
efficiently supports user queries. - contain Project Status information.
- contain links to abstracts of Observing
Proposals. - contain PI Observing Run Abstracts Detailed
Observing Logs. - contain links to the executed Flight Plans.
- contain Flight Director Logs.
- contain links to the Working and Public Archives,
Pipeline Software Archive, Documentation Library,
and Bibliography.
23Data Content Requirements - Working Archive (1)
The Working Archive shall
- store raw data (science calibration) acquired
from all SOFIA instruments. The raw data and
related Housekeeping data shall be deposited into
the Working Archive immediately after a
successful SOFIA flight, ideally within a few
hours after landing. - serve as the primary data repository. Data
reduction pipelines will read raw data and write
intermediate data produced by the Standard Data
Product pipelines into the Working Archive. - serve as a data backup for General Investigators.
- be housed at the SSMOC and made available to
eligible PIs and CoIs as soon as it enters the
archive, and to the public after the requisite
validation period. The Working Archive will be
available online, but Working datasets will be
transferred onto a Web-accessible (FTP) area with
password protection.
24Data Content Requirements - Working Archive (2)
The Working Archive shall
- track the processing history of science data
products and instrument calibration files,
notably for intermediate and reduced data
products which are preliminary or unvalidated,
and thus not yet copied to the Public Archive. - contain Housekeeping data pertaining to the state
or status of the instruments, the aircraft, the
telescope, and observing conditions (environment)
while observations were made and data were
collected. - contain FITS data files of Housekeeping
instrument calibration data stored either as
header keywords and values, or pointers to more
extensive data in auxiliary files which are
required for data reduction and calibration by
the pipelines. - serve as a resource for archival research,
especially for people who wish to develop
improvements to the data reduction algorithms to
push the limits of the observations to make new
scientific discoveries or improvements to
previous interpretations.
25Data Content Requirements - Working Archive (3)
Data in the Working Archive shall be linked to
other Archive components
- Summary Archive Metadata
- Actual Flight Plans
- Project Status
- Flight Logs
- appropriate versions of Pipeline Data Reduction
Software Archive and supporting documentation. - Reduced data in the Public Archive
- Documentation Library
- Video and audio recordings
26Data Content Requirements - Public Archive (1)
- The SOFIA Facility Instruments will each have a
Standard Pipeline that will produce reduced,
calibrated images, photometric measurements, or
spectra for standard modes, or AOTs. Data
products resulting from filled-in AOTs, which are
called Astronomical Observation Requests (AORs)
comprise the Public Archive. - The Public Archive shall be accessible by GIs and
the general public through Web-based query and
request forms. - The Public Archive shall serve network-based
requests for data from remote archive system
software - The Public Archive data shall be mirrored at the
Infrared Science Archive (IRSA) at IPAC, where
interfaces and query engines will be developed
and maintained in coordination with similar
software used to support community access for
data from NASA's other infrared missions.
27Data Content Requirements - Public Archive (2)
Data in the Public Archive shall be linked to
other Archive components
- Summary Archive Metadata
- Actual Flight Plans
- Project Status
- Flight Logs
- Pipeline Data Reduction Software Archive
- the user interfaces for access to the raw data
and housekeeping data in the Working Archive, and
the Documentation Library
28Data Format and Transport Standards
- The SOFIA instruments shall produce files in FITS
format as their primary raw data products. These
will be transferred to the Archive team at the
SSMOC and comprise the bulk of the Working
Archive. - The DCS shall support archiving of FITS images
and spectra using the Binary Table (BINTABLE)
Extension Standard - SOFIA data shall follow a standard Dictionary for
FITS Keyword Types. - Both FITS and XML formats will be supported for
data interchange. - The "Observation Sequence Numbers" (OSNs) in a
flight will be cross-referenced to the OBSID
(Observation Identification) numbers in each PI's
observing program using XML documents and/or
database tables maintained in the Archive.
29Pipeline Software Archive
- Pipeline software a well-defined, documented,
automated, scientifically validated, ordered
sequence of data reduction module operations
designed for a specific set of AOT's supported by
the SOFIA DCS. - The data reduction modules shall be delivered to
the DCS by the Facility Instrument Teams, along
with the validated pipelines that support the
chosen AOTs. The general pipeline architecture,
maintenance and version control will subsequently
be SSMOC and DCS responsibilities, initially in
close collaboration with the instrument teams - An official pipeline version is associated with
an approved scientific validation procedure
defined by the SOFIA Science Center. Since data
reduction software will evolve during the
lifecycle of SOFIA, and data storage or transfer
formats may change slightly as knowledge of
calibration and reductions improves, all modules
related to data reduction and calibration shall
be archived and downloadable from the SOFIA DCS
Web site.
30Pipeline Software Archive
- Reduced intermediate and calibrated data products
which are the result of pipeline data reduction
software shall contain FITS keywords that record
the pipeline version that produced them. - The Web interface for the Software Archive shall
indicate which versions of the pipeline software
produced each AOR on a given date. There shall
also be links to documentation of each data
reduction software module. - NOTE Flight Planning software and Proposal
Preparation software are not included in the
Software Archive because they are not directly
related to the science data archive itself.
31Documentation Library
NOTE There is currently no centralized
Documentation Library that satisfies the needs of
all aspects of the DCS and the SOFIA project.
Although there is a clear need for the Archive to
have strong ties to the Documentation Library and
SOFIA Bibliography, it will not formally be
considered part of the SOFIA archive, which
concentrates on the science data. These
requirements which are related to the Archive are
included here for completeness, and they should
be considered in the design of the SOFIA
Documentation Library.
The Documentation Library shall
- contain Users Manuals for the Facility
Instruments, with version control. - contain data reduction and pipeline software
descriptions and manuals, with version control. - contain the Observer's Guide to Aircraft
Procedures, current version. - contain the Flight Planning Software Manual,
current version. - contain the Calls for Proposals, both for
observing and instrument development, with
version control. - maintain a SOFIA Bibliography to support the
project in tracking the productivity of each
observing program. Its contents shall be
cross-referenced to the Project Status
information for each proposal. - be located at the SOFIA Science Center and
closely linked to the SOFIA Web site and the data
archives.
32Implementation
- Software Work Products
- Data Inventory Generator
- Facility Science Data Capture Tool
- Housekeeping Data Capture Tool
- Ancillary Data Capture Tool
- Summary Generator
- Header Consolidation
- Link Generator
- Validation of Required Files Present
- Populates the Summary Archive
33Implementation
- Software Work Products (continued)
- Archive Management Tool
- Archive Integrity Checking Tool
- Backup Tools
- DBMS mangement GUI interface
- Expert (Internal) Pipeline Interface
- Pipeline Evocation Module
- Query Tools
- Web based Query interface
- Interface to Commercial DBMS system
- Report Generation Modules
- Query Logging
34Implementation
- Protocol Documents
- Format Documents
- Facility Instruments Science Data Format
Document - Flight Log Format Document
- Observer Log Format Document
- Housekeeping Data Format Document
- Archive Directory Structure Document
- Design Documents
- Conceptual Archive Design Document
- Archive Implementation Design Document
35Implementation
- Archive Test Results
- Archive Testing Plan
- Archive Testing Reports
- Archive Performance Verification Reports