Title: Data Access Layer Servers
1US National Virtual Observatory
Data Access Layer Servers NVO Summer School,
Aspen Sept. 2006
Doug Tody (NRAO)
2Data Access Layer (DAL) Services
- Goals
- Understand what the DAL services are
- and what is involved to implement them
- Agenda
- Review current and planned DAL services
- Introduce options/issues faced in implementing
the DAL services
3Current and Planned DAL Services
- Dataset Generic dataset, complex data
aggregates and associations (proposed) - Cone (SCS) Catalog data (released)
- SIAP V1.0 Image data (released)
- SSAP 1D Spectra (near PR 2nd gen DAL
prototype) - SLAP Spectral line lists (near PR)
- STAP Table/Catalog access (proposed)
- SSAP followon Spectral Energy Distributions
(SEDs) - SSAP followon Time series
- SIAP V2.0 Major upgrade - cube data etc.
- SNAP Numerical Models / Theory data
4Major elements of a DAL service
- Discovery query (queryData)
- Discover data matching query
- Access metadata ("headers") for candidate
datasets - Negotiate contract for virtual data generation
- This is a web/database type operation
- Data access (getData acref URL)
- Retrieve selected datasets (URL-based)
- May be archival data, or virtual data computed on
the fly - In general dataset may be computed, like a CGI
web page - This is numerical/scientific computing
type operation - Interface
- RESTful only parameter based currently available
- Syntax-based query (ADQL/SQL) will be added as
option - SOAP will be added but RESTful interface will be
retained
5Simple Cone Search
- Summary
- Simplest possible access to astronomical catalogs
- By far the most widely implemented VO data
service - Prototypical DAL service
- Query Parameters
- RA, DEC Position on the sky (J2000,
DDEG) - SR Search radius (DDEG)
- VERB Verbosity (levels 1-3, optional)
- Query Response
- VOTable UCDs describe columns
6Simple Image Access (SIA V1.0)
- Summary
- Uniform access to 2 dimensional images
- Basically 2-D, but data model and interface are
more general - Same service profile as Cone, but adds getData
- The query is now used for data discovery instead
of data access as for Cone data access is a
separate operation - Prototype for 2nd generation DAL interfaces
- Data models, multiple output formats,virtual
data generation, etc.
7SIA Concepts
- Types of Services
- Atlas Precomputed survey image (entire image)
- Pointed Image from pointed observation (entire
image) - Cutout Cutout existing image (pixels unchanged)
- Mosaic Reprojected image (pixels resampled)
- Virtual Data
- Data model mediation
- Subsetting, filtering, transformation, etc. on
the fly - Possible to view same data in different ways
- SIA data model is the familiar "astronomical
image" - Generally this means a 2D sky projection, but
cubes too - Data array is logically a regular grid of pixels
- Encoded as a FITS image, GIF/JPEG, etc.
8SIA Input Parameters
- Required parameters
- POS center of ROI (ra, dec decimal degrees ICRS)
- SIZE width or width, height
- FORMAT ALL, GRAPHIC, image/fits, image/jpeg,
text/html, FORMATmetadata returns service
metadata - Optional parameters
- INTERSECT values covers, enclosed, center,
overlaps - VERB table verbosity
- Service-defined parameters
- used to further refine queries, but not yet
standardized - e.g., BAND, SURVEY, etc.
- Image generation parameters
- NAXIS, CFRAME, EQUINOX, CRPIX, CRVAL, CDELT,
ROTANG, PROJ - used for cutout/mosaic services to specify image
to be generated
9SIA Query Response
- Output is a VOTable
- Must contain a RESOURCE element with
tag"results", containing the results of the
query. - The results resource contains a single table
- Each row of the table describes a single data
object which can be retrieved. - The fields of the table describe the attributes
of the dataset - These are the attributes of the SIA data model
- In SIA 1.0, the UCD is used to identify the data
model attribute - e.g., POS_EQ_RA_MAIN, VOXImage_Scale, etc.
10SIA Query Response
- Image metadata
- Describes the image object (required)
- Coordinate system metadata
- Image WCS
- Spectral bandpass metadata
- Prototype data model describing spectral bandpass
of image - Processing metadata
- Tells whether the service modified the image
data - Access metadata
- Tells client how to access the dataset
(required) - Resource-specific metadata
- Additional optional service-defined metadata
describing image
11SIA Image Metadata (UCDs)
- VOXImage_Title Brief description of image
- POS_EQ_RA_MAIN Ra (ICRS)
- POS_EQ_DEC_MAIN Dec (ICRS)
- INST_ID Instrument name
- VOXImage_MJDateObs MJD of observation
- VOXImage_Naxes Number of image axes
- VOXImage_Naxis Length of each axis
- VOXImage_Scale Image scale, deg/pix
- VOXImage_Format Image file format
12(No Transcript)
13Image Retrieval
- Retrieval is optional
- Typically only a fraction of the available images
are retrieved - Based on query response
- If an access reference is provided, the data can
be retrieved - SIAP can also be used to describe data which is
not online - The same data may be available in multiple
formats - Image retrieval
- Very simple access reference is a URL
- Standard tools can be used to fetch the data
- (browser, wget, curl, i/o library, etc.)
- Data is often computed on-the-fly
- All retrieval is synchronous (currently)
- No provision for restricting access (currently)
14Simple Spectral Access (SSA)
- Summary
- Uniform access to 1-D spectra
- Can also handle spectral aggregates via
association - Support for SEDs and time series will be added
- First of the 2nd generation DAL interfaces
- Basic approach does not change (queryData,
getData) - Query interface and metadata are generalized
- SIA upgrade (etc.) will share the same basic
interface - Includes a standard data model for spectral
datasets - Needed, as there is no standard way to represent
spectra - Standard serializations are defined (VOTable,
FITS, etc.) - Returned data is typically generated on the fly
- External stored spectra may be in any form
15SSA Interface Overview
- Service Operations
- queryData Discovery query
- (getData) URL-based currently, as
for SIA - (stageData) Reserved used to
asynchronously stage data - getCapabilities Query service metadata and
capabilities - Complexity
- Basic usage is quite simple
- queryData examine VOTable
- fetch data by access reference URL
- Basic Spectrum object
- general metadata ("header")
- spectral coordinate vector
- flux vector
- optional error vector
- Formats
- VOTable, FITS, XML, etc. user or service choice
16SSA Query Interface
- Mandatory query parameters
- POS X, Y, FRAME (ICRS)
- SIZE diameter (decimal degrees)
- BAND spectral region (1-2 num or name)
- TIME date1/date2 (ISO8601)
- FORMAT VOTable, FITS, XML, text, graphics,
html, native
17SSA Query Interface
- Optional query parameters
- specres minimum spectral resolution (L/dL)
- spatres minimum spatial resolution (DDEG)
- timeres minimum time resolution (seconds)
- SNR minimum SNR
- redshift redshift interval (1-2 decimal values)
- targetname target name, e.g., "mars"
- targetclass target class, e.g., star, QSO, AGN,
etc.
18SSA Query Interface
- Optional query parameters
- pubDID publisherID string
- creatorDID creatorID string
- collection collection ID (shortName,
minimum match) - top max top-ranked entries to be
returned - token continuation token for multipage
querys - maxrec maximum records in query
response - mtime create/modify time in given
range (ISO8601) - runid passed on to any other services
- compress enable compression
19SSA Query Response
- Classes of Query Metadata
- Query Describes the query itself
- Association Logical associations
(aggregation) - Access Access metadata for data
retrieval - Dataset General dataset metadata (type
etc.) - DataID Dataset identification - what is
it - Curation How data is published and made
available - Target Astronomical target observed, if
any - Derived Derived quantities (SNR,
redshift, etc.) - Char.Coverage Coverage of spatial, spectral,
time axes - Char.Accuracy Calibration, resolution,
sampling, errors - CoordSys Coordinate system reference
frames (STC)
20SSA Query Response
- Query Metadata
- Query.Score Degree of match to query params
- Query.Token Step through large query
response - Association Metadata
- Association.Type Type of association
- Association.ID Instance ID linking
associated records - Association.Key Unique key identifying each
member - Access Metadata
- Access.Reference URL of data product to be
retrieved - Access.ServiceDID DataID of virtual data
product - Access.Format MIME type of dataset
- Access.Size approximate dataset size
(bytes)
21SSA Query Response
- DataID - Dataset Identification Metadata
- DataID.Title One-line description of
dataset (String) - DataID.Collection Collection name
(shortName) - DataID.Creator Creator of dataset
(String) - DataID.CreatorID Identifier for VO Creator
(URI) - DataID.CreatorDID Dataset ID assigned by
creator (URI) - DataID.CreatorLogo URL for Creator logo (URI)
- DataID.Contributor Contributor (may be
multiple instances) - DataID.Date Date last modified (ISO
Date string) - DataID.Version Version of dataset
instance (String) - DataID.Instrument Instrument description
(String) - DataID.Bandpass Spectral bandpass, e.g.,
filter (String) - DataID.DataSource Original source of data
(String) - DataID.CreationType How was dataset created
(String)
22Some SSA Concepts
- DataSource
- survey, pointed, theory, artificial
- CreationType
- native, archival, cutout, filtered, mosaic,
projection, spectral extraction, catalog
extraction, etc. - Provenance
- Where did this data come from?
- especially important for virtual data generated
by service - DataID (Collection, CreatorDID, etc.) refers to
original data - Curation (PublisherDID etc.) refer to data from
service - CreationType indicates how the data was derived
23Some SSA Concepts
- Associations
- Use association metadata to link related records
(datasets) - An association is a complex dataset
- Data Models
- Data models formalize the content of data or
metadata - Container/component architecture
- Component data models aggregated in a container
and associated logically (similar to a relational
database) - Dataset, Spectrum, Characterization, STC, etc.
- Characterization
- Physically characterize the data
- Spatial, spectral, and temporal axes
- Coverage, sampling, resolution, accuracy
- Applies to any dataset (not specific to spectra)
24(No Transcript)
25SIA Upgrade Preview (SIA V2.0)
- Main objectives
- Upgrade metadata, query interface as for SSA
- standard generic dataset metadata
- more powerful query interface
- more comprehensive output metadat
- Precision image data access enhancements
- e.g., cube data, image slicing, projection,
filtering - (TBD whether this is folded into basic SIA or
done as a separate service class) - Advanced service capabilities
- versioning, metadata query
- asynchronous data staging, authentication,
VOStore integration
26Cube Data
- Overview
- Motivated primarily by radio data surveys (CGPS,
Arecibo) - Many O/IR integral field unit (IFU) instruments
coming online as well - Challenge datasets can be both large and
complex - Large datasets
- Current data cubes are several hundred MB up to
several GB - Future wide-field wide-band 2048x2048x8192x4
128 GB - With polarization, multiple bands, could have 1/2
TB datasets! - Complex datasets
- e.g., CGPS HI cube, CO cube, continuum, IQUV,
IRAS same field - Multiple ways to view the same data
- Multi-band surveys are a simpler example of this
trend - Use-Cases for recent study
- CGPS, SGPS, GALFA (Arecibo), SINFONI (ESO IFU)
27(No Transcript)
28Cube Data
- Data access considerations
- Network download of large cubes can be
impractical - VO-style virtual data access to remote data is
required - subsetting, filtering (spectral or time regions),
transformations (projections, spectrum
extraction) - Strategy iteratively download data subset,
visualize locally - Typical access modes
- Whole image
- Spectrum extraction
- Cutout 2D planes
- Cutout 3D sub-cube (permits local full 3D
analysis) - 2D projection along one axis
- 3D projection (general 3D transformation)
- 2D slice through 3D cube at arbitrary 3D
pos,orientation
29Cube Data
- Typical access scenario
- Discovery query to discover data, get access
metadata - Access query to set up virtual data access (WCS
based) - Data access, dynamically generating virtual data
- Repeat for a different region or view
- Example Compute 2D projection with spectral
filtering - View 2D preview or projection, e.g., continuum
- Extract 1D spectra in sky regions (SSA with
synthetic aperture) - Analyze sky spectrum to determine night sky lines
(SLAP) - Compute 2D projection of cube excluding sky
emission, absorption - Other examples
- Extract 3D sub-cube for full 3D analysis locally
- 2D slice at arbitrary position and orientation
30Cube Examples
- Extract 2-D plane from cube, same orientation
- queryData
- PubDIDltdesired cube datasetgt
- POSltcenter of 2-D planegt
- SIZEltspatial extent of 2-D planegt
- (cutout of smaller region also possible here)
- BANDltspectral-coord of desired planegt
- NAXES2
- FORMATFITS
31Cube Examples
- 2-D Projection with spectral filtering
- queryData
- PubDIDltdesired cube datasetgt
- POSltcenter of 2-D planegt
- SIZEltspatial extent of 2-D planegt
- (cutout of smaller region also possible here)
- BANDltrange-list of good spectral regionsgt
- NAXES2
- FORMATFITS
- (in SINFONI case original cube is in Euro-3D
format)
32Cube Examples
- Extract 3-D Sub-Cube
- queryData
- PubDIDltdesired cube datasetgt
- POSltspatial center of regiongt
- SIZEltspatial extent of sub-cubegt
- BAND3.45E-7/8.76E-6
- NAXES3
- FORMATFITS
33Implementing DAL Services
- Overall Process
- Determine what subclass of service to implement
- do we return whole files, cutouts, extract
spectra, etc.? - Select service technology
- Java, dotNet/Mono, Ruby, etc.
- Implement
- Reference code or a template would be useful here
- Test
- Service verification tools
- Register
- As soon as you do this you are online!
34Cone Search
- queryData operation
- SQL select operation on a RDBMS
- Transform output into VOTable format
- a VOTable package can be useful here
- Issues
- May need to assign UCDs to your catalog fields
35Simple Image Access
- queryData operation
- Select operation on a RDBMS
- Compute SIA query response metadata
- Transform output into VOTable format
- Issues
- Computing the SIA query response metadata can be
nontrivial - e.g., for a cutout or mosaic
- don't forget you should return WCS information
- Metadata generation
- This is much easier if image metadata is cached
in DBMS - For virtual data must compose access reference
command
36Simple Image Access (contd)
- getData operation
- Atlas, Pointed
- only input is an access URL pointing to the file
- return FITS file
- Cutout, Mosaic
- access URL is the command which generates the
virtual data - may require significant, complex computation!
- getCapabilities
- For SIA V1.0 this is FORMATmetadata
- Tells client service capabilities and any
optional parameters
37Implementing DAL Services
- Web Service Frameworks
- LAMP - Linux, Apache, MySQL, Python/Perl/PHP etc.
- Apache Web server, Tomcat, Java servlets
- dotNET/Mono
- Microsoft approach SQL server, C
- Ruby on Rails
- Trendy new alternative
- Virtual Data Generation
- Backend may require significant computation
- Re-use some science package (IRAF, IDL, AIPS,
CASA, etc.) - Or at least CFITSIO, WCSTOOLS, and other
libraries
38(No Transcript)