Stephen Gwyn - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Stephen Gwyn

Description:

Aggregating Metadata from Multiple Archives: a Non-VO Approach CADC Stephen Gwyn Canadian Astronomy Data Centre Stephen Gwyn Canadian Astronomy Data Centre – PowerPoint PPT presentation

Number of Views:263
Avg rating:3.0/5.0
Slides: 29
Provided by: adas153
Category:

less

Transcript and Presenter's Notes

Title: Stephen Gwyn


1
Aggregating Metadata from Multiple Archives a
Non-VO Approach
CADC
Stephen Gwyn Canadian Astronomy Data Centre
Stephen Gwyn Canadian Astronomy Data Centre
2
- Astronomy is using more and more archival data
- More than 50 of HST papers are archival
- Similar trends for other telescopes - Harder
for solar system astronomy
SSOIS Solar System Object Image Search allows
users to search for images of moving targets
Stephen Gwyn Canadian Astronomy Data Centre
3
SSOIS Solar System Object Image Search allows
users to search for images of moving targets
Stephen Gwyn Canadian Astronomy Data Centre
4
SSOIS Solar System Object Image Search allows
users to search for images of moving targets
Stephen Gwyn Canadian Astronomy Data Centre
5
Initally, only data from CFHT/MegaCam was searched
Stephen Gwyn Canadian Astronomy Data Centre
6
Next added data from external telescope archives
Stephen Gwyn Canadian Astronomy Data Centre
7
Next added data from external telescope archives
CADC
Stephen Gwyn Canadian Astronomy Data Centre
8
Scraping external archives
For each image, we need - position (RA,Dec) -
Field of view - MJD of mid-exposure - filter -
exposure time - target name - URL to data
Stephen Gwyn Canadian Astronomy Data Centre
9
Scraping external archives
For each image, we need - position (RA,Dec) -
Field of view - MJD of mid-exposure - filter -
exposure time - target name - URL to data
Stephen Gwyn Canadian Astronomy Data Centre
10
There are a variety of data archive interfaces....
11
Scraping external archives
- In an ideal world one query to get all
metadata - In real life row limits - As the
archives are updated, they need to be re-scraped
periodically - Programmatic retrieval is required
Stephen Gwyn Canadian Astronomy Data Centre
12
Use SIAP?
Advantages - A single tool can scrape multiple
archives Disadvantages - Not all archives
have an SIAP interface - Many SIAP services do
not conform to the VO standard - Not all SIAP
services contain all the necessary metadata -
Most archives have at least 1 heavily observed
patch of sky hit the row limit again -
SIAP services vary in ability for positional
queries - maximum search area - search
is circle or box - may require 105 queries
may be perceived as DOS attack Far better off
scraping by day/night/MJD - Almost all
telescopes take lt10000 observations per 24
hours - Can re-scrape with fewer queries
Stephen Gwyn Canadian Astronomy Data Centre
13
Scraping by RA/Dec
Stephen Gwyn Canadian Astronomy Data Centre
14
Scraping by Date
Stephen Gwyn Canadian Astronomy Data Centre
15
Older archive interfaces - Query page simple
CGI result page - view source on the query page -
get form inputs - issue repeated queries to CGI
result page using GET or POST with
wget/curl/scripting API - Easy http//astronomyda
ta.edu/query?ra12.87dec13.52mjd57323
Stephen Gwyn Canadian Astronomy Data Centre
16
Newer archive interfaces - AJAX/HTML5/etc page
- Download Javascript and run through
de-obfuscator - locate relevant XMLHttpRequest -
determine if cookies are necessary - issue
repeated queries to XMLHttpRequest URLs - Much
harder
Stephen Gwyn Canadian Astronomy Data Centre
17
Easiest of all... http//smoka.nao.ac.jp/status/ob
slog/SUP_2007.txt
Stephen Gwyn Canadian Astronomy Data Centre
18
A script to get all Subaru/SuprimeCam
metadata... !/bin/bash wget http//smoka.nao.ac.
jp/status/obslog/SUP_1999.txt wget
http//smoka.nao.ac.jp/status/obslog/SUP_2000.txt
wget http//smoka.nao.ac.jp/status/obslog/SUP_2001
.txt wget http//smoka.nao.ac.jp/status/obslog/SUP
_2002.txt wget http//smoka.nao.ac.jp/status/obslo
g/SUP_2003.txt wget http//smoka.nao.ac.jp/status/
obslog/SUP_2004.txt wget http//smoka.nao.ac.jp/st
atus/obslog/SUP_2005.txt wget http//smoka.nao.ac.
jp/status/obslog/SUP_2006.txt wget
http//smoka.nao.ac.jp/status/obslog/SUP_2007.txt
wget http//smoka.nao.ac.jp/status/obslog/SUP_2008
.txt wget http//smoka.nao.ac.jp/status/obslog/SUP
_2009.txt wget http//smoka.nao.ac.jp/status/obslo
g/SUP_2010.txt wget http//smoka.nao.ac.jp/status/
obslog/SUP_2011.txt wget http//smoka.nao.ac.jp/st
atus/obslog/SUP_2012.txt wget http//smoka.nao.ac.
jp/status/obslog/SUP_2013.txt wget
http//smoka.nao.ac.jp/status/obslog/SUP_2014.txt
Stephen Gwyn Canadian Astronomy Data Centre
19
The second easiest CADC's Advanced Search
Stephen Gwyn Canadian Astronomy Data Centre
20
The second easiest CADC's Advanced Search
Stephen Gwyn Canadian Astronomy Data Centre
21
The second easiest CADC's Advanced Search
Stephen Gwyn Canadian Astronomy Data Centre
22
The second easiest CADC's Advanced Search
http//www1.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/tap/s
ync?LANGADQLREQUESTdoQueryQUERYSELECT20Obser
vation.observationURI20AS2022Preview222C20Ob
servation.collection20AS2022Collection222C20
Observation.observationID20AS2022Obs.20ID222
C20COORD1(CENTROID(Plane.position_bounds))20AS2
022RA20(J2000.0)222C20COORD2(CENTROID(Plane.p
osition_bounds))20AS2022Dec.20(J2000.0)222C
20Plane.time_bounds_cval120AS2022Start20Date2
22C20Observation.instrument_name20AS2022Instr
ument222C20Plane.time_exposure20AS2022Int.2
0Time222C20Observation.target_name20AS2022Ta
rget20Name222C20Plane.energy_bandpassName20AS
2022Filter222C20Plane.calibrationLevel20AS2
022Cal.20Lev.222C20Observation.type20AS202
2Obs.20Type222C20Plane.energy_bounds_cval120A
S2022Min.20Wavelength222C20Plane.energy_boun
ds_cval220AS2022Max.20Wavelength222C20Obser
vation.proposal_id20AS2022Proposal20ID222C2
0Observation.proposal_pi20AS2022P.I.20Name22
2C20Plane.productID20AS2022Product20ID222C
20Plane.dataRelease20AS2022Data20Release222C
20AREA(Plane.position_bounds)20AS2022Field20o
f20View222C20Plane.position_sampleSize20AS20
22Pixel20Scale222C20Plane.dataProductType20A
S2022Data20Type222C20Plane.position_timeDepe
ndent20AS2022Moving20Target222C20Plane.prov
enance_name20AS2022Provenance20Name222C20Pl
ane.provenance_keywords20AS2022Provenance20Key
words222C20Observation.intent20AS2022Intent
222C20Observation.target_type20AS2022Target2
0Type222C20Observation.target_standard20AS20
22Target20Standard222C20Plane.metaRelease20AS
2022Meta20Release222C20Observation.sequenceN
umber20AS2022Sequence20Number222C20Observat
ion.algorithm_name20AS2022Algorithm20Name222
C20Observation.proposal_title20AS2022Proposal
20Title222C20Observation.proposal_keywords20AS
2022Proposal20Keywords222C20Observation.prop
osal_project20AS2022Proposal20Project222C20
Plane.position_bounds20AS2022Polygon222C20Pl
ane.energy_emBand20AS2022Band222C20Plane.pro
venance_reference20AS2022Prov.20Reference222
C20Plane.provenance_version20AS2022Prov.20Ver
sion222C20Plane.provenance_project20AS2022Pr
ov.20Project222C20Plane.provenance_producer20
AS2022Prov.20Producer222C20Plane.provenance_
runID20AS2022Prov.20Run20ID222C20Plane.pro
venance_lastExecuted20AS2022Prov.20Last20Exec
uted222C20Plane.provenance_inputs20AS2022Pro
v.20Inputs222C20Plane.energy_restwav20AS202
2Rest-frame20Spectral20Coverage222C20Plane.pl
aneID20AS2022planeID222C20isDownloadable(Pla
ne.planeURI)20AS2022DOWNLOADABLE222C20Plane.
planeURI20AS2022CAOM20Plane20URI222C20Obse
rvation.instrument_keywords20AS2022Instrument2
0Keywords222C20Plane.energy_transition_species
20AS2022Molecule222C20Plane.energy_transition
_transition20AS2022Transition222C20Plane.pos
ition_resolution20AS2022IQ2220FROM20caom2.Pl
ane20AS20Plane20JOIN20caom2.Observation20AS2
0Observation20ON20Plane.obsID203D20Observatio
n.obsID20WHERE2020(20Observation.instrument_na
me203D2027MegaPrime2720AND20Observation.col
lection203D2027CFHT2720)FORMATtsv
Stephen Gwyn Canadian Astronomy Data Centre
23
The second easiest CADC's Advanced Search
SELECT Observation.observationURI AS
"Preview", Observation.collection AS
"Collection", Observation.observationID AS
"Obs. ID", COORD1(CENTROID(Plane.position_bounds
)) AS "RA (J2000.0)", COORD2(CENTROID(Plane.posi
tion_bounds)) AS "Dec. (J2000.0)",
Plane.time_bounds_cval1 AS "Start Date",
Observation.instrument_name AS "Instrument",
Plane.time_exposure AS "Int. Time",
Observation.target_name AS "Target Name",
Plane.energy_bandpassName AS "Filter",
Plane.calibrationLevel AS "Cal. Lev.",
Observation.type AS "Obs. Type",
Plane.energy_bounds_cval1 AS "Min. Wavelength",
Plane.energy_bounds_cval2 AS "Max. Wavelength",
Observation.proposal_id AS "Proposal ID",
Observation.proposal_pi AS "P.I. Name",
Plane.productID AS "Product ID",
Plane.dataRelease AS "Data Release",
AREA(Plane.position_bounds) AS "Field of View",
Plane.position_sampleSize AS "Pixel Scale",
Plane.dataProductType AS "Data Type",
Plane.position_timeDependent AS "Moving
Target", Plane.provenance_name AS "Provenance
Name", Observation.intent AS "Intent",
Observation.target_type AS "Target Type",
Observation.target_standard AS "Target
Standard", Observation.sequenceNumber AS
"Sequence Number", Observation.algorithm_name
AS "Algorithm Name", Observation.proposal_title
AS "Proposal Title", Observation.proposal_keywor
ds AS "Proposal Keywords", Plane.energy_emBand
AS "Band", Plane.provenance_version AS "Prov.
Version", Plane.provenance_project AS "Prov.
Project", Plane.provenance_runID AS "Prov. Run
ID", Plane.provenance_lastExecuted AS "Prov.
Last Executed", Plane.energy_restwav AS
"Rest-frame Spectral Coverage",
isDownloadable(Plane.planeURI) AS
"DOWNLOADABLE", Plane.planeURI AS "CAOM Plane
URI", Observation.instrument_keywords AS
"Instrument Keywords", Plane.energy_transition_s
pecies AS "Molecule", Plane.energy_transition_tr
ansition AS "Transition", Plane.position_resolut
ion AS "IQ" FROM caom2.Plane AS Plane JOIN
caom2.Observation AS Observation ON Plane.obsID
Observation.obsID WHERE ( Observation.collection
'CFHT' )





Stephen Gwyn Canadian Astronomy Data Centre
24
The other hard part - Parsing downloaded
metadata - Which observations are images? -
Quality control - is MJD right? - Are
coordinates 2000.0 or 1950.0? - Sorting out
filters - remove narrow band filter data -
remove bad filters - remove grism data -
maybe homogenize filter names (B vs Bj vs
Bjohnson vs Johnson B vs ...) - Telescope
footprint not typically part of the metadata -
Work out links back to original images
Stephen Gwyn Canadian Astronomy Data Centre
25
SSOIS saves the Earth....
26
Summary - SSOIS allows multi-archive searches
for moving objects - Metadata is harvested from
external archives - Lessons learned - SIAP is
not useful for metadata harvesting - multiple
queries by time not by position - older
interfaces are easier to scrape - parsing
metadata often harder than retrieving it
Stephen Gwyn Canadian Astronomy Data Centre
27
Stephen Gwyn Canadian Astronomy Data Centre
28
Summary - SSOIS allows multi-archive searches
for moving objects - Metadata is harvested from
external archives - Lessons learned - SIAP is
not useful for metadata harvesting - multiple
queries by time not by position - older
interfaces are easier to scrape - parsing
metadata often harder than retrieving it
Stephen Gwyn Canadian Astronomy Data Centre
Write a Comment
User Comments (0)
About PowerShow.com