Title: Introduction to Sky Survey Problems
1Introduction to Sky Survey Problems
2Introduction to sky survey database problems
- Astronomical data
- Astronomical databases
- The Virtual Observatory concept status
- Large sky survey databases
- Spatial indexing in astronomical databases
- Case Study SDSS SkyServer
3Observational Astronomy
4Astronomical data in original form
- Optical
- Image array of pixel values
- X-ray
- Event list positions, arrival times, energies of
all detected photons - Radio
- Interferometric visibilities sparse Fourier
transform of a region of the sky - Very different types of data
5Astronomical data in final form
- Most research done using catalogue data
- i.e. tables of attributes of detected sources
mainly discrete sources (stars, galaxies, etc) - Data compression
- Catalogue - few of image data volume
- Amenable to representation in relational DB
- Natural indexing by location in sky
6Astronomical Databases
- Sky survey archives
- Homogeneous data, standard reduction pipeline
- Science Archive do science on DB
- Telescope archives
- Semi-indexed collections of raw data files from
all observations taken heterogeneous - Download data for reduction and analysis
- Specialist data centres collections of
catalogues - Bibliographic databases scans of major journals
7The Virtual Observatory
- Concept
- Interoperable federation of all the worlds
significant astronomical databases - Facilitate multi-wavelength astronomy
- Status
- Several projects underway AstroGrid in UK
- 5 years work to create a fully working VO
- The VO sets the context for the design of new sky
survey databases
8AstroGrid www.astrogrid.org
- Consortium
- Edinburgh, Leicester, Cambridge, RAL, MSSL,
Jodrell Bank, Queens Belfast - 3 year (4M) project
- 1 yr Phase A Study finished end of 2002
- 2 yr Phase B Implementation to end 2004
- Web (later Grid) service framework in Java
- Currently building web services, portals, etc -
researching OGSA and OGSA-DAI
9Large sky survey databases
- Major science driver for AstroGrid and VO
- New science mining multi-wavelength data
- Largest are optical/near-infrared sky surveys
- Largest of these hosted in Edinburgh
- current - SuperCOSMOS, SDSS (mirror)
- future - WFCAM, VISTA
- Each yield 1-10TB of catalogue data in RDBMS
10Spatial queries in astronomy
- Two important types
- Select entries (with predicate) in area of sky
- Match entries (esp. between two tables)
- Second is special case of first
- i.e. both boil down to point-within-distance-of-p
oint - but distances in two cases can be very different
- Advantage in using a hierarchical spatial
indexing scheme - Perform spatial query at appropriate granularity
11Spatial Indexingin Astronomy
- The Celestial Sphere
- Many coordinate systems
- Most common is the
- equatorial system, with
- Right Ascension and
- Declination as analogues
- of Longitude Latitude
12Spatial indexing in astronomical databases
- Basic DBMS indexes are 1-D e.g. B-trees
- Some DBMSs support general 2-D indexing
- Usually using R-trees (or variants) rectangles
astronomical experiments not too successful
Clive - Some DBMSs have native spatial indexing
- Little knowledge of this in astronomy - want to
know more - But
- The Celestial Sphere is a sphere(!)
- Many geographical spatial DBs use planar
projections - So, astronomers have felt the need to develop
spatial indexing prescriptions of their own
13Hierarchical Triangular Mesh - HTM
- Developed by Sloan survey archive team at JHU
- Start with projection of octahedron on sphere and
subdivide triangles at their midpoints - Generate unique pixel ID code based on position
in the sky and level in hierarchy
can index that with B-tree
14Hierarchical Equal Area Iso-Latitude Pixelisation
(HEALPix)
- Developed by Kris Gorski (now JPL/Caltech)
- Start with division of sphere into twelve equal
area curvilinear quadrilaterals, - then divide each into four
- Like HTM, produces a
- pixel code on which a
- B-tree index can be made
- (Ian HEALPix in Oracle?)
15Sky survey DB case studySkyServer for SDSS
- Sloan Digital Sky Survey (SDSS)
- first of new generation of sky surveys
- US-led team, dedicated telescope camera
- Image half of northern sky in 5 optical bands
- Then obtain optical
- spectra for 1,000,000
- galaxies
- Estimated 1TB of
- catalogue data
16SDSS Archive
- First of new generation of sky survey archives
- Represents the state-of-the-art in sky survey
databases - Developed by Alex Szalays team at Johns Hopkins
- Project started in earnest in about 1996
- OODBMSs seen as the coming thing
- SDSS chose Objectivity/DB for their archive
15 staff-years of effort later, theyd rewritten
much of the DBMS themselvesand then jumped ship
and started using MS SQL Server! - SkyServer
(in collaboration with Jim Gray, MS Research)
17SkyServer design considerations
- Power flexibility to pose arbitrary queries
- Simple astronomers ignorant of SQL!
- Hide messy spherical trigonometry
- Distance on sphere between (a1,d1) and (a2,d2) is
given in SQL by - 2.0asin(sqrt(square(sin(0.5(radians(d1-d2)))
) cos(radians(d1))cos(radians(d2))
square(sin(0.5(radians(a1-a2))))) - Dont want users typing this
- Dont really want DBMS to evaluate expressions
like this often
18SkyServer spatial queries
- Simple table-valued functions exposed to user
- E.g. select count()
- from fGetNearbyObjEq(a,d,radius)
- (a,d)(Right Ascension, Declination)
- Functions call SQL Server Extended Stored
Procedure - HTM index manipulation routines, implemented in a
Dynamically Linked Library (DLL) - DLL generated from HTM package in C
19Lessons from HTM implementation in SkyServer
- SQL is not great for spherical trigonometry
- Messy to write, slow to compute
- Have to define stored procedures/functions
- Expose a clean interface to users
- Let them pose queries the way they want to
- Replace trig operations by integer arithmetic
- Library of HTM index operations underneath
- Precompute tables of neighbouring objects
- Far fewer spatial match operations at query time
20Problems with this approach
- How easy to develop stored procedures, etc?
- Needs detailed knowledge of DBMS
- Extended Stored Procedure calls slow
- How well will query optimiser use HTM?
- less well than built-in spatial index?
- but that might be poorly suited to astronomical
applications - How easy to implement all this in DBMSs other
than SQL Server? - But this works reasonably well in practice!