Association techniques for the Virtual Observatory - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Association techniques for the Virtual Observatory

Description:

... all galaxies in region A of the sky with an optical/X-ray flux ratio greater ... algorithm to learn form of nA,ID(mi) [Emma Taylor PhD thesis] Circumvent it: ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 16
Provided by: nes68
Category:

less

Transcript and Presenter's Notes

Title: Association techniques for the Virtual Observatory


1
Association techniques for the Virtual Observatory
  • Bob Mann

2
Why associations are crucial to the Virtual
Observatory
  • The essence of the VO is database federation
  • Usually DBs of independent origin
  • No links between entries in different DBs
  • Such links needed for prototypical VO query
  • e.g. give me all galaxies in region A of the sky
    with an optical/X-ray flux ratio greater than X
    which are not detected in the radio to a limiting
    flux of Y

Optical
X-ray
Radio
3
Why you might think associations are easy to make
  • Natural spatial indexing to astro databases
  • Plus uncertainties on positions, in general
  • Just perform matching by proximity
  • Simple-ish methods for doing this Clive
  • Some practical issues for distributed case
  • Data volumes
  • think about transfers performance
  • Metadata for interoperability

4
SkyQuery www.skyquery.net
  • Restriction to SQLServer databases .Net
  • Requires special facilities at data centres?
    Greg
  • Matching by proximity alone

5
Matching by proximity is not always adequate
Need astrophysical information to know which of
the red objects is the most likely counterpart to
the cyan source
6
General Case
  • Database A
  • Positions (RAi,Deci) for i1,NA
  • Pos. Uncerts (sRA,i, sDec,i) or (sX,i, sY,i) or
    si or s
  • Other attributes Aij for j1,MA
  • Ditto for Database B
  • (NA,NB) may be up to 109
  • (MA,MB) may be 102
  • lt10 likely to be used in association procedure

7
General Requirements
  • Users can readily assess whether associations are
    suitable for their analysis
  • Transparency of method used
  • Figure of merit for each association
  • User-supplied association methods(?)
  • Performance pre-computation vs. on-the-fly
  • Incorporating astrophysical prior knowledge, but
    not biasing associations unduly
  • Often new classes of source involved

8
Likelihood Ratio technique(s)
  • Likelihood Ratio, LRij, for association of ith
    entry of DB A and jth of B defined to be
  • LRij prob. that Ai is true counterpart of Bj
  • ________________________________
  • prob. that Ai is not true counterpart of Bj
  • Choose i that maximises LRij

9
LR example
  • A is an optical catalogue, with magnitudes m and
    negligible positional errors
  • Gaussian positional uncertainty, e(x,y), for B
  • Then, LRij nA,ID(mi) e(xj,yj) / nA(mi)
  • Problems
  • Might not know form of nA,ID(mi)
  • Might have several populations in B

10
If nA,ID(mi) is not known
  • Estimate it
  • Compare nA(m) around source positions with nA(m)
    for full database A
  • Learn it
  • Use EM algorithm to learn form of nA,ID(mi)
    Emma Taylor PhD thesis
  • Circumvent it
  • Set nA,ID(mi)const. and normalise LRij using
    randomly-located fictitious sources

11
But
  • All of these methods require statistics on A
  • e.g. nA(m)
  • or histogram of any other attribute(s)
  • The more complicated the physical model e.g.
    multiple source populations in B the more
    complicated the statistics that are needed
  • Not insurmountable problem just lots of
    count() queries

12
Pre-computing cross-neighbours
  • LR chooses between a few candidates usually
  • Pre-compute store cross-neighbours
  • At least for the few, very large DBs
  • Can then allow many probabilistic models to be
    used following the initial proximity cut

B
A
CrossNeighbours (B,C)
CrossNeighbours (C,B)
C
13
Distributed Association Service?
  • c.f. Distributed Annotation Server
  • Allows third-party annotation in bio DBs
  • inferred function of this gene is junk
  • Can be included in queries (somehow)
  • Select whatever from BioDB
  • where not function is junk
  • Some sort of join between BioDB and the
    Distributed Annotation Server

14
Distributed Association Service (2)
  • Is something like this needed in the VO?
  • Easier than adding extra columns to tables
  • What would it contain
  • References to original databases
  • entry N in DB A is entry M in DB B
  • Descriptions of methods used
  • Links to literature referencesADS/CDS

15
Associations in the VO
  • Basically, something like Gregs picture
  • Start with a large dose of SkyQuery
  • Add possibility of running user-defined
    algorithms on dataset from proximity cut
  • Pre-compute cross-neighbours for big DBs
  • Distributed Association Service to record matches
    made?and methods used?
Write a Comment
User Comments (0)
About PowerShow.com