Association techniques for the Virtual Observatory - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Association techniques for the Virtual Observatory

Description:

... all galaxies in region A of the sky with an optical/X-ray flux ratio greater ... algorithm to learn form of nA,ID(mi) [Emma Taylor PhD thesis] Circumvent it: ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 16

Provided by: nes68

Category:

more less

Transcript and Presenter's Notes

Title: Association techniques for the Virtual Observatory

1
Association techniques for the Virtual Observatory

Bob Mann

2
Why associations are crucial to the Virtual
Observatory

The essence of the VO is database federation
Usually DBs of independent origin
No links between entries in different DBs
Such links needed for prototypical VO query
e.g. give me all galaxies in region A of the sky
with an optical/X-ray flux ratio greater than X
which are not detected in the radio to a limiting
flux of Y

Optical
X-ray
Radio
3
Why you might think associations are easy to make

Natural spatial indexing to astro databases
Plus uncertainties on positions, in general
Just perform matching by proximity
Simple-ish methods for doing this Clive
Some practical issues for distributed case
Data volumes
think about transfers performance
Metadata for interoperability

4
SkyQuery www.skyquery.net

Restriction to SQLServer databases .Net
Requires special facilities at data centres?
Greg
Matching by proximity alone

5
Matching by proximity is not always adequate
Need astrophysical information to know which of
the red objects is the most likely counterpart to
the cyan source
6
General Case

Database A
Positions (RAi,Deci) for i1,NA
Pos. Uncerts (sRA,i, sDec,i) or (sX,i, sY,i) or
si or s
Other attributes Aij for j1,MA
Ditto for Database B
(NA,NB) may be up to 109
(MA,MB) may be 102
lt10 likely to be used in association procedure

7
General Requirements

Users can readily assess whether associations are
suitable for their analysis
Transparency of method used
Figure of merit for each association
User-supplied association methods(?)
Performance pre-computation vs. on-the-fly
Incorporating astrophysical prior knowledge, but
not biasing associations unduly
Often new classes of source involved

8
Likelihood Ratio technique(s)

Likelihood Ratio, LRij, for association of ith
entry of DB A and jth of B defined to be
LRij prob. that Ai is true counterpart of Bj
________________________________
prob. that Ai is not true counterpart of Bj
Choose i that maximises LRij

9
LR example

A is an optical catalogue, with magnitudes m and
negligible positional errors
Gaussian positional uncertainty, e(x,y), for B
Then, LRij nA,ID(mi) e(xj,yj) / nA(mi)
Problems
Might not know form of nA,ID(mi)
Might have several populations in B

10
If nA,ID(mi) is not known

Estimate it
Compare nA(m) around source positions with nA(m)
for full database A
Learn it
Use EM algorithm to learn form of nA,ID(mi)
Emma Taylor PhD thesis
Circumvent it
Set nA,ID(mi)const. and normalise LRij using
randomly-located fictitious sources

11
But

All of these methods require statistics on A
e.g. nA(m)
or histogram of any other attribute(s)
The more complicated the physical model e.g.
multiple source populations in B the more
complicated the statistics that are needed
Not insurmountable problem just lots of
count() queries

12
Pre-computing cross-neighbours

LR chooses between a few candidates usually
Pre-compute store cross-neighbours
At least for the few, very large DBs
Can then allow many probabilistic models to be
used following the initial proximity cut

B
A
CrossNeighbours (B,C)
CrossNeighbours (C,B)
C
13
Distributed Association Service?

c.f. Distributed Annotation Server
Allows third-party annotation in bio DBs
inferred function of this gene is junk
Can be included in queries (somehow)
Select whatever from BioDB
where not function is junk
Some sort of join between BioDB and the
Distributed Annotation Server

14
Distributed Association Service (2)

Is something like this needed in the VO?
Easier than adding extra columns to tables
What would it contain
References to original databases
entry N in DB A is entry M in DB B
Descriptions of methods used
Links to literature referencesADS/CDS

15
Associations in the VO

Basically, something like Gregs picture
Start with a large dose of SkyQuery
Add possibility of running user-defined
algorithms on dataset from proximity cut
Pre-compute cross-neighbours for big DBs
Distributed Association Service to record matches
made?and methods used?

Write a Comment

User Comments (0)