Title: SKYQUERY www.skyquery.net
1SKYQUERY www.skyquery.net
- Federated Database Query System (using
WebServices) - Developed by Tanu Malik, Alex Szalay, Tamas
Budavari, Ani Thakar - _at_ The Johns Hopkins University
- Rational
- Same drivers as VOQL/IVOA
- Large/complex astronomy queries
- Query over disjoint/distributed archives
- Demonstrator for NVO database federation.
- See if WebServices really work
- Development Time 5 Man Months
- Done over 6 week period.
2Federation Issues
- 1. Building a basic framework that allows
federation, respecting autonomy (and
heterogeneity). - Use a simple architecture and provide medium
for interoperability acceptable to all. - 2. Using the framework to solve scientific
queries. - Specifying queries transparently .
- Executing queries efficiently .
- WebServices Soloution
- SkyQuery uses Web Services for interoperability.
- Use of Internet standards.
- Communication Protocol HTTP
- Message Exchange Model
- Simple Object Access Protocol(SOAP)
- eXtended Markup Language(XML) encoding.
- Service Description Web Services Description
Language (WSDL),
3Architecture
WebServices
Registration
SkyQuery
4Query Language
SIMPLE SQL based on BumbleBee Lex/Yacc using
basic SQL syntax from a book Added extension
for Area (now also Poly,Chull in development).
Added extension for XMATCH (About 6 weeks
work) Looks like this
SELECT o.objId, o.weight, o.color, t.lambda
FROM SDSSPhotoObject o, TWOMASSPhotoPrimary
t FIRST PrimaryObject p WHERE
AREA(181.3,-0.76,6.5) AND XMATCH(o,t,p)lt3.5
AND typeGALAXY and (o.I - t.m_j)gt2
Try SkyQuery.Net
5XMatch
Obtain counts of objects in each catalogue. Take
first catalogue (smallest) Select an object
Select all objects in next Catalogue within a
generous radius of the Object (HTM) Assume a
position for the real body effectively the
weighted mean position of the objects in
consideration Then minimise Chisquare formula to
work out radius for match Evaluate which objects
fall in the radius for a match. More info on
http//www.skyquery.net/matching.htm Good for
small areas now optimization in pipeline.
6Query Execution
- Goal of distributed query execution
- Minimize processing costs.
- Minimize transmission costs.
- Performance Queries.
- A SQL-like query to find upper bound on the
number of tuples. - Query Execution Plan(QEP)
- An efficient plan constructed based on results of
performance queries.
7Performance Queries
- Done First
- P1 SELECT count() FROM SDSS Photo_Object O
- WHERE AREA(189.83,-0.52,8.5) AND O.type 3.
- P2 SELECT count() FROM TWOMASS Photo_Primary T
- WHERE AREA(189.83,-0.52,8.5)
- P3 SELECT count() FROM FIRST Primary_Object P
- WHERE AREA(189.83,-0.52,8.5)
SELECT o.objId, o.ra, o.r, o.type, t.objId FROM
SDSSPhotoPrimary o, TWOMASSPhotoPrimary
t, FIRSTPhotoPrimary p WHERE
XMATCH(o,t,!p)lt3.5 AND
AREA(189.83,-0.52,8.5) AND o.type3
8Query Execution Plan
- Say, R3 lt R2 lt R1
- QEP
- T1 URL1, SELECT o.objId, o.ra, o.r, o.type,
t.objId - FROM SDSSPhotoPrimary o
- WHERE AREA(189.83,-0.52,8.5)AND o.type3
- AND XMATCH(o,t,!p)lt3.5
- T2 URL2, SELECT o.objId, o.ra, o.r, o.type,
t.objId - FROM TWOMASSPhotoPrimary t
- WHERE AREA(189.83,-0.52,8.5)
- AND XMATCH(o,t,!p)lt3.5
-
- T3 URL3, SELECT p.objid,p.cx,p.cy,p,cz
- FROM FIRST PrimaryObject p
- WHERE AREA(189.83,-0.52,8.5)
No Xmatch needed at last archive
9Query Execution
2. QEP
1.Performance Queries
SKYNODE 3
SKYNODE 2
SKYNODE 1
T2 result
T3 result
10Lessons
- Tools now exist to develop federations fast.
- Adding new nodes is trivial.
- Web services work and are effective for building
federations. - POS skynode to come
- INT skynode added (McMahon)
- This was pre VOTable will be added
- Formal package to be made shrink wrap package
with guide.
11What about IVO
- Two parts
- User level interface
- XML (easy parse) GUI (easy use) ,SQL (standard?
- All of the above ?
- Participant DB interfaces
- Support only simple queries
- More advanced Xmatch, Region???
- WebService interface seems appropriate.
- Use Data Model for names of items (presumably)?