Title: 1
1Online Science -- The World-Wide Telescope
Archetype
- Jim Gray
- Microsoft Research
- Collaborating with
- Alex Szalay, Ani Thakar, _at_ JHU
- Roy Williams, George Djorgovski, Julian Bunn _at_
Caltech - Robert Brunner _at_ U.I.
2Outline
- The revolution in Computational Science
- The Virtual Observatory Concept
- World-Wide Telescope
3Computational Science The Third Science Branch
is Evolving
- In the beginning science was empirical.
- Then theoretical branches evolved.
- Now, we have computational branches.
- Was primarily simulation
- Growth areas data analysis visualization
of peta-scale instrument data. - Help both simulation and instruments.
- Are primitive today.
4Computational Science
- Traditional Empirical Science
- Scientist gathers data by direct observation
- Scientist analyzes data
- Computational Science
- Data captured by instrumentsOr data generated by
simulator - Processed by software
- Placed in a database / files
- Scientist analyzes database / files
5What Do Scientists Do With The Data?They Explore
Parameter Space
- There is LOTS of data
- people cannot examine most of it.
- Need computers to do analysis.
- Manual or Automatic Exploration
- Manual person suggests hypothesis, computer
checks hypothesis - Automatic Computer suggests hypothesis person
evaluates significance - Given an arbitrary parameter space
- Data Clusters
- Points between Data Clusters
- Isolated Data Clusters
- Isolated Data Groups
- Holes in Data Clusters
- Isolated Points
- Points / clusters similar to this one
Nichol et al. 2001 Slide courtesy of and adapted
from Robert Brunner _at_ CalTech.
6Challenge to Data Miners Rediscover Astronomy
- Astronomy needs deep understanding of physics.
- But, some was discovered as variable
correlations then explained with physics. - Famous example Hertzsprung-Russell Diagramstar
luminosity vs color (temperature) - Challenge 1 (the student test) How much of
astronomy can data mining discover? - Challenge 2 (the Turing test)Can data mining
discover NEW correlations?
7Whats needed?(not drawn to scale)
8 Some science is hitting a wallFTP and GREP are
not adequate
- You can GREP 1 MB in a second
- You can GREP 1 GB in a minute
- You can GREP 1 TB in 2 days
- You can GREP 1 PB in 3 years.
- Oh!, and 1PB 3,000 disks
- At some point you need indices to limit
search parallel data search and analysis - This is where databases can help
- You can FTP 1 MB in 1 sec
- You can FTP 1 GB / min ( 1 /GB)
- 2 days and 1K
- 3 years and 1M
9The Digital Shoebox
- Personal
- In the old dayspeople took photoshad them
developedput them in a shoe box - Some people actually put them in picture albums.
- But mostly, pictures are never seen againit is
hard to find anything
- Science
- In the old days scientists kept notebooks.
- Now they keep ftp servers
- Some put them in indexed databases
- But mostly, data are never seen again and it is
hard to find anything.
How do we find data subsets in the shoebox?
10Goal Easy Data Publication Access
- Augment FTP with data query Return
intelligent data subsets - Make it easy to
- Publish Record structured data
- Find
- Find data anywhere in the network
- Get the subset you need
- Explore datasets interactively
- Realistic goal
- Make it as easy as publishing/reading web sites
today. -
11Web Services The Key?
- Web SERVER
- Given a url parameters
- Returns a web page (often dynamic)
- Web SERVICE
- Given a url XML document (soap msg)
- Returns an XML document
- Tools make this look like an RPC.
- F(x,y,z) returns (u, v, w)
- Distributed objects for the web.
- naming, discovery, security,..
- Internet-scale distributed computing
Your program
Web Server
http
Web page
Your program
Web Service
soap
Data In your address space
objectin xml
12Grid and Web Services Synergy
- I believe the Grid will be many web services
- IETF standards Provide
- Naming
- Authorization / Security / Privacy
- Distributed Objects
- Discovery, Definition, Invocation, Object Model
- Higher level services workflow, transactions,
DB,.. - Synergy commercial Internet Grid tools
13Outline
- The revolution in Computational Science
- The Virtual Observatory Concept
- World-Wide Telescope
14Data Federations of Web Services
- Massive datasets live near their owners
- Near the instruments software pipeline
- Near the applications
- Near data knowledge and curation
- Super Computer centers become Super Data Centers
- Each Archive publishes a web service
- Schema documents the data
- Methods on objects (queries)
- Scientists get personalized extracts
- Federation Uniform access to multiple Archives
- A common global schema
15Why Astronomy Data?
- It has no commercial value
- No privacy concerns
- Can freely share results with others
- Great for experimenting with algorithms
- It is real and well documented
- High-dimensional data (with confidence intervals)
- Spatial data
- Temporal data
- Many different instruments from many different
places and many different times - Federation is a goal
- The questions are interesting
- How did the universe form?
- There is a lot of it (petabytes)
16Astronomy Data Growth
- In the old days astronomers took photos.
- Now instruments are digital (100s of GB/nite)
- Detectors are following Moores law.
- Data avalanche double every 2 years
- all data more than 2 years old is public
- About 1 PB public now
Total area of worlds 3m telescopes (m2)
3 M telescopes area m2
Total number of CCD pixels (megapixel)
Courtesy of Alex Szalay
CCD area mpixels
Growth over 25 years is a factor of 30 in
glass,a factor of 3000 in pixels.
17Time and Spectral DimensionsThe Multiwavelength
Crab Nebulae
Szalays variant of Metcalfs Law The utility
of N different data sets is approxmately N2/2
Each pair of comparisons gives additional
information. The Federation value is superlinear
in size.
18The Age of Mega-Surveys
- Large number of new surveys
- multi-TB in size, 100 million objects or more
- Data publication an integral part of the survey
- Software bill a major cost in the survey
- These mega-surveys are different
- top-down design
- large sky coverage
- sound statistical plans
- well controlled/documented data processing
- Each survey has a publication plan
- Federating these archives
- ? Virtual Observatory
MACHO 2MASS DENIS SDSS PRIME DPOSS GSC-II COBE
MAP NVSS FIRST GALEX ROSAT OGLE LSST...
Slide courtesy of Alex Szalay, modified by Jim
19Data Publishing and Access
- But..
- How do I get at that petabyte of public of the
data? - Astronomers have culture of publishing.
- FITS files and many tools.http//fits.gsfc.nasa.g
ov/fits_home.html - Encouraged by NASA.
- FTP what you need.
- But, data details are hard to document.
Astronomers want to do it, but it is VERY
difficult.(What programs where used? What were
the processing steps? How were errors treated?) - And by the way, few astronomers have a spare
petabyte of storage in their pocket (today). - THESIS Challenging problems are publishing
data providing good query visualization tools
20Virtual Observatoryhttp//www.astro.caltech.edu/n
voconf/http//www.voforum.org/
- Premise Most data is (or could be online)
- So, the Internet is the worlds best telescope
- It has data on every part of the sky
- In every measured spectral band optical, x-ray,
radio.. - As deep as the best instruments (2 years ago).
- It is up when you are up.The seeing is always
great (no working at night, no clouds no moons
no..). - Its a smart telescope links objects and
data to literature on them.
21Sky Server
- Alex Szalay of Johns Hopkins builSkyServer
(based on TerraServer design)
http//skyserver.sdss.org/ - Data access Astronomy education
- 7M web hits, usage growing 15/month
- Moving to V4 DB Schema (1.5 TB DB 5TB image
by 7/1/2003) - Recent CS efforts have been
- automated data pipeline (workflow engine) and
- web services integration with VO
- Template widely used and cloned in the Astronomy
and Computer Science communities - Prototype for publishing an Astronomy archive on
web.
22Virtual Observatory Status
- Lots of meetings (too many)
- VO table defined (a successor to FITS?)
- Tool suite emerging
- Defining Astronomy Objects and Methods.
- Federated 5 Web Services (fermilab/sdss,
jhu/first, Cal Tech/dposs, Cambrige/nt) - http//skyquery.net/ multi-survey crossID match
and select Distributed query optimization - http//SkyService.jhu.pha.edu/SdssCutout Image
access service (cutout annotated) - WWT is a great Web Services (.Net) application
- Federating heterogeneous data sources.
- Cooperating organizations
- An Information At Your Fingertips challenge.
23SkyQuery Web Services http//skyquery.net/
- Basic Services
- Metadata about resources
- Waveband
- Sky coverage
- Translation of names to universal dictionary
(UCD) - Simple search resources
- Cone Search
- Image mosaic
- Unit conversions
- Filtering, counting, histograms
- On-the-fly recalibrations
- Higher Level Services
- Built on Atomic Services
- Perform more complex tasks
- Examples
- Automated resource discovery
- Cross-identifications
- Photometric redshifts
- Outlier detections
- Visualization facilities
- Goal
- Build custom portals in days from existing
building blocks (like today in IRAF or IDL)
24SkyQuery Cross-id Steps http//skyquery.net/
- Parse query
- Get counts
- Sort by counts
- Make plan
- Cross-match
- Recursively, from small to large
- Select necessary attributes only
- Return output
- Insert cutout image
SELECT o.objId, o.r, o.type, t.objId FROM
SDSSPhotoPrimary o, TWOMASSPhotoPrimary t
WHERE XMATCH(o,t)AREA(181.3,-0.76,6.5) AND (o.i - t.m_j)
2 AND o.type3
25Summary
- The revolution in Computational
Sciencesimulation analysis - The Virtual Observatory Concept
- World-Wide Telescope
- I finally found a distributed database
- I have found a distributed system and a
distributed object system.
26ReferencesNVO (Virtual Observatory)WWT (world
wide telescope)
- NVO Science Definition (an NSF report)http//www.
nvosdt.org/ - VO Forum website http//www.voforum.org/
- World-Wide Telescope paper in ScienceV.293 pp.
2037-2038. 14 Sept 2001. (MS-TR-2001-77 word or
pdf.)