1 - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

1

Description:

The World-Wide Telescope Archetype. Jim Gray. Microsoft Research ... So, the Internet is the world's best telescope: It has data on every part of the sky ... – PowerPoint PPT presentation

Number of Views:180
Avg rating:3.0/5.0
Slides: 27
Provided by: jimg178
Category:
Tags: telescope

less

Transcript and Presenter's Notes

Title: 1


1
Online Science -- The World-Wide Telescope
Archetype
  • Jim Gray
  • Microsoft Research
  • Collaborating with
  • Alex Szalay, Ani Thakar, _at_ JHU
  • Roy Williams, George Djorgovski, Julian Bunn _at_
    Caltech
  • Robert Brunner _at_ U.I.

2
Outline
  • The revolution in Computational Science
  • The Virtual Observatory Concept
  • World-Wide Telescope

3
Computational Science The Third Science Branch
is Evolving
  • In the beginning science was empirical.
  • Then theoretical branches evolved.
  • Now, we have computational branches.
  • Was primarily simulation
  • Growth areas data analysis visualization
    of peta-scale instrument data.
  • Help both simulation and instruments.
  • Are primitive today.

4
Computational Science
  • Traditional Empirical Science
  • Scientist gathers data by direct observation
  • Scientist analyzes data
  • Computational Science
  • Data captured by instrumentsOr data generated by
    simulator
  • Processed by software
  • Placed in a database / files
  • Scientist analyzes database / files

5
What Do Scientists Do With The Data?They Explore
Parameter Space
  • There is LOTS of data
  • people cannot examine most of it.
  • Need computers to do analysis.
  • Manual or Automatic Exploration
  • Manual person suggests hypothesis, computer
    checks hypothesis
  • Automatic Computer suggests hypothesis person
    evaluates significance
  • Given an arbitrary parameter space
  • Data Clusters
  • Points between Data Clusters
  • Isolated Data Clusters
  • Isolated Data Groups
  • Holes in Data Clusters
  • Isolated Points
  • Points / clusters similar to this one

Nichol et al. 2001 Slide courtesy of and adapted
from Robert Brunner _at_ CalTech.
6
Challenge to Data Miners Rediscover Astronomy
  • Astronomy needs deep understanding of physics.
  • But, some was discovered as variable
    correlations then explained with physics.
  • Famous example Hertzsprung-Russell Diagramstar
    luminosity vs color (temperature)
  • Challenge 1 (the student test) How much of
    astronomy can data mining discover?
  • Challenge 2 (the Turing test)Can data mining
    discover NEW correlations?

7
Whats needed?(not drawn to scale)
8
Some science is hitting a wallFTP and GREP are
not adequate
  • You can GREP 1 MB in a second
  • You can GREP 1 GB in a minute
  • You can GREP 1 TB in 2 days
  • You can GREP 1 PB in 3 years.
  • Oh!, and 1PB 3,000 disks
  • At some point you need indices to limit
    search parallel data search and analysis
  • This is where databases can help
  • You can FTP 1 MB in 1 sec
  • You can FTP 1 GB / min ( 1 /GB)
  • 2 days and 1K
  • 3 years and 1M

9
The Digital Shoebox
  • Personal
  • In the old dayspeople took photoshad them
    developedput them in a shoe box
  • Some people actually put them in picture albums.
  • But mostly, pictures are never seen againit is
    hard to find anything
  • Science
  • In the old days scientists kept notebooks.
  • Now they keep ftp servers
  • Some put them in indexed databases
  • But mostly, data are never seen again and it is
    hard to find anything.

How do we find data subsets in the shoebox?
10
Goal Easy Data Publication Access
  • Augment FTP with data query Return
    intelligent data subsets
  • Make it easy to
  • Publish Record structured data
  • Find
  • Find data anywhere in the network
  • Get the subset you need
  • Explore datasets interactively
  • Realistic goal
  • Make it as easy as publishing/reading web sites
    today.

11
Web Services The Key?
  • Web SERVER
  • Given a url parameters
  • Returns a web page (often dynamic)
  • Web SERVICE
  • Given a url XML document (soap msg)
  • Returns an XML document
  • Tools make this look like an RPC.
  • F(x,y,z) returns (u, v, w)
  • Distributed objects for the web.
  • naming, discovery, security,..
  • Internet-scale distributed computing

Your program
Web Server
http
Web page
Your program
Web Service
soap
Data In your address space
objectin xml
12
Grid and Web Services Synergy
  • I believe the Grid will be many web services
  • IETF standards Provide
  • Naming
  • Authorization / Security / Privacy
  • Distributed Objects
  • Discovery, Definition, Invocation, Object Model
  • Higher level services workflow, transactions,
    DB,..
  • Synergy commercial Internet Grid tools

13
Outline
  • The revolution in Computational Science
  • The Virtual Observatory Concept
  • World-Wide Telescope

14
Data Federations of Web Services
  • Massive datasets live near their owners
  • Near the instruments software pipeline
  • Near the applications
  • Near data knowledge and curation
  • Super Computer centers become Super Data Centers
  • Each Archive publishes a web service
  • Schema documents the data
  • Methods on objects (queries)
  • Scientists get personalized extracts
  • Federation Uniform access to multiple Archives
  • A common global schema

15
Why Astronomy Data?
  • It has no commercial value
  • No privacy concerns
  • Can freely share results with others
  • Great for experimenting with algorithms
  • It is real and well documented
  • High-dimensional data (with confidence intervals)
  • Spatial data
  • Temporal data
  • Many different instruments from many different
    places and many different times
  • Federation is a goal
  • The questions are interesting
  • How did the universe form?
  • There is a lot of it (petabytes)

16
Astronomy Data Growth
  • In the old days astronomers took photos.
  • Now instruments are digital (100s of GB/nite)
  • Detectors are following Moores law.
  • Data avalanche double every 2 years
  • all data more than 2 years old is public
  • About 1 PB public now

Total area of worlds 3m telescopes (m2)
3 M telescopes area m2
Total number of CCD pixels (megapixel)
Courtesy of Alex Szalay
CCD area mpixels
Growth over 25 years is a factor of 30 in
glass,a factor of 3000 in pixels.
17
Time and Spectral DimensionsThe Multiwavelength
Crab Nebulae
Szalays variant of Metcalfs Law The utility
of N different data sets is approxmately N2/2
Each pair of comparisons gives additional
information. The Federation value is superlinear
in size.
18
The Age of Mega-Surveys
  • Large number of new surveys
  • multi-TB in size, 100 million objects or more
  • Data publication an integral part of the survey
  • Software bill a major cost in the survey
  • These mega-surveys are different
  • top-down design
  • large sky coverage
  • sound statistical plans
  • well controlled/documented data processing
  • Each survey has a publication plan
  • Federating these archives
  • ? Virtual Observatory

MACHO 2MASS DENIS SDSS PRIME DPOSS GSC-II COBE
MAP NVSS FIRST GALEX ROSAT OGLE LSST...
Slide courtesy of Alex Szalay, modified by Jim
19
Data Publishing and Access
  • But..
  • How do I get at that petabyte of public of the
    data?
  • Astronomers have culture of publishing.
  • FITS files and many tools.http//fits.gsfc.nasa.g
    ov/fits_home.html
  • Encouraged by NASA.
  • FTP what you need.
  • But, data details are hard to document.
    Astronomers want to do it, but it is VERY
    difficult.(What programs where used? What were
    the processing steps? How were errors treated?)
  • And by the way, few astronomers have a spare
    petabyte of storage in their pocket (today).
  • THESIS Challenging problems are publishing
    data providing good query visualization tools

20
Virtual Observatoryhttp//www.astro.caltech.edu/n
voconf/http//www.voforum.org/
  • Premise Most data is (or could be online)
  • So, the Internet is the worlds best telescope
  • It has data on every part of the sky
  • In every measured spectral band optical, x-ray,
    radio..
  • As deep as the best instruments (2 years ago).
  • It is up when you are up.The seeing is always
    great (no working at night, no clouds no moons
    no..).
  • Its a smart telescope links objects and
    data to literature on them.

21
Sky Server
  • Alex Szalay of Johns Hopkins builSkyServer
    (based on TerraServer design)
    http//skyserver.sdss.org/
  • Data access Astronomy education
  • 7M web hits, usage growing 15/month
  • Moving to V4 DB Schema (1.5 TB DB 5TB image
    by 7/1/2003)
  • Recent CS efforts have been
  • automated data pipeline (workflow engine) and
  • web services integration with VO
  • Template widely used and cloned in the Astronomy
    and Computer Science communities
  • Prototype for publishing an Astronomy archive on
    web.

22
Virtual Observatory Status
  • Lots of meetings (too many)
  • VO table defined (a successor to FITS?)
  • Tool suite emerging
  • Defining Astronomy Objects and Methods.
  • Federated 5 Web Services (fermilab/sdss,
    jhu/first, Cal Tech/dposs, Cambrige/nt)
  • http//skyquery.net/ multi-survey crossID match
    and select Distributed query optimization
  • http//SkyService.jhu.pha.edu/SdssCutout Image
    access service (cutout annotated)
  • WWT is a great Web Services (.Net) application
  • Federating heterogeneous data sources.
  • Cooperating organizations
  • An Information At Your Fingertips challenge.

23
SkyQuery Web Services http//skyquery.net/
  • Basic Services
  • Metadata about resources
  • Waveband
  • Sky coverage
  • Translation of names to universal dictionary
    (UCD)
  • Simple search resources
  • Cone Search
  • Image mosaic
  • Unit conversions
  • Filtering, counting, histograms
  • On-the-fly recalibrations
  • Higher Level Services
  • Built on Atomic Services
  • Perform more complex tasks
  • Examples
  • Automated resource discovery
  • Cross-identifications
  • Photometric redshifts
  • Outlier detections
  • Visualization facilities
  • Goal
  • Build custom portals in days from existing
    building blocks (like today in IRAF or IDL)

24
SkyQuery Cross-id Steps http//skyquery.net/
  • Parse query
  • Get counts
  • Sort by counts
  • Make plan
  • Cross-match
  • Recursively, from small to large
  • Select necessary attributes only
  • Return output
  • Insert cutout image

SELECT o.objId, o.r, o.type, t.objId FROM
SDSSPhotoPrimary o, TWOMASSPhotoPrimary t
WHERE XMATCH(o,t)AREA(181.3,-0.76,6.5) AND (o.i - t.m_j)
2 AND o.type3
25
Summary
  • The revolution in Computational
    Sciencesimulation analysis
  • The Virtual Observatory Concept
  • World-Wide Telescope
  • I finally found a distributed database
  • I have found a distributed system and a
    distributed object system.

26
ReferencesNVO (Virtual Observatory)WWT (world
wide telescope)
  • NVO Science Definition (an NSF report)http//www.
    nvosdt.org/
  • VO Forum website http//www.voforum.org/
  • World-Wide Telescope paper in ScienceV.293 pp.
    2037-2038. 14 Sept 2001. (MS-TR-2001-77 word or
    pdf.)
Write a Comment
User Comments (0)
About PowerShow.com