National Virtual Observatory - PowerPoint PPT Presentation

About This Presentation
Title:

National Virtual Observatory

Description:

National Virtual Observatory Theory,Computation, and Data Exploration Panel of the AASC Charles Alcock, Tom Prince, Alex Szalay – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 41
Provided by: prin75
Learn more at: http://www.sdss.jhu.edu
Category:

less

Transcript and Presenter's Notes

Title: National Virtual Observatory


1
National Virtual Observatory
  • Theory,Computation, and Data Exploration Panel of
    the AASC
  • Charles Alcock, Tom Prince, Alex Szalay

2
The National Virtual Observatory
  • National
  • distributed in scope across institutions and
    agencies
  • available to all astronomers and the public
  • Virtual
  • not tied to a single brick-and-mortar location
  • supports astronomical observations and
    discoveries via remote access to digital
    representations of the sky
  • Observatory
  • general purpose
  • access to large areas of the sky at multiple
    wavelengths
  • supports a wide range of astronomical
    explorations
  • enables discovery via new computational tools

3
Why Now ?
  • The past decade has witnessed
  • a thousand-fold increase in computer speed
  • a dramatic decrease in the cost of computing
    storage
  • a dramatic increase in access to broadly
    distributed data
  • large archives at multiple sites and high speed
    networks
  • significant increases in detector size and
    performance
  • These form the basis for science
  • of qualitatively different nature

4
Trends
  • Future dominated by detector improvements
  • Moores Law growth in CCD capabilities
  • Gigapixel arrays on the horizon
  • Improvements in computing and storage will
    track growth in data volume
  • Investment in software is critical, and
    growing

Total area of 3m telescopes in the world in m2,
total number of CCD pixels in Megapix, as a
function of time. Growth over 25 years is a
factor of 30 in glass, 3000 in pixels.
5
The Discovery Process
Past observations of small, carefully selected
samples of objects in a narrow wavelength band
Future high quality, homogeneous
multi-wavelength data on millions of objects,
allowing us to
  • discover significant patterns
  • from the analysis of statistically rich and
    unbiased image/catalog databases
  • understand complex astrophysical systems
  • via confrontation between data and large
    numerical simulations

The discovery process will rely heavily on
advanced visualization and statistical analysis
tools
6
NVO Science Discoveries
  • Discoveries of rare objects
  • Searches for exotic new sources truly rare at
    level of 1 source in 10 million
  • Multi-wavelength identification of large
    statistical samples of previously rare objects
  • brown dwarfs, high-z quasars, ultra-luminous IR
    galaxies, etc.
  • Efficient cross-identification of unidentified
    sources from new surveys
  • Example Use radio, optical, and IR surveys to
    identify serendipitous Chandra X-ray sources
  • Selection of targets for spectroscopic
    follow-up

7
NVO Science Statistical Surveys
  • Homogeneous samples of typical objects
  • Mega-surveys sample size not a problem any more
  • Statistical accuracy determined entirely by
    systematics
  • Multi-wavelength data enables accurate sample
    selection (evolution, rest-frame selection)
  • High Precision Astrophysics of Origins
  • Large scale structure of the universe
  • Galactic structure
  • Galaxy evolution
  • Active galaxies, galaxy clusters, ...
  • Stellar populations
  • Leading to New Astronomy

8
New Astronomy Different!
  • Systematic Data Exploration
  • will have a central role in the New Astronomy
  • Digital Archives of the Sky
  • will be the main access to data
  • Data Avalanche
  • the flood of Terabytes of data is already
    happening, whether we like it or not!
  • Transition to the new
  • may be organized or chaotic

9
Ongoing Mega-Surveys
MACHO 2MASS SDSS DPOSS GSC-II COBE
MAP NVSS FIRST GALEX ROSAT OGLE, ...
  • Large number of new surveys
  • Multi-terabyte in size, 100 million objects or
    larger
  • Individual archives planned and under way
  • Multi-wavelength view of the sky
  • More than 13 wavelength coverage within 5 years
  • Impressive early discoveries
  • Finding exotic objects by unusual colors
  • L,T dwarfs, high redshift quasars
  • Finding objects by time variability
  • gravitational micro-lensing

10
High Redshift Quasars
  • Several zgt5 QSOs discovered by SDSSin the early
    test data

11
Methane/T Dwarf
  • Discovery of several newobjects by SDSS 2MASS

12
DPOSS Discoveries

13
New Neighbor of the Milky Way
  • Finding new galaxies by spatial clustering of red
    objects
  • New galaxy is about 30 million light years away
  • Larger than most of the spiral galaxies in the
    Messier Catalogue
  • Clearly visible in the 2MASS infrared image
  • Expect to find 1000s of such galaxies with 2MASS

Optical
Infrared
14
The Observatories
  • NOAO/NRAO
  • 20 of the time on all its telescopes dedicated
    to major surveys using a wide range of telescope
    and instrumentation packages
  • The NASA Great Observatories
  • new opportunities for surveys,
  • combine mission-specific data with those from
    other missions and from the ground

15
HST Data Archive

16
Proposed Surveys
  • Next Decade New optimized survey systems
  • exploring new parameter space
  • Dark Matter Telescope (DMT)
  • map the distribution of matter for zlt1.5 from
    weak lensing, through
    deep, high quality images of galaxies
  • moving and variable objects through repetitive
    surveys
  • Spectroscopic Wide-Field Telescope (SWIFT)
  • evolution of galaxies from z4 to the present
    from star
    formation rates
  • determine chemical abundances and kinematics

17
The Road to the NVO
  • The environment to exploit these huge sky surveys
    does not exist today!
  • 1 Terabyte at 10 Mbyte/s takes 1 day
  • Expect 100s of intensive queries and 1000s of
    casual queries per-day
  • Data will reside at multiple locations
  • Existing analysis tools do not scale to Terabyte
    data sets
  • Acute need in a few years,
  • it will not just happen

a New Initiative is needed!
18
NVO A New Initiative
  • A new initiative is needed
  • to ensure an evolutionary, cost-effective
    transition
  • to maximize the impact of large current and
    future efforts
  • to create the necessary new standards in the
    community
  • to develop the software tools needed
  • to ensure that the astronomical community has the
    proper network and hardware infrastructure to
    carry out its science
  • The National Virtual Observatory
  • can be the catalyst of the New Astronomy

19
The Goals of the NVO
  • Virtual observations of the sky in multiple
    wavelengths, by integrating all-sky Mega-surveys
  • Query the individual object catalogs and image
    databases thousands of times per day
  • Joint queries of the combined catalogs thousands
    of times per day
  • Enable discovery in these archives via new tools
    novel visualization techniques, supervised,
    unsupervised learning, advanced classification
    techniques

20
NVO The Challenges
  • Size of the archived data
  • 40,000 square degrees is 2 trillion pixels
  • One band 4 Terabytes
  • Multi-wavelength 10-100 Terabytes
  • Time dimension few Petabytes
  • The development of
  • new archival methods
  • new analysis tools
  • Hardware requirements
  • Training the next generation

21
Necessary Components
  • New archival methods
  • New analysis tools
  • New hardware requirements

22
New Archival Methods
  • Structure and manage multi-TB (and soon PB) data
    archives, distributed across the continent
  • Rapid and transparent access to image/catalog
    databases across all wavelengths, via intelligent
    query agents
  • Efficient query and data retrieval by more than
    10,000 scientists world-wide, with enhanced
    search operators (like spatial proximity)

23
Examples non-local queries
  • Find all objects within 1' which have more than
    two neighbors with u-g, g-r, i-K colors within
    0.05m
  • Find all star-like objects within dm0.2 of the
    colors of a quasar at 5.5ltzlt6.5, using all colors
    in all available catalogs
  • Find galaxies that are blended with a star,
    output the deblended magnitudes
  • Provide a list of moving objects consistent with
    an asteroid, based on all the surveys, estimate
    possible orbit parameters
  • Find binary stars where at least one of them has
    the colors of a white dwarf, within the error
    boxes of hard x-ray sources

24
Examples Todays I/O rates
  • Reading a 1 TB data set
  • data access speed
    time days
  • Fast database server 50 MB/s 0.23
  • Local SCSI/Fast Ethernet 10 MB/s 1.2
  • T1 0.5 MB/s 23
  • Typical good www 20 KB/s 580
  • Brute force is not enough we need clever
    techniques

25
Geometric Indexing
Divide and Conquer
Partitioning
3 ? N ? M
HierarchicalTriangular Mesh
Split as k-d treeStored as r-treeof bounding
boxes
Using regularindexing techniques
26
Sky coordinates
Stored as Cartesian coordinates projected onto
a unit sphere Longitude and Latitude
lines intersections of planes and the
sphere Boolean combinations query polyhedron
27
Sky Partitioning
Hierarchical Triangular Mesh - based on octahedron
28
Hierarchical Subdivision
Hierarchical subdivision of spherical
triangles represented as a quadtree In SDSS the
tree is 5 levels deep - 8192 triangles, In 2MASS
the tree goes much deeper in the Galactic plane
One shoe fits all This indexing is now adopted
by SDSS, 2MASS, GSC2, POSS2, FIRST and is
considered by CDS, PLANCK and GAIA New
standard spatial index for astronomy!
29
Result of the Query
30
New Analysis Tools
  • Discover new patterns through advanced
    statistical methods and visualization techniques
  • Confront catalogs and image databases with
    numerical simulations of astrophysical systems
  • Collaborative exploration of multi-wavelength
    databases by multiple groups working at remote
    sites

31
New Hardware Requirements
  • Large distributed database engines with Gbyte/s
    aggregate I/O speed
  • High speed (gt10 Gbits/s) backbones
    cross-connecting the major archives
  • Scalable computing environment with hundreds of
    CPUs for statistical analysis and discovery

32
What is the NVO? - Content
33
What is the NVO? - Components

34
Conceptual Architecture
User
Discovery tools
Analysis tools
Gateway
Data Archives
35
The Flavor/Role of the NVO
  • Highly Distributed and Decentralized
  • Multiple Phases, built on top of another
  • Establish standards, meta-data formats
  • Integrate main catalogs
  • Develop initial querying tools
  • Develop collaboration requirements, establish
    procedure to import new catalogs
  • Develop distributed analysis environment
  • Develop advanced visualization tools
  • Develop advanced querying tools

36
NVO Development Functions
  • Software development
  • query generation/optimization, software agents,
    user interfaces, discovery tools, visualization
    tools
  • Standards development
  • Meta-data, meta-services, streaming formats,
    object relationships, object attributes,...
  • Infrastructure development
  • archival storage systems, query engines, compute
    servers, high speed connections of main centers
  • Train the Next Generation
  • train scientists equally at home in astronomy and
    modern computer science, statistics,
    visualization

37
The Mission of the NVO
  • The National Virtual Observatory should
  • provide seamless integration of the digitally
    represented multi-wavelength sky
  • enable efficient simultaneous access to
    multi-Terabyte to Petabyte databases
  • develop and maintain tools to find patterns and
    discoveries contained within the large databases
  • develop and maintain tools to confront data with
    sophisticated numerical simulations

38
NVO Funding
  • The NVO is ideal for multi-agency and IT funding
  • relevant for all areas of astronomy and space
    science
  • excellent match to goals of the IT2 initiative
  • but core funding must come from NASA and NSF
  • needs serious involvement of computer scientists
  • Scope
  • approximately 25M for the first 5 years, could
    be larger in the second half
  • Requires long term commitment
  • development/deployment (5 5 years)
  • Needs to start soon
  • data avalanche has already begun

An effort for the whole astronomy -
astrophysics community!
39
(No Transcript)
40
NVO Layers
Three layers built on top of another, tied
together with standards
  • Discovery tools
  • Visualization
  • Advanced classification methods
  • Supervised/unsupervised learning
  • Data mining
  • Standards
  • Meta-data
  • Interfaces between archives
  • Cross-identification standards
  • Archive-tool interfaces
  • Basic analysis tools
  • Query capabilities
  • Statistical tools
  • Ability to run user code (API)
  • Browsing tools
  • Archives
  • Data content
  • Interconnections
  • Cross identifications
  • Services
Write a Comment
User Comments (0)
About PowerShow.com