Title: POMS Slides
1Pattern Discovery Tools for Large Astronomical
Surveys
Tin Kam HoBell Labs, Lucent Technologiestkh_at_rese
arch.bell-labs.com
in collaboration with David Wittman, J. Anthony
Tyson University of California, Davis Samuel
Carliles, Wil O'Mullane, Alex Szalay Johns
Hopkins University
Mirage web site http//www.cs.bell-labs.com/wh
o/tkh/mirage VO interface
http//skyservice.pha.jhu.edu/develop/vo/mirage
Mirage (in public release since 2002) is a
prototype of an analysis tool that supports
pattern discovery across multi-typed data.
Mirage is a Java-based tool that is organized
around a command interpreter which receives
action commands from textual input or a graphical
user interface. The action commands are for
loading data, incremental import of new entries
and new attributes, simple attribute
manipulation, and activating several embedded
classification routines. The most important
functionalities are built on simultaneous
visualization of raw image data, extracted
feature vectors, and classification results. The
graphical display presents a stack of canvas
pages. Each page can be subdivided arbitrarily,
via horizontal or vertical splits, into
rectangular cells. Each cell can be loaded with
any particular data view module via simple
drag-and-drop operations. Each module provides
its own control commands to manipulate the
specific method of data presentation. In
addition, all view modules implement the same
Java Interface "ActivePanel", which contains the
following commands that, when coupled with
view-specific operations, support very powerful
exploration operations getSelected()
clearSelected() highlightDataEntry()
colorDataEntry() clearHighlights()
clearColors() changeToMonochrome()
changeToColor() Early results from various uses
of Mirage have been very encouraging. We have
plans to refine and generalize the ideas
experimented in the software, towards a more
versatile tool suitable for supporting more
advanced analysis of large-scale imaging
databases featured in next-generation
astronomical surveys.
- Many large-scale sky surveys are generating data
at a rate far beyond reach by traditional manual
analysis. This trend is accelerating in the
near future, the Large Synoptic Survey Telescope
(LSST) (http//www.lsst.org/lsst_home.shtml)
will repeatedly image the entire sky visible
from its site, at multiple wavelengths, producing
a time-tagged imaging database of 20 petabytes
and a corresponding event catalog of 150 TB,
with parameters of position, time, intensity,
colors, and motion. - Besides much increased data volume, databases are
no more collected for a single well-defined
purpose, with filters and detectors optimized for
known features. Paradigm-shifting discoveries
of unexpected events or correlations often result
from open-ended explorations. This requires a
tool which not only enables detection of the
unexpected, but rapid exploration and
visualization of the new phenomenon to determine
if it is scientifically valuable, or a
previously unidentified systematic error. - Challenges for the Analysis Tool
- Versatile visualization utilities allowing many
perspectives - Visualization can help verify correctness of
preprocessing steps, clean up undesirable
artifacts, choose relevant samples, spot
explicit patterns, select useful features, and
suggest algorithms and models. To support all
these needs, flexibility in the choice of
perspectives is critical. Moreover, a connecting
architecture is needed such that data
relationship can be easily tracked between
different views of the data. - Support for exploratory discovery across diverse
data types - Astronomical surveys contain multiple data types
and incomparable groups of variables. Examples
are images, spectra, light curves, and various
scalar or vector parameters derived from the raw
data. Relationships uncovered in each data type
need to be correlated with those from others.
This requires tools for modeling, building index
structures, and navigation of data distributions
in each data type, and methods for tracking
correlations between different navigation paths. - Integration of manual and automatic pattern
recognition methods - Human judgement needs to be part of the analysis
loop to apply proper domain expertise. Automatic
pattern recognition algorithms can process large
data volumes efficiently, objectively, and
consistently. They can also complement
deficiencies in manual explorations due to
unreliable human intuition or inability to
comprehend high-dimensional vectors. But
"stand-alone" algorithms are not enough. A
convenient bridge is needed to connect between
manual and automatic exploration tools. This
includes support for rapid examination of
different sampling options and feature choices,
algorithmic alternatives and parameters, and
facilities for checking the results for validity
and interpretation, in contexts of different
levels of abstraction from the raw data. - And a good tool should
- -- leverage existing visualization and analysis
methods, - - enable continued growth by addition of new
visualization or analysis tools, - - support interface with existing databases
access tools, - - be scalable in data volume and processing
speed.
- Mirage features
- Data Visualization in Multiple, Linked Views
Show patterns in histograms, scatter plots,
parallel coordinates, tables, images - Selection and Tracking Select points in any
view, broadcast to all others with highlights or
colors - Systematic Traversal of Data Structures Walk in
histograms, cluster graphs or trees, echo in all
other views - Flexible Graphics Utilities Open multiple-page
plots easily with arbitrary configuration - Command Scripts Run prepared groups of
operations as animations - Remote Database Access Retrieve data for
analysis over WWW VO data access via IVOA client
package
Work in progress Images FITS image panel with
World Coordinates support using JSky package
Array of image panels with synchronized zooming
and panning Panel for overlay of multiple
images and object markers Analysis Connection
to external libraries for automatic pattern
recognition Data structures for
high-dimensional spaces Database Join among
different datasets on arbitrary common keys (e.g.
RA, DEC) Coupling with VO access methods