Title: DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS
1DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS
Data Avalanche in Astronomy
Cross Matching Alignment of Astronomy Catalogs
- Astronomy Sky Surveys (SDSS , 2MASS)
- Observes Galaxies, Quasars, Stars Serendipity
Objects - Raw Data from Telescope is pre-processed
- Hundreds of attributes for each object
- National Virtual Observatory - Develop an
information technology infrastructure for
enabling easy access to distributed astronomy
catalogs
Catalog P
Catalog Q
The Matched Catalog
Distributed PCA Algorithm
- Data Matrix Site A - n X p , Site B n X q
- p q m (total number of attributes)
- Normalize the data at respective sites without
any communication - A central co-ordination site S sends A and B a
random number generation seed - A and B generate a l X n random matrix R
(elements of the random matrix are i.i.d and
chosen from any distribution with mean 0 and
variance 1) - A sends RA and B sends RB to S
- Compute D (RA)T (RB) / l
- ED EAT(RTR)B/ l AT ERTR B / l AT B
(Johnson and Linden Strauss lemma)
The Fundamental Plane of Galaxies
Mass / Luminosity / Radius
Experimental Results
Velocity Dispersion
Surface Brightness
- Objective Finding correlations in high
dimensional spaces - Domain Knowledge For the class of elliptical
galaxies, observe the parameters Surface
Brightness, Log (Velocity Dispersion), Log
(Radius) - A 2D plane exists in the observed space of
parameters called The Fundamental Plane
The Distributed Problem
Objective Finding correlations in high
dimensional spaces Domain Knowledge For the
class of elliptical galaxies, observe the
parameters Surface Brightness, Log (Velocity
Dispersion), Log (Radius) A 2D plane exists in
the observed space of parameters called The
Fundamental Plane
2MASS
Mean Surface Brightness ( Kmsb)
SDSS
Build a Distributed Principal Component Analysis
Algorithm
Assumptions 1. Build the cross matched table
off-line 2. Compute indices and send to the sites
The Virtual Table
Work Done by Haimonti Dutta, Chris Giannella,
Kirk Borne, Ran Wolff and Hillol Kargupta NSF
Grants IIS-0329143 , IIS-0093353 , IIS-0203958
and NASA Grant NAS2-37143