DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS

Description:

Observes Galaxies, Quasars, Stars Serendipity Objects Raw Data from Telescope is pre-processed Hundreds of attributes for each object National Virtual Observatory ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 2
Provided by: Haim154
Category:

less

Transcript and Presenter's Notes

Title: DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS


1
DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS
Data Avalanche in Astronomy
Cross Matching Alignment of Astronomy Catalogs
Tuple ID Join Attribute (X) A
P1 X1 A1
P2 X2 A2
P3 X3 A2
Tuple ID Join Attribute (X) B
Q1 X3 B1
Q2 X2 B2
Q3 X2 B4
Q4 X1 B3
  • Astronomy Sky Surveys (SDSS , 2MASS)
  • Observes Galaxies, Quasars, Stars Serendipity
    Objects
  • Raw Data from Telescope is pre-processed
  • Hundreds of attributes for each object
  • National Virtual Observatory - Develop an
    information technology infrastructure for
    enabling easy access to distributed astronomy
    catalogs

Join Attribute (X) A B
X1 A1 B3
X2 A2 B2
X2 A2 B4
X3 A2 B1
Catalog P
Catalog Q
The Matched Catalog
Distributed PCA Algorithm
  1. Data Matrix Site A - n X p , Site B n X q
  2. p q m (total number of attributes)
  3. Normalize the data at respective sites without
    any communication
  4. A central co-ordination site S sends A and B a
    random number generation seed
  5. A and B generate a l X n random matrix R
    (elements of the random matrix are i.i.d and
    chosen from any distribution with mean 0 and
    variance 1)
  6. A sends RA and B sends RB to S
  7. Compute D (RA)T (RB) / l
  8. ED EAT(RTR)B/ l AT ERTR B / l AT B
    (Johnson and Linden Strauss lemma)

The Fundamental Plane of Galaxies
Mass / Luminosity / Radius
Experimental Results
Velocity Dispersion
Surface Brightness
  1. Objective Finding correlations in high
    dimensional spaces
  2. Domain Knowledge For the class of elliptical
    galaxies, observe the parameters Surface
    Brightness, Log (Velocity Dispersion), Log
    (Radius)
  3. A 2D plane exists in the observed space of
    parameters called The Fundamental Plane

The Distributed Problem
Objective Finding correlations in high
dimensional spaces Domain Knowledge For the
class of elliptical galaxies, observe the
parameters Surface Brightness, Log (Velocity
Dispersion), Log (Radius) A 2D plane exists in
the observed space of parameters called The
Fundamental Plane
2MASS
Mean Surface Brightness ( Kmsb)
SDSS
Red Shift (rs) Angular Effective Radius (Iaer) Velocity Dispersion (vd)
Build a Distributed Principal Component Analysis
Algorithm
Assumptions 1. Build the cross matched table
off-line 2. Compute indices and send to the sites
Kmsb Velocity Dispersion (Angular Eff. Radius X Red Shift)
The Virtual Table
Work Done by Haimonti Dutta, Chris Giannella,
Kirk Borne, Ran Wolff and Hillol Kargupta NSF
Grants IIS-0329143 , IIS-0093353 , IIS-0203958
and NASA Grant NAS2-37143
Write a Comment
User Comments (0)
About PowerShow.com