Title: Knowledge Extraction from Scientific Data
1- Knowledge ExtractionfromScientific Data
- Roy Williams
- California Institute of Technology
- roy_at_caltech.edu
- SDMIV24 October 2002Edinburgh
KE Tools
S Data
2Scientific Data
- Datacubes
- N-dimensional array
- spectrum, time-series,
- image, voxels, hyperspectral image
- Concentration
- Pattern matching
- Integration
- Event Sets
- Often derived from pattern matching
- A set of events is a table
- Integrating Event Sets
- Clustering
3Knowledge Extraction
- Concentration
- principle components
- cluster/outlier finding
- Datacube ? Eventset
- Pattern matching
- From theory or from training set
- Integration
- registration of datacubes
- join / crossmatch of eventsets
4Datacube
Some stars from the DPOSS survey
5Datacube
An AVIRIS image of San Francisco Bay
atmospheric absorption
400-2500 nm in 224 bands R. Green, JPL
6Concentrating Information
- eg Principle Component Analysis
- Given a set of vectors
- Compute dot products
- (same as correlations)
- Diagonalize
- Throw out weaker (noise) components
7Information concentration
Principle Component Analysis
8Event Sets
- Created by pattern matching
- from a known rule
- from a training set
- by finding clusters
9Event Set Table
103?
namelongitude contentEarth
coordinate unitsdegrees datatypedouble displayf
6.2
nameID contentkey unitsnone datatypechar
43.4 87.2 83.2
E3948547 E3948545 E3943766
108?
10Gravitational Lenses
Pattern matching finds events in datacubes
A. Szalay, Johns Hopkins
11Black hole collisions
LIGO Laser Interferometric Gravitational Wave
Experiment
12Creating Event Sets
Supervised Classification
Given a set of volcanoes, find a lot more
volcanoes Here we use Singular Value Decomposition
13Multiparameter data colour-colour-fx/fopt
symbols X-ray source counterparts contours all
optical objects
Mike Watson Leicester University
14Integrating Datacubes
Find a mapping from one domain to the
other Registration of DPOSS and Hubble Deep Field
15Datacube Registration
Movement of ice inferred from registration
16Integrating Event Sets
- Database Join
- Fuzzy Join
- eg astronomical crossmatch
- Distributed Join
- does the Grid do databases?
17Integration of Star Catalogs
18Visualizing Event Sets
Unsupervised clustering
50000 stars in color-color space
19A Grid of Services
Human gets Data
Understood by human Further processing after
format change
Network of Services
Grid of pipes and engines Switches and actuators
data flow
20Example Grid of Services
Catalog Service
Query Check Service
Query Estimator
DPOSS Service
Users code
Crossmatch Service
2MASS Service
Storage Service
flexible complex metadata AND broadband binary
21Computing Challenges
Clustering Classification Visualization Outlier
Detection
- Visualization of 1010 points
- Database access to 1010 points
22Standards needed
- Bundling diverse objects together
- with code and references
- Referencing data resources on the Grid
- local, remote, replicated, ....
23Problem Solving Environment
- Plumbing (big data) and electrical (control,
metadata) - Web service and workflow
- Finding service classes/implementations by
semantics - GUI / Executive / IO adapters / Algorithms