Mining Solar Images to Support Astrophysics Research - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Mining Solar Images to Support Astrophysics Research

Description:

from 6000 degrees in photosphere (visible surface of the Sun) ... The outer atmosphere of the Sun (the corona) is indeed hotter than the underlying photosphere! ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 42
Provided by: olf5
Category:

less

Transcript and Presenter's Notes

Title: Mining Solar Images to Support Astrophysics Research


1
Mining Solar Images to Support Astrophysics
Research
  • Olfa Nasraoui
  • Computer Engineering Computer Science
  • University of Louisville
  • Olfa.nasraoui_at_louisville.edu
  • In collaboration with
  • Joan Schmelz
  • Department of Physics
  • University of Memphis
  • jschmelz_at_memphis.edu
  • Acknowledgement team members who worked on this
    project
  • Nurcan Durak, Sofiane Sellah, Heba Elgazzar,
    Carlos Rojas (Univ. of Louisville)
  • Jonatan Gomez and Fabio Gonzalez (National Univ.
    of Colombia)
  • Jennifer Roames, Kaouther Nasraoui (Univ. of
    Memphis)

NASA-AISRP PI Meeting, Univ. Maryland, Oct. 3-5
2006
2
Outline
  • Motivations Goals of the Project
  • Sources of Data
  • Methodology
  • Results on EIT Data
  • Upcoming Plans

3
Motivations (1) The Coronal Heating Problem
  • The question of why the solar corona is so hot
    remains one of the most exciting astronomy
    puzzles for the last 60 years.
  • Temperature increases very steeply
  • from 6000 degrees in photosphere (visible surface
    of the Sun)
  • to a few million degrees in the corona (region
    500 kilometers above the photosphere).
  • Even though the Sun is hotter on the inside than
    it is on the outside.
  • The outer atmosphere of the Sun (the corona) is
    indeed hotter than the underlying photosphere!
  • Measurements of the temperature distribution
    along the coronal loop length can be used to
    support or eliminate various classes of coronal
    temperature models.
  • Scientific analysis requires data observed by
    instruments such as EIT, TRACE, and SXT.

4
Motivations (2) Finding Needles in Haystacks
(manually)
  • The biggest obstacle to completing the coronal
    temperature analysis task is collecting the right
    data (manually).
  • The search for interesting images (with coronal
    loops) is by far the most time consuming aspect
    of this coronal temperature analysis.
  • Currently, this process is performed manually.
  • It is therefore extremely tedious, and hinders
    the progress of science in this field.
  • The next generation "EIT" called MAGRITE,
    scheduled for launch in a few years on NASA's
    Solar Dynamics Observatory, should be able to
    take
  • as many images in about four days
  • as was taken by EIT over 6 years!
  • and will no doubt need state of the art
    techniques to sift through the massive data to
    support scientific discoveries

5
Goals of the project Finding Needles in
Haystacks (automatically)
  • Develop an image retrieval system based on Data
    Mining
  • to quickly sift through data sets downloaded from
    online solar image databases
  • and automatically discover the rare but
    interesting images containing solar loops, which
    are essential in studies of the Coronal Heating
    Problem
  • Publishing mined knowledge on the web in an
    easily exchangeable format for astronomers.

6
Sources of Data
  • EIT Extreme UV Imaging Telescope aboard the
    NASA/European Space Agency spacecraft called SOHO
    (Solar and Heliospheric Observatory)
  • http//umbra.nascom.nasa.gov/eit
  • TRACE NASAs Transition Region And Coronal
    Explorer
  • http//vestige/lmsal.com/TRACE.SXT
  • SXT Soft X-ray Telescope database on the
    Japanese spacecraft Yohkoh
  • http//ydac.mssl.ucl.ac.uk/ydac/sxt/sfm-cal-top.h
    tml

7
Samples of Data EIT

8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Steps
  • Sample Image Acquisition and Labeling
  • images with and without solar loops, 1020 X 1022
    2 MB / image
  • Image Preprocessing, Block Extraction, and
    Feature extraction
  • Building Evaluating Classification Models
  • At block level (is a block a loop or no-loop
    block?)
  • 10-fold cross validation
  • Train, then test on independent set 10 times,
  • average results
  • At image level (does an image contain a loop
    block?)
  • Use model learned from training data
  • One global model, or 1 model/solar cycle
  • Test on independent set of images from different
    solar cycles

17
Step 1. Sample Image Acquisition and Labeling
  • Used for
  • downloading training images to use as example for
    learning stage
  • Marking the blocks containing interesting loops
  • Marking data is added as metadata in header of
    the training image

18
(No Transcript)
19
Step 2. Image Preprocessing and Block Extraction
  • Despeckling (to clean noise) and Gradient
    Transformation (to bring out the edges)
  • Phase I (loops out of solar disk) divide area
    outside solar disk into blocks with an optimal
    size (to maximize overlap with marked areas over
    all training images)
  • Use each block as one data record to extract
    individual data attributes for learning and
    testing

20
Step 2 (contd) Block Extraction Labeling
  • Starting with marked (labeled) image
  • A mark rectangle that includes a real loop
  • Extract several out-of-disk blocks from each
    image
  • Label each block automatically based on overlap
    with marked blocks
  • class 1 Loops
  • class 2 No loop
  • Hence, will generate several positive negative
    examples

21
(No Transcript)
22
Difficult classification problemLoops come in
different sizes, shapes, intensities, etc



23
Hardly distinguishable regions without
interesting loops
  • Inconsistencies in labeling are common
  • Subjectivity, quality of data



24
Even at edge level challenging
  • Which block is NOT a loop block?

25
Even at edge level challenging
  • Which block is NOT a loop block?

26
Defective and Asymmetric nature of Loop Shapes
27
Features inside each block - applied on the
original intensity levels
  • Statistical Features
  • Mean
  • Standard Deviation
  • Smoothness
  • Third Moment
  • Uniformity
  • Entropy

28
Features inside each block - applied on edges
  • Hough-based Features
  • First apply Hough transform
  • Image space ? Hough Space (H.S.)
  • Pixel ? parameter combination for a given shape
  • All pixels ? vote for several parameter
    combinations
  • Extract peaks from H.S.
  • Then construct features based on H.S.
  • Peak detection is very challenging
  • Many false peaks (noise)
  • Bin splitting (peaks are split)
  • Biggest problem size of Hough accumulator array
  • Every pixel votes for all possible curves that go
    trough this pixel
  • Combinatorial explosion as we add more parameters
  • Solution we feed the Hough space into a stream
    clustering algorithm to detect peaks




29
Stream clustering(published in SIAM Data Mining,
2006)eliminate need to store Hough accumulator
array by processing it in 1 pass
  • Input initial scales s0, max. No. of clusters
  • Output running (real-time) synopsis of clusters
    in input stream
  • Repeat until end of stream
  • Input next data point x
  • For each cluster in current synopsis
  • Perform Chebyshev test (test for compatibility
    without any assumptions on distributions, but
    requires robust scale estimates)
  • If x passes Chebyshev test Then
  • Update cluster parameters centroid, scale
  • If no cluster or x fails all Chebyshev tests Then
  • Create new cluster (cx, s s0)
  • Perform pairwise Chebyshev tests to merge
    compatible clusters
  • densest cluster absorbs merged cluster
  • Centroid updated
  • Eliminate clusters with low density

30
Examples of results
  • Ability to learn (in 1 pass) cluster locations
    and sizes (scales) from very noisy data
  • Clusters of different
  • densities,
  • sizes,
  • shapes

31
Examples of clustering 2-D Hough space
32
Spatial Features
33
Curvature Features
34
Curvature Features
l
d
35
Step 3. Classification
36
(No Transcript)
37
Block-based results
150 solar images from 1996, 1997, 2000, 2001,
2004 2005 403 Loop blocks 7950 No-loop blocks
38
Loop Mining Tool
39
(No Transcript)
40
Image Based Testing Results
41
Upcoming Plans
  • Construction of better shape features from
    outputs of clustering of Hough space
  • TRACE data sets
  • Started
  • Online learning
  • Users can change the label after seeing results
  • System adapts learned models online
  • Use testing tool for labeling
  • Can help at least as a filter
  • Improve demo downloadable tools
Write a Comment
User Comments (0)
About PowerShow.com