Anomaly Detection for Scientific Data - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Anomaly Detection for Scientific Data

Description:

... Used: Orca (Distance-Based ... Orca Algorithm. Based on nested loops ... I presented one algorithm (Orca) that runs in nearly linear time so it can be ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 10
Provided by: markschw
Category:

less

Transcript and Presenter's Notes

Title: Anomaly Detection for Scientific Data


1
Anomaly Detection for Scientific Data
  • Mark Schwabacher
  • NASA ARC, Code TI (formerly IC, TC)
  • ROSES Code S T Workshop
  • February 17, 2005

2
What is Anomaly Detection?
  • Seek to find parts of the data (anomalies) that
    are different from the rest of the data
  • Supervised approaches use examples of
    anomalies unsupervised approaches do not.

3
How can Anomaly Detectionbe Applied to
Scientific Data?
  • Examples
  • Data from Earth-observing satellites
  • Data from telescopes
  • Direct scientists attention to anomalies could
    lead to scientific discoveries
  • Detect errors, so they can be corrected

4
Example Earth Science ApplicationVegetation Data
  • Joint work with Ranga Myneni of Boston University
  • Used Leaf Area Index (LAI) Fraction Absorbed of
    Photosynthetically Available Radiation (FPAR)
    from Moderate Resolution Imaging
    Spectroradiometer (MODIS) instrument aboard the
    Terra and Aqua satellites

5
Results
  • Used MODIS data from one time point at 4 km
    resolution (7.7 million pixels within Earths
    land area)
  • Used 4 variables LAI, FPAR, QA, and latitude
  • Used an unsupervised, distance-based anomaly
    detection algorithm
  • The 1 outlier was in northern Russia and the 2
    outlier was in southern New Zealand
  • Both points had unusually high LAI and FPAR
    values for their latitudes
  • Investigation revealed a bug in the software that
    produced the LAI and FPAR products
  • Error was corrected, and new versions of the data
    were made available to the scientific community.

6
Algorithm Used Orca(Distance-Based Outliers)
  • The main idea is to find points in low density
    regions of the feature space
  • V is the total volume within radius d
  • N is the total number of examples
  • k is the number of examples in sphere

Joint work with Stephen Bay of ISLE
7
Orca Algorithm
  • Based on nested loops
  • For each example, find its nearest neighbors
    with a sequential scan
  • Modified with a pruning rule
  • While performing the sequential scan,
  • Keep track of closest neighbors found so far
  • prune examples once the neighbors found so far
    indicate that the example cannot be a top outlier
  • Worst case O(N2) distance computations
  • In practice, runs in nearly linear time
  • Can handle millions of data points

8
Conclusions
  • Anomaly detection algorithms can find
    previously-unknown anomalies in large scientific
    data sets
  • Could lead to scientific discoveries or
    correction of errors
  • Different algorithms find qualitatively different
    anomalies, so it is worth running multiple
    algorithms
  • I presented one algorithm (Orca) that runs in
    nearly linear time so it can be applied to very
    large data sets

9
Pruning
  • Outliers based on distance to the 3rd nearest
    neighbor (k3)

sequential scan
d is distance to 3rd nearest neighbor for the
weakest top outlier
Write a Comment
User Comments (0)
About PowerShow.com