Spatial Data Mining - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Spatial Data Mining

Description:

Mining of non-spatial data. Diaper sales and beer sales are correlated in evenings ... Which spatial events are predictable? ... Unique Properties of Spatial Patterns ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 28
Provided by: csS1
Category:
Tags: data | mining | spatial

less

Transcript and Presenter's Notes

Title: Spatial Data Mining


1
Spatial Data Mining
  • Satoru Hozumi
  • CS 157B

2
Learning Objectives
  • Understand the concept of Spatial Data Mining
  • Learn techniques on how to find spatial patterns

3
Examples of Spatial Patterns
  • 1855 Asiatic Cholera in London.
  • A water pump identified as the source.
  • Cancer cluster to investigate health hazards.
  • Crime hotspots for planning police patrol routes.
  • Affects of weather in the US caused by unusual
    warming of Pacific ocean (El Nino).

4
What is a Spatial Pattern?
  • What is not a pattern?
  • Random, haphazard, chance, stray, accidental,
    unexpected.
  • Without definite direction, trend, rule, method,
    design, aim, purpose.
  • What is a Pattern?
  • A frequent arrangement, configuration,
    composition, regularity.
  • A rule, law, method, design, description.
  • A major direction, trend, prediction.

5
Defining Spatial Data Mining
  • Search for spatial patterns.
  • Non-trivial search as automated as possible.
  • Large search space of plausible hypothesis
  • Ex. Asiatic cholera causes water, food, air,
    insects.
  • Interesting, useful, and unexpected spatial
    patterns.
  • Useful in certain application domain
  • Ex. Shutting off identified water pump gt saved
    human lives.
  • May provide a new understanding of the world
  • Ex. Water pump Cholera connection lead to the
    germ theory.

6
What is NOT Spatial Data Mining
  • Simple querying of Spatial Data
  • Finding neighbors of Canada given names and
    boundaries of all countries (Search space not
    large)
  • Uninteresting or obvious patterns
  • Heavy rainfall in Minneapolis is correlated with
    heavy rainfall in St. Paul (10 miles apart).
  • Common knowledge, nearby places have similar
    rainfall
  • Mining of non-spatial data
  • Diaper sales and beer sales are correlated in
    evenings

7
Families of Spatial Data Mining Patterns
  • Location Prediction
  • Where will a phenomenon occur?
  • Spatial Interactions
  • Which subset of spatial phenomena interact?
  • Hot spot
  • Which locations are unusual or share
    commonalities?

8
Location Prediction
  • Where will a phenomenon occur?
  • Which spatial events are predictable?
  • How can a spatial event be predicted from other
    spatial events?
  • Examples
  • Where will an endangered bird nest?
  • Which areas are prone to fire given maps of
    vegitation and drought?
  • What should be recommended to a traveler in a
    given location?

9
Spatial Interactions
  • Which spatial events are related to each other?
  • Which spatial phenomena depend on other
    phenomenon?
  • Examples
  • Earth science
  • climate and disturbance gt wild fires, hot, dry,
    lightning
  • Epidemiology
  • Disease type and enviornmental events gt West
    Nile disease, stagnant water source, dead birds,
    mosquitoes

10
Hot spots
  • Is a phenomenon spatially clutered?
  • Which spatial entities are unusual or share
    common characteristics?
  • Examples
  • Crime hot spots to plan police patrols

11
Spatial Queries
  • Spatial Range Queries
  • Find all cities within 50 miles of Paris
  • Query has associated region (location, boundary)
  • Answer includes overlapping or contained data
    regions
  • Nearest-Neighbor Queries
  • Find the 10 cities nearest to Paris
  • Results must be ordered by proximity
  • Spatial Join Queries
  • Find all cities near a lake
  • Join condition involves regions and proximity.

12
Unique Properties of Spatial Patterns
  • Items in a traditional data are independent of
    each other, where as properties of location in a
    map are often auto-correlated (patterns exist)
  • Traditional data deals with simple domains, e.g.
    numbers and symbols where as spatial data types
    are complex
  • Items in traditional data describe discrete
    objects where as spatial data is continuous

13
Association Rules
  • Support the number of time a rule shows up in a
    database
  • Confidence Conditional probability of Y given X
  • Example
  • (Bedrock type limestone), (soil depth lt 50 ft)
    gt (sink hole risk high)
  • Support 20 , confidence 0.8
  • Interpretation Locations with limestone bedrock
    and low soil depth have high risk of sink hole
    formation.

14
Apriori Algorithm to mine association rules
  • Key challenge
  • Very large search space
  • Key assumption
  • Few associations are support above given
    threshold
  • Associations with low support are not interesting
  • Key insight
  • If an association item set has high support, then
    so do all its subsets

15
Association rules Example
16
Techniques for Association Mining
  • Classical method
  • Association rules given item types and
    transactions
  • Assumes spatial data can be decomposed into
    transactions
  • Such decomposition may alter spatial patterns
  • New spatial method
  • Spatial association rule
  • Spatial co-location

17
Associations, Spatial associations, co-location
18
Associations, Spatial associatins, co-location
19
Co-location Rules
  • For point data in space
  • Does not need transaction, works directly with
    continuous space
  • Use neighborhood definition and spatial joins

20
Co-location rules
21
Clustering
  • Process of discovering groups in large databases
  • Spatial view rows in a database points in a
    multi-dimentional space.
  • Visualization may reveal interesting groups

22
Clustering
  • Hierarchical
  • All points in one cluster
  • Split and merge till a stop criterion is reached
  • Partitional
  • Start with random central point
  • Assign points to nearest central point
  • Update the central points
  • Approach with statistical rigor
  • Density
  • Find clusters based on density of regions

23
Outliers
  • Observations inconsistent with rest of the
    dataset
  • Observations inconsistent with their
    neighborhoods
  • A local instability or discontinuity

24
Variogram Cloud
  • Create a variogram by plotting attribute
    difference, distance for each pair of points
  • Select points common to many outlying pairs

25
Moran Scatter Plot
  • Plot normalized attribute values, weighted
    average in the neighborhood for each location
  • Select points in upper left and lower right
    quadrant

26
Scatter plot
  • Plot normalized attribute values, weighted
    average in the neighborhood for each location
  • Fit a liner regression line
  • Select points which are unusually far from the
    regression line.

27
Conclusion
  • Patterns are opposite of random
  • Common spatial patterns
  • Location prediction
  • Feature interaction
  • Hot spot
  • Spatial patterns may be discovered using
  • Techniques like associations, clustering and
    outlier detection
Write a Comment
User Comments (0)
About PowerShow.com