Cascading spatio-temporal pattern discovery: A summary of results - PowerPoint PPT Presentation

About This Presentation
Title:

Cascading spatio-temporal pattern discovery: A summary of results

Description:

Cascading spatio-temporal pattern discovery: A summary of results Pradeep Mohan , Shashi Shekhar , James A.Shine , James P.Rogers2 University of Minnesota, Twin ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 22
Provided by: Pradee45
Category:

less

Transcript and Presenter's Notes

Title: Cascading spatio-temporal pattern discovery: A summary of results


1
Cascading spatio-temporal pattern discovery A
summary of results
  • Pradeep Mohan¹, Shashi Shekhar¹, James A.Shine²,
    James P.Rogers2
  • ¹University of Minnesota, Twin-Cities,
    mohan,shekhar_at_cs.umn.edu
  • ²Engineering Research and Development Center,
    Alexandria, VA James.A.Shine, James.P.Rogers.II_at_
    usace.army.mil

2
Outline
  • Introduction
  • Motivation
  • Problem Statement
  • Related Work
  • Contributions
  • Interest Measure
  • CSTP Miner Algorithm
  • Evaluation and Case Study
  • Conclusion and Future Work

3
Motivation Public Safety
T1
T2
T3
Assault(A)
C.4
A.2
B.2
A.3
Bar Closing(B)
C.1
B.1
Drunk Driving (C)
C.2
C.3
A.1
A.4
Cascading spatio-temporal pattern (CSTP)
  • Partially ordered subsets of ST event types.

Bar Closing
Drunk Driving
  • Located together in space.

Assault
  • Occur in stages over time.

Stages Bar Closing, Assault , Drunk Driving,
Hurricane, Climate change etc.
Other Applications Climate change, epidemiology,
evacuation planning.
4
Problem Definition
  • Input a) ST framework, b) directed ST
    neighbor relation R, c) Interest measure threshold
  • Output A set of CSTPs with interestingness
    gt threshold
  • Objective a) Minimize computation costs while
    discovering statistically meaningful CSTPs.
  • Constraints a) Correctness and Completeness

Example
C
ST Join (R)
R 0.5 Miles, 2 min.
A
Threshold 0.5
B
5
Challenges and Contributions
Challenges
  • Space and Time are continuous
  • Many overlapping ST neighborhoods
  • Neighborhood enumeration is computationally
    challenging
  • Conflicting Requirements
  • Ex., Statistical interpretation Vs.
    computational scalability
  • Exponential Candidate Space
  • Ex., Candidate CSTPs exponential in the number
    of event types

Contributions
  • Interest Measures
  • Statistical Interpretation
  • Computational Structure
  • CSTP Miner Algorithm
  • Filtering Strategies
  • Evaluation
  • Experimental Evaluation
  • Case study

6
Limitations of Related Work ST Data Mining
Related Work ST Sequences ST Subsets
Partial Order v X
Multiply connected X v
Multiple patterns v v
ST Statistical Interpretation X (only spatial) X
  • Limitations
  • ST Co-occurrence
  • Treating space and time independently.
  • Absence of partial order
  • ST Sequence
  • Does not account for multiply connected
    patterns(e.g. nonlinear)
  • Misses non-linear semantics.
  • No ST statistical interpretation.

6
7
Interest Measures
  • Cascade Participation Ratio (CPR)

Conditional Probability of observing an instance
of CSTP having seen an Instance of A
  • Cascade Participation Index (CPI)

Lower bound on the Conditional Probability of
observing an instance of CSTP having seen an
Instance of A, B or C
C
A
B
8
Interest Measures Statistical Interpretation
Spatial Statistics ST K-Function (Diggle et al.
1995)
  • Cascade Participation Index (CPI) is an upper
    bound to the ST K-Function

Example
Time Axis
X Axis
Y Axis
ST K-Function 2/9 3/9 1/3 9/9 1
CPI 2/3 1 1
8
9
CSTP Miner Algorithm Overview
Filtering Choice
R
Upper Bound Filter
CPI Threshold
  • CPI computation involves ST Join.
  • ST Join
  • Sort-merge over time
  • Nested loop over space.
  • Computational Bottleneck!

Candidate Generation
Cycle checking
Cycles Removed
Multi-resolution Filter
Pruned CSTPs
Compute CPI
Prune CSTP
Prevalent CSTPs
using same strategy as Kuramochi and Karypis04
9
10
Filtering strategies
  • Enhance Savings Filter Non-prevalent CSTPs
    before CPI computation
  • Before Candidate Generation Upper bound
    (UB)filter
  • Key Idea
  • CPI has anti-monotone upper bound.
  • After Candidate Generation Multi-resolution
    ST(MST) filter
  • Key Idea
  • There exists a low dimensional embedding in
    space and time.
  • Over estimate CPI by coarsening ST dataset.
  • If Overestimate (CPI) lt Threshold Pruned

10
11
Evaluation
Goals
  1. What is the effect of event types on execution
    time ?
  1. What is the effect of CPI threshold ?

c. Other experiments Effect of Neighborhood
size, Dataset size, Grid Parameters
  • Real Dataset City of Lincoln, Nebraska, Year
    2007
  • Matlab 7.0 , X5355 2.66 GHZ with 16 GB Main
    Memory and Linux OS
  • Events within an interval of 10 minutes were
    assigned the same time stamp.

12
Experimental Analysis
Questions
b. What is the effect of CPI threshold ?
a. What is the effect of event types ?
Fixed parameters a. CPI 0.2 b. Time
Neighborhood 1750 Time stamps.
Fixed parameters a. of event types 5 b.
Time Neighborhood 1750 Time stamps.
Trends
a. Patten size is exponential in the number of
event types. b. MST filter enhances computational
savings.
12
13
Lincoln, NE crime dataset Case study
  • Is bar closing a generator for crime related
    CSTP ?

Bar locations in Lincoln, NE
  • Observation Crime peaks around bar-closing!

Questions
  • Is bar closing a crime generator ?
  • Are there other generators (e.g. Saturday Nights
    )?

K.S Test Saturday night significantly different
than normal day bar closing (P-value 1.249x10-7
, K 0.41)
14
Conclusions
  • Cascading ST Patterns are useful in applications
    like Public Safety and Climate change science.
  • ST Multi-resolution filtering enhances
    computational performance.
  • Complementary filtering strategies.
  • Statistically interpretable interest measure.

Future work
  • New interest measure alternatives.
  • Qualitative Comparison with Graphical Models
    (e.g. Dynamic Bayes Nets, Hidden Markov Models
    etc.)

15
Acknowledgment
  • Members of the Spatial Database and Data Mining
    Research Group University of Minnesota,
    Twin-Cities.
  • This Work was supported by Grants from USARMY
    and NSF.

Thank You for your Questions, Comments and
Patience!
15
16
Crime Report Schema Alignment
  • University of Texas at Dallas

17
Overview
  • Two different tables from two different data
    sources. Our goal is to align attributes between
    two tables.

Washington DC Incidents Reported
Lincoln_Nebraska Incidents Reported
NID CCN ence Long Latitude
3768 571398 Arson 38.87010181 -76.9822237
3787 519110 Theft 38..88852 -76.9370033
3779 519097 Burglary 38.95143 -77.0238048
INC_ Time_ Date_ Team_ Area
45111 2124 11-17-2007 Northwest Team
41000 1822 12-2-2007 Center Team
Code Crime
45111 Arson
41000 Auto-theft
41000 Unauthorized use of motor vehicle
18
Dataset ER Diagram
  • Heterogeneity

Washington DC
Lincoln
Crime
Crime_type
crime
Incident_2007_reported
Incident_2007_reported
located
located
located
located
Football Match
Bars
Football Match
Bars
Crime is an attribute in Washington DC Dataset,
while it is a table in Lincoln Dataset.
19
Schema Alignment
  • Syntactic Matching Keyword-based matching on
    Crime name
  • Lincoln.CrimeType. IncidentClassification
    Robbery
  • Washington.Crime Robbery
  • Semantic Matching Semantically Relevant
  • A. Specialization vs. Generalization
  • Lincoln.CrimeType. IncidentClassification
    Death
  • Washington.Crime Homicide
  • Death is super class of Homicide
  • B. Finding Semantic Matching
  • Definition of Crimes
  • Using shared Words to determine Similarity
  • Relevant Words
  • Find relevant words using K-medoid Clustering
    and Normalized Google Distance (NGD)

Jeffrey Partyka, Latifur Khan, Bhavani
Thuraisingham, Geographically-Typed Semantic
Schema Matching, In Proc. of  ACM SIGSPATIAL
International Conference on Advances in
Geographic Information Systems (ACM GIS 2009),
Seattle, Washington, USA, November 2009.
Extended Version Submitted to Journal of Web
Semantics, Springer.
20
I. Finding Semantic Matching using Definition of
Crime
  • Finding shared words to determine similarity
  • Larceny-Theft Unlawful taking, carrying,
    leading, or riding away of property from the
    possession or constructive possession of another
    attempts to do these acts are included in the
    definition. 1
  • Theft Illegal taking of another person's
    property without that person's freely-given
    consent. 2
  • Assault An act that causes another to
    apprehend an immediate harmful contact. 3
  • Red keywords are common words in crime
    definitions, while blue keywords are not common..

1 http//www.fbi.gov/ucr/cius_04/offenses_report
ed/property_crime/larceny-theft.html 2
http//en.wikipedia.org/wiki/Theft 3
http//en.wikipedia.org/wiki/Assult
21

II. K-medoid NGD Instance Similarity
Extract distinct keywords from compared columns
Step 1
C1
C2
Offence Long Latitude
Arson 38.87010181 -76.9822237
Theft 38..88852 -76.9370033
Burglary 38.95143 -77.0238048
INC_ Team_Area
Arson Northwest Team
Theft Center Team
Lincoln
Washington DC
Keywords extracted from columns Arson, Theft,
Stolen,
Group distinct keywords together into semantic
clusters
Step 2
Arson,Theft,Burglary,.
Arson,Theft,Northwest.
C1 U C2
Similarity H(CT) / H(C)
Step 3
Calculate Similarity
22
Lincoln, NE crime dataset Case study
Pop I Pop II KS P-Val. a 0.05 a 0.2
Sat Night All Year 0.4187 1.249x10-7 Yes Yes
Football Night All Year 0.3400 0.1067 NO Yes
Sat Night Football Night 0.1987 0.7899 NO No
22
23
Limitations of Related Work Traditional Data
Mining
T
C.2
C.1
C.4
C.3
A.1
Time partition
A.3
A.4
A.2
B.2
B.1
X
Y
Space partition
  • Properties
  • Related Work
  • Transaction Graph Mining.
  • Transaction is a core concept.
  • Sequential pattern mining.
  • Support as an interest measure.
  • Limitations
  • Transactionization of a continuous framework ?
    non empty cutsets.
  • Support (frequency) leads to double counting of
    overlapping edges.

24
Graph Mining Limitations
Output Frequent Pattern Space (GF)
Input Dataset Space (GI)
And Other patterns
Maximum Independent Set Choices
Choice 1 E11,E13 2 Choice 2 E12 1
Overlap Graphs of Embeddings
B.1
Properties
A.2
B.1
C.2
C.3
A.1
  1. MIS Choices are non-unique.
  2. No statistical interpretation.
  3. Exact solutions are NP Complete, Approximate
    solutions need not be complete.

B.1
C.3
A.1
(A.1, B.1)
(C.2, B.1)
E12
E13
E11
24
25
Related Work
Data Mining
Spatio-temporal DM
Traditional DM
Sequences
Graphs
Sequences
Subsets
Graphs
Association
Our Work
Transaction
Single Graph
Single Pattern
Multiple Patterns
MIS Patterns
Process Mining
Limitations of Related Work
Related Work ST Sequences ST Subsets Process Mining MIS Graph Patterns
Partial Order X X v X
Multiply connected X v v X (un directed graphs)
Multiple patterns v v X v
ST Statistical Interpretation X (only spatial) X X X
25
26
More experimental analysis
More Questions
c. What is the effect of temporal neighborhood
size ?
  • What is the effect of spatial neighborhood size ?
  • What is the effect of dataset size ?

Fixed parameters a. CPI 0.2 b. Time
Neighborhood 1750 time stamps c. Spatial
Neighborhood 7 miles. d. Dataset size 4083
instances e. Event types 5
Trends
  1. MST filter enhances computational savings.
  2. Performance sensitive to time neighborhood size.
  3. Performance not very sensitive spatial
    neighborhood.

26
27
Sensitivity of MST filter to grid parameters
  • What is the Sensitivity of MST grid parameter d ?
  • What is the Sensitivity of MST grid parameter t ?

Fixed parameters a. Grid parameter , t 2000
time stamps
Fixed parameters a. Grid parameter , d 7
miles
Trends
  1. MST Filter is more sensitive to the temporal
    parameter (t) than the spatial parameter.

27
Write a Comment
User Comments (0)
About PowerShow.com