Title: Cascading spatio-temporal pattern discovery: A summary of results
1Cascading spatio-temporal pattern discovery A
summary of results
- Pradeep Mohan¹, Shashi Shekhar¹, James A.Shine²,
James P.Rogers2 - ¹University of Minnesota, Twin-Cities,
mohan,shekhar_at_cs.umn.edu - ²Engineering Research and Development Center,
Alexandria, VA James.A.Shine, James.P.Rogers.II_at_
usace.army.mil
2Outline
- Introduction
- Motivation
- Problem Statement
- Related Work
- Interest Measure
- CSTP Miner Algorithm
- Evaluation and Case Study
- Conclusion and Future Work
3Motivation Public Safety
T1
T2
T3
Assault(A)
C.4
A.2
B.2
A.3
Bar Closing(B)
C.1
B.1
Drunk Driving (C)
C.2
C.3
A.1
A.4
Cascading spatio-temporal pattern (CSTP)
- Partially ordered subsets of ST event types.
Bar Closing
Drunk Driving
- Located together in space.
Assault
- Occur in stages over time.
Stages Bar Closing, Assault , Drunk Driving,
Hurricane, Climate change etc.
Other Applications Climate change, epidemiology,
evacuation planning.
4Problem Definition
- Input a) ST framework, b) directed ST
neighbor relation R, c) Interest measure threshold
- Output A set of CSTPs with interestingness
gt threshold
- Objective a) Minimize computation costs while
discovering statistically meaningful CSTPs.
- Constraints a) Correctness and Completeness
Example
C
ST Join (R)
R 0.5 Miles, 2 min.
A
Threshold 0.5
B
5Challenges and Contributions
Challenges
- Space and Time are continuous
- Many overlapping ST neighborhoods
- Neighborhood enumeration is computationally
challenging
- Ex., Statistical interpretation Vs.
computational scalability
- Exponential Candidate Space
- Ex., Candidate CSTPs exponential in the number
of event types
Contributions
- Interest Measures
- Statistical Interpretation
- Computational Structure
- CSTP Miner Algorithm
- Filtering Strategies
- Evaluation
- Experimental Evaluation
- Case study
6Limitations of Related Work ST Data Mining
Related Work ST Sequences ST Subsets
Partial Order v X
Multiply connected X v
Multiple patterns v v
ST Statistical Interpretation X (only spatial) X
- ST Co-occurrence
- Treating space and time independently.
- Absence of partial order
- ST Sequence
- Does not account for multiply connected
patterns(e.g. nonlinear) - Misses non-linear semantics.
- No ST statistical interpretation.
6
7Interest Measures
- Cascade Participation Ratio (CPR)
Conditional Probability of observing an instance
of CSTP having seen an Instance of A
- Cascade Participation Index (CPI)
Lower bound on the Conditional Probability of
observing an instance of CSTP having seen an
Instance of A, B or C
C
A
B
8Interest Measures Statistical Interpretation
Spatial Statistics ST K-Function (Diggle et al.
1995)
- Cascade Participation Index (CPI) is an upper
bound to the ST K-Function
Example
Time Axis
X Axis
Y Axis
ST K-Function 2/9 3/9 1/3 9/9 1
CPI 2/3 1 1
8
9CSTP Miner Algorithm Overview
Filtering Choice
R
Upper Bound Filter
CPI Threshold
- CPI computation involves ST Join.
- ST Join
- Sort-merge over time
- Nested loop over space.
- Computational Bottleneck!
Candidate Generation
Cycle checking
Cycles Removed
Multi-resolution Filter
Pruned CSTPs
Compute CPI
Prune CSTP
Prevalent CSTPs
using same strategy as Kuramochi and Karypis04
9
10Filtering strategies
- Enhance Savings Filter Non-prevalent CSTPs
before CPI computation
- Before Candidate Generation Upper bound
(UB)filter
- Key Idea
- CPI has anti-monotone upper bound.
- After Candidate Generation Multi-resolution
ST(MST) filter
- Key Idea
- There exists a low dimensional embedding in
space and time. - Over estimate CPI by coarsening ST dataset.
- If Overestimate (CPI) lt Threshold Pruned
10
11Evaluation
Goals
- What is the effect of event types on execution
time ?
- What is the effect of CPI threshold ?
c. Other experiments Effect of Neighborhood
size, Dataset size, Grid Parameters
- Real Dataset City of Lincoln, Nebraska, Year
2007 - Matlab 7.0 , X5355 2.66 GHZ with 16 GB Main
Memory and Linux OS - Events within an interval of 10 minutes were
assigned the same time stamp.
12Experimental Analysis
Questions
b. What is the effect of CPI threshold ?
a. What is the effect of event types ?
Fixed parameters a. CPI 0.2 b. Time
Neighborhood 1750 Time stamps.
Fixed parameters a. of event types 5 b.
Time Neighborhood 1750 Time stamps.
Trends
a. Patten size is exponential in the number of
event types. b. MST filter enhances computational
savings.
12
13Lincoln, NE crime dataset Case study
- Is bar closing a generator for crime related
CSTP ?
Bar locations in Lincoln, NE
- Observation Crime peaks around bar-closing!
Questions
- Is bar closing a crime generator ?
- Are there other generators (e.g. Saturday Nights
)?
K.S Test Saturday night significantly different
than normal day bar closing (P-value 1.249x10-7
, K 0.41)
14Conclusions
- Cascading ST Patterns are useful in applications
like Public Safety and Climate change science.
- ST Multi-resolution filtering enhances
computational performance. - Complementary filtering strategies.
- Statistically interpretable interest measure.
Future work
- New interest measure alternatives.
- Qualitative Comparison with Graphical Models
(e.g. Dynamic Bayes Nets, Hidden Markov Models
etc.)
15 Acknowledgment
- Members of the Spatial Database and Data Mining
Research Group University of Minnesota,
Twin-Cities. - This Work was supported by Grants from USARMY
and NSF.
Thank You for your Questions, Comments and
Patience!
15
16Crime Report Schema Alignment
- University of Texas at Dallas
17Overview
- Two different tables from two different data
sources. Our goal is to align attributes between
two tables.
Washington DC Incidents Reported
Lincoln_Nebraska Incidents Reported
NID CCN ence Long Latitude
3768 571398 Arson 38.87010181 -76.9822237
3787 519110 Theft 38..88852 -76.9370033
3779 519097 Burglary 38.95143 -77.0238048
INC_ Time_ Date_ Team_ Area
45111 2124 11-17-2007 Northwest Team
41000 1822 12-2-2007 Center Team
Code Crime
45111 Arson
41000 Auto-theft
41000 Unauthorized use of motor vehicle
18Dataset ER Diagram
Washington DC
Lincoln
Crime
Crime_type
crime
Incident_2007_reported
Incident_2007_reported
located
located
located
located
Football Match
Bars
Football Match
Bars
Crime is an attribute in Washington DC Dataset,
while it is a table in Lincoln Dataset.
19Schema Alignment
- Syntactic Matching Keyword-based matching on
Crime name - Lincoln.CrimeType. IncidentClassification
Robbery - Washington.Crime Robbery
- Semantic Matching Semantically Relevant
- A. Specialization vs. Generalization
- Lincoln.CrimeType. IncidentClassification
Death - Washington.Crime Homicide
- Death is super class of Homicide
- B. Finding Semantic Matching
- Definition of Crimes
- Using shared Words to determine Similarity
- Relevant Words
- Find relevant words using K-medoid Clustering
and Normalized Google Distance (NGD)
Jeffrey Partyka, Latifur Khan, Bhavani
Thuraisingham, Geographically-Typed Semantic
Schema Matching, In Proc. of ACM SIGSPATIAL
International Conference on Advances in
Geographic Information Systems (ACM GIS 2009),
Seattle, Washington, USA, November 2009.
Extended Version Submitted to Journal of Web
Semantics, Springer.
20I. Finding Semantic Matching using Definition of
Crime
- Finding shared words to determine similarity
- Larceny-Theft Unlawful taking, carrying,
leading, or riding away of property from the
possession or constructive possession of another
attempts to do these acts are included in the
definition. 1 - Theft Illegal taking of another person's
property without that person's freely-given
consent. 2 - Assault An act that causes another to
apprehend an immediate harmful contact. 3 - Red keywords are common words in crime
definitions, while blue keywords are not common..
1 http//www.fbi.gov/ucr/cius_04/offenses_report
ed/property_crime/larceny-theft.html 2
http//en.wikipedia.org/wiki/Theft 3
http//en.wikipedia.org/wiki/Assult
21 II. K-medoid NGD Instance Similarity
Extract distinct keywords from compared columns
Step 1
C1
C2
Offence Long Latitude
Arson 38.87010181 -76.9822237
Theft 38..88852 -76.9370033
Burglary 38.95143 -77.0238048
INC_ Team_Area
Arson Northwest Team
Theft Center Team
Lincoln
Washington DC
Keywords extracted from columns Arson, Theft,
Stolen,
Group distinct keywords together into semantic
clusters
Step 2
Arson,Theft,Burglary,.
Arson,Theft,Northwest.
C1 U C2
Similarity H(CT) / H(C)
Step 3
Calculate Similarity
22Lincoln, NE crime dataset Case study
Pop I Pop II KS P-Val. a 0.05 a 0.2
Sat Night All Year 0.4187 1.249x10-7 Yes Yes
Football Night All Year 0.3400 0.1067 NO Yes
Sat Night Football Night 0.1987 0.7899 NO No
22
23Limitations of Related Work Traditional Data
Mining
T
C.2
C.1
C.4
C.3
A.1
Time partition
A.3
A.4
A.2
B.2
B.1
X
Y
Space partition
- Transaction Graph Mining.
- Transaction is a core concept.
- Sequential pattern mining.
- Support as an interest measure.
- Transactionization of a continuous framework ?
non empty cutsets.
- Support (frequency) leads to double counting of
overlapping edges.
24Graph Mining Limitations
Output Frequent Pattern Space (GF)
Input Dataset Space (GI)
And Other patterns
Maximum Independent Set Choices
Choice 1 E11,E13 2 Choice 2 E12 1
Overlap Graphs of Embeddings
B.1
Properties
A.2
B.1
C.2
C.3
A.1
- MIS Choices are non-unique.
- No statistical interpretation.
- Exact solutions are NP Complete, Approximate
solutions need not be complete.
B.1
C.3
A.1
(A.1, B.1)
(C.2, B.1)
E12
E13
E11
24
25Related Work
Data Mining
Spatio-temporal DM
Traditional DM
Sequences
Graphs
Sequences
Subsets
Graphs
Association
Our Work
Transaction
Single Graph
Single Pattern
Multiple Patterns
MIS Patterns
Process Mining
Limitations of Related Work
Related Work ST Sequences ST Subsets Process Mining MIS Graph Patterns
Partial Order X X v X
Multiply connected X v v X (un directed graphs)
Multiple patterns v v X v
ST Statistical Interpretation X (only spatial) X X X
25
26More experimental analysis
More Questions
c. What is the effect of temporal neighborhood
size ?
- What is the effect of spatial neighborhood size ?
- What is the effect of dataset size ?
Fixed parameters a. CPI 0.2 b. Time
Neighborhood 1750 time stamps c. Spatial
Neighborhood 7 miles. d. Dataset size 4083
instances e. Event types 5
Trends
- MST filter enhances computational savings.
- Performance sensitive to time neighborhood size.
- Performance not very sensitive spatial
neighborhood.
26
27Sensitivity of MST filter to grid parameters
- What is the Sensitivity of MST grid parameter d ?
- What is the Sensitivity of MST grid parameter t ?
Fixed parameters a. Grid parameter , t 2000
time stamps
Fixed parameters a. Grid parameter , d 7
miles
Trends
- MST Filter is more sensitive to the temporal
parameter (t) than the spatial parameter.
27