Title: Mining spatiotemporal cascade patterns
1Mining spatio-temporal cascade patterns
1
2Outline
- Motivation
- Modeling Spatio-Temporal Cascade Pattern
- Spatio Temporal Cascade Pattern Mining Problem
- Challenges
- Related Work
- Contributions
- Key Concepts
- Proposed Approach
- Validation Methodology
- Validation of Interest Measures
- Analytical Validation of Algorithms
- Conclusion and Further Work
2
3Motivation
- Application Domains Crime (Crime Linkage),
Military (Insurgent Attack Patterns), ecology
(preservation of endangered species), power
systems (cascading blackouts) - Example Military
- Understanding insurgent Attack Patterns
- Understanding global and local trends in attacks.
- Predicting future locations of attacks.
- Example Crime
- Crime linkage analysis.
- Relationships between different types of crime
(Drunk Driving, Hit and Run, Homicide, Shop
Breaking) - Example Power Systems
- Transmission Data Analysis
- Identifying sequences of faults for a potential
cascading black out. - Preventing blackouts.
Source http//www.ferc.gov/EventCalendar/Files/20
040414101846-blackout.pps
3
4Motivation Tactical Crime Analysis
- Geographic Profiling Step of Scenario
SelectionRossmo2006 - Scenario selection Optimal subset of Crime
Sites to be profiled. - Performs Crime series Linkage Analysis
- Requires Human Intervention and manually
building separate scenarios. - No Existing measure for the validity of crime
linkage - Outliers Complicates the problem.
- Tactical Crime Analyst Questions
- Can we prune unnecessary data in an un-biased
way ? - Are a particular set of Crimes part of a single
series? - Are these serial crimes committed by one
individual or a group of individuals operating
together ? - How can we identify and eliminate outliers ?
Source http//pt.wikipedia.org/wiki/Ataques_a_Tir
os_em_Beltway
4
5Modeling Spatio Temporal Cascade Patterns Iraq
Insurgency Example
- Cascade Patterns
- Series of Attacks by US Troops. (A)
- Series of attacks on US Troops.(B)
- Series of attacks on civilian facilities.(C)
- Series of suicide attacks. (D)
- Series of Communal Clashes.(E)
- Attacks on Iraqi troops.(F)
Source www.tribuneindia.com
6Modeling Spatio Temporal Cascade Patterns
Lincoln Crime sample dataset
0100 AM
0130 AM
0200 AM
0230 AM
7Sample Data Records
8Modeling Spatio Temporal Cascade Patterns An
Example Cascade Pattern
Bar Closing(B)
AutoCrime(A)
Assault (C),Vandalism (D)
OtherCrimes(E)
D
A
B
C
D
A
Continuous Sub-cascades
D
B
E
C
A
E
C
Cascade Pattern
9Spatio Temporal Cascade Pattern Mining Problem
- Given
- A spatio-temporal event database.
- A set of M spatio-temporal event types.
- A spatial neighborhood relation S?, A temporal
Neighborhood relation T ? . - A Time window ( interval) T w (gt T ? )
- A spatio-temporal co-occurrence prevalence
threshold - A spatio-temporal link prevalence threshold
- A Cascade Prevalence Threshold CP?
- Find
- All Spatio-temporal Cascade (ST- Cascade)
patterns with cascade prevalence gt CP?, - where, all its component spatio-temporal
co-occurrence prevalence gt and all
its inter component links have a spatio-temporal
link prevalence gt - Objective
- Statistical Significance of interest measures.
- Minimize Computational Cost
- Constraints
- Correctness
- Completeness
- Monotonic Composite Multi-dimensional Interest
Measure
9
10Challenges
- Conceptual Challenges
- Spatial footprint of events over time changes
with different problem settings. - A cascade pattern is non-linear . (Example.,
Cascading Power Failure) - Concurrent processes can together form a
cascade. (Example A group of serial criminals
working together and committing different crimes) - Requires different timing considerations to
capture concurrency , prolonged influence and
lack of influence. - Statistical challenges
- Cascading behavior is multidimensional (involves
both space and time) Interest measures need to
capture this aspect. - Timing constraints and neighborhood definitions
vary across problem domains Interest measures
need to be flexible and correct. - Identifying the right statistical measure from
spatial statistics to compare proposed interest
measures. - Existence of non-linearity and concurrent
processes across space and time Requires
Composite Multi-dimensional Interest measures - Risk of generating spurious patterns.
- Desirable computational properties such as
monotonicity. - Computational challenges
- Composite multi-dimensional interest measures are
computationally complex. - Patterns are exponential in the number of event
types. - Patterns are exponential in the nature of the
timing constraints and maximum time span length
of the dataset.
11Classification of Related Work and Proposed
Pattern
11
12Modeling ST-Cascade Patterns Related ST-Patterns
12
13Related Work Topological Patterns Wang et al.
2005
- Topological Pattern Co-occurrence of m feature
types over a spatio-temporal neighborhood.
Example Bar Closing Drunk Driving, Hit and
Run, Accident
- Properties
- All events occur in the same spatio-temporal
neighborhood. - No concept of separate time intervals.
- Concept of Space and time neighborhood.
- Interest Measure Prevalence (Participation
Index over a Space Time Neighborhood)
Topological Patterns Wang et al. 2005
ST Cascade
- Limitation
- Does not recognize sequences of different
spatio-temporal co-occurrences - (no ordering between different
co-occurrences)
- Example
- Can catch crime patterns co-occurring in a
spatio-temporal neighborhood. - Cannot identify an ordering over different
spatio-temporal co-occurrences.
Contains
Spatio-temporal Co-occurrence
A Topological Pattern
13
14Related Work Sequential Patterns from
Spatio-temporal Event Datasets, Huang et al. 2008
- Spatio temporal Sequential Pattern Sequence of
spatio-temporal events based on a temporal
ordering.
Example
- Properties
- Defines Follow Predicate for time ordering
using a Particular neighborhood N(e) - Defines During Predicate for intervals
(possibly subset) must have to define another
neighbor hood N1(e) to find subsets. - Interest measure weak anti monotonic.
- Limitation
- Sequence Index and Density Ratio can be only a
single type of neighborhood function at a time
for a particular pattern. - Requires a-priori specification of the
respective neighbor hood definitions between
different event types to obtain sequences of
subsets. - Due to these reasons Sequence Index cannot be
used as the significance measure for capturing
cascades. - No notion of a continuous sub-sequence where
continuity referes to the presence of common
subsets of event types. - Rules out presence of both the Follow and
During relationship between a pair of event
types.
- Example
- Can identify ordered sets of spatio-temporal
event types that is a link between different
types. - Can identify links, but cannot identify
co-occurrences of crime types and hence cannot
identify ordering between different co-occurring
sets.
14
15Sequential Patterns from Spatio-temporal Event
Datasets
Spatio-temporal Cascade Patterns
Finds
Finds
Cannot find both of these from the same dataset
D and D in the above pattern correspond to
different event instance sets in this case, in
Cascades it is not, in crime also it is not!!
15
16A distinguishing Example
- Three Criterion
- Neighborhood Relationship
- Interest Measure
- Pattern Semantics
17A distinguishing Example
During
Follow
Sequence Index different values for both.
Hence cannot capture patterns where both exist.
18Related Work Collocation Episodes Cao et al.
2006
- Collocation Episodes finding inter-movement
regularities of different object types.
Example (Bar Closing, DrunkDriving)?(DrunkDriving
,HitRun)
- Properties
- A reference feature is used for defining the
sequence of episodes. - Defined for moving object types.
- Limitation
- A reference feature type is required for finding
such patterns - Trajectories known apriori.
- Example
- Can catch patterns which have common event type.
- Cannot identify patterns that do not have common
event type. - Assumes that a sequence of episodes are
connected by a common reference type.
18
19Collocation Episodes
Spatio-temporal Cascade Patterns
- Inputs
- Trajectories of Moving objects
- Inputs
- Spatio-temporal Event Database
Finds
Finds
Constraints
Cannot Find
19
20Related Work Spatio-Temporal Cross Correlation
Function, Ma et al. 2006 (ST Cross K Function)
- Temporal Extension to the Cross K Function -
statistical measure for quantifying cause
effect relationships of different event types
- Properties
- Extends Ripleys K function (Ripley et al. 1976)
by adding time. - One Tail case events of other type occurring
only after a current event - Two Tail case- events before and after the
current event.
- Limitations
- Defined only for pairs of event types.
- High Computational cost for computing the K
function.
- Example
- Can identify only pairs of spatio-temporally
correlated event types.
ST Cross K Function
ST Cascades
A
D
D
A
A
B
C
B
C
Two-Tail Effect
A
A
One-Tail Effect
B
E
B
20
21Contributions
- Modeled Spatio-Temporal Cascade Patterns
- Definition of monotonic composite interest
measure Cascade Prevalence - Prove interest measures preserve anti-monotone
property of Cascades and statistical
significance. - Development of a novel ST Cascade Miner Algorithm.
21
22Key Concepts -2 Spatio-Temporal Co-occurrence
- Spatio-Temporal Participation Ratio
- Spatio-Temporal Prevalence (Spatio-temporal
Participation Index)
- A Spatio-temporal Co-occurrence is prevalent if
22
23Key Concepts -3
- Spatio-Temporal Link Relationship is an
ordering between spatio-temporal co-occurrences
or spatio-temporal event types. -
- B and A are event types which
- Satisfy a spatial neighborhood relation S?
- Are within a time window Tw
- Do not satisfy a time neighborhood relationT?
- Definition 1 Spatio-temporal Link Prevalence of
a pattern
min( of instances of D satisfying a spatial
neighborhood relationship with B and occurring
after B)/ (total of instances of D), (of
instances of B in spatial neighborhood of D and
occurring before D)/(total of instances of B)
min4/4,3/5 3/5
- Definition 2 prevalent Spatio-temporal Cascade
3/5
3/5
Prevalent ST Cascades
ST Cascade
ST Cascade
3/5
23
ST Cascade
24Key Concepts -4
- Definition 3 Cascade Prevalence of a ST Cascade
Intersecting at the number of common instances of
B
Cascade Prevalence (instances of B in
) / (total
instances of B)
If, prevalent( ) and prevalent(
)
Intersecting at the number of common instances of
E
Cascade Prevalence (instances of E in
) / (total
instances of E)
If, prevalent( ) and prevalent(
)
Cascade Prevalence min (instances of E in
) / (total instances of E) ,
(instances of B in
) / (total
instances of B)
If, prevalent( ) and
prevalent( )
24
In all of the above cases Tw gt T?, if Tw T? ?
25Key Concepts -4 (contd , Examples)
if Tw T? then we have (or patterns are just
connected by common type)
A
D
D
B
C
If, prevalent( ) and
prevalent( )
Cascade Prevalence (instances of D in
) / (total
instances of D)
E
B
If, prevalent( ) and
prevalent( )
Cascade Prevalence (instances of B in
) / (total instances of B)
26Key Concepts -4 (contd, Examples)
A
D
D
B
C
Cascade Prevalence min(( instances of D in
)/ (total instances of D), (
instances of B in
)/ (total
instances of B) )
27 Evaluation of Interest Measures
Properties
- Monotonicity
- Spatio-temporal prevalence (PI)
- Link prevalence.
- Cascade Prevalence (CP)
- Statistical Significance to ST Cross K Function
(Ma et al. 2006) - Spatio-temporal prevalence (PI)
- Link prevalence is defined only between 2 types.
- Cascade Prevalence (CP)
28 Evaluation of Interest Measures
Properties
- Lemma 1
- Spatio-temporal Prevalence is Monotonic.
2/5
2/5
1
Spatio-temporal Link Prevalence is similar to
Spatio-temporal Prevalence with a different Time
window hence, has same properties.
2/5
- Lemma 2
- Cascade Prevalence preserves is Monotonic
3/5
3/5
2/5
28
29Spatiotemporal prevalence is an upper bound to ST
K-Function - Illustration
30Spatio-Temporal Link Prevalence is an upper bound
to ST K-Function - Illustration
31Cascade Prevalence is an upper bound to ST
K-Function - Illustration
32Proposed Approach Naïve Algorithm
- Step 1 Preprocessing Create instance level
graph - Step 2 Candidate Generation
- Candidate Co-occurrences
- Candidate Spatio-temporal Links
- Candidate continuous single link cascades
- Step 3 Repeatedly merge all continuous k-1
linked cascades to form continuous k linked
cascades. - Step4 Pruning Step
- Limitations
- Step 2 No pruning
- Step 3 merge all k-1 link cascades
- Step 4 looks at all candidates.
33Proposed Approach Naïve Algorithm
- Inputs
- All of the Inputs in problem definition
- Output
- All Prevalent Cascade Patterns
- Pseudo code
- preprocessing step
- generate_all_instance
_co_occurrences_ pplying S? and T ? . - generate_all_instance_
links_applying S? and T w . - mining step
- candidate generation
- generate all event type
co-occurrences - generate all event type
links - generate all continuous sub
cascades with one link add it to set S. - merging step
- intialize k 2
- repeat until S ! null set
- C all k-1 linked
continuous sub cascades - repeat until C!null
set - merge all
continuous k-1 linked continuous sub cascades
with one another
34Conclusions and Further Work
- Optimized algorithm for Mining ST Cascades
- Proof of Correctness and Completeness.
- Experimental Evaluation.