Title: MINING RELATIONSHIPS AMONG INTERVALBASED EVENTS FOR CLASSIFICATION
1MINING RELATIONSHIPS AMONG INTERVAL-BASED EVENTS
FORCLASSIFICATION
- Dhaval Patel Wynne Hsu Mong Li
Lee - School of Computing
- National University of Singapore
2Outline
- Introduction
- Problem Tasks
- Contributions
- Proposed Solution
- Experiments
- Conclusion
- On-going Work
3Introduction
- Event duration captures temporal relation between
events - Diabetic patients (E1 Overlap E2)
- Multimedia video anomaly detection in smart
home environment -
- Financial time series - stock market data
4Introduction
- Encoding of temporal relation between events
(A Overlap B) Overlap C
(A Overlap B)
?
(A Overlap B) Overlap C
5Problem Tasks
- Design a lossless representation to encode
temporal relation among events (gt 3 events) - Design an efficient algorithm to discover
frequent interval-based temporal patterns - Apply the discovered patterns in classification
6Contributions
- Design an augmented hierarchical representation
- Develop Apriori based frequent interval-based
temporal patterns discovery algorithm called
IEMiner - Build IEClassifier based on discovered frequent
interval-based temporal patterns
7Augmented Hierarchical Representation
- Incorporate additional count information
- Contain, Finish by, Meet, Overlap, Start
- Representation is lossless
(A Overlap0,0,0,1,0 B) Overlap0,0,0,1,0 C
(A Overlap0,0,0,1,0 B)
(A Overlap0,0,0,1,0 B) Overlap0,0,0,2,0 C
8IEMiner
Frequent k-pattern
Candidate generation
Candidate (k1)-patterns
Support counting
Frequent (k1)-patterns
9IEMiner
Support (A Overlap0,0,0,1,0 B) ¾ (
75) Confidence ((A Overlap0,0,0,1,0 B) gt
Class A) 2/3) Confidence ((A Overlap0,0,0,1,0
B) gt Class B) 1/3)
10IEMiner
Frequent k-pattern
Candidate generation
Candidate (k1)-patterns
Support counting
Frequent (k1)-patterns
11Candidate generation
- Straightforward Apriori-based approach
- Generate level (k1) candidates from 2 frequent k
patterns
13
12Candidate generation
- Candidate Generation at level (k1)
- Generate candidates from frequent k-pattern and
2-pattern - To constraint size of candidate set
- Size of 2-pattern set is reduced in each
iteration - Only selected k-patterns are expanded
13Candidate generation
Generate 4-pattern from frequent 3 pattern and
2-pattern
A Overlap0,0,0,1,0 B A Before0,0,0,0,0 D B
Before0,0,0,0,0 D A Before0,0,0,0,0 F A
Before0,0,0,0,0 G F Before0,0,0,0,0 G C
Contain1,0,0,0,0 D .
14Candidate generation
15Candidate generation
- Theorem 2
- At iteration (k1), 2-patterns which are present
in less that (k-1) frequent k-pattern will not
generate any valid candidates.
16Candidate generation
Generate 4-patterns from frequent 3 patterns and
2-patterns
A Overlap0,0,0,1,0 B A Before0,0,0,0,0 D B
Before0,0,0,0,0 D A Before0,0,0,0,0 F A
Before0,0,0,0,0 G F Before0,0,0,0,0 G C
Contain1,0,0,0,0 D .
3 3 3 3 1 1 2
17IEMiner
Frequent k-pattern
Candidate generation
Candidate (k1)-patterns
Support counting
Frequent (k1)-patterns
18Support counting
- Count number of windows in which each candidate
(k1) patterns are present - For each window w
- Intelligently generate only those candidates
which are present in candidate (k1)-pattern set - Increment count of those generated candidate
patterns - Issue Avoid processing unnecessary windows
19Optimization
- Prefix Count
- Selectively expands frequent k-pattern during
candidate generation - Window Blacklist
- Avoid un-necessary checking of windows to reduce
dataset size
20IEClassifier
D
Input
n1
n2
nn
3
4
10
10
Majority Vote
Highest Confidence
21Experiments
- Evaluate efficiency and scalability of IEMiner on
both synthetic and real world datasets - Evaluate accuracy of IEClassifier on Hepatitis
Dataset
22Effect of varying minimum support
23Effect of varying database size
24Effect of varying pattern length
25Effect of varying event density
26Effect of optimization strategies
Window Blacklist
Prefix Count
27ASL Dataset
28Hepatitis Dataset
29Accuracy of IEClassifier
Experiments on Hepatitis Data
30Some Discovered Results
Hepatitis Data
31Conclusion
- Mining relationships among interval-based events
is important problem having applications in
diverse field - Proposed Augmented Hierarchical Representation
- Designed an efficient IEMiner algorithm
- Designed IEClassifier based on frequent pattern
- Temporal abstraction applied to Hepatitis dataset
can be viewed as domain dependent dimensionality
reduction techniques
32On Going Work
- Integrating IEMiner and IEClassifier as a single
stage algorithm - Discover only those patters with high
discriminating power
33Q A?
34 35Related Work
- Kams A1-pattern discovery algorithm DaWak-2000
- Lossy Representation
- Used vertical id concept
- H-DFS ICDE-2005
- Matrix Based Representation List n(n-1)/2
relations for temporal pattern - Used vertical id concept with candidate
generation - Tprefix TKDE-2007
- Transform interval data into sequence
- Prefix Based approach
36Candidate generation
Generate 4-patterns from frequent 3 patterns and
2-patterns
A Overlap0,0,0,1,0 B A Before0,0,0,0,0 D B
Before0,0,0,0,0 D A Before0,0,0,0,0 F A
Before0,0,0,0,0 G F Before0,0,0,0,0 G C
Contain1,0,0,0,0 F .
F
F
37Support Counting
Candidate Pattern (A Overlap0,0,0,1,0 B)
Overlap0,0,0,2,0 C
Before active_TP A, B, A Overlap0,0,0,1,0
B passive_TP
After active_TP A, B, A Overlap0,0,0,1,0
B, A Overlap0,0,0,1,0 C, B Overlap0,0,0,1,0
C, A Overlap0,0,0,1,0 B Overlap0,0,0,2,0
C passive_TP
38Optimization
39Optimization
Generate 4-patterns from frequent 3 patterns and
2-patterns
1
1
40Optimization