COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence - PowerPoint PPT Presentation

About This Presentation
Title:

COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence

Description:

COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel Hill – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 23
Provided by: Ning54
Category:

less

Transcript and Presenter's Notes

Title: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence


1
COM (Co-Occurrence Miner)Graph Classification
Based on Pattern Co-occurrence
  • Ning Jin, Calvin Young, Wei Wang
  • University of North Carolina at Chapel Hill
  • 11/04/2009

2
What Are Graphs?
  • Graph
  • a set of nodes connected by a set of edges
  • nodes and edges can have labels
  • edges can have directions

1
2
1
2
3
Graph Classification Example
Negative set
Positive set
4
Graph Classification Example
Negative set
Positive set
5
Graph Classification Example
Negative set
Positive set
6
Graph Representation
Represented by
Represented by
graphs
7
Interesting Properties in Data
Determined by structure
Determined by structure
most
some
some
most
most
some
8
Graph Classification
positive
negative
positive
negative
Classify
Function is determined by structure
becomes
Classify graphs
9
Graph ClassificationUsing Frequent Subgraph
Patterns
The positive graphs should have Some common
subgraph patterns that negative graphs dont
have Generate classifiers
Frequent subgraph mining in the positive set
(frequency gt threshold)
Feature selection
High dimensional data points classification
10
Graph ClassificationUsing Frequent Subgraph
Patterns
The positive graphs should have Some common
subgraph patterns that negative graphs dont
have Generate classifiers
Frequent subgraph mining in the positive set
Feature selection
High dimensional data points classification
11
Graph ClassificationUsing Discriminative
Subgraph Patterns
Frequent subgraph mining in the positive set
Mining discriminative/significant subgraph
patterns
merge
Feature selection
Scoring function
Pattern redundancy Pattern 1 found in positive
graphs P1, P2 and in negative graphs N1,
N2 Pattern 2 found in positive graphs P1, P2, P3
and in negative graphs N1 Pattern 1 is redundant
given pattern 2
12
Graph ClassificationUsing Discriminative
Subgraph Patterns
Frequent subgraph mining in the positive set
Mining discriminative/significant subgraph
patterns
merge
Feature selection
Scoring function
Pattern redundancy Pattern 1 found in positive
graphs P1, P2 and in negative graphs N1,
N2 Pattern 2 found in positive graphs P1, P2, P3
and in negative graphs N1 Pattern 1 is redundant
given pattern 2
13
Previous Discriminative Pattern Mining Methods
  • Each tree node represents a subgraph pattern
  • Each node is a supergraph of its parent node,
    with one more edge
  • One subgraph pattern corresponds to only one node

Pattern redundancy Pattern 1 found in positive
graphs G1, G2 and in negative graphs G4,
G5 Pattern 2 found in positive graphs G1, G2, G3
and in negative graphs G4 Pattern 1 is redundant
given pattern 2
Scoring function
14
1. Heuristic Exploration Order
Pattern 1
Pattern 2
Pattern redundancy Pattern 1 found in positive
graphs G1, G2 and in negative graphs G4,
G5 Pattern 2 found in positive graphs G1, G2, G3
and in negative graphs G4 Pattern 1 is redundant
given pattern 2
15
Heuristic Exploration Order Delta Score
Pattern p
Large absolute value
Pattern p
Large derivative
Delta score of p score of p score of p
Its like looking for maximum of a function
16
Heuristic Exploration Order Delta Score
Pattern p
Pattern p
Delta score of p score of p score of p
17
Workflow of Pattern Exploration
Collect frequent edges in the positive set and
insert into a heap H
A frequency threshold tp is needed
If H not empty
terminate
Pop from H the pattern p with the highest delta
score
Extend pattern p and insert new non-redundant
patterns into H
18
2. Use Co-occurrences of Patterns
D
D
B
B
A
A
A
Can be approximated by
C
C
D
D
Co-occurrence
D
D
B
B
Graph G
A
A
A
Graph G
C
C
D
D
19
When Co-occurrence Is Superior
Separately A-B N1, N2, P1, P2, P3, P4 B-C N3,
N4, P1, P2, P3, P4 Co-occurrence of A-B and
B-C P1, P2, P3, P4 No negative graphs
20
Co-occurrence Generation
Candidate co-occurrence 1
For each new pattern p
Candidate co-occurrence 2
Pattern p
Candidate co-occurrence 3
Candidate co-occurrence 4
insert
Union of pattern p and candidate co-occurrence k
insert
Candidate co-occurrence n
merging candidate k and pattern p can improve the
score of p most significantly
A co-occurrence is a set of subgraph patterns
p1, p2, , pm
21
3. Use Association Rules to Classify
Association Rule p1, p2, p3, , pn ?
positive Input of COM (Co-Occurrence rule
Miner) Positive graph set, negative graph
set Frequency threshold tp of classification rule
in the positive set frequency threshold tn in
the negative set Output of COM A set of
association rules
22
Association Rule Generation
Terminate when each positive graph is covered
If a rule satisfies gttp and lttn, it is a
resulting rule
Each candidate co-occurrence corresponds to a
candidate association rule
Remove redundant rules
23
Experiments Datasets
Protein datasets Six SCOP families
Chemical datasets Six PubChem bioassays
24
Experiments Parameters Evaluation
Protein datasets tp 30, tn 0 Chemical
datasets tp 1, tn 0.4
25
Experimental Results Protein Datasets
26
Experimental Results Chemical Datasets
27
Conclusions
  • Using heuristic pattern exploration order and
    co-occurrences can improve runtime efficiency of
    mining discriminative patterns
  • Using association rules can achieve competitive
    classification accuracy

28
Questions Suggestions
Write a Comment
User Comments (0)
About PowerShow.com