Frequent Subgraph Pattern Mining on Uncertain Graph Data - PowerPoint PPT Presentation

About This Presentation

Title:

Frequent Subgraph Pattern Mining on Uncertain Graph Data

Description:

Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM 09, Hong Kong – PowerPoint PPT presentation

Number of Views:217

Avg rating:3.0/5.0

Slides: 30

Provided by: eduh75

Category:

more less

Transcript and Presenter's Notes

Title: Frequent Subgraph Pattern Mining on Uncertain Graph Data

1
Frequent Subgraph Pattern Miningon Uncertain
Graph Data

Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang
Harbin Institute of Technology, China
CIKM09, Hong Kong
Nov 4, 2009

2
Outline

Background
Problem Definition
Algorithm
Experimental Results
Conclusions

3
Background

Graph mining has played an important role in a
range of real world applications.
medicines structures of molecules
bioinformatics biological networks
technologies WWW
social science social networks
many others

4
Directions of Graph Mining
Models of graphse.g. Leskovec et al. KDD05
Patterns of graphse.g., Yan et al. ICDM02
Uncertainties of graphs
Privacy of graphse.g., Zou et al. VLDB09
Evolution of graphse.g., Faloutsos et al.
SIGMOD07
5
Uncertainties of Graphs Example I

Protein-Protein Interaction (PPI) Networks
Vertices proteins
Edges interactions between proteins
Uncertainties probabilities of interactions
really existing

TIF34
0.375
0.639
0.867
0.651
0.651
FET3
0.147
0.639
0.698
NTG1
SMT3
RAD59
RPC40
The data are taken from the STRING Database
(http//string-db.org).
6
Uncertainties of Graphs Example II

Topologies of wireless sensor networks (WSNs)
Vertices sensor nodes
Edges wireless links between sensor nodes
Uncertainties probabilities of wireless links
functioning at any given time

0.75
0.95
0.88
0.92
0.69
7
The Goal of This Paper
Models of graphse.g. Leskovec et al. KDD05
Patterns of graphse.g., Yan et al. ICDM02
Uncertainties of graphs
Privacy of graphse.g., Zou et al. VLDB09
Evolution of graphse.g., Faloutsos et al.
SIGMOD07
8
Outline

Background
Problem Definition
Algorithm
Experimental Results
Conclusions

9
Preliminaries
Graph Database
Subgraph Pattern
support 1.0
support 0.5
The support of S the number of graphs
containing S
the total number of graphs
10
Frequent Subgraph Pattern Mining Problem

Input a graph database D, and a support
threshold minsup
Output all subgraph patterns with support no
less than minsup
FSP mining on biological networks (e.g., PPI
networks) is an important tool for discovering
functional modules Koyutürk et al.
Bioinformatics 04, Turanalp et al. BMC
Bioinformatics 08.
PPI networks are subject to uncertainties.
How do we define support?

11
Model of Uncertain Graphs
(1 0.5) 0.6 0.7 0.8 0.168
Uncertain Graph
0.5 (1 0.6) 0.7 0.8 0.112
12
Model of Uncertain Graphs (Contd)
Theorem An uncertain graph represents a
probability distribution over all its implicated
graphs.
13
Uncertain Graph Databases
Theorem An uncertain graph DB represents a
probability distribution over all its implicated
graph DBs.
Totally, 24 23 128 implicated graph databases.
Implicated Graph Database
((1 0.5) 0.6 0.7 0.8) (0.8 0.1 (1
0.7)) 4.032 10-3
14
Expected Support
D
uncertain graph DB
p1 Pr(D implicates d1)
p2 Pr(D implicates d2)
pn Pr(D implicates dn)
s1 support of S in d1
s2 support of S in d2
sn support of S in dn
The expected support of S is
15
FSP Mining Problem on Uncertain Graphs

Input an uncertain graph database D, and an
expected support threshold minsup
Output all subgraph patterns with expected
support no less than minsup
It is P-hard to count the number of frequent
subgraph patterns.
Reduction from the problem of counting the number
of satisfying truth assignments of a monotone
k-CNF formula.
The FSP mining problem on uncertain graphs is
NP-hard.

16
Outline

Background
Problem Definition
Algorithm
Experimental Results
Conclusions

17
Approximation Method

It is P-hard to compute the expected support of
a subgraph pattern.
We develop an approximation method to find an
approximate set of frequent subgraph patterns.
Let e (0 lt e lt 1) be a relative error tolerance.

Output
Discard
Arbitrary
expected support
1
0
minsup
(1-e) minsup
18
Objective I

Difficulty I of frequent subgraph patterns is
exponentially large.
Objective I Examine subgraph patterns as
efficiently as possible to find all frequent ones.

19
Method for Objectives I

Step 1 Build a search tree T of subgraph
patterns.
Step 2 Examine subgraph patterns in T in
depth-first order
If S is infrequent, then all its descendents can
be pruned.

20
Objective II

Difficulty II It is P-hard to compute the
expected support esup(S) of a subgraph pattern S.
Objective II Make the following judgments
without computing esup(S) exactly.
If esup(S) is surely not in the green region,
then discard.
If esup(S) is probable to be in the green region
and surely not in the red region, then output.

21
Method for Objective II

Step 1 Approximate esup(S) by an interval l, u
such that esup(S)?l, u.
Step 2 Decide whether S can be output or not by
testing the following conditions.

Output
Discard
Shrink
22
Approximating esup(S) by l,u
A subgraph pattern S occurs in an uncertain graph
G if S is contained in at least one implicated
graph of G.

Algorithm Approximate esup(S) by l,u Step 1
For each uncertain graph Gi in D, approximate
Pr(S occurs in Gi) by an interval li, ui of
width at most eminsup. Step 2
23
Approximate Pr(S occurs in Gi) by li, ui
Step 1 Find all embeddings of S in Gi.
4 embeddings Step 2 Assign boolean
variables to the edges in the embeddings. Pr(x1)
0.5, Pr(x2) 0.6, Pr(x3) 0.7, Pr(x4)
0.8. Step 3 Construct a conjunctive formula for
each embedding. C1 (x1 x2), C2 (x1 x4),
C3 (x2 x3), C4 (x3 x4). Step 4 Construct
a DNF formula. F C1 V C2 V C3 V C4. Step 5
Estimate Pr(F TRUE) by p using Karp Lubys
Markov-Chain Monte-Carlo
method with absolute error eminsup/2 and
confidence d (d ?0,1). Step 6 li, ui p -
eminsup/2, p eminsup/2.
24
Outline

Background
Problem Definition
Algorithm
Experimental Results
Conclusions

25
Experimental Results

Data
The STRING Database (http//string-db.org)

26
Time Efficiency
27
Approximation Quality
28
Scalability
29
Conclusions

A new model of uncertain graph data has been
proposed.
The frequent subgraph pattern mining problem on
uncertain graph data has been formalized.
The computational complexity of the problem has
been formally proved to be NP-hard.
An approximate mining algorithm has been
proposed.
The proposed algorithm has high efficiency, high
approximation quality, and high scalability.

30
Thank you

Write a Comment

User Comments (0)