Title: Computer Science and Engineering
1- Efficient Subgraph Similarity All-Matching
- Computer Science and Engineering
- Gaoping Zhu, Ke Zhu, Wenjie Zhang, Xuemin Lin,
Chuan Xiao - The University of New South Wales
2Outline
- Introduction
- Preliminary
- Framework
- Algorithms
- Experiments
- Conclusions
3Introduction Graph Data
- Chem-informatics
- Chemical Compounds (Small Size)
- Bio-informatics
- Protein Interaction Networks (Medium Size)
- Internet
- World Wide Web (Large Size)
4Introduction Subgraph All-Matching
- Problem
- Subgraph exact all-matching enumerates all exact
matches of a query graph q in a large data graph
G. - Subgraph similarity all-matching enumerates all
similarity matches of a query graph q in a large
data graph G. - Motivations
- Noisy query graphs due to erroneous user input.
- Noisy data graphs due to imprecise collection.
5Preliminaries
- Edge Edit Distance
- The edge edit distance from a graph g1 to another
graph g2 is the minimum number of edge insertions
required to transform g1 to g2. - GED (p1, q) 0, GED (p2, q) 1.
B
B
B
B
B
B
A
C
A
C
A
C
q
p1
p2
6Preliminaries
- Feasible Pattern
- Given a distance threshold d, p is called a
feasible pattern of q if p is a connected
subgraph of q with no missing vertex and GED (p,
q) d. - The feasible patterns of q are p1, p2, p3, p4
for d 1.
B
B
B
B
B
B
B
B
B
B
A
C
A
C
A
C
A
C
A
C
q, d 1
p1
p2
p3
p4
7Preliminaries
- Similarity Matches
- A similarity match of q in G is a subgraph
isomorphic mapping from any feasible pattern p to
q. - Must consider any feasible pattern!
- Exact matches of q in G are also similarity
matches!
similarity match
similarity match
B
C
A
B
B
B
B
B
G
A
C
A
C
q
p1
p2
exact match
8SAPPER VLDB10
Enumerate Phase
Search Phase
Results
Mp1
d 1
Mp2
G
Mp5
9Motivation I Effective Search Order
A
1 match
1 match
1 match
1 match
4 matches
12 matches
27 matches
Search Order One v1, v2, v3, v4, v5, v6, v7,
v8 ? 47 intermediate matches
A
C
B
B
B
B
B
B
Search Order Two v4, v3, v5, v6, v2, v1, v7,
v8 ?350 intermediate matches
D
B
G
10Motivation II Sharing Computation
p
p'
Query Execution Plan One search p and p
separately ? Share no computation
Query Execution Plan Two search f1, f2 and f2
and then join ?Share the computation on f1
11Framework - DecQ
- Query Decomposition (Phase One)
- Decompose the query graph q into a set of
selective edge-disjoint sub-queries Q f1, ,
fn , called fragments.
q
Query Graph
Decompose
Fragments
f1
f2
f3
f4
12Framework - DecQ
- Local Matching (Phase Two)
- Enumerate all local (feasible) patterns f of
each fragment f and apply depth-first search on
each pattern f to obtain the local matches
(exact matches of f in G).
f
Fragments
Enumerate
fa
f'b
f'c
f'd
Local Patterns
Depth-first Search
Mfa
Mfb
Mfc
Mfd
Local Matches
13Framework - DecQ
- Global Matching (Phase Three)
- Enumerate all global (feasible) patterns p and
merge the local matches of decomposed local
patterns of p to obtain the global matches (exact
matches of p in G).
p
Mp
Global Matches
Merge
Mf1
Mf3
Mf4
Mf2
Local Matches
Retrieve
f'1
f'2
f'3
f'4
Local Patterns
14Algorithms
- Local matching
- Enumerate all local patterns f of each fragment
f. - Search all exact matches of each f by
depth-first search fashion with effective search
order. - Effective Search Order
- It is NP-complete to find an search order with
minimum number of intermediate matches produced
in the depth-first search.
15Algorithms
- Estimating Exact Matches of a Graph
- Given a graph f, assume M(v) / M(e) contains all
mappings in G of a vertex v / edge e in f. - For each edge (u, v) in f, given any u in M(u)
and v in M(v), the probability that there is an
edge (u, v) in G is - The estimated number of exact matches of f in G
can be represented by
16Algorithms
- Approximating Optimal Search Order
- A search order grow a local pattern f vertex by
vertex. - Greedy heuristic select the vertex v such that
the number of estimated exact matches of the
current subgraph s of f is minimized.
s3
f
s1
s2
17Algorithms
- Global Matching
- A global pattern p can be either a minimal or a
non-minimal pattern. - A minimal pattern p does not have one subgraph
p, which is also a global pattern with one
missing edge in p. - A non-minimal pattern p has at least one subgraph
p, which is also a global pattern with one
missing edge in p.
18Algorithms
- Processing Minimal Patterns
- For a minimal pattern p, we decompose p into a
set of local patterns and merge the local matches
to obtain global matches Mp.
p'
p
store the matches of (f3 ? f4)
reuse the matches of (f3 ? f4)
f'1
f'2
f'3
f'4
f1
f2
f'3
f'4
M1
M3
M4
M2
M1
M3
M4
M2
19Algorithms
- Processing Non-minimal Patterns
- For a non-minimal pattern p, we pick the child
pattern p of p with the smallest Mp. We check
if the missing edge exists in each exact match of
p in G. If so, this match is validated as an
exact match of p in G.
B
B
C
A
B
B
B
B
B
B
A
C
A
C
A
C
p
p
20Algorithms
- Decomposition Query Execution Plan
- Each decomposition of a global pattern p
corresponds to a query execution plan of p.
(i.e., as in RDBMS) - It is costly to generate a good query execution
plan for each global pattern p of q . - Recursive Bisection
- We use heuristic solution to recursively bisect q
into a set Q of edge-disjoint fragments. - Bisect a graph into two subgraphs such that their
graph size are balanced.
21Experiments
- Real Data
- Data Graph HPRD (Human Protein Interaction
Network, V(G) 9,460 vertices, E(G) 37,081
with vertices labeled by GO Term) - Query Graphs selected subgraphs from HPRD
network with 1-3 inserted noisy edge. - Synthetic Data
- Data Graphs obtained by synthetic graph
generator - Query graphs selected subgraphs from data
graphs with 1-3 inserted noisy edge.
22Experiments
- Evaluated Algorithms
- SAPPER
- ROND (Random search Order No Decomposition)
- EOND (Effective search Order No Decomposition)
- DecQ (Effective Search Order and Decomposition)
- Default Settings
- E(q) 40, avg. deg(q) 4
- E(G) 5k, avg. deg(G) 12, SL 100
- d 2
23Experiments
24Experiments
25Experiments
- Varying Data Graph Settings
26Experiments
27Conclusions
- A novel framework DecQ for subgraph similarity
all-matching. - Effective search order for local matching with
depth-first search fashion. - Effective query decomposition plan for global
matching with computation sharing.
28- Thank You!
- Any Questions?