Computer Science and Engineering - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Computer Science and Engineering

Description:

Efficient Subgraph Similarity All-Matching Computer Science and Engineering Gaoping Zhu, Ke Zhu, Wenjie Zhang, Xuemin Lin, Chuan Xiao The University of New South Wales – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 29
Provided by: BradH151
Category:

less

Transcript and Presenter's Notes

Title: Computer Science and Engineering


1
  • Efficient Subgraph Similarity All-Matching
  • Computer Science and Engineering
  • Gaoping Zhu, Ke Zhu, Wenjie Zhang, Xuemin Lin,
    Chuan Xiao
  • The University of New South Wales

2
Outline
  • Introduction
  • Preliminary
  • Framework
  • Algorithms
  • Experiments
  • Conclusions

3
Introduction Graph Data
  • Chem-informatics
  • Chemical Compounds (Small Size)
  • Bio-informatics
  • Protein Interaction Networks (Medium Size)
  • Internet
  • World Wide Web (Large Size)

4
Introduction Subgraph All-Matching
  • Problem
  • Subgraph exact all-matching enumerates all exact
    matches of a query graph q in a large data graph
    G.
  • Subgraph similarity all-matching enumerates all
    similarity matches of a query graph q in a large
    data graph G.
  • Motivations
  • Noisy query graphs due to erroneous user input.
  • Noisy data graphs due to imprecise collection.

5
Preliminaries
  • Edge Edit Distance
  • The edge edit distance from a graph g1 to another
    graph g2 is the minimum number of edge insertions
    required to transform g1 to g2.
  • GED (p1, q) 0, GED (p2, q) 1.

B
B
B
B
B
B
A
C
A
C
A
C
q
p1
p2
6
Preliminaries
  • Feasible Pattern
  • Given a distance threshold d, p is called a
    feasible pattern of q if p is a connected
    subgraph of q with no missing vertex and GED (p,
    q) d.
  • The feasible patterns of q are p1, p2, p3, p4
    for d 1.

B
B
B
B
B
B
B
B
B
B
A
C
A
C
A
C
A
C
A
C
q, d 1
p1
p2
p3
p4
7
Preliminaries
  • Similarity Matches
  • A similarity match of q in G is a subgraph
    isomorphic mapping from any feasible pattern p to
    q.
  • Must consider any feasible pattern!
  • Exact matches of q in G are also similarity
    matches!

similarity match
similarity match
B
C
A
B
B
B
B
B
G
A
C
A
C
q
p1
p2
exact match
8
SAPPER VLDB10
Enumerate Phase
Search Phase
Results
Mp1
d 1
Mp2
G


Mp5
9
Motivation I Effective Search Order
A
1 match
1 match
1 match
1 match
4 matches
12 matches
27 matches
Search Order One v1, v2, v3, v4, v5, v6, v7,
v8 ? 47 intermediate matches
A
C
B
B
B
B
B
B
Search Order Two v4, v3, v5, v6, v2, v1, v7,
v8 ?350 intermediate matches
D
B
G
10
Motivation II Sharing Computation
p
p'
Query Execution Plan One search p and p
separately ? Share no computation
Query Execution Plan Two search f1, f2 and f2
and then join ?Share the computation on f1
11
Framework - DecQ
  • Query Decomposition (Phase One)
  • Decompose the query graph q into a set of
    selective edge-disjoint sub-queries Q f1, ,
    fn , called fragments.

q
Query Graph
Decompose
Fragments
f1
f2
f3
f4
12
Framework - DecQ
  • Local Matching (Phase Two)
  • Enumerate all local (feasible) patterns f of
    each fragment f and apply depth-first search on
    each pattern f to obtain the local matches
    (exact matches of f in G).

f
Fragments
Enumerate
fa
f'b
f'c
f'd
Local Patterns
Depth-first Search
Mfa
Mfb
Mfc
Mfd
Local Matches
13
Framework - DecQ
  • Global Matching (Phase Three)
  • Enumerate all global (feasible) patterns p and
    merge the local matches of decomposed local
    patterns of p to obtain the global matches (exact
    matches of p in G).

p
Mp
Global Matches
Merge
Mf1
Mf3
Mf4
Mf2
Local Matches
Retrieve
f'1
f'2
f'3
f'4
Local Patterns
14
Algorithms
  • Local matching
  • Enumerate all local patterns f of each fragment
    f.
  • Search all exact matches of each f by
    depth-first search fashion with effective search
    order.
  • Effective Search Order
  • It is NP-complete to find an search order with
    minimum number of intermediate matches produced
    in the depth-first search.

15
Algorithms
  • Estimating Exact Matches of a Graph
  • Given a graph f, assume M(v) / M(e) contains all
    mappings in G of a vertex v / edge e in f.
  • For each edge (u, v) in f, given any u in M(u)
    and v in M(v), the probability that there is an
    edge (u, v) in G is
  • The estimated number of exact matches of f in G
    can be represented by

16
Algorithms
  • Approximating Optimal Search Order
  • A search order grow a local pattern f vertex by
    vertex.
  • Greedy heuristic select the vertex v such that
    the number of estimated exact matches of the
    current subgraph s of f is minimized.

s3
f
s1
s2
17
Algorithms
  • Global Matching
  • A global pattern p can be either a minimal or a
    non-minimal pattern.
  • A minimal pattern p does not have one subgraph
    p, which is also a global pattern with one
    missing edge in p.
  • A non-minimal pattern p has at least one subgraph
    p, which is also a global pattern with one
    missing edge in p.

18
Algorithms
  • Processing Minimal Patterns
  • For a minimal pattern p, we decompose p into a
    set of local patterns and merge the local matches
    to obtain global matches Mp.

p'
p
store the matches of (f3 ? f4)
reuse the matches of (f3 ? f4)
f'1
f'2
f'3
f'4
f1
f2
f'3
f'4
M1
M3
M4
M2
M1
M3
M4
M2
19
Algorithms
  • Processing Non-minimal Patterns
  • For a non-minimal pattern p, we pick the child
    pattern p of p with the smallest Mp. We check
    if the missing edge exists in each exact match of
    p in G. If so, this match is validated as an
    exact match of p in G.

B
B
C
A
B
B
B
B
B
B
A
C
A
C
A
C
p
p
20
Algorithms
  • Decomposition Query Execution Plan
  • Each decomposition of a global pattern p
    corresponds to a query execution plan of p.
    (i.e., as in RDBMS)
  • It is costly to generate a good query execution
    plan for each global pattern p of q .
  • Recursive Bisection
  • We use heuristic solution to recursively bisect q
    into a set Q of edge-disjoint fragments.
  • Bisect a graph into two subgraphs such that their
    graph size are balanced.

21
Experiments
  • Real Data
  • Data Graph HPRD (Human Protein Interaction
    Network, V(G) 9,460 vertices, E(G) 37,081
    with vertices labeled by GO Term)
  • Query Graphs selected subgraphs from HPRD
    network with 1-3 inserted noisy edge.
  • Synthetic Data
  • Data Graphs obtained by synthetic graph
    generator
  • Query graphs selected subgraphs from data
    graphs with 1-3 inserted noisy edge.

22
Experiments
  • Evaluated Algorithms
  • SAPPER
  • ROND (Random search Order No Decomposition)
  • EOND (Effective search Order No Decomposition)
  • DecQ (Effective Search Order and Decomposition)
  • Default Settings
  • E(q) 40, avg. deg(q) 4
  • E(G) 5k, avg. deg(G) 12, SL 100
  • d 2

23
Experiments
  • Varying Error Threshold

24
Experiments
  • Varying Query Settings

25
Experiments
  • Varying Data Graph Settings

26
Experiments
  • Comparing with SAPPER

27
Conclusions
  • A novel framework DecQ for subgraph similarity
    all-matching.
  • Effective search order for local matching with
    depth-first search fashion.
  • Effective query decomposition plan for global
    matching with computation sharing.

28
  • Thank You!
  • Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com