Title: Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky
1Homomorphism Mapping in Metabolic Pathways
- Qiong Cheng, Dipendra Kaur, Robert Harrison,
Alexander Zelikovsky - Computer Science in Georgia State University
Dec. 1 2007 RECOMB Satellite Conference on
Systems Biology 2007
2Outline
- Concept of Metabolic pathway comparison
- Enzyme similarity
- Graph mappings embeddings homomorphisms
- Min cost homomorphism problem for trees
- Optimal DP algorithm for trees
- Min cost homomorphism problem for arbitrary
graphs - Minimum Feedback vertex set (MFVS)
- Searching metabolic networks for
- pathway motifs
- pathway holes
- Web tool
- Architecture Brief interface
- Future work
3Metabolic pathway pathways model
4Comparison of metabolic pathways
- Enzyme similarity and pathway topology together
represent the similarity of pathway functionality.
- Pathway topology
- Similarity
5Related work
(Forst Schulten1999, Chen
Hofestaedt2004)
(Pinter 2005 o(VG2VT/logVGVGVTlogVT
) )
Mapping Linear pattern ? Graph (Kelly et al
2004) ( o(VTi2VG2) )
Exhaustively search (Sharan et al 2005 ( o(i!)
o(VTi2VG2) ), Yang et al 2007 (
o(2VGVG2) )
6Enzyme mapping cost
Enzyme D d1 . d2 . d3 . d4
- EC (Enzyme Commission) notation
- Measure Enzyme similarity score ? by the lowest
common upper class distribution
- Measure ? by tight reaction property
Enzyme X x1 . x2 . x3 . x4
Enzyme Y y1 . y2 . y3 . y4
?X, Y 1
?X, Y 10
?X, Y 8
otherwise
7Graph mappings embeddings homomorphisms
Homomorphism f T ? G fv VT ? VG fe ET
? paths of G
Edge-to-path cost l (fe(e)-1)
Homomorphism cost
l Se in ET (fe(e)-1)
We allow different enzymes to be mapped to the
same enzyme.
8Min cost homomorphism of multi source tree to
arbitrary graph
- A multi-source tree is a directed graph, whose
underlying undirected graph is a tree.
- Given an multisource tree T ltVT, ETgt (Pattern)
and an arbitrary graph G ltVG, EGgt (Text), - find min cost homomorphism of multisource tree
to arbitrary graph f T ? G
9Preprocessing of text graph
Transitive closure of G is graph G(V, E),
where E(i,j) there is i-j-path in G
10Pattern graph ordering
- Construct ordered pattern T
- DFS traversal
- Processing order in opposite way
Ordered pattern T
- Each edge ei in T is the unique edge connecting
vi - with the previous vertices in the order
11DP table
min cost homomorphism mapping from Ts subgraph
induced by previous vertices in the order into G
DTa, uj
12Filling DP table
?(vi , uj) if vi is a leaf in T
?(vi, uj) ?l1 to adj(vi)Minj1 to VGC(il,
j)
if vi is a leaf in T
Cil, jl DTil, jl l(h(j, jl) - 1)
l is penalty for gaps
h(j, jl) (hops between uj and ujl in G)
13Runtime Analysis for mapping trees
- Transitive closure takes O(VGEG).
- Pattern graph ordering takes O(VT ET)
- Calculate min contribution of all child pairs
of node pair (vi?T,uj?G) takes tij degT
(vi)degG(uj)
- Filling DT takes Sj1 to VG Si1 to
VTtij Sj1 to VG degG(uj)Si1 to
VTdegT(vi)
2EGET
The total runtime for mapping trees is
O(VGEGVGVT).
14MFVS
- Minimum Feedback vertex set (MFVS)
- Given an undirected graph G(V,E) and a
nonnegative weight function w on V - Find a minimum weight subset of V whose removal
leaves an acyclic graph.
- Bad news MFVS problem is NP-complete.
- Good news 2-approximation
- Greedy Algorithm
- Delete degree 1/0 vertices from V and set
remaining vertices to V - MFVSlt- f
- while V ? f do
- pick up the set S of maximal degree
vertices - MFVS lt- MFVS U S
- Delete degree 1/0 vertices from V
15Min cost homomorphism of arbitrary graphs
- Given an arbitrary graph P ltVP, EPgt (Pattern)
and an arbitrary graph G ltVG, EGgt (Text), - find min cost homomorphism f P ? G
- Algorithm
- Find minimum feedback vertex set F(P) of P
- Construct a multi source tree P ltVp-F(P),
Ep(Vp-F(P))gt - for every sub mapping fv F(P) ?VG do
- obtain min cost homomorphism of multi source
tree P to arbitrary graph G under sub mapping
fv - choose min cost homomorphism for all sub mappings
16Runtime Analysis for mapping arbitrary graphs
- Finding min feedback vertex set takes O(VP
ET)
- O(VG F(P)) possible mappings for MFVS
- Finding min homomorphism mapping of multi source
tree to arbitrary graph takes O(VGEGVGVT)
.
The total runtime is O(VG F(P)(VGEGVGVT
)).
17Statistical significance
- Random degree-conserved graph generation
- Reshuffle nodes
Reshuffle edge
- Randomized P-Value computation
18Experiments applications
- All-against-all mappings among S. cerevisiae, B.
subtilis, T. thermophilus, and E.coli
Hallobacterium
- Identifying conserved pathways
- 24 pathways that are conserved across all 4
species - 18 more pathways that are conserved across at
least three of these species
- Discovering pathways holes
19Mappings with cycles
20Resolving Ambiguity
21Pathway holes
- Check if there is such enzyme in pattern
- Find the closest protein in the same group
- If identity is too high gt 80 then we expect good
filling - Align to previous and next enzyme the functions
may be taken over
22Filling pathways holes
23Web Service Architecture
24Web Interface
25(No Transcript)
26Future work
- Approximation algorithm to handle with the
comparison of general graphs - Mining protein interaction network
- Discovery of critical elements or modules based
on graph comparison - Discovery of evolution relation of organisms by
pathway comparison of different organisms at
different time points - Integration with genome database
27Reference
- Ron Y Pinter, Oleg Rokhlenko, Esti Yeger-Lotem,
Michal Ziv-Ukelson Alignment of metabolic
pathways. Bioinformatics. LNCS 3109.
Springer-Verlag.(Aug 2005)21(16) 3401-8 - Sebastian Wernicke Combinatorial Algorithms to
Cope with the Complexity of Biological Networks.
Dissertation (December 2006) - J. Ellson, E. Gansner, E. Koutsofios, S. North,
and G. Woodhull. Graphviz and dynagraph - static
and dynamic graph drawing tools. In M. Junger and
P. Mutzel, editors, Graph Drawing Software, pages
127-148. Springer-Verlag, 2003 - Yan and J. Han. gspan Graph-based substructure
pattern mining. In ICDM, pages 721-724, 2002. - N. Ketkar, L. Holder, D. Cook, R. Shah and J.
Coble, Subdue Compression-based Frequent Pattern
Discovery in Graph Data, Proceedings of the ACM
KDD Workshop on Open-Source Data Mining, August
2005. - K, Borgwardt, S. Bottger, H. Kriegel, VGM visual
graph mining, International Conference on
Management of Data archive Proceedings of the
2006 ACM SIGMOD international conference on
Management of data - Q. Cheng, D. Kaur, R. Harrison, and A.
Zelikovsky,"Mapping and Filling Metabolic
Pathways ", RECOMB Satellite Conference on
Systems Biology 2007 - Q. Cheng, R. Harrison, and A. Zelikovsky,"Homomor
phisms of Multisource Trees into Networks with
Applications to Metabolic Pathways", Proc. of
IEEE 7-th International Symposium on
BioInformatics and BioEngineering (BIBE'07)
28Question?
Thanks!
29Handling Cycles
- Sorting of the pattern such that children can
communicate only through parent - Fix images for some pattern vertices gt
interrupt communication through cycles - Feedback vertex set F(T) VT-F(T) is acyclic
- Runtime is increased by factor O(VG F(T))
- t(v) of reasonable text images of v
- ? t(v) -gt min ? log(t(v)) -gtmin
- 2-approximation algo
30Software architecture of service-oriented pathway
mining tool
Services Container
Ambiguity pairs
AI
Potential holes
Rule based mining
DB
Pathway Modeling Comparison
Storage Indexing
Data-Control-View
Browsers
PDC
SW
Visualized Outputs
Additional Value Service
Simulation