LargeScale Metabolic Network Alignment: MetaCyc and KEGG - PowerPoint PPT Presentation

About This Presentation
Title:

LargeScale Metabolic Network Alignment: MetaCyc and KEGG

Description:

There are an increasing number of encyclopedic' metabolic ... Dan Davison. Anamika Kothari. Joseph Dale. MetaCyc.org. 13. SRI International Bioinformatics ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 14
Provided by: peter170
Category:

less

Transcript and Presenter's Notes

Title: LargeScale Metabolic Network Alignment: MetaCyc and KEGG


1
Large-Scale Metabolic Network Alignment MetaCyc
and KEGG
  • Tomer Altman
  • Bioinformatics Research Group
  • SRI International
  • taltman_at_ai.sri.com

2
Problem Motivation
  • There are an increasing number of encyclopedic
    metabolic networks, or reaction databases
  • KEGG and MetaCyc, plus Rhea, BRENDA, and GO
  • A natural question to ask is, what is similar /
    different between them?
  • There has been some linking of MetaCyc compounds
    to KEGG, but none for reactions

3
Challenges with Mapping Objects
  • Multiple aspects to compare (name, chemical
    structure, reaction substrates, external
    identifiers)
  • Inexact naming
  • Inexact structures (different specificity of
    stereocenters)
  • Inexact description of reactions (classes vs.
    instances, proton-balancing)
  • How to combine the evidence in a logical fashion

4
Evidence / Prediction Data Structure
  • Prediction type
  • Predictor name
  • MetaCyc Object Frame ID
  • KEGG Object identifier
  • Iteration number
  • Parameters used

5
Compound Evidence
  • Curated MetaCyc links to KEGG
  • Name matching
  • PubChem identifier mapping (used for ChEBI as
    well)
  • Molecular Fingerprint Tanimoto Similarity
    Coefficient
  • InChI string comparison
  • Exact Sub-Structure Match (no stereochemistry)
  • All-but-one inference

6
Compound Prediction Detail All-but-one
  • Most of the compounds between these two reactions
    are the same
  • Class vs. instance, and naming issues lead to
    unknown match between acceptor and oxidized
    electron acceptor

7
Reaction Evidence
  • EC Numbers
  • UniProt Accession Numbers
  • Name matches (gleaned from associated objects)
  • Exact equation match
  • Inexact equation match (cosine similarity)

8
Reaction Prediction Detail UniProt Mapping
  • Use UniProt Accession numbers to map the enzymes
    in MetaCyc and KEGG to one another
  • Use UniRef 90 or 100 to map the same protein
    when not exact same Accession Number

9
From Evidence to Prediction
  • Rule 1 Evidence is partitioned into exact and
    inexact quality types
  • Rule 2 Evidence is partitioned (orthogonally)
    into qualitative and quantitative types
  • Rule 3 A high-quality prediction will consist
    of qualitative and quantitative evidence in favor
  • Rule 4 We try exact quality types first, then
    inexact
  • Rule 5 No contradictory predictions
  • Rule 6 Iterate until no new predictions
  • This is the general idea, but implementation has
    some domain-based heuristics thrown in, such as
    with EC numbers

10
Statistics
  • 3269 MetaCyc reactions with links to KEGG (75)
  • 4004 MetaCyc compounds with links to KEGG (gt50)

11
Future Work
  • Greatest Common Subgraph Compound Matching
  • Sensitivity / Specificity characterization of all
    evidence types
  • Analysis of unmatched content of KEGG and MetaCyc
    for algorithm improvement and focused curation
    (inexact reaction matching and novel reaction
    import)
  • Hierarchical clustering of compounds and
    reactions
  • A general tool that can be used for any two given
    metabolic networks
  • Recasting algorithm in a machine learning
    framework

12
Acknowledgements
  • Peter Karp
  • Douglas Brutlag
  • Dan Davison
  • Anamika Kothari
  • Joseph Dale

MetaCyc.org
13
One more thing!
  • Pathway Tools API (beta)
  • http//www.ai.sri.com/taltman/ptools-api/ptools_a
    pi.html
Write a Comment
User Comments (0)
About PowerShow.com