The design and implementation of a system that integrates pathway data from KEGG and genome sequence data from NCBI - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

The design and implementation of a system that integrates pathway data from KEGG and genome sequence data from NCBI

Description:

The ids used in two DBs are totally different. ... Use conserved domain to perform HMM search. Enable sequence alignment and pattern search ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 24
Provided by: stude1058
Category:

less

Transcript and Presenter's Notes

Title: The design and implementation of a system that integrates pathway data from KEGG and genome sequence data from NCBI


1
The design and implementation of a system that
integrates pathway data from KEGG and genome
sequence data from NCBI
  • Xiang (Sean) Zhou
  • Advisor Prof. Sun Kim
  • Bioinformatics Capstone Project
  • Indiana University

2
Outline
  • Background
  • Methods
  • Sample results
  • Online demonstration
  • Future direction

3
Why do we want to study metabolic pathway?
  • One of the challenges in life science is to
    uncover the fundamental design principle that
    provides the common underlying structure and
    function in all cells and microorganisms 2 .
  • Metabolic pathway network serves as the tool to
    achieve the goal.

4
Metabolic Pathway
  • Definition of a metabolic pathway
  • A series of enzyme-catalyzed chemical reactions
    within a cell, which results in the removal of a
    molecule from the environment to be used/stored
    by the cell, or the initiation of another
    metabolic pathway1.
  • A pathway is a linked set of biochemical
    reactionslinked in the sense that the product of
    one reaction is a reactant of, or an enzyme that
    catalyzes, a subsequent reaction4.

5
Why is it so difficult to study metabolism in
multiple genomes?
  • The metabolism in one organism is too large to be
    grasped by a single mind. (i.e. E. coli has a
    metabolism involving over 850 substances and 1500
    reactions.)
  • Genome projects keep generating a large amount of
    sequence data.

6
A sample metabolic pathway3
7
Pathway Database(DB)
  • A pathway DB is a bioinformatics DB that
    describes biochemical pathways and their
    component reactions, enzymes, and substrates4.

8
Current Pathway DBs
  • KEGG (Kyoto Encyclopedia of Genes and Genomes)
  • The most comprehensive metabolic pathway DB.
  • EcoCyc
  • Encyclopedia of Escherichia coli K12 Genes and
    Metabolism.
  • CGAP (Cancer Genome Anatomy Project)
  • Pathways on the CGAP web site are obtained
    directly from BioCarta and KEGG.
  • WIT
  • It has changed to a commercial DB.

9
Disadvantages of current DBs
  • They are static.
  • All data are pre-computed and stored in the DBs.
  • Users flexibility of choosing their genome and
    pathway of interest is limited.
  • They can only study one genome at a time.
  • User cannot compare the pathways in different
    genomes at the same time.

10
Motivation
  • Create a system
  • User can select genomes and pathways of their
    interest and perform sequence analysis freely.
  • Enables multi-genome pathways comparison

11
Data Sources
  • KEGG
  • NCBI GenBank
  • PLATCOM genome comparison data

12
The Challenge
  • In KEGG and NCBI GenBank
  • The genome names and genes names are slightly
    different.
  • The ids used in two DBs are totally different.
  • Some of the protein id (pid) in KEGG are
    out-dated.
  • Thus, integration of the two DBs is not trivial.

13
The unique features of our system
  • Easy to maintain
  • Only need to download the latest datasets from
    KEGG and NCBI GenBank.
  • Flexibility
  • Sequence analysis is based on the combination of
    the genomes and pathways of users choice.
  • Everything is computed on the fly.
  • Integration of KEGG and NCBI GenBank DBs in terms
    of sequence analysis.

14
Methods
  • FASTA
  • ClustalW
  • HMMer
  • A series of modules

15
Infrastructure
A query protein sequence
A pathway
A reference genome
Interested genomes
Protein information
Pathway Information
Search for missing genes
16
PLATCOM-Metabolic Pathway Division
17
Sample Result (1)
18
Sample Result (2)
19
Sample Result (3)
20
Online Demonstration
  • PlatCom
  • A Platform for Computational Comparative Genomics

21
Future Direction
  • Use conserved domain to perform HMM search
  • Enable sequence alignment and pattern search
  • Connect to other DBs
  • Protein-Protein Interaction DBs
  • PDB
  • Improve the performance by using dynamic cache.

22
Reference
  1. H. JEONG, H., TOMBOR, B., ALBERT, R., OLTVAI, Z.
    N., and BARABÁSI, A.-L., (2000), The large-scale
    organization of metabolic networks, Nature,
    407651-654
  2. http//www.free-definition.com/
  3. http//www.genome.ad.jp/kegg/pathway.html
  4. Karp, PD, (2001), Pathway Databases A Case Study
    in Computational Symbolic Theories, Science,
    2932040-2044

23
Acknowledge
  • Professor Sun Kim
  • Kwangmin Choi
  • Arvind Gopu
Write a Comment
User Comments (0)
About PowerShow.com