Title: Metabolic Pathway I609: PhD Seminar Computational techniques in comparative genomics
1Metabolic PathwayI609 PhD SeminarComputational
techniques in comparative genomics
2Overview
- Metabolism 101
- Resources
- Network topology
- Pathway representation
- Metabolic pathway analysis using comparative
genomics approach - Pathway evolution
- Practical application
3Metabolism 101
4-Omics
5Basic keywords
- An enzyme is any of several complex proteins that
are produced by cells and act as catalysts in
specific biochemical reactions - A reaction is a process in which one or more
substances are changed chemically into one or
more different substances - Metabolism is a step by step modification of the
initial molecule to shape it into another
product. - Catabolism
- Anabolism
- A metabolite is any substance involved in
metabolism either as a product of metabolism or
necessary for metabolism (substrate)
6Catalytic reaction by enzyme
7EC number(from wikipedia)
- EC 1 Oxidoreductases
- EC 1.1 Acting on the CH-OH group of donors
- EC 1.1.1 With NAD or NADP as acceptor
- EC 1.1.1.1 alcohol as substrate
8Metabolome and Metabolomics
- Metabolome
- complete set of small-molecule metabolites to be
found within a biological sample, such as a
single organism - Target for drug discovery Biomarkers of
physiological disease (diagnostics). - Target for metabolic engineering (e.g. Jurassic
park) - Metabolomics
- "the quantitative measurement of the dynamic
multiparametric metabolic response of living
systems to pathophysiological stimuli or genetic
modification" - Metabolomics can be regarded as the end point of
the 'omics' era (genomics, transcriptomics,
proteomics, metabolomics.etc.) - The changes in the metabolome are the ultimate
answer of an organism to genetic alterations,
disease, or environmental influences.
9MetabolismTCA cycle, glycolysis
10Metabolic pathway/network
- A metabolic pathway is a series of chemical
reactions occurring within a cell, catalyzed by
enzymes, and resulting in either the formation of
a metabolic product to be used or stored by the
cell, or the initiation of another metabolic
pathway. - often require dietary minerals, vitamins and
other cofactors - Networks of metabolite feedback pathways
- regulate gene and protein expression,
- also can mediate signaling between organisms.
- Directed graph
- Reaction stoichiometry Quatatative relationship
- A substrate enters a metabolic pathway depending
on the needs of the cell and the availability of
the substrate. - Initiate another metabolic pathway (flux
generating step).
11Metabolic pathway overview Roche Applied
Science "Biochemical Pathways" wall chart
http//www.expasy.ch/cgi-bin/show_thumbnails.pl
12G5 region a part of TCA
13Metabolic modules big themes
14The size of metabolomes
- An attraction of the metabolome has always been
that it is numerically smaller, and thus more
tractable, than the transcriptome or proteome. - The measured metabolome is greater than that
encoded by the genome - it includes molecules acquired exogenously as
drugs, foods or food additives, - also include molecules derived from the
microflora of the host - Open system, not closed
- Saccharomyces cerevisiae has some 1200 reactions
and 650 metabolites - The curated human metabolome presently contains
respectively some 1100/3300 reactions and
700/2700 metabolites.
15The number of reactions and metabolites are
underestimated
- Some areas of metabolism are more represented
than others transporters especially are highly
under-represented (for their activities in
transporting xenobiotics and pharmaceuticals) - Many enzymes have currently unknown substrates.
- It is hard to discover molecules whose existence
one does not suspect, and so some molecules might
be reasonably prevalent but of unknown chemical
identity. (In plants and yeast, most metabolites
measured by gas chromatography-mass spectrometry
are presently of uncertain identity.)
- Genome evolution reveals biochemical networks
and functional modules - Von Mering et al., PNAS vol.100(26), 2003
16Public resources
17KEGG Kyoto Encyclopedia of Genes and
Genomeshttp//www.genome.jp/kegg
- KEGG is a suite of databases. One of the most
integrated pathway DB - 335 pathways from 906 organisms
- PATHWAY
- holds the current knowledge on molecular
interaction networks, including metabolic
pathways, regulatory pathways,and molecular
complexes - GENES
- is a collection of gene catalogs for all the
complete genomes and some partial genomes. Each
gene catalog is computationally derived from
public resources, and is manually reannotated for
reconstruction of KEGG pathways. - KEGG GENES is associated with KEGG GENOME
containing chromosome maps, - LIGAND Database
- provide the linkage between chemical and
biological aspects of life in the light of
enzymatic reactions - COMPOUND/GLYCAN/REACTION
- KO for manually curated ortholog groups
- KEGG SSDB for computationally generated
ortholog/paralog clusters and gene clusters
18KEGG
19MetaCychttp//www.metacyc.org
- MetaCyc is a database of nonredundant,
experimentally elucidated metabolic pathways - More than 900 different organisms are represented
- The majority of pathways occur in microorganisms
and plants - More than 900 metabolic pathways are stored, with
more than 6,000 enzymatic reactions and more than
12,000 associated literature citations - stores all enzyme-catalyzed reactions that have
been assigned EC numbers by the Nomenclature
Committee of the International Union of
Biochemistry and Molecular Biology - also stores hundreds of additional
enzyme-catalyzed reactions that have not yet been
assigned an EC number - MetaCyc contains pathways involved in both
primary and secondary metabolism
20MetaCyc
21Other resources
22Network topology
23Graph and Network
- Graph
- Well-known and important concept in discrete
mathematics and computer science - Consists of a set of nodes and a set of edges
- Node ? Object
- Edge ? Relation between two objects
- Network
- Graph Some information/meaning
- We do not distinguish graphs from networks in
this talk
24Graph representation of metabolic pathway
- Compound Network
- Node ? Chemical compound
- Edge ? Chemical reaction
- Reaction Network
- Node ? Chemical reaction (or enzyme)
- Edge ? Chemical compound shared by two reactions
25Graph and Biological Network
Metabolic network (KEGG)
Graph
Node Object e.g. Chemical compound Edge
Relation between objects e.g.
Chemical reaction
26Degree/connectivity, k
- How many links the node has to other nodes?
- Undirected network
- Characterized by an average degree P(k) 2L/N
- N nodes and L links
- Directed network
- Incoming degree, kin
- Outgoing degree, kout
27Graph and Degree
- Degree
- Node with degree 1 J
- Nodes with degree 2 B, C, F, G, H
- Nodes with degree 3 E, I, A, D
- P(k) (degree distribution) P(1)0.1,
P(2)0.5, P(3)0.4
28Scale-Free Network
- P(k)
- Degree distribution
- Frequency of nodes with degree k
- Scale-free network
- P(k) follows power law
- Different from random networks
29Poisson Distribution and Power-Law Distribution
Poisson distribution (random graph)
Power-law distribution (scale-free graph)
e-??k/k!
k -?
30 Random Network vs. Scale-free Network
P(k) e-??k/k!
Random Network
Scale-free Network
P(k) ? k -?
( m03, m2 )
2/6
4/14
3/10
2/14
3/10
4/14
2/6
2/14
2/6
2/10
2/10
2/14
31(No Transcript)
32Scale-free Networks in Real World
- Metabolic networks ??2.24 (depending on species)
- Node ? Chemical compound
- Edge ? Chemical reaction (almost equivalently,
enzyme) - Protein interaction networks ??2.2
- Node ? Protein
- Edge ? Interaction between two proteins
- WWW??2.1
- Node ? Web page
- Edge ? Link between web pages
- Movie stars??2.3
- Node ? Actor/Actress
- Edge ? Act in the same movie
33Compound Network vs. Reaction Network
34Line Graph Transformation and Metabolic Network
- Correspondence
- Compound network ? Original network
- Reaction network ? Transformed network
- But, this correspondence is not precise
- We will consider more realistic transformation
later
35Line Graph Transformation
- Edge in G ? Node in L(G)
- There is an edge in L(G) if two edges in G
share the same node as endpoints
36Physical Line Graph Transformation
37Main Result
- If P(k) ? k ? in G, then P(k)? k ?1 in
L(G) - Assuming that there is no assortative mixing in G
- (i.e, any node has no preference of high
or low degree nodes) - Intuitive Proof
- Node v of degree k in G has k edges
- These k edges corresponds to k nodes in L(G)
- Each of these k nodes in L(G) has degree around k
- From v in G, we have k nodes of degree around k
in L(G) - Thus, P(k) ? kk ? k ?1 in L(G)
38Scale-free topology in biological systems
- Network biology understanding the cell's
functional organization, Barabási et al., Nature
Reviews Genetics 5, 101-113, 2004 - Growth process and Preferential attachment
- The network emerges through the subsequent
addition of new nodes (1,2 in red) - The nodes that appeared early in the history of
the network are the most connected ones - e.g. CoA, NAD, GTP
- Nodes prefer to connect to more connected nodes.
(1 gtgt 2) - Growth and preferential attachment generate hubs
through a rich-gets-richer mechanism - Error tolerance
- Attack vulneraility
- Gene duplication as the origin of preferential
attachment - Duplicated genes produce identical proteins that
interact with the same partner. Therefore, each
protein that is in contact with a duplicated
protein gains an extra link - Proteins with a large number of interactions tend
to gain links more often, as it is more likely
that they interact with the protein that has been
duplicated.
39XML representation
40XML representation
- Machine-readable
- Easy for data exchange
- SBML (for systems biology)
- KGML (for KEGG)
- BioPAX (for MetaCyc etc.)
- XIN (for DIP)
- Etc.
41XIN
42KGML
43BioPAX
44SBML
45Metabolic pathway analysis using comparative
genomics approach
46Context-based analysis
- A metabolic reconstruction is an attempt to
develop a detailed overview of an organisms
metabolism from an analysis of genomic sequence. - Metabolic reconstructions can reveal new aspects
of metabolism in well-studied organisms - It supports inference of pathways on the basis of
the presence or absence of relevant genes.
(missing gene, metabolic hole etc.) - Combining inferred pathways into hierarchical
blocks produces metabolic charts specific for a
particular organism and connected to individual
genes
47Goal Find the missing links
- Long-term goal
- Simulation of whole cell the virtual cell.
- Mid-term goal
- Predict cell reaction to change in environment.
- Predict cell reaction to gene knockout/modificatio
n - Current stage
- Describe and calculate network behavior
- Assign functions to all proteins
- Identify all regulatory events
- Fill the gaps!
48- Missing genes in metabolic pathways a
comparative genomics approach, Osterman and
Overbeek, Current Opinion in Biochemistry, Vol.7,
2003
49Gene cluster
- Genes from the same pathway tend to cluster on
prokaryotic chromosomes. - functional coupling
- Clustering of orthologous FASII-related genes
(with corresponding colors to (b)) provided key
evidence for the identification of two novel
enzymes (missing genes) involved with SFA and UFA
II pathways fabK (11b) and fabM
50Gene fusion
- This technique involves searches for a pair of
genes from one genome that appear to be fused
into a single gene within another genome,
providing further evidence of potential
functional coupling. Since its introduction - the protein fusion approach has been implemented
and successfully applied for genome-wide
hypothetical protein analysis,
51Phylogenetic profile
- The underlying assumption is that two proteins
from the same cellular pathway are expected to
either both occur or both not occur in any
specific organism - Co-occurrence scores the total number of
genomes in the in-group containing a homolog of
a given protein minus the number of genomes in
the out-group containing such a homolog ,
normalized by a highest possible score (of 28).
52Metabolic module findingGenome evolution
reveals biochemical networks and functional
modulesVon Mering et al., PNAS vol.100(26), 2003
- Metabolic pathway DB describes metabolites and
enzymes. - Comparative genomics can reveal selective
pressures shared by groups of enzymes ?
functional modularity
53- Genome evolution reveals biochemical networks
and functional modules - Von Mering et al., PNAS vol.100(26), 2003
54Pathway evolution by interaction with environment
55Modification of metabolism profile
- Genomic islands in pathogenic and environmental
microorganisms, Dobrindt et al., Nature Reviews
Microbiology 2, 414-424 (May 2004)
56Differential metabolic gene profile within
Mycoplasma specie
- Differential metabolism of Mycoplasma species as
revealed by their genomes , Arraes et al.,
Genetics and Molecular Biology (301) - Alternative/salvage pathway
57Reconstruction of symbionthost interactions
- Symbiosis insights through metagenomic analysis
of a microbial consortium - Woyke et al. ,Nature 443, 950-955
58Practical application
- Drug discovery
- Metabolic engineering
59Intracellular parasites http//en.wikipedia.org/wi
ki/Cryptosporidium_parvum
- Cryptosporidium parvum is an important AIDS
pathogen and a potential agent for bioterrorism.
- Determine if IMPDH and other enzymes of the
purine nucleotide salvage pathways are valid
targets for chemotherapy. - Preliminary experiments suggest that C. parvum
IMPDH is significantly different from the human
isozymes, - Currently trying to develop selective inhibitors
of the parasite enzyme - http//www.america.gov/st/washfile-english/2004/Oc
tober/20041029121451lcnirelleP0.6819879.html
60Thanks!