Metabolic Pathway I609: PhD Seminar Computational techniques in comparative genomics PowerPoint PPT Presentation

presentation player overlay
1 / 60
About This Presentation
Transcript and Presenter's Notes

Title: Metabolic Pathway I609: PhD Seminar Computational techniques in comparative genomics


1
Metabolic PathwayI609 PhD SeminarComputational
techniques in comparative genomics
  • Kwangmin Choi

2
Overview
  • Metabolism 101
  • Resources
  • Network topology
  • Pathway representation
  • Metabolic pathway analysis using comparative
    genomics approach
  • Pathway evolution
  • Practical application

3
Metabolism 101
4
-Omics
5
Basic keywords
  • An enzyme is any of several complex proteins that
    are produced by cells and act as catalysts in
    specific biochemical reactions
  • A reaction is a process in which one or more
    substances are changed chemically into one or
    more different substances
  • Metabolism is a step by step modification of the
    initial molecule to shape it into another
    product.
  • Catabolism
  • Anabolism
  • A metabolite is any substance involved in
    metabolism either as a product of metabolism or
    necessary for metabolism (substrate)

6
Catalytic reaction by enzyme
7
EC number(from wikipedia)
  • EC 1 Oxidoreductases
  • EC 1.1 Acting on the CH-OH group of donors
  • EC 1.1.1 With NAD or NADP as acceptor
  • EC 1.1.1.1 alcohol as substrate

8
Metabolome and Metabolomics
  • Metabolome
  • complete set of small-molecule metabolites to be
    found within a biological sample, such as a
    single organism
  • Target for drug discovery Biomarkers of
    physiological disease (diagnostics).
  • Target for metabolic engineering (e.g. Jurassic
    park)
  • Metabolomics
  • "the quantitative measurement of the dynamic
    multiparametric metabolic response of living
    systems to pathophysiological stimuli or genetic
    modification"
  • Metabolomics can be regarded as the end point of
    the 'omics' era (genomics, transcriptomics,
    proteomics, metabolomics.etc.)
  • The changes in the metabolome are the ultimate
    answer of an organism to genetic alterations,
    disease, or environmental influences.

9
MetabolismTCA cycle, glycolysis
10
Metabolic pathway/network
  • A metabolic pathway is a series of chemical
    reactions occurring within a cell, catalyzed by
    enzymes, and resulting in either the formation of
    a metabolic product to be used or stored by the
    cell, or the initiation of another metabolic
    pathway.
  • often require dietary minerals, vitamins and
    other cofactors
  • Networks of metabolite feedback pathways
  • regulate gene and protein expression,
  • also can mediate signaling between organisms.
  • Directed graph
  • Reaction stoichiometry Quatatative relationship
  • A substrate enters a metabolic pathway depending
    on the needs of the cell and the availability of
    the substrate.
  • Initiate another metabolic pathway (flux
    generating step).

11
Metabolic pathway overview Roche Applied
Science "Biochemical Pathways" wall chart
http//www.expasy.ch/cgi-bin/show_thumbnails.pl
12
G5 region a part of TCA
13
Metabolic modules big themes
14
The size of metabolomes
  • An attraction of the metabolome has always been
    that it is numerically smaller, and thus more
    tractable, than the transcriptome or proteome.
  • The measured metabolome is greater than that
    encoded by the genome
  • it includes molecules acquired exogenously as
    drugs, foods or food additives,
  • also include molecules derived from the
    microflora of the host
  • Open system, not closed
  • Saccharomyces cerevisiae has some 1200 reactions
    and 650 metabolites
  • The curated human metabolome presently contains
    respectively some 1100/3300 reactions and
    700/2700 metabolites.

15
The number of reactions and metabolites are
underestimated
  • Some areas of metabolism are more represented
    than others transporters especially are highly
    under-represented (for their activities in
    transporting xenobiotics and pharmaceuticals)
  • Many enzymes have currently unknown substrates.
  • It is hard to discover molecules whose existence
    one does not suspect, and so some molecules might
    be reasonably prevalent but of unknown chemical
    identity. (In plants and yeast, most metabolites
    measured by gas chromatography-mass spectrometry
    are presently of uncertain identity.)
  • Genome evolution reveals biochemical networks
    and functional modules
  • Von Mering et al., PNAS vol.100(26), 2003

16
Public resources
17
KEGG Kyoto Encyclopedia of Genes and
Genomeshttp//www.genome.jp/kegg
  • KEGG is a suite of databases. One of the most
    integrated pathway DB
  • 335 pathways from 906 organisms
  • PATHWAY
  • holds the current knowledge on molecular
    interaction networks, including metabolic
    pathways, regulatory pathways,and molecular
    complexes
  • GENES
  • is a collection of gene catalogs for all the
    complete genomes and some partial genomes. Each
    gene catalog is computationally derived from
    public resources, and is manually reannotated for
    reconstruction of KEGG pathways.
  • KEGG GENES is associated with KEGG GENOME
    containing chromosome maps,
  • LIGAND Database
  • provide the linkage between chemical and
    biological aspects of life in the light of
    enzymatic reactions
  • COMPOUND/GLYCAN/REACTION
  • KO for manually curated ortholog groups
  • KEGG SSDB for computationally generated
    ortholog/paralog clusters and gene clusters

18
KEGG
19
MetaCychttp//www.metacyc.org
  • MetaCyc is a database of nonredundant,
    experimentally elucidated metabolic pathways
  • More than 900 different organisms are represented
  • The majority of pathways occur in microorganisms
    and plants
  • More than 900 metabolic pathways are stored, with
    more than 6,000 enzymatic reactions and more than
    12,000 associated literature citations
  • stores all enzyme-catalyzed reactions that have
    been assigned EC numbers by the Nomenclature
    Committee of the International Union of
    Biochemistry and Molecular Biology
  • also stores hundreds of additional 
    enzyme-catalyzed reactions that have not yet been
    assigned an EC number
  • MetaCyc contains pathways involved in both
    primary and secondary metabolism

20
MetaCyc
21
Other resources
22
Network topology
23
Graph and Network
  • Graph
  • Well-known and important concept in discrete
    mathematics and computer science
  • Consists of a set of nodes and a set of edges
  • Node ? Object
  • Edge ? Relation between two objects
  • Network
  • Graph Some information/meaning
  • We do not distinguish graphs from networks in
    this talk

24
Graph representation of metabolic pathway
  • Compound Network
  • Node ? Chemical compound
  • Edge ? Chemical reaction
  • Reaction Network
  • Node ? Chemical reaction (or enzyme)
  • Edge ? Chemical compound shared by two reactions

25
Graph and Biological Network
Metabolic network (KEGG)
Graph
Node Object e.g. Chemical compound Edge
Relation between objects e.g.
Chemical reaction
26
Degree/connectivity, k
  • How many links the node has to other nodes?
  • Undirected network
  • Characterized by an average degree P(k) 2L/N
  • N nodes and L links
  • Directed network
  • Incoming degree, kin
  • Outgoing degree, kout

27
Graph and Degree
  • Degree
  • Node with degree 1 J
  • Nodes with degree 2 B, C, F, G, H
  • Nodes with degree 3 E, I, A, D
  • P(k) (degree distribution) P(1)0.1,
    P(2)0.5, P(3)0.4

28
Scale-Free Network
  • P(k)
  • Degree distribution
  • Frequency of nodes with degree k
  • Scale-free network
  • P(k) follows power law
  • Different from random networks

29
Poisson Distribution and Power-Law Distribution
Poisson distribution (random graph)
Power-law distribution (scale-free graph)
e-??k/k!
k -?
30
Random Network vs. Scale-free Network
P(k) e-??k/k!
Random Network
Scale-free Network
P(k) ? k -?
( m03, m2 )
2/6
4/14
3/10
2/14
3/10
4/14
2/6
2/14
2/6
2/10
2/10
2/14
31
(No Transcript)
32
Scale-free Networks in Real World
  • Metabolic networks ??2.24 (depending on species)
  • Node ? Chemical compound
  • Edge ? Chemical reaction (almost equivalently,
    enzyme)
  • Protein interaction networks ??2.2
  • Node ? Protein
  • Edge ? Interaction between two proteins
  • WWW??2.1
  • Node ? Web page
  • Edge ? Link between web pages
  • Movie stars??2.3
  • Node ? Actor/Actress
  • Edge ? Act in the same movie

33
Compound Network vs. Reaction Network
34
Line Graph Transformation and Metabolic Network
  • Correspondence
  • Compound network ? Original network
  • Reaction network ? Transformed network
  • But, this correspondence is not precise
  • We will consider more realistic transformation
    later

35
Line Graph Transformation
  • Edge in G ? Node in L(G)
  • There is an edge in L(G) if two edges in G
    share the same node as endpoints

36
Physical Line Graph Transformation
37
Main Result
  • If P(k) ? k ? in G, then P(k)? k ?1 in
    L(G)
  • Assuming that there is no assortative mixing in G
  • (i.e, any node has no preference of high
    or low degree nodes)
  • Intuitive Proof
  • Node v of degree k in G has k edges
  • These k edges corresponds to k nodes in L(G)
  • Each of these k nodes in L(G) has degree around k
  • From v in G, we have k nodes of degree around k
    in L(G)
  • Thus, P(k) ? kk ? k ?1 in L(G)

38
Scale-free topology in biological systems
  • Network biology understanding the cell's
    functional organization, Barabási et al., Nature
    Reviews Genetics 5, 101-113, 2004
  • Growth process and Preferential attachment
  • The network emerges through the subsequent
    addition of new nodes (1,2 in red)
  • The nodes that appeared early in the history of
    the network are the most connected ones
  • e.g. CoA, NAD, GTP
  • Nodes prefer to connect to more connected nodes.
    (1 gtgt 2)
  • Growth and preferential attachment generate hubs
    through a rich-gets-richer mechanism
  • Error tolerance
  • Attack vulneraility
  • Gene duplication as the origin of preferential
    attachment
  • Duplicated genes produce identical proteins that
    interact with the same partner. Therefore, each
    protein that is in contact with a duplicated
    protein gains an extra link
  • Proteins with a large number of interactions tend
    to gain links more often, as it is more likely
    that they interact with the protein that has been
    duplicated.

39
XML representation
40
XML representation
  • Machine-readable
  • Easy for data exchange
  • SBML (for systems biology)
  • KGML (for KEGG)
  • BioPAX (for MetaCyc etc.)
  • XIN (for DIP)
  • Etc.

41
XIN
42
KGML
43
BioPAX
44
SBML
45
Metabolic pathway analysis using comparative
genomics approach
46
Context-based analysis
  • A metabolic reconstruction is an attempt to
    develop a detailed overview of an organisms
    metabolism from an analysis of genomic sequence.
  • Metabolic reconstructions can reveal new aspects
    of metabolism in well-studied organisms
  • It supports inference of pathways on the basis of
    the presence or absence of relevant genes.
    (missing gene, metabolic hole etc.)
  • Combining inferred pathways into hierarchical
    blocks produces metabolic charts specific for a
    particular organism and connected to individual
    genes

47
Goal Find the missing links
  • Long-term goal
  • Simulation of whole cell the virtual cell.
  • Mid-term goal
  • Predict cell reaction to change in environment.
  • Predict cell reaction to gene knockout/modificatio
    n
  • Current stage
  • Describe and calculate network behavior
  • Assign functions to all proteins
  • Identify all regulatory events
  • Fill the gaps!

48
  • Missing genes in metabolic pathways a
    comparative genomics approach, Osterman and
    Overbeek, Current Opinion in Biochemistry, Vol.7,
    2003

49
Gene cluster
  • Genes from the same pathway tend to cluster on
    prokaryotic chromosomes.
  • functional coupling
  • Clustering of orthologous FASII-related genes
    (with corresponding colors to (b)) provided key
    evidence for the identification of two novel
    enzymes (missing genes) involved with SFA and UFA
    II pathways fabK (11b) and fabM

50
Gene fusion
  • This technique involves searches for a pair of
    genes from one genome that appear to be fused
    into a single gene within another genome,
    providing further evidence of potential
    functional coupling. Since its introduction
  • the protein fusion approach has been implemented
    and successfully applied for genome-wide
    hypothetical protein analysis,

51
Phylogenetic profile
  • The underlying assumption is that two proteins
    from the same cellular pathway are expected to
    either both occur or both not occur in any
    specific organism
  • Co-occurrence scores the total number of
    genomes in the in-group containing a homolog of
    a given protein minus the number of genomes in
    the out-group containing such a homolog ,
    normalized by a highest possible score (of 28).

52
Metabolic module findingGenome evolution
reveals biochemical networks and functional
modulesVon Mering et al., PNAS vol.100(26), 2003
  • Metabolic pathway DB describes metabolites and
    enzymes.
  • Comparative genomics can reveal selective
    pressures shared by groups of enzymes ?
    functional modularity

53
  • Genome evolution reveals biochemical networks
    and functional modules
  • Von Mering et al., PNAS vol.100(26), 2003

54
Pathway evolution by interaction with environment
55
Modification of metabolism profile
  • Genomic islands in pathogenic and environmental
    microorganisms, Dobrindt et al., Nature Reviews
    Microbiology 2, 414-424 (May 2004)

56
Differential metabolic gene profile within
Mycoplasma specie
  • Differential metabolism of Mycoplasma species as
    revealed by their genomes , Arraes et al.,
    Genetics and Molecular Biology (301)
  • Alternative/salvage pathway

57
Reconstruction of symbionthost interactions
  • Symbiosis insights through metagenomic analysis
    of a microbial consortium
  • Woyke et al. ,Nature 443, 950-955

58
Practical application
  • Drug discovery
  • Metabolic engineering

59
Intracellular parasites http//en.wikipedia.org/wi
ki/Cryptosporidium_parvum
  • Cryptosporidium parvum  is an important AIDS
    pathogen and a potential agent for bioterrorism. 
  • Determine if IMPDH and other enzymes of the
    purine nucleotide salvage pathways are valid
    targets for chemotherapy.  
  • Preliminary experiments suggest that C. parvum
    IMPDH is significantly different from the human
    isozymes,
  • Currently trying to develop selective inhibitors
    of the parasite enzyme
  • http//www.america.gov/st/washfile-english/2004/Oc
    tober/20041029121451lcnirelleP0.6819879.html

60
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com