Characterization of Prokaryotic Genomic Structure and Application to Biological Pathway Prediction - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Characterization of Prokaryotic Genomic Structure and Application to Biological Pathway Prediction

Description:

Decipher microbial genomes through understanding ... 2-AEP pathway (in Gram-positive microbes): Pathways to utilize phosphonates: NH2CH2CH2PO3H ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 46
Provided by: Kyl160
Category:

less

Transcript and Presenter's Notes

Title: Characterization of Prokaryotic Genomic Structure and Application to Biological Pathway Prediction


1
Characterization of Prokaryotic Genomic Structure
and Application to Biological Pathway Prediction
  • Ying Xu
  • Biochemistry and Molecular Biology Department,
    and
  • Institute of Bioinformatics
  • University of Georgia
  • http//csbl.bmb.uga.edu

2
Deciphering Microbial Genomes
  • Decipher microbial genomes through understanding
  • individual basic units, e.g., genes, cis
    regulatory elements,
  • organizational structures of the basic units
  • linking genomic structural information to
    molecular and cellular machinery

gcgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtg
tgggtagtagctgatatgatgcgaggtaggggataggatagcaacagatg
agcggatgctgagtgcagtggcatgcgatgtcgatgatagcggtaggtag
acttcgcgcataaagctgcgcgagatgattgcaaagragttagatga.
3
What We Know
  • 300 microbes have their complete genomes
    sequenced
  • Most genes in each genome have been
    computationally predicted (quite accurately)
  • Genes are grouped into operons (transcriptional
    units)

4
What We Know

5
What We Know
  • While some of the concepts are well established,
    little is known about how to identify them
    accurately
  • Many other unknown genomic elements and
    structures are yet to be identified
  • RNA genes
  • pseudo genes
  • transposable elements
  • horizontal transferred genes
  • genomic islands
  • genome rearrangements
  • .
  • regulatory binding motifs of all sorts
  • other regulatory elements encoded in the genome
  • .

6
Deciphering Microbial Genomes
  • Even if we have all the genomic elements and
    structural information, we still need to figure
    out
  • which genes encode what biological function
  • how the genomic structures encode parts of an
    organism
  • how the parts work together to accomplish complex
    functions, e.g., biological clock

7
Goals of the Project
  • deciphering genomic structures of prokaryotic
    organisms
  • investigate genomic structures beyond individual
    genes through comparative genome analyses
  • ultimately, understand why prokaryotic genomes
    are organized in the way they are organized
  • elucidating biological pathways and networks in
    prokaryotic organisms through application of
  • gained information about genomic structures
  • other experimental information, and
  • computational modeling

8
  • PART I Deciphering genomic structure

gcgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtg
tgggtagtagctgatatgatgcgaggtaggggataggatagcaacagatg
agcggatgctgagtgcagtggcatgcgatgtcgatgatagcggtaggtag
acttcgcgcataaagctgcgcgagatgattgcaaagragttagatgagct
gatgctagaggtcagtgactgatgatcgatgcatgcatggatgatgcagc
tgatcgatgtagatgcaataagtcgatgatcgatgatgatgctagatgat
agctagatgtgatcgatggtaggtaggatggtaggtaaattgatagatgc
tagatcgtaggtagtagct
9
Orthologous Gene Mapping-- the basic tool
  • Finding equivalent genes across microbial
    genomes
  • most fundamental operation in comparative genome
    analysis
  • We have developed a novel method for orthologous
    gene mapping using
  • both sequence similarity information and genomic
    structure information

genome X
genome Y
Mao et al, PNAS, 2006 Wu et al, 2006
(submitted) Mao et al, 2006 (submitted)
10
Orthologous Gene Mapping
  • Observation the probability for a pair of
    homologous genes across two genomes to be
    orthologous is substantially higher than the
    probability for them to be non-orthologous if
    there is a pair of homologous genes in their
    neighborhood
  • Have developed a scoring scheme for measuring the
    possibility of being orthologous genes, based
    on
  • the above observation, and
  • sequence similarity information


Orthologous?
Wu et al, 2006 (submitted)
11
Orthologous Gene Mapping
  • For any group of homologous genes, construct a
    map, representing possible orthology relationship
    among homologous genes
  • Interestingly, the map has a hierarchical
    structure!
  • Developed a database for hierarchically clustered
    equivalent gene clusters (HCG) at different
    resolution level

Wu et al, 2006 (submitted) Mao et al, 2006
(submitted)
12
Deciphering Genomic Structures
  • By examining orthologous gene mappings across
    genomes, we can derive enormous amount of genomic
    structure information
  • Operon genes arranged in tandem in genome as a
    basic unit of transcriptional regulation genes
    of an operon work together
  • Regulon a set of operons regulated by the same
    (transcription) regulatory machinery genes of a
    regulon work together under certain conditions

13
Prediction of Operons
  • Known features
  • sharing common promoter and terminator
  • genes of the same operon are functionally related
  • conserved operonic structures across closely
    related genomes
  • inter-genic distances are generally shorter than
    inter-operonic distances
  • ..
  • Mathematically, the problem can be formulated as
    to partition a sequence of genes into groups so
    that are most consistent with
  • conserved gene neighborhood relationships across
    related genomes
  • functional prediction of genes
  • promoter and terminator predictions
  • known intergenic/operonic distributions

14
Prediction of Operons
  • We have developed a number of computer programs,
    including JPOP, for operon prediction
  • Prediction accuracy is 80 when applied to new
    genomes
  • Prediction accuracy could be improved when
    time-course microarray data is available and used

Chen et al, NAR, 2004 Tran et al, NAR, 2006 (to
appear) Dam et al. 2006 (submitted)
15
Prediction of Uber-operons
  • Study of conservations among groups of operons
    has uncovered the lost associations among the
    operons that used to work together
  • A uber-opreon is a group of functionally related
    operons whose union is conserved across multiple
    genomes
  • We have developed an algorithm for predicting
    uber-operons in a genome, which are useful for
  • prediction of component genes of biological
    pathways
  • regulon prediction

g1, g2, g3, g4
g5, g6, g7, g8, g9, g10
genome X
g1, g2, g3, g4, g5
g6
g7, g8, g9, g10
genome Y
Che et al, NAR, 2006
16
Prediction of Regulons
  • A more challenging (and more information-rich)
    problem is to predict regulons
  • Key characteristics of regulons a group of
    operons sharing similar gene expression patterns
    and having common cis (transcription factor)
    binding sites
  • Challenging issues
  • TF binding sites are difficult to predict
  • existing predictions of operons and binding sites
    both are noisy

17
Prediction of Regulons
  • Our strategy clustering of operons based on
  • sharing common regulatory binding sites
  • functional relatedness of involved genes
  • prediction of co-regulated genes based on
    microarray data
  • information derived from uber-operons
  • Clustering of operons allows us to weed out some
    of the erroneous predictions by individual
    (noisy) predictors

Su et al, NAR, 2005 Che et al, 2006 (in
preparation)
18
Prediction of Regulatory Binding Sites
  • Mathematically, the problem can be formulated as
  • Popular methods mainly rely on sampling
    techniques (e.g., Gibbs sampling) to search for
    such a set of k-mers.

Given a set of N promoter sequences and the
genome, find a k-mer from each promoter region so
the aligned N k-mers have high information
content and the statistical significance of
having this aligned N k-mers with such level of
information content is high.
TGTGAAAGACTGTTTTTTTGATCGTTTTGACAAAAATGGAAGTCCACA
AAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATCCCATAG
TGATGTACTGCATGTATGCAAAGGACGTCAGATTACCGTGCAGTACAG
TAAACGATTCCACTAATTTATTCCATGTCACTCTTTTCGCATCTTTGT
ACATTACCGCCAATTCTGTAACAGAGATCACACAAAGCGACGGTGGGG
ACTTTTTTTTCATATGCCTGACGGAGTTGACACTTGTAAGTTTTCAAC
19
Prediction of Regulatory Binding Sites
  • Our approach
  • find conserved k-mers through data clustering
  • validation through biophysical approach
  • Binding site identification through data
    clustering

TGGTGTGAAAGACTGTTTTTTTGATAACTGTCTGCATGGTCATATTTTT
AAATTGTGATGTGTATCGAAGTGTGTTAATGTGAGTTAGCTCACTCAT
TAGAATTCTGAGCGGATAACAATTTCACTTCTGTGAACTAAACCGAGG
TCATGAATTCTGTCACAGTGCAAATTCAGAGATTGTGATTCGATTCAC
ATTTAAATGTTGTGCTGTGGTTAACCCAATTACGGTGTCAAATACCGC
ACAGATGCGACCTGTGACGGAAGATCACTTCGCAATTTGTCAGTGGTC
GCACATATCCT
Olman et al, PSB, 2003 Olman et al, JBCB, 2003
20
Tie to Structural Information
  • We have developed a protein-DNA docking program
    for assessing the binding affinity between a
    protein and DNA motif
  • The core of the program is a statistics-based
    energy function measuring 2-body, 3-body and
    4-body interactions between amino acids and
    nucleotides
  • On a test set with 18 TF structures and 2750
    predicted binding motifs, our program ranks all
    18 correct binding motifs among the top 25
    binding predictions

Liu et al, NAR, 2005
21
Prediction of Functional Modules
  • A pair of genes are considered to be functionally
    linked if they belong to the same (known)
    pathways, regulons, complexes, .
  • We found that such a functional linkage
    relationship could be predicted using
  • co-occurrence relationships
  • co-evolutionary relationships
  • functional relatedness defined in terms of GO
    classification
  • Using such prediction, we have predicted
    functional linkage maps for all sequenced
    microbial genomes

22
Prediction of Functional Modules
  • Identification of sub-networks that might be
    functional
  • Sub-networks that are densely intra-connected
    --- groups of genes that are functionally linked
    with each other hence might indicate that these
    genes work together
  • Sub-networks that are conserved across multiple
    maps groups of genes whose functional linkage
    relationships are conserved across multiple
    organisms, indicating that there is an
    evolutionary pressure for the conservation
  • These two types of relatedness are
    complementary to each other

Wu et al, NAR, 2005 Wu et al, GIW, 2005
23
Prediction of Functional Modules
Red Pathway Blue Regulon Green
Transcription Unit Purple Similar GO
assignments
24
Other Related Work
  • Identification and characterization of insertion
    sequences (and other transposable elements) at
    genome scale
  • Identification and characterization of
    protein-binding motifs at genome scale
  • Functional classification of genes at
    multi-resolution a framework beyond concepts of
    homology/orthology
  • Evolutionary studies of operons

25
Working Towards ..
  • Deriving the genomic units and structures, at
    different levels, of microbial genomes
  • making progress ..
  • Understanding the organizational rules of the
    basic units
  • through extensive comparative genome analyses

26
  • PART II Pathway and network prediction

27
Biological Networks
  • Biological network a group of bio-molecules
    (protein, DNA, RNA) wired together to
    accomplish a (complex) biological function
  • including regulatory, signaling and metabolic
    components
  • pathways un-branched networks
  • Example the process of nitrogen assimilation

Senses the availability of nitrogen in what forms
-gt activates the transporting process to uptake a
particular form of nitrogen into the cell -gt
reduces this form of nitrogen to a form the cell
could utilize directly (nitrate -gt nitrite -gt
ammonia -gt glutamine -gt glutamate) -gt may
trigger a number of biological processes
28
Predicting Biological Networks
A1 A2 An
B1 B2 Bn
Z1 Z2 Zn
Y1 Y2 Yn
t 1
t N
t 2
What is the common regulation mechanism?
transcription regulation network
29
Predicting Biological Networks
  • Linear dynamic model for a regulatory network
  • A transition matrix
  • b constitutive expression level
  • noise at time t
  • expression level of all genes at time t
  • Estimating matrix A as an optimization problem
  • AI (AI)
  • bA Ab

Building models consistent with gene expression
data
30
Challenging Problems
  • There are numerous other mathematical frameworks
    for modeling biological networks
  • Experimental data is significantly limited
    compared to the complexity of the networks to be
    elucidated,
  • making the network prediction problem a
    significantly under-constrained problem
  • leading to possibly infinitely many network
    solutions, each of which explains the data
    equally well

31
Network Inference in Microbes-- our general
strategy
  • Framework prediction of network topologies
    that are most consistent with high-throughput
    data and prior knowledge
  • Constraints derivation of as much information
    about (a) component genes and (b) their
    interactions as possible and using them as
    prediction constraints
  • Sampling sample the feasible network topology
    space to derive network topology distribution

Su et al, GIW, 2003 Ji and Xu, Bioinformatics,
2006
32
Information Extractable from Literature to set
the framework
  • Literature and database search
  • to infer initial conceptual models for a target
    pathway
  • to collect information about which genes are
    involved in the target pathway and their
    interaction relationships

Pathways to utilize phosphonates
2-AEP pathway (in Gram-positive microbes)

Transaminase
NH2CH2CH2PO3H 2-aminoethylphosphonate
COHCH2PO3H2 phosphoacetaldehyde
phosphonatase
Automated literature mining capabilities are
desperately needed!!!
CHOCH3 Pi acetaldehyde
33
Derivation of Constraints
  • Information derivable through comparative genome
    analyses and analysis of other experimental data
  • Component genes (parts list) in a target network
  • Functional roles of component genes
  • Possible interaction relationships among
    component genes
  • Higher level functional modules conserved
    across organisms

using a systematic approach!
34
Deriving Parts List
  • Through analysis of microarray gene expression
    data, one could possibly identify an initial list
    of genes possibly involved in a particular
    biological process
  • identification of differentially expressed genes,
    co-expressed genes

g1, g2, , gk
The observed gene expression data are the results
of complex interactions of possibly many pathways
in a cell, which might work cooperatively,
competitively or independently with each other
Microarray data might need to be interpreted in
the context of a network model
Xu et al, NAR, 2003
35
Deriving Parts List
  • Refining parts-list through prediction and
    application of genomic structures (guilt by
    association)
  • Operons
  • Uber-operons
  • Regulons, and
  • Functional modules
  • ..

36
Prediction of Interactions
  • Two types of interactions we intend to capture
  • physical interactions
  • functional links
  • There are a number of databases of experimentally
    verified protein-protein interactions
  • DIP, BIND

Homology search against these data sets is the
key technique
Su et al, GIW, 2004
37
Network Mapping across Genomes
  • Related genomes may employ similar networks for a
    particular biological process
  • Through mapping a homologous network across
    genomes, one could possibly derive a network in
    the target genome

?
38
Network Mapping across Genomes
  • Our approach -- mapping orthologous genes of a
    pathway to a target genome, which best preserve
    regulon structures, i.e., co-regulated operons
  • The basic idea find homologous gene pairs with
    highest sequence similarity under condition that
    mapped genes are grouped into co-regulated
    operons

homologous genes
Using both homology and genomic structure
information, in mapping networks!
39
Network Mapping across Genomes
  • The problem was formulated and solved as a
    Steiner network problem (called constrained
    minimum spanning tree problem)
  • A recent solution solves the problem as an
    integer programming problem

Have implemented the algorithm as a program P-MAP
Mao et al, PNAS, 2006 Olman et al, CSB, 2004
40
Mapping KEGG Pathways
  • (Generic) KEGG pathways consist of enzymes and
    their interactions
  • Mapping a KEGG pathway is essentially to find
    genes that encode the enzymes in the pathway

41
Nitrogen Assimilation and Photosynthesis
  • Known facts
  • the core part of the nitrogen assimilation is
    regulated by TF ntcA, forming ntcA regulon
  • A number of genes are known to be in the ntcA
    regulons in some of the 16 sequenced
    cyanobacterial genomes
  • known ntcA regulated operons in cyanobacteria
    also have a s70-like binding motif in their
    promoters
  • We predicted the binding motif of ntcA along with
    the s70-like motif
  • Key idea predicting clustered motifs

Su et al, NAR, 2005
42
Nitrogen Assimilation and Photosynthesis
  • Using the profiles of the two binding motifs, we
    searched the 16 genomes for additional nctA
    regulated genes and identified a number of
    additional operons
  • An interesting observation is that we
    consistently found genes known to be involved in
    photosynthesis across the 11 genomes, with ntcA
    binding motifs
  • It was previously known that nitrogen
    assimilation process is somehow coordinated with
    the photosynthesis process but the molecular
    level mechanism is not clear
  • We for the first time predicted a rough model for
    the coordination process between these two
    important biological processes, based on the
    detailed functions and interactions of the
    involved genes.

Su et al, NAR, 2006
43
Nitrogen Assimilation and Photosynthesis
Nutrients
Light
CO2
Som
Periplasmic membrane
Plasma membrane
Photosystem
Calvin cycle
ATP NADPH
RbcL, RbcS, Icd
NrtP

Other pathways
NO3
NO3-
SYNY2460, 2468,2469,2474
2-OG
PII
NarB
Hypothetical proteins

SYNW2289
PetH

Hypothetical proteins
NO2-
SYNW0273
NirA
GOGAT
Glu
Rpod
NtcA
Urt
Urease
NH4
Gln
Urea
Urea
GS
Cyanase
DNA
Glu
Cyn
Cyanate
Cyanate
GltS
Amt
Glu
NH4
Shape codes
Color codes
NtcA regulon
transformation/translocation
transporter
Non-ntcA regulon
gene
regulation
protein
Transcription factor
44
Summary
  • Substantial amount of information about genomic
    structures and organizational rules are derivable
    through comparative genomics
  • This information makes it possible for
    computational derivation of biological pathways
    and networks of microbes
  • Network prediction is a systems problem, and it
    requires a systems approach
  • Combined application of the multiple types of
    information provides a powerful approach to
    network elucidation

45
Acknowledgment
  • People of the project at UGA
  • Zhengchang Su
  • Fenglou Mao
  • Hongwei Wu
  • PhuongAn Dam
  • Victor Olman
  • Guojun Li
  • Zhijie Liu
  • Fengfeng Zhou
  • Dongsheng Che
  • Collaborators
  • Tao Jiang, UCR
  • Xin Chen, UCR
  • Brian Palenik, UCSD
  • Dong Xu, Univ of Missouri
  • Arthur Grossman, Carnegie Inst
  • Devaki Bhaya, Carnegie Inst
  • Funding support
  • NSF/BDI2 NSF/ITR
  • DOE GTL project

http//csbl.bmb.uga.edu
Write a Comment
User Comments (0)
About PowerShow.com