Comparative genomics and evolution of regulatory interactions in bacteria PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Comparative genomics and evolution of regulatory interactions in bacteria


1
Comparative genomics and evolution of regulatory
interactions in bacteria
  • Mikhail Gelfand
  • Research and Training Center of Bioinformatics
  • Institute for Information Transmission Problems
  • Russian Academy of Sciences
  • September 2006

2
??? ??? ??????????. ? ???? ?????. ??????
????????? ?? ???? ????. ???? ???????????? ?????
??????. ??????? ????????, ??? ??? ??????.?????
????????
  • A list of some observations. In a corner, its
    warm.
  • A glance leaves an imprint on anything its dwelt
    on.
  • Water is glasss most public form.
  • Man is more frightening than its skeleton.
  • Joseph Brodsky

3
Basic assumptions and techniques
  • Phylogenetic footprinting (Ross Hardison,
    eukaryotes, 1988)regulatory (transcription
    factor-binding) sites are more conserved than
    surrounding non-coding regions gt TF-binding
    sites are seen as conserved islands in multiple
    alignments of gene upstream regions.
  • Works for close genomes (e.g. E.coli
    Salmonella, sometimes Yersinia), where upstream
    regions are alignable.
  • Ignores site turnover
  • Consistency filtering (Gelfand and Mironov, 1999,
    bacteria)regulatory systems are biologically
    reasonablegt regulons are conserved (more or
    less) gt true sites occur upstream of
    orthologous genes (false sites are scattered at
    random)
  • need to take care of the operon structure
  • assumes conservation of TF-binding motif in DNA
  • ignores evolution of regulatory systems

4
Conserved motif upstream of nrd genes
5
Identification of the candidate regulator by the
analysis of phyletic patterns
  • COG1327 the only COG with exactly the same
    phylogenetic pattern as the motif
  • large scale on the level of major taxa
  • small scale within major taxa
  • absent in small parasites among alpha- and
    gamma-proteobacteria
  • absent in Desulfovibrio spp. among
    delta-proteobacteria
  • absent in Nostoc sp. among cyanobacteria
  • absent in Oenococcus and Leuconostoc among
    Firmicutes
  • present only in Treponema denticola among four
    spirochetes

6
COG1327 Predicted transcriptional regulator,
consists of a Zn-ribbon and ATP-cone domains
regulator of the riboflavin pathway?
7
Additional evidence 1
  • nrdR is sometimes clustered with nrd genes or
    with replication genes dnaB, dnaI, polA

8
Additional evidence 2
  • In some genomes, candidate NrdR-binding sites are
    found upstream of other replication-related genes
  • dNTP salvage
  • topoisomerase I, replication initiator dnaA,
    chromosome partitioning, DNA helicase II

9
Multiple sites (nrd genes) FNR, DnaA, NrdR
10
Mode of regulation
  • Repressor (overlaps with promoters)
  • Co-operative binding
  • most sites occur in tandem (gt 90 cases)
  • the distance between the copies (centers of
    palindromes) equals an integer number of DNA
    turns
  • mainly (94) 30-33 bp, in 84 31-32 bp 3 turns
  • 21 bp (2 turns) in Vibrio spp.
  • 41-42 bp (4 turns) in some Firmicutes
  • experimental confirmation in Streptomyces
    (Borovok et al. 2004, Grinberg et al. 2006) and
    in E. coli (Grinberg et al. 2006)

11
Evolutionary processes that shape regulatory
systems
  • Expansion and contraction of regulons (birth or
    death of sites)
  • Duplications of regulators (with or without
    regulated loci)
  • Loss of regulators (with or without regulated
    loci)
  • Re-assortment of regulators and structural genes
  • especially in complex systems
  • Change of regulator specificity
  • Horizontal transfer

12
Birth and death of sites is a very dynamic
process (even in bacteria)
  • NadR-binding sites upstream of pncB seem absent
    in Klebsiella pneumoniae and Serratia marcescens

13
but there are candidate sites further upstream
14
and they are clearly diferent (not simply
misaligned).
15
Loss of regulators and cryptic sites
Loss of RbsR in Y. pestis (ABC-transporter also
is lost)
RbsR binding site
Start codon of rbsD
16
Regulon expansion how FruR has become CRA
Mannose
Glucose
ptsHI-crr
manXYZ
edd
epd
eda
adhE
aceEF
icdA
ppsA
pykF
mtlD
mtlA
Mannitol
pckA
gpmA
pgk
gapA
fbp
pfkA
aceA
tpiA
fruK
fruBA
Fructose
aceB
Gamma-proteobacteria
17
Common ancestor of Enterobacteriales
Mannose
Glucose
ptsHI-crr
manXYZ
edd
epd
eda
adhE
aceEF
icdA
ppsA
pykF
mtlD
mtlA
Mannitol
pckA
gpmA
pgk
gapA
fbp
pfkA
aceA
tpiA
fruK
fruBA
Fructose
aceB
Gamma-proteobacteria Enterobacteriales
18
Common ancestor of Escherichia and Salmonella
Mannose
Glucose
ptsHI-crr
manXYZ
edd
epd
eda
adhE
aceEF
icdA
ppsA
pykF
mtlD
mtlA
Mannitol
pckA
gpmA
pgk
gapA
fbp
pfkA
aceA
tpiA
fruK
fruBA
Fructose
aceB
Gamma-proteobacteria Enterobacteriales E. coli
and Salmonella spp.
19
Trehalose/maltose catabolism, alpha-proteobacteria
Duplicated LacI-family regulators
lineage-specific post-duplication loss
20
The binding motifs are very similar (the blue
branch is somewhat different to avoid
cross-recognition?)
21
Utilization of maltose/maltodextrin, Firmicutes
Displacement invasion of a regulator from a
different subfamily (horizontal transfer from a
related species?) blue sites
22
Orthologous TFs with completely different
regulons (alpha-proteobaceria and Xanthomonadales)
23
Utilization of an unknown galactoside in
gamma-proteobacteria
Yersinia and Klebsiella two regulons, GalR (not
shown, includes genes galK and galT) and Laci-X
Erwinia one regulon, GalR
Loss of regulator and merger of regulons It
seems that laci-X was present in the common
ancestor (Klebsiella is an outgroup)
24
Catabolism of gluconate, proteobacteria
25
extreme variability of regulation of marginal
regulon members
ß
?
Pseudomonas spp.
26
Combined regulatory network for iron homeostasis
genes in in a-proteobacteria.
Fe
Fe
- Fe
Fe
-
FeS status
of cell
FeS
- Fe
Fe
The connecting line denote regulatory
interactions, which the thickness reflecting the
frequency of the interaction in the analyzed
genomes. The suggested negative or positive mode
of operation is shown by dead-end and arrow-end
of the line.
27
Distribution of Irr, Fur/Mur, MntR, RirA,
and IscR regulons in a-proteobacteria
?' in RirA column denotes the absence of the
rirA gene in an unfinished genomic sequence and
the presence of candidate RirA-binding sites
upstream of the iron uptake genes.
28
Distribution of Irr, Fur/Mur, MntR, RirA,
and IscR regulons in a-proteobacteria
?' in RirA column denotes the absence of the
rirA gene in an unfinished genomic sequence and
the presence of candidate RirA-binding sites
upstream of the iron uptake genes.
29
Distribution of Irr, Fur/Mur, MntR, RirA,
and IscR regulons in a-proteobacteria
Not RirA. IscR?
?' in RirA column denotes the absence of the
rirA gene in an unfinished genomic sequence and
the presence of candidate RirA-binding sites
upstream of the iron uptake genes. UPDATE the
genomes finished, still no rirA gene.
30
Distribution of the conserved members of the Fe-
and Mn-responsive regulons and the predicted
RirA, Fur/Mur, Irr, and DtxR binding sites in
a-proteobacteria
Genes Functions Iron uptake Iron storage FeS
synthesis
Iron usage Heme biosynthesis Regulatory
genes Manganese uptake
31
Phylogenetic tree of the Fur family of
transcription factors in a-proteobacteria - I
Fur in g- and b- proteobacteria
Fur in e- proteobacteria
Fur in Firmicutes
in a-proteobacteria
Regulator of manganese uptake genes (sit, mntH)
in a-proteobacteria
Regulator of iron uptake and metabolism genes
a-proteobacteria
32
Erythrobacter litoralis
Caulobacter crescentus
Novosphingobium aromaticivorans
Zymomonas mobilis
Sequence logos for identified Fur-binding sites
in the other group of a-proteobacteria
Sphinopyxis alaskensis
Oceanicaulis alexandrii
Rhodospirillum rubrum
Gluconobacter oxydans
Magnetospirillum magneticum
Parvularcula bermudensis -
Identified Mur-binding sites
Bacillus subtilis
The A, B, and C groups
Sequence logos for known Fur-binding sites in
Escherichia coli and Bacillus subtilis
Mur
a
of - proteobacteria -
Escherichia coli
33
Phylogenetic tree of the Fur family of
transcription factors in a-proteobacteria - II
Fur in g- and b- proteobacteria
Fur in e- proteobacteria
Fur in Firmicutes
a-proteobacteria
Irr in a-proteo- bacteria regulator of
iron homeostasis
34
Sequence logos for the identified Irr binding
sites in a-proteobacteria.
(8 species) - Irr
The A group
The B group

(4 species) - Irr
The C group (12 species) - Irr
35
Phylogenetic tree of the Rrf2 family of
transcription factors in a-proteobacteria
Nitrite/NO-sensing regulator NsrR (Nitrosomonas
europeae, Escherichia coli)
Positional clustering of rrf2-like genes
with iron uptake and storage genes Fe-S cluster
synthesis operons genes involved in nitrosative
stress protection sulfate uptake/assimilation
genes thioredoxin reductase carboxymuconolactone
decarboxylase-family genes hmc cytochrome
operon
Iron repressor RirA (Rhizobium leguminosarum)
Cysteine metabolism repressor CymR (Bacillus
subtilis)
Cytochrome complex regulator Rrf2 (Desulfovibrio
vulgaris)
Iron-Sulfur cluster synthesis repressor
IscR (Escherichia coli)
proteins with the conserved C-X(6-9)-C(4-6)-C
motif within effector-responsive domain proteins
without a cysteine triad motif
36
Sequence logos for the identified RirA-binding
sites in a-proteobacteria
The A group - RirA
(8 species)
(12 species)
The C group - quasi-RirA (12 genomes)
37
An attempt to reconstruct the history
38
Regulators and their binding motifs
  • Subtle changes at close evolutionary distances
  • Cases of motif conservation at surprisingly large
    distances
  • Surprisingly similar motifs of unrelated
    regulators site usurpation (???)
  • Correlation between contacting nucleotides and
    amino acid residues

39
DNA motifs and protein-DNA interactions
Entropy at aligned sites and the number of
contacts (heavy atoms in a base pair at a
distance ltcutoff from a protein atom)
CRP
PurR
IHF
TrpR
40
Specificity-determining positions in the LacI
family
  • Training set 459 sequences,
  • average length 338 amino acids,
  • 85 specificity groups

44 SDPs
10 residues contact NPF (analog of the effector)
7 residues in the effector contact zone
(5?ltdminlt10?)
6 residues in the intersubunit contacts
5 residues in the intersubunit contact zone
(5?ltdminlt10?)
7 residues contact the operator sequence
6 residues in the operator contact zone
(5?ltdminlt10?)
LacI from E.coli
41
CRP/FNR family of regulators
42
Correlation between contacting nucleotides and
amino acid residues
  • CooA in Desulfovibrio spp.
  • CRP in Gamma-proteobacteria
  • HcpR in Desulfovibrio spp.
  • FNR in Gamma-proteobacteria

Contacting residues REnnnR TG 1st arginine GA
glutamate and 2nd arginine
DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVR DV COOA
ELTMEQLAGLVGTTRQTASTLLNDMIR EC CRP
KITRQEIGQIVGCSRETVGRILKMLED YP CRP
KXTRQEIGQIVGCSRETVGRILKMLED VC CRP
KITRQEIGQIVGCSRETVGRILKMLEE DD HCPR
DVSKSLLAGVLGTARETLSRALAKLVE DV HCPR
DVTKGLLAGLLGTARETLSRCLSRMVE EC FNR
TMTRGDIGNYLGLTVETISRLLGRFQK YP FNR
TMTRGDIGNYLGLTVETISRLLGRFQK VC FNR
TMTRGDIGNYLGLTVETISRLLGRFQK
TGTCGGCnnGCCGACA
TTGTGAnnnnnnTCACAA
TTGTgAnnnnnnTcACAA
TTGATnnnnATCAA
43
The correlation holds for other factors in the
family
44
Open problems
  • Model the evolution of regulatory systems (a
    catalog of elementary events, estimates of
    probabilities)
  • Birth of a binding site what are the mechanisms?
  • Loss of a binding site
  • Duplication of a regulated gene and/or a
    regulator
  • Horizontal transfer of a regulated gene and/or a
    regulator
  • Loss of a regulated gene and/or a regulator
  • Change of specificity
  • General properties?
  • Distribution of TF family and regulon sizes
  • Stable cores and flexible margins of functional
    systems (in terms of gene presence and
    regulation)
  • Co-evolution of TFs and DNA sites
  • Neutral model for the evolution of binding
    sites (with invariant functional pressure from
    the bound protein)
  • How do the motifs evolve? What is the driving
    force changes in TFs?
  • TF-family, position-specific protein-DNA
    recognition code?
  • All that needs to take into account the
    incompleteness and noise in the data

45
RNA regulatory systems
  • Riboswitches regulation by formation of
    alternative structures dependent on binding of
    small molecules
  • T-boxes regulation by formation of alternative
    structures dependent on binding of uncharged tRNA
  • Highly conserved (sequence, secondary
    structure)gt easy to recognize
  • Largegt phylogenetic trees, duplications etc.

46
Systematic analysis of T-boxes (very
preliminary results)
  • T-boxes the mechanism (Grundy Henkin)

47
Partial alignment of predicted T-boxes
TGG T-box
Aminoacyl-tRNA synthetases
Amino acid biosynthetic genes
Amino acid transporters
48
continued (in the 5 direction)
anti-anti (specifier) codon
Aminoacyl-tRNA synthetases
Amino acid biosynthetic genes
Amino acid transporters
49
800 T-boxes in 90 bacteria
  • Firmicutes
  • aa-tRNA synthetases
  • enzymes
  • transporters
  • all amino acids excluding glutamine, glutamate,
    lysine
  • Actinobacteria (regulation of translation
    predicted)
  • branched chain (ileS)
  • aromatic (Atopobium minutum)
  • Delta-proteobacteria
  • branched chain (leu enzymes)
  • Thermus/Deinococcus group (aa-tRNA synthases)
  • branched chain (ileS, valS)
  • glycine
  • Chloroflexi, Dictyoglomi
  • aromatic (trp enzymes)
  • branched chain (ileS)
  • threonine

50
Recent duplications and bursts ARG-T-box in
Clostridium difficile
51
(No Transcript)
52
ASN/ASP/HIS T-boxes Duplications and changes in
specificity
53
Blow-up
54
Branched-chain amino acids duplications and
changes in specificity
ATC
CTC
ATC
55
Blow-up
transporter
ATC
GTC
dual regulation of common enzymes
ATC
CTC
56
Same enzymes different regulators (common part
of the aromatic amino acids biosynthesis pathway)
cf. E.coli AroF,G,H feedback inhibition by TRP,
TYR, PHE transcriptional regulation by TrpR, TyrR
57
S-box (SAM riboswitch)
Grundy and Henkin, 1998
58
S-box riboswitch regulator of methionine
biosynthesis
Firmicutes
Loss of S-boxes
Lactobacillales Met-T-box
Streptotoccales MtaR (transcription
factor) SAM-III riboswitch (metK) (the Henkin
group)
Bacillales S-box
Clostridiales S-box
proteobacteria
  • Other genomes with S-boxes the Zoo
  • Petrotoga
  • actinobacteria (Streptomyces, Thermobifida)
  • Chlorobium, Chloroflexus, Cytophaga
  • Fusobacterium
  • Deinococcus

E.coliTFs
Xanthomonas S-box
alphas SAM-II
Geobacter S-box
Need more genomes
59
Acknowledgements
  • Andrei A. Mironov (algorithms and software)
  • Alexandra B. Rakhmaninova (SDPs)
  • Olga Kalinina (SDPs/LacI)
  • Olga Laikova (LacI, sugars)
  • Dmitry Ravcheev (FruR)
  • Dmitry Rodionov (now at Burnham Institute) (NrdR,
    iron)
  • Alexei Vitreschak (RNA)
  • Leonid Mirny, MIT (protein/DNA contacts, SDPs)
  • Andy Johnston, University of East Anglia (iron)
  • Howard Hughes Medical Institute
  • Russian Fund of Basic Research
  • Russian Academy of Sciences, program Molecular
    and Cellular Biology
  • INTAS
Write a Comment
User Comments (0)
About PowerShow.com