Title: Evolution of regulatory interactions in bacteria
1Evolution of regulatory interactions in bacteria
- Mikhail Gelfand
- Research and Training Center Bioinformatics,
- Institute for Information Transmission Problems,
RAS - Moscow, Russia
- Singapore, 17-18 July 2006
2Comparative genomics of regulation
- Why
- Functional annotation of genes
- Metabolic modeling
- Practical applications in genetic engineering,
drug targeting etc. - How
- Close genomes phylogenetic footprinting.
Regulatory sites are seen as conservation
islands in alignments of gene upstream regions - Distant genomes consistency filtering.
Candidate sites in one genome may be unreliable,
but independent occurrence upstream of
orthologous genes in many genomes yields reliable
predictions - Caveats
- Presense of (predicted) binding sites does not
immediately imply functional regulation - Operon structure
- Need to verify presence of orthologous
transcription factors in the studied genomes - Orthologous factors may have different binding
motifs - One functional system may be regulated by
different factors within and between genomes - Many genomes
- Taxon-specific regulation
- Evolution
- individual sites
- transcription-fator families
- transcription factors and their binding motifs
3How it works Two simple examples
- Biotin regulator of alpha-proteobacteria
- Universal regulator of ribonucleotide reductases
reconstruction of the regulatory system and the
mechanism of regulation
4BirA (biotin regulator in eubacteria and
archaea) conserved signal, changed spacing
5BirA (biotin regulator in eubacteria and
archaea) conserved signal, changed spacing
BirA of alpha-proteobacteria no DNA-binding
domain
6Identification of the candidate regulator (BioR)
in alpha-proteobacteria
TTATAGATAA TTATCTATAA TTATAGATAg TTATCTATAA TTATCT
ATAA TTATAGATAg TTATCTATAA TcATATATtA TcATAGATAg T
TATCTATAA TTATCTATAA TTATCTATtA TTATCTAcAA TTATCTA
TAA TTATCTATAA TTATCTATAA TcATAGATtA cTATAGATAA TT
ATCTAcAA
- Candidate binding sites similar palindromes
upstream of biotin biosynthesis and transport
genes in different genomes
7-
- Positional clustering candidate transcription
factor from the GntR family is often found in the
same loci (black arrows) - Phyletic patterns phyletic distribution of
candidate sites (red cirsles) exactly coincides
with the phyletic distribution of the candidate
regulator - Autoregulation in many cases there are candidate
sites upstream of the bioR gene itself
8Conserved signal upstream of nrd genes
9Identification of the candidate regulator by the
analysis of phyletic patterns
- COG1327 the only COG with exactly the same
phylogenetic pattern as the signal - large scale on the level of major taxa
- small scale within major taxa
- absent in small parasites among alpha- and
gamma-proteobacteria - absent in Desulfovibrio spp. among
delta-proteobacteria - absent in Nostoc sp. among cyanobacteria
- absent in Oenococcus and Leuconostoc among
Firmicutes - present only in Treponema denticola among four
spirochetes
10COG1327 Predicted transcriptional regulator,
consists of a Zn-ribbon and ATP-cone domains
regulator of the riboflavin pathway?
11Additional evidence 1
- nrdR is sometimes clustered with nrd genes or
with replication genes dnaB, dnaI, polA
12Additional evidence 2
- In some genomes, candidate NrdR-binding sites are
found upstream of other replication-related genes - dNTP salvage
- topoisomerase I, replication initiator dnaA,
chromosome partitioning, DNA helicase II
13Multiple sites (nrd genes) FNR, DnaA, NrdR
14Mode of regulation
- Repressor (overlaps with promoters)
- Co-operative binding
- most sites occur in tandem (gt 90 cases)
- the distance between the copies (centers of
palindromes) equals an integer number of DNA
turns - mainly (94) 30-33 bp, in 84 31-32 bp 3 turns
- 21 bp (2 turns) in Vibrio spp.
- 41-42 bp (4 turns) in some Firmicutes
- experimental confirmation in Streptomyces
(Borovok et al., 2004)
15Evolutionary processes that shape regulatory
systems
- Expansion and contraction of regulons
- Duplications of regulators with or without
regulated loci - Loss of regulators with or without regulated loci
- Re-assortment of regulators and structural genes
- especially in complex systems
- Horizontal transfer
16Loss of regulators, and cryptic sites
Loss of the RbsR in Y. pestis (ABC-transporter
also is lost)
RbsR binding site
Start codon of rbsD
17Regulon expansion how FruR has become CRA
Mannose
Glucose
ptsHI-crr
manXYZ
edd
epd
eda
adhE
aceEF
icdA
ppsA
pykF
mtlD
mtlA
Mannitol
pckA
gpmA
pgk
gapA
fbp
pfkA
aceA
tpiA
fruK
fruBA
Fructose
aceB
Gamma-proteobacteria
18Common ancestor of Enterobacteriales
Mannose
Glucose
ptsHI-crr
manXYZ
edd
epd
eda
adhE
aceEF
icdA
ppsA
pykF
mtlD
mtlA
Mannitol
pckA
gpmA
pgk
gapA
fbp
pfkA
aceA
tpiA
fruK
fruBA
Fructose
aceB
Gamma-proteobacteria Enterobacteriales
19Common ancestor of Escherichia and Salmonella
Mannose
Glucose
ptsHI-crr
manXYZ
edd
epd
eda
adhE
aceEF
icdA
ppsA
pykF
mtlD
mtlA
Mannitol
pckA
gpmA
pgk
gapA
fbp
pfkA
aceA
tpiA
fruK
fruBA
Fructose
aceB
Gamma-proteobacteria Enterobacteriales E. coli
and Salmonella spp.
20Trehalose/maltose catabolism, alpha-proteobacteria
Duplicated LacI-family regulators
lineage-specific post-duplication loss
21The binding signals are very similar (the blue
branch is somewhat different to avoid
cross-recognition?)
22Utilization of an unknown galactoside,
gamma-proteobacteria
Yersinia and Klebsiella two regulons, GalR (not
shown, includes genes galK and galT) and Laci-X
Erwinia one regulon, GalR
Loss of regulator and merger of regulons It
seems that laci-X was present in the common
ancestor (Klebsiella is an outgroup)
23Utilization of maltose/maltodextrin, Firmicutes
Two different ABC transporters (shades of
red) PTS (pink) Glucoside hydrolases (shades of
green) Two regulators (black and grey)
24Modularity of the functional subsystem
Two different ABC systems Three hydrolases in one
operon (E. faecalis) or separately
25Changes of regulation
Displacement invasion of a regulator from a
different subfamily (horizontal transfer from a
related species?) blue sites
26Orthologous TFs with completely different
regulons (alpha-proteobaceria and Xanthomonadales)
27Catabolism of gluconate, proteobacteria
28extreme variability of regulation of marginal
regulon members
ß
?
Pseudomonas spp.
29Combined regulatory network for iron homeostasis
genes in a-proteobacteria
Fe
Fe
- Fe
Fe
-
FeS status
of cell
FeS
- Fe
Fe
The connecting line denote regulatory
interactions, which the thickness reflecting the
frequency of the interaction in the analyzed
genomes. The suggested negative or positive mode
of operation is shown by dead-end and arrow-end
of the line.
30 Distribution of Irr, Fur/Mur, MntR, RirA,
and IscR regulons in a-proteobacteria
?' in RirA column denotes the absence of the
rirA gene in an unfinished genomic sequence and
the presence of candidate RirA-binding sites
upstream of the iron uptake genes.
31Phylogenetic tree of the Fur family of
transcription factors in a-proteobacteria - I
Fur in g- and b- proteobacteria
Fur in e- proteobacteria
Fur in Firmicutes
in a-proteobacteria
Regulator of manganese uptake genes (sit, mntH)
in a-proteobacteria
Regulator of iron uptake and metabolism genes
a-proteobacteria
32Erythrobacter litoralis
Caulobacter crescentus
Novosphingobium aromaticivorans
Zymomonas mobilis
Sequence logos for the identified Fur-binding
sites in the D group of a-proteobacteria
Sphinopyxis alaskensis
Oceanicaulis alexandrii
Rhodospirillum rubrum
Gluconobacter oxydans
Magnetospirillum magneticum
Parvularcula bermudensis -
Identified Mur-binding sites
Bacillus subtilis
The A, B, and C groups
Sequence logos for the known Fur-binding sites
in Escherichia coli and Bacillus subtilis
Mur
a
of - proteobacteria -
Escherichia coli
33Phylogenetic tree of the Fur family of
transcription factors in a-proteobacteria - II
Fur in g- and b- proteobacteria
Fur in e- proteobacteria
Fur in Firmicutes
a-proteobacteria
Irr in a-proteo- bacteria regulator of
iron homeostasis
34Sequence logos for the identified Irr binding
sites in a-proteobacteria
(8 species) - Irr
The A group
The B group
(4 species) - Irr
The C group (12 species) - Irr
35Phylogenetic tree of the Rrf2 family of
transcription factors in a-proteobacteria
Nitrite/NO-sensing regulator NsrR (Nitrosomonas
europeae, Escherichia coli)
Positional clustering of rrf2-like genes
with iron uptake and storage genes Fe-S cluster
synthesis operons genes involved in nitrosative
stress protection sulfate uptake/assimilation
genes thioredoxin reductase carboxymuconolactone
decarboxylase-family genes hmc cytochrome
operon
Iron repressor RirA (Rhizobium leguminosarum)
Cysteine metabolism repressor CymR (Bacillus
subtilis)
Cytochrome complex regulator Rrf2 (Desulfovibrio
vulgaris)
Iron-Sulfur cluster synthesis repressor
IscR (Escherichia coli)
proteins with the conserved C-X(6-9)-C(4-6)-C
motif within effector-responsive domain proteins
without a cysteine triad motif
36Sequence logos for the identified RirA-binding
sites in a-proteobacteria
The A group - RirA
(8 species)
(12 species)
The C group - RirA
37Distribution of the conserved members of the Fe-
and Mn-responsive regulons and the predicted
RirA, Fur/Mur, Irr, and DtxR binding sites in
a-proteobacteria
Genes Functions Iron uptake Iron storage FeS
synthesis
Iron usage Heme biosynthesis Regulatory
genes Manganese uptake
38An attempt to reconstruct the history
39Regulators and their signals
- Subtle changes at close evolutionary distances
- Cases of motif conservation at surprisingly large
distances - Correlation between contacting nucleotides and
amino acid residues
40DNA signals and protein-DNA interactions
Entropy at aligned sites and the number of
contacts (heavy atoms in a base pair at a
distance ltcutoff from a protein atom)
CRP
PurR
IHF
TrpR
41Specificity-determining positions in the LacI
family
- Training set 459 sequences,
- average length 338 amino acids,
- 85 specificity groups
44 SDPs
10 residues contact NPF (analog of the effector)
7 residues in the effector contact zone
(5?ltdminlt10?)
6 residues in the intersubunit contacts
5 residues in the intersubunit contact zone
(5?ltdminlt10?)
7 residues contact the operator sequence
6 residues in the operator contact zone
(5?ltdminlt10?)
LacI from E.coli
42The LacI family subtle changes in signals at
close distances
G
n
A
CG
Gn
GC
43CRP/FNR family of regulators
44Correlation between contacting nucleotides and
amino acid residues
- CooA in Desulfovibrio spp.
- CRP in Gamma-proteobacteria
- HcpR in Desulfovibrio spp.
- FNR in Gamma-proteobacteria
Contacting residues REnnnR TG 1st arginine GA
glutamate and 2nd arginine
DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVR DV COOA
ELTMEQLAGLVGTTRQTASTLLNDMIR EC CRP
KITRQEIGQIVGCSRETVGRILKMLED YP CRP
KXTRQEIGQIVGCSRETVGRILKMLED VC CRP
KITRQEIGQIVGCSRETVGRILKMLEE DD HCPR
DVSKSLLAGVLGTARETLSRALAKLVE DV HCPR
DVTKGLLAGLLGTARETLSRCLSRMVE EC FNR
TMTRGDIGNYLGLTVETISRLLGRFQK YP FNR
TMTRGDIGNYLGLTVETISRLLGRFQK VC FNR
TMTRGDIGNYLGLTVETISRLLGRFQK
TGTCGGCnnGCCGACA
TTGTGAnnnnnnTCACAA
TTGTgAnnnnnnTcACAA
TTGATnnnnATCAA
45The correlation holds for other factors in the
family
46Open problems
- Model the evolution of regulatory systems (a
catalog of elementary events, estimates of
probabilities) - Birth of a binding site what are the mechanisms?
- Loss of a binding site
- Duplication of a regulated gene and/or a
regulator - Horizontal transfer of a regulated gene and/or a
regulator - Loss of structural a gene and/or a regulator
- General properties?
- Distribution of TF family and regulon sizes
- Stable cores and flexible margins of functional
systems (in terms of gene presence and
regulation) - Co-evolution of TFs and DNA sites
- Neutral model for the evolution of binding
sites (with invariant functional pressure from
the bound protein) - How do the signals evolve? What is the driving
force changes in TFs? - TF-family, position-specific protein-DNA
recognition code? - All that needs to take into account the
incompleteness and noise in the data
47Acknowledgements
- Andrei A. Mironov (algorithms and software)
- Alexandra B. Rakhmaninova (SDPs)
- Dmitry Rodionov (now at Burnham Institute) (BioR,
NrdR, iron) - Olga Laikova (LacI, sugars)
- Dmitry Ravcheev (FruR)
- Olga Kalinina (SDPs/LacI)
- Leonid Mirny, MIT (protein/DNA contacts, SDPs)
- Andy Johnston, University of East Anglia (iron)
- Howard Hughes Medical Institute
- Russian Fund of Basic Research
- Russian Academy of Sciences, program Molecular
and Cellular Biology - INTAS