Title: Coevolution of DNAbinding proteins and their binding sites
1Co-evolution of DNA-binding proteins and their
binding sites
Y. D. Korostelev, O. N. Laikova, A. A. Mironov,
M. S. Gelfand, A. B. Rakhmaninova
1
2A problem
DNA-binding domain
?
?
Binding site?
Binding site
CAGACCTATTGACGCTAACTACCTCTGAAC
2
3A common approach studying spatial structures
of DNAprotein complexes
Preferences (Suzuki et al. 1985, Thornton et al.
2001)?
Arg gt G Asn gt A Gln gt A Glu gt C His gt
G Asn gt A Lys gt T
Arg gt GpG Lys gt ApG Asngt CpA Asp gt ApT
But! There are much more sequences of
DNA-binding proteins and their binding sites
than X-ray structures
3
4Our approach studying correlated substitutions
in sequences
Protein alignment
DNA alignment
...LEK R NF...
...AAGC G G...
...VEK R NF...
...AAGC G G...
...LDS A NF...
...ATGC T T...
...LDA A NG...
...AAGC T G...
...IDY K NF...
...AAGC A G...
Correlated substitutions
in a pair of positions
4
5The main idea of the algorithm
Binding sites alignment
Proteins alignment
LAFDHDQILQMAQERLQGKVRYQP-IGFELLPEKFSLRQLQRMYETVLGR
S---LDKRNF
tTAaTGgCTTTAtGcCACTAT
LAFDHNQILDYGYQRLRNKLEYSP-IAFEVLPELFTLNDLFQLYTTVLGE
D--FADYSNF
TTAaaGTAAtAaTTACCATAA
LSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLNDLYQLYTTVLGE
N--FSDYSNFLSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLND
LYQLYTTVLGEN--FSDYSNF
AaAtTGTCTTTAtGcCACTAT
TTATGGTAAATTcTACCATAA
LAFDHSKILAYGHRRLCNKLEYSP-VAFDVLPEYFTLNDLYQFYSTVLGA
N--FSDYSNFLAFDHSKILAYGHRRLCNKLEYSP-VAFDVLPEYFTLND
LYQFYSTVLGAN--FSDYSNFLAFDHSKILAYGHRRLCNKLEYSP-VAF
DVLPEYFTLNDLYQFYSTVLGAN--FSDYSNF
TTATGGTAAATTcTACCATAA
TTATgGTCAgTTTcACcAaAA
TTaGTCgAAATAaccaACtAA
LAFDHNQILDYGYQRLRNKLEYSP-IAFEVLPELFTLNDLFQLYTTVLGE
D--FADYSNF
TTATCGTCAtCtcGACGACAA
LSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLNDLYQLYTTVLGE
N-FSDYSNFLSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLND
LYQLYTTVLGEN--FSDYSNF
TttAGGTAAgTTATACTTTTA
tTAaTGgCTTTAtGcCACTAT
Mutual Information
Z-score
5
6Input
Proteins alignment
Sites alignment
7Defining statistically significantly correlated
pairs
8Output
Heatmap graphic representation of correlated
pairs of positions
Contingency table allows to locate preferences
and coordinated substitutions
9LacI family
A large family of transcription factors 1369
protein sequences 4843 site sequences DNA-bindin
g domain 71 aa Binding site 20 bp,
palindromic
10Only few pairs appeared to be correlated.Total
19 pairs,7 positions in the proteins
alignment and 5 positions in the DNA alignment
11Correlated protein positions on a structure (red
spheres)?
PurR_Ecoli
12Comparison with contacts in three known
structures of complexes (1efa, 1rzr, 1jft)?
1. Classification of contacts (combined contacts
from three structures)?
conserved (identity of amino acid or nucleotide
gt 90)?
Specific (contacts between amino acid side
chain and nucleotide base)?
Other
9
13Comparison with contacts in three known
structures of complexes (1efa, 1rzr, 1jft)?
2.Correlated pairs
Correlations of non contacting residue in the loop
Correlated
conserved (identity of residue or nucleotide gt
90)?
Symmetrical nucleotide has contacts
Other
Specific non correlated
10
14Comparison with experimental data (Lehming et
al)?
20-5
16-7
12
15Correlations are not consequence of phylogenetic
trace
AR_GTA
SR_GCA
16Conclusions
LacI family Results Only few positions are
correlated Correlated pairs of positions
correspond to variable or moderately conserved
columns of alignments Almost all of correlated
pairs correspond to specific protein-DNA
contacts Correlations are not consequence of
phylogenetic trace We believe Correlations
reflect nature physiochemical of interactions
between protein and DNA Online tool for
search of correlations is available at
http//www.bioinf.fbb.msu.ru/Prot-DNA-Korr
17
17Other studied families
NrtR
Poster by Korostelev et al.
CRP-FNR, N4-N6 methyltransferases, C-proteins
Poster by Miteeva, Stepanova, et al.