Title: Biophysics and Bioinformatics of Transcription Regulation
1Biophysics and Bioinformatics of Transcription
Regulation
Marko Djordjevic Dept. of Physics, Pupin Labs.,
Columbia U.
2- Outline
- Part I Biophysics approach to transcription
factor binding site discovery. - Part II Quantitative analysis of bacteriophage
gene expression strategies.
3PART ITranscription Factor binding site
identification
- Introduction to transcription regulation
- Model for T.F.-DNA interaction
- Biophysics based algorithm
- Application to E. Coli T.F. binding sites
- Comparison with information theory algorithm
- Conclusion for Part I
4Control of Gene Expression by Transcription
Factors lac Operon
lacZ
OFF
OFF
OFF
ON
Alberts et al, Molecular Biology of the cell.
5Examples of CAP factor binding sites
attcgtgatagctgtcgtaaagttttgttacctgcctctaacttaagt
gtgacgccgtgcaaataatgccgtgattatagacacttttatttgcga
tgcgtcgcgcattttaatgagattcagatcacatat taatgtgacgtc
ctttgcatacgaaggcgacctgggtcatgctgaggtgttaaattgatc
acgttt
similar words
From experiment, we know some, but NOT all
binding sites for a given transcription factor.
Can we predict ALL of them?
6Biophysical model of T.F.-DNA interactions
Probability of DNA segment S to bind a protein c
with chemical potential m kBT ln ( c/K )
Note p(S) is Fermy function Saturation of
binding.
To parameterize binding energy, we use
independent nucleotide approximation.
A.Sengupta, M.Djordjevic and B.I.Shraiman, PNAS
2002
7What we want to do?
(M.Djordjevic, A. M. Sengupta and B. I. Shraiman,
Gen Res. 2003)
8Problem
Mix protein with set containing all genomic
sequences of length L. Than extract and sequence
some of the DNA sequences bound by the factor.
Given set of extracted sequences Sk, determine
energy matrix and chemical potential.
Solution
Maximize the likelihood ? of having all Sk
bound (at chemical potential m ) and extracted,
and none of the other sequences.
and
9Quadratic Programming (QP) Algorithm
- In T?0 (all-or-none) approximation to p(S)
- all examples Sk bound by T.F.
- b) number of bound random Ss is minimized
10Application to transcription factorbinding site
identification in E. coli
Start with known sites for 50 T.F. (and RNA
polymerase) in DPInteract database (Church lab)
Use QP algorithm to define e, m for each factor
Identify all (intergenic) DNA segments S
satisfying Ee(S) lt m for each e, m set
11Empirical distribution of estimated E
m
Background
m
12Sample results
for pleiotropic factors
13RNAP (RpoD) site statistics
Note 50 false negatives for promoter
prediction, can be reduced by
combining with activator-TF search
14Information-theoretic weight matrix
Natural threshold does not exist
This is not correct binding probability.
Saturation effects are not properly described.
15False negative/positive trade-off curve
(based on comparison with RegulonDB)
m
16Prediction of binding site modality
e.g. CAP a dual function transcription factor
Based on position of predicted CAP sites relative
to predicted promoters (i.e.RNAP sites)
17Part I Conclusion
Thinking physically about protein-DNA
recognition lead to a new and improved
bio-informatic algorithm
The algorithm is designed to correctly estimate
eia , m given data on protein binding to oligos,
under controlled conditions (i.e. a
bio-physical experiment).
- For bio-informatic data the algorithm provides
- Rational choice of a binding threshold m
- 2) Minimization of expected FALSE POSITIVES.
Note Information-theoretic weight matrix
approach does not estimate the
threshold score for binding.
18Acknowledgements
Anirvan M. Sengupta (Dept. of Physics, Rutgers
U.) Boris I. Shraiman (KITP, UCSB)
19Xp10 bacteriophage gene expression strategy Marko
Djordjevic Columbia U., Department of Physics,
Pupin Labs
20Overview
- Introduction to bacteriophage biology
- Motivation
- Experimental setup
- Quantitative data analysis
- Bioinformatic analysis
- Kinetic modeling
- Conclusion and more general context
21Bacteriophages - bacterial viruses
22Bacterium Xanthomonas oryzae causes bacterial
leaf blight, a serious disease of rice.
Lytic bacteriophage Xp10 infects Xanthomonas
oryzae.
From Yuzenkova et al., J. Mol.Biol. (2003) 330,
735-748
Photo from Mueller, K.E. 1983. Field Problems of
Tropical Rice
23Xp10 genome
- p7 functions
- Inhibits transcription from most host RNAP
promoters - Acts as anti-termination protein
Xp10 genome organization and anti-termination
mechanism reminds to ? phage, which uses only
host RNAP
However, this view leaves no role for Xp10 RNAP!
24Motivation
25Scheme of an experiment
Rifampicin (drug that inhibits bacterial
RNAP) 20-minute incubation
Quantitatively measured transcript abundances.
26Quantitative data analysis
(E. Semenova, M.Djordjevic, B. Shraiman and K.
Severinov, in press, Mol. Microbiol.)
27Transcript kinetic analysis when bacterial RNAP
is inhibited
transcripts abundance (t 20 min Rifampicin) -
transcripts abundance (t)
?(t)
Xp10 RNAP has to transcribe R genes.
L genes transcribed exclusively by bacterial
RNAP.
Determine half lives of L transcripts.
28Bioinformatic search for Xp10 promoters
Promoters predicted by MLSA and QPMEME algorithms
(QPMEME M. Djordjevic, A. Sengupta and B.
Shraiman, Gen. Res. 13 (2003))
Experimental verification
29What are contributions of two RNA polymerases to
R gene transcription activity
Measured transcript abundances
(M.Djordjevic, E. Semenova, B.I.Shraiman and K.
Severinov, in preparation)
56L
5R
- Kinetic model assumptions
- Anti-termination efficiency given by
- n0 and n are unknown constants, different for
two RNA polymerases - Proteins are stabile on the time scale of
infection
30Estimating transcription activities
Transcription activity from transcript abundance
Early R genes
Late R genes
31Modeling results
Transcription of early R genes
Transcription of L genes
Transcription of late R genes
32Is transcription by both RNA polymerases
necessary for phage viability ?
Experiment If Rif is added at 15 min, progeny
amount reduces by 70.
Our prediction from estimated transcription
activities
Transcription of R-genes by both host and
phage RNAP is necessary for phage viability!
33What we have learned about Xp10?
- We identified promoters recognized by Xp10 RNAP.
This was a non-trivial problem! - The joint transcription of the same set of genes
by two types of RNA polymerases is unprecedented
for a bacteriophage (but it occurs in
chloroplasts)! - Our results strongly suggest that transcription
of R genes by both RNA polymerases is necessary
for phage viability.
34On more general level
- We argue that micro-array experiment, coupled to
quantitative data analysis, bioinformatics and
kinetic modeling, presents an efficient way to
analyze phage transcription strategy. - Introduce quantitative methods of data analysis,
that may be used to study gene expression
strategies of novel viruses.
35Acknowledgements
Ekaterina Semenova (Waksman Institute, Rutgers
U.) Boris Shraiman (KITP, UCSB) Konstantin
Severinov (Waksman Institute, Dept. of Molecular
Biology and Biochemistry, Rutgers U.)