Biophysics and Bioinformatics of Transcription Regulation - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Biophysics and Bioinformatics of Transcription Regulation

Description:

Part I: Biophysics approach to transcription factor binding site discovery. ... Biophysics based algorithm. Application to E. Coli T.F. binding sites ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 36
Provided by: weizm
Category:

less

Transcript and Presenter's Notes

Title: Biophysics and Bioinformatics of Transcription Regulation


1
Biophysics and Bioinformatics of Transcription
Regulation
Marko Djordjevic Dept. of Physics, Pupin Labs.,
Columbia U.
2
  • Outline
  • Part I Biophysics approach to transcription
    factor binding site discovery.
  • Part II Quantitative analysis of bacteriophage
    gene expression strategies.

3
PART ITranscription Factor binding site
identification
  • Introduction to transcription regulation
  • Model for T.F.-DNA interaction
  • Biophysics based algorithm
  • Application to E. Coli T.F. binding sites
  • Comparison with information theory algorithm
  • Conclusion for Part I

4
Control of Gene Expression by Transcription
Factors lac Operon
lacZ
OFF
OFF
OFF
ON
Alberts et al, Molecular Biology of the cell.
5
Examples of CAP factor binding sites
attcgtgatagctgtcgtaaagttttgttacctgcctctaacttaagt
gtgacgccgtgcaaataatgccgtgattatagacacttttatttgcga
tgcgtcgcgcattttaatgagattcagatcacatat taatgtgacgtc
ctttgcatacgaaggcgacctgggtcatgctgaggtgttaaattgatc
acgttt
similar words
From experiment, we know some, but NOT all
binding sites for a given transcription factor.
Can we predict ALL of them?
6
Biophysical model of T.F.-DNA interactions
Probability of DNA segment S to bind a protein c
with chemical potential m kBT ln ( c/K )
Note p(S) is Fermy function Saturation of
binding.
To parameterize binding energy, we use
independent nucleotide approximation.
A.Sengupta, M.Djordjevic and B.I.Shraiman, PNAS
2002
7
What we want to do?
(M.Djordjevic, A. M. Sengupta and B. I. Shraiman,
Gen Res. 2003)
8
Problem
Mix protein with set containing all genomic
sequences of length L. Than extract and sequence
some of the DNA sequences bound by the factor.
Given set of extracted sequences Sk, determine
energy matrix and chemical potential.
Solution
Maximize the likelihood ? of having all Sk
bound (at chemical potential m ) and extracted,
and none of the other sequences.
and
9
Quadratic Programming (QP) Algorithm
  • In T?0 (all-or-none) approximation to p(S)
  • all examples Sk bound by T.F.
  • b) number of bound random Ss is minimized

10
Application to transcription factorbinding site
identification in E. coli
Start with known sites for 50 T.F. (and RNA
polymerase) in DPInteract database (Church lab)
Use QP algorithm to define e, m for each factor
Identify all (intergenic) DNA segments S
satisfying Ee(S) lt m for each e, m set
11
Empirical distribution of estimated E
m
Background
m
12
Sample results
for pleiotropic factors
13
RNAP (RpoD) site statistics
Note 50 false negatives for promoter
prediction, can be reduced by
combining with activator-TF search
14
Information-theoretic weight matrix
Natural threshold does not exist
This is not correct binding probability.
Saturation effects are not properly described.
15
False negative/positive trade-off curve
(based on comparison with RegulonDB)
m
16
Prediction of binding site modality
e.g. CAP a dual function transcription factor
Based on position of predicted CAP sites relative
to predicted promoters (i.e.RNAP sites)
17
Part I Conclusion
Thinking physically about protein-DNA
recognition lead to a new and improved
bio-informatic algorithm
The algorithm is designed to correctly estimate
eia , m given data on protein binding to oligos,
under controlled conditions (i.e. a
bio-physical experiment).
  • For bio-informatic data the algorithm provides
  • Rational choice of a binding threshold m
  • 2) Minimization of expected FALSE POSITIVES.

Note Information-theoretic weight matrix
approach does not estimate the
threshold score for binding.
18
Acknowledgements
Anirvan M. Sengupta (Dept. of Physics, Rutgers
U.) Boris I. Shraiman (KITP, UCSB)
19
Xp10 bacteriophage gene expression strategy Marko
Djordjevic Columbia U., Department of Physics,
Pupin Labs
20
Overview
  • Introduction to bacteriophage biology
  • Motivation
  • Experimental setup
  • Quantitative data analysis
  • Bioinformatic analysis
  • Kinetic modeling
  • Conclusion and more general context

21
Bacteriophages - bacterial viruses
22
Bacterium Xanthomonas oryzae causes bacterial
leaf blight, a serious disease of rice.
Lytic bacteriophage Xp10 infects Xanthomonas
oryzae.
From Yuzenkova et al., J. Mol.Biol. (2003) 330,
735-748
Photo from Mueller, K.E. 1983. Field Problems of
Tropical Rice
23
Xp10 genome
  • p7 functions
  • Inhibits transcription from most host RNAP
    promoters
  • Acts as anti-termination protein

Xp10 genome organization and anti-termination
mechanism reminds to ? phage, which uses only
host RNAP
However, this view leaves no role for Xp10 RNAP!
24
Motivation
25
Scheme of an experiment
Rifampicin (drug that inhibits bacterial
RNAP) 20-minute incubation
Quantitatively measured transcript abundances.
26
Quantitative data analysis
(E. Semenova, M.Djordjevic, B. Shraiman and K.
Severinov, in press, Mol. Microbiol.)
27
Transcript kinetic analysis when bacterial RNAP
is inhibited
transcripts abundance (t 20 min Rifampicin) -
transcripts abundance (t)
?(t)
Xp10 RNAP has to transcribe R genes.
L genes transcribed exclusively by bacterial
RNAP.
Determine half lives of L transcripts.
28
Bioinformatic search for Xp10 promoters
Promoters predicted by MLSA and QPMEME algorithms
(QPMEME M. Djordjevic, A. Sengupta and B.
Shraiman, Gen. Res. 13 (2003))
Experimental verification
29
What are contributions of two RNA polymerases to
R gene transcription activity
Measured transcript abundances
(M.Djordjevic, E. Semenova, B.I.Shraiman and K.
Severinov, in preparation)
56L
5R
  • Kinetic model assumptions
  • Anti-termination efficiency given by
  • n0 and n are unknown constants, different for
    two RNA polymerases
  • Proteins are stabile on the time scale of
    infection


30
Estimating transcription activities
Transcription activity from transcript abundance
Early R genes
Late R genes
31
Modeling results
Transcription of early R genes
Transcription of L genes
Transcription of late R genes
32
Is transcription by both RNA polymerases
necessary for phage viability ?
Experiment If Rif is added at 15 min, progeny
amount reduces by 70.
Our prediction from estimated transcription
activities
Transcription of R-genes by both host and
phage RNAP is necessary for phage viability!
33
What we have learned about Xp10?
  • We identified promoters recognized by Xp10 RNAP.
    This was a non-trivial problem!
  • The joint transcription of the same set of genes
    by two types of RNA polymerases is unprecedented
    for a bacteriophage (but it occurs in
    chloroplasts)!
  • Our results strongly suggest that transcription
    of R genes by both RNA polymerases is necessary
    for phage viability.

34
On more general level
  • We argue that micro-array experiment, coupled to
    quantitative data analysis, bioinformatics and
    kinetic modeling, presents an efficient way to
    analyze phage transcription strategy.
  • Introduce quantitative methods of data analysis,
    that may be used to study gene expression
    strategies of novel viruses.

35
Acknowledgements
Ekaterina Semenova (Waksman Institute, Rutgers
U.) Boris Shraiman (KITP, UCSB) Konstantin
Severinov (Waksman Institute, Dept. of Molecular
Biology and Biochemistry, Rutgers U.)
Write a Comment
User Comments (0)
About PowerShow.com