Title: A Computational Method to Increase Gene Expression
1A Computational Method to Increase Gene Expression
- Harlan Robins
- Computational Biology
- FHCRC
2HIV genes are poorly expressed
Cullen BR, Trends Biochem Sci, 2003 Aug28(8).
3Problem HIV vaccine is likely less effective
with poor protein expression
- Immune response correlates positively with
antigen levels - For DNA vaccines, the antigen is expressed
protein
4Solution increase protein expression
- Traditional method is codon optimization
- The idea is to recode an ORF with codons
corresponding to the highest abundance tRNAs - This method is designed to optimize translation
- Codon usage has some success in HIV
- Gag, Pol, Env expression can increase 500Xs
- For HIV a new method is needed
- The poor expression of HIV genes is caused by
nuclear isolation, not translation - Gain from codon usage is indirect effect
5What else is playing a role besides codon usage?
- Short motifs are good starting point
- Many RNA binding proteins
- Suggested more proteins bind RNA than DNA
- Statistically meaningful for real genome lengths
6How do we Find motifs in coding regions?
- Two Sequences
- Compare the two sequence search for motifs that
are over- or under-represented in one sequence
compared to the other
- Single Sequence with constraints
- Create a second sequence create the sequence
which is maximally random given the constraints
this is the maximal entropy sequence - Apply same procedure as for two sequences
7Form a Distribution (from a sequence)
- We need to form a probability distribution from a
sequence. - For length n, we have i1,,4n words. P(si)
the fraction of length n words with sequence si.
Example if our sequence is 40 bases
AGACTAATTGCGTAGCATAATCATGCATGTCGATGCGATT
P(GCAT) 2/(40-3) .054
8Back to finding Motifs
- First goal start with
- a probability distribution (that we get from a
sequence) - a set of constraints on that sequence (i.e.
sequence must code for a particular protein). - Produce the most random distribution possible
given the constraints. - This will be our first approximation for a
background genome.
9Coding Sequence motifs
- Constraints
- Preserve amino acid order
- Preserve codon usage in each gene
- Get Maximum Entropy Distribution from sequence by
randomly permuting the codons for each amino acid
within each gene. - For example, imagine a short gene with the
following AA and coding sequences
M L1 L2 H1 L3 H2
L4 H3 ST ATG CTA CTG CAT TTA CAT
CTG CTT TAG
Randomly permute L1,L2,L3,L4 and H1,H2,H3,
extracting the fraction of each word of length
n. The MED is the average of many runs.
10Now that we have the two distributions we need a
strategy to find motifs
- We want motifs that are over- or
under-represented in the real distribution as
compared to the MED. - Choose the word (motif) that contributes the most
number of bits to the entropy difference between
the two distributions. - reminder, for this talk we are only
considering non-degenerate motifs - For each motif si with P(si), split the
distribution into 2 parts P(si) and 1-P(si). - Find the word si which has maximum Relative
Entropy.
11A problem arises How do we find the next motif?
Example CTAG is strongly selected against (as is
true in many bacteria because it is a restriction
site).
12Solution (simplified version) rescale MED to
remove contribution of word si from DKL
For all i imax
This rescaling is proven to monotonically
decrease DKL.
13List of motifs for E. Coli
- or means over- or under-represented
- Many motifs are known restriction sites
14 Gag expression in transiently transfected 293
cells
- The table presents the results from 4 independent
transfection experiments - Two DNA dose (0.5 and 1 ?g) were used for
transfection of 293 cells - Culture media were collected at 48 and 72 hours
post transfection - Gag expression was quantified by P24 measurement
- Results are expressed as the mean value (ng/ml,
?SD) of triplicates
15Immunogenicity of Gag DNA vaccines in mouse
(b-cell or antibody)
Anti-p24 Ab response measured by ELISA
Robins Gag
16Immunogenicity of Gag DNA vaccines in mouse
(t-cell or cellular)
P24-specific CD4 and CD8 cellular immune
responses measured by IFN-? ELISpot
Robins Gag
17Experimental program by Greg Spies with Julie
McElrath, Steve Self, and Larry Corey
- Optimize VRC version of Gag incorporating full
set of motifs - Create Adenovirus vector with optimized Gag and
GFP tag - Test vaccine potency
- In vitro
- Mouse model
- Move to primates