Title: Lecture 6: Gene Prediction
1Lecture 6 Gene Prediction
- Chapter 6
- Part 1 Prokaryotic gene organization and gene
prediction
2Review of Molecular Genetics
3- Promoter
- -10 and 35 regions from TSS where sigma factor
recognizes a promoter - Operator (where regulator binds) found between
TSS and SD - Transcriptional start site (TSS) occurs at the
tip of the yellow arrow. - Shine Dalgarno sequence
- Start codon and stop codon and concept of ORF
- UTR of the transcript.
4- Which strand?
- Template strand
- Coding (Sense) Strand
- Recognizing theoretical ORFs
- Start and Stop in same frame
5Operon Model
- Operon
- Polycistronic mRNA unique to prokaryotes
- EX) Lac operon beta galactosidase, lactose
permease, and lactose transacetylase all under
control of same promoter?regulated together via
pLacI repressor - Weak versus strong promoters
6Review of Terms
- Template strand
- Coding strand (sense strand)
- Software typically looks for ORFs on coding
strand in 5 to 3 direction 5ATGTGA3 - (TAA)
- (TAG)
7Terminology
- ORF
- A series of DNA codons, including a 5 initiation
codon and a termination codon, that encodes a
putative or known gene. - Exons
- Portions of the ORF that are transcribed and when
combined form the coding sequence (CDS) for the
gene - Introns
- Portions of the ORF that are transcribed and are
spliced out of the mRNA before translation. - Untranslated regions (UTRs)
- Non-coding regions that are transcribed and flank
the ORF (for DNA) and CDS (for mRNA) - 5 end (relative to mRNA) UTRs (leader,
regulatory sites) - 3 end UTRs (terminator sites, trailer)
- cDNA (Complementary DNA)
- Get cDNA from reverse transcription of CDS
- CDS
- How to get it in the laboratory.
- How to get it on paper.
8Eukaryotic Gene
DNA template strand (nucleus)
promoter
Exon
intron
Exon
intron
Exon
Downstream
Upstream
transcription
Trc start site
terminator
Pre-mRNA (nucleus)
RNA processing
leader
trailer
3
AAA
mRNA
G
P
P
P
5 cap
Start codon
Stop codon
9Another Example
103
AAAAAA
5
mRNA
Add an oligo(dT) primer compliments poly A tail
3
5
AAAAAA
5
TTTTTT
3
Bottom strand synthesized by reverse
transcriptase (DNA)
3
mRNA
5
AAAAAA
5
3
TTTTTT
DNA
Ribonuclease H degrades RNA.
5
3
DNA
Second strand of DNA synthesized by DNA
Polymerase I
cDNA
DNA
3
5
5
3
DNA
11Finding Genes in Prokaryotes
12Prediction Strategies for Prokaryotes
- Start and stop codons
- 83 of E. coli start codons are AUG in mRNA (UUG
and GUG occur less often) - Start in DNA coding strand (5?3)?
- Stop in DNA?
- Size
- Stop codons occur randomly every 21 codons in
noncoding DNA - If you have a run of greater than 30 sense codons
then you may have a coding region (an ORF is
possible) - Average length of coding region is 317 codons
long less than 1.8 of all genes are shorter
than 60 codons - -35 and 10 recognition for Sigma Factors
- Transcriptional termination signals
- Inverted repeats followed by a run of uracils
found at 3 end - Forms a stem loop structure, which signifies
termination
13Prediction Strategies for Prokaryotes
- Comparison to a database of known sequences
- Look for homology if it shows homology then it
cannot be junk - Problem the database is incomplete
- Lack of homology doesnt mean it isnt a true
gene - Shine Dalgarno recognition
- AGGAGGU
- Upstream from start codon downstream from trc
start site - Regulatory sites found upstream from trc start
site
14Where does the Reading Frame Start?
- Use a 6 frame search b/c we dont know
- 3 reading frames on each strand
- DS DNA
- 5tacgtactcaacaatcatgagctggccattttaa3
- 3atgcatgagttgttagtactcgaccggtaaaatt5
- Search for atg in 5 to 3 direction, which will
represent your start codon - Top strand
- 5tacgtactcaacaatcatgagctggccattttaa3
- Bottom Strand
- 5ttaaaatggccagctcatgattgttgagtacgta3
15Top strand 5tacgtactcaacaatcatgagctggccattttaa3
Bottom Strand 5ttaaaatggccagctcatgattgttgagtacgta
3 In reality there needs to be enough codons
between the start and stop to represent a real
ORF. How many? At least 30 codons (90
bases). Average 317 codons (951) Less than 2 are
less than 60 codons (180 bases) ORFinder will
designate frames. It assumes the single strand
you put into the program is the coding (sense)
strand in the 5to3 direction and it calls this
the plus strand. It figures the opposite strand
and calls this minus. 1, 2, 3 -1, -2, -3
16Question to Ponder!
- What if you find an ORF in a prokaryote with
several supporting criteria, but you dont find
the promoter region close-by upstream?
17Gene Prediction Software
- ORF Finder
- http//www.ncbi.nlm.nih.gov/gorf/gorf.html
- Finds all start and stop codons
- Sorts by size
- Links for easy BLAST
- Useful for Prokaryote ORF finding
- Not very useful for Eukaryote DNA
- Prok Practice together
18Practice
- Download the prok practice sequence from the
course web page. - Copy the sequence and open ORF Finder.
- Past the sequence into ORF Finder and run.
- Look at Blast output of each possible ORF.
- Look at sizes of putative ORFs