BCB 444/544 - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

BCB 444/544

Description:

Chp 8 - pp 97 - 112. Wed Oct 24 - Lecture 27 (will not be ... 1.6 kcal/mole. Why 1.2 vs 1.6? Basepair. What gives here? C Staben 2005. Energy minimization: ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 42
Provided by: publicI
Category:
Tags: bcb | mole

less

Transcript and Presenter's Notes

Title: BCB 444/544


1
BCB 444/544
  • Lecture 26
  • Gene Prediction
  • 26_Oct22

2
Required Reading (before lecture)
  • Mon Oct 22 - Lecture 26
  • Gene Prediction
  • Chp 8 - pp 97 - 112
  • Wed Oct 24 - Lecture 27 (will not be covered
    on Exam 2)
  • Regulatory Element Prediction
  • Chp 9 - pp 113 - 126
  • Thurs Oct 25 - Review Session Project Planning
  • Fri Oct 26 - EXAM 2

3
Assignments Announcements
  • Sun Oct 21 - Study Guide for Exam 2 was posted
  • Mon Oct 22 - HW4 Due
  • (no "correct" answer to post)
  • Thu Oct 25 - Lab Optional Review Session for
    Exam
  • 544 Project Planning/Consult with DD MT
  • Fri Oct 26 - Exam 2 - Will cover
  • Lectures 13-26 (thru Mon Sept 17)
  • Labs 5-8
  • HW 3 4
  • All assigned reading
  • Chps 6 (beginning with HMMs), 7-8, 12-16
  • Eddy What is an HMM
  • Ginalski Practical Lessons

4
BCB 544 "Team" Projects
  • 544 Extra HW2 is next step in Team Projects
  • Write 1 page outline
  • Schedule meeting with Michael Drena to discuss
    topic
  • Read a few papers
  • Write a more detailed plan
  • You may work alone if you prefer
  • Last week of classes will be devoted to Projects
  • Written reports due Mon Dec 3 (no class that
    day)
  • Oral presentations (15-20') will be Wed-Fri Dec
    5,6,7
  • 1 or 2 teams will present during each class
    period
  • See Guidelines for Projects posted online

5
BCB 544 Only New Homework Assignment
  • 544 Extra2 (posted online Thurs?)
  • No - sorry! sent by email on Sat
  • Due PART 1 - ASAP
  • PART 2 - Fri Nov 2 by 5 PM
  • Part 1 - Brief outline of Project, email to Drena
    Michael
  • after response/approval, then
  • Part 2 - More detailed outline of project
  • Read a few papers and summarize status of
    problem
  • Schedule meeting with Drena Michael to
    discuss ideas

6
Seminars this Week
  • BCB List of URLs for Seminars related to
    Bioinformatics
  • http//www.bcb.iastate.edu/seminars/index.html
  • Oct 25 Thur - BBMB Seminar 410 in 1414 MBB
  • Dave Segal UC Davis Zinc Finger Protein Design
  • Oct 19 Fri - BCB Faculty Seminar 210 in 102 ScI
  • Guang Song ComS, ISU Probing functional
    mechanisms by structure-based modeling and
    simulations

7
Chp 16 - RNA Structure Prediction
  • SECTION V STRUCTURAL BIOINFORMATICS
  • Xiong Chp 16 RNA Structure Prediction
    (Terribilini)
  • RNA Function
  • Types of RNA Structures
  • RNA Secondary Structure Prediction Methods
  • Ab Initio Approach
  • Comparative Approach
  • Performance Evaluation

8
Covalent non-covalent bonds in RNA
This is a new slide
  • Primary
  • Covalent bonds
  • Secondary/Tertiary
  • Non-covalent bonds
  • H-bonds
  • (base-pairing)
  • Base stacking

Fig 6.2 Baxevanis Ouellette 2005
9
RNA Pseudoknots Tetraloops
This is a new slide
  • Often have important regulatory or catalytic
    functions

Pseudoknot
Tetraloop
http//academic.brooklyn.cuny.edu/chem/zhuang/QD/m
ckay_hr.gif
http//www.lbl.gov/Science-Articles/Research-Revie
w/Annual-Reports/1995/images/rna.gif
10
Base Pairing in RNA
This slide has been changed
  • G-C, A-U, G-U ("wobble") many variants

See IMB Image Library of Biological Molecules
http//www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NAN
A.htmlbasepairs
11
RNA Secondary Structure Prediction Methods
This slide has been changed
  • Two (three, recently) main types of methods
  • Ab initio - based on calculating most
    energetically favorable secondary structure(s)
  • Energy minimization (thermodynamics)
  • Comparative approach - based on comparisons of
    multiple evolutionarily-related RNA sequences
  • Sequence comparison (co-variation)
  • Combined computational experimental
  • Use experimental constraints when available

12
RNA Secondary structure prediction - 3
This is a new slide
3) Combined experimental computational
  • Experiments
  • Map single-stranded vs double-stranded regions
    in folded RNA
  • How?
  • Enzymes S1 nuclease, T1 RNase
  • Chemicals kethoxal, DMS, OH?
  • Software
  • Mfold
  • Sfold
  • RNAStructure
  • RNAFold
  • RNAlifold

13
Ab Initio Prediction Clarifications
This slide has been changed
  • Free energy is calculated based on parameters
    determined in the wet lab
  • Correction Use known energy associated with
    each type of nearest-neighbor pair
    (base-stacking) (not base-pair)
  • Base-pair formation is not independent multiple
    base-pairs adjacent to each other are more
    favorable than individual base-pairs -
    cooperative - because of base-stacking
    interactions
  • Bulges and loops adjacent to base-pairs have a
    free energy penalty

14
Energy minimization What are the rules?
This is a new slide
What gives here?
Why 1.2 vs 1.6?
C Staben 2005
15
Energy minimization calculations Base-stacking
is critical
This is a new slide
- Tinocco et al.
C Staben 2005
16
Ab Initio Energy Calculation
This slide has been changed
  • Search for all possible base-pairing patterns
  • Calculate total energy of each structure based on
    all stabilizing and destabilizing forces
  • Total free energy for a specific RNA conformation
    Sum of incremental energy terms for
  • helical stacking
  • (sequence dependent)
  • loop initiation
  • unpaired stacking

(favorable "increments" are lt 0)
Fig 6.3 Baxevanis Ouellette 2005
17
Dynamic Programming
This slide has been changed
  • Finding optimal secondary structure is difficult
    - lots of possibilities
  • Compare RNA sequence with itself
  • Apply scoring scheme based on energy parameters
    for base stacking, cooperativity, and penalties
    for destabilizing forces (loops, bulges)
  • Find path that represents most energetically
    favorable secondary structure

18
3 - Popular Programs that use Combined
Computational Experimental Approaches
  • Mfold
  • Sfold
  • RNAStructure
  • RNAFold
  • RNAlifold

19
Comparison of Predictions for Single RNA using
Different Methods
JH Lee 2007
20
Comparison of Mfold Predictions -/
Constraints
Mfold plus constraints -54.84 kcal/mol
Mfold -126.05 kcal/mol
JH Lee 2007
21
Performance Evaluation
This slide has been changed
  • Ab initio methods? correlation coefficient
    20-60
  • Comparative approaches? correlation coefficient
    20-80
  • Programs that require user to supply MSA are more
    accurate
  • Comparative programs are consistently more
    accurate than ab initio
  • Base-pairs predicted by comparative sequence
    analysis for large small subunit rRNAs are 97
    accurate when compared with high resolution
    crystal structures! - Gutell, Pace
  • BEST APPROACH? Methods that combine
    computational prediction (ab initio
    comparative) with experimental constraints (from
    chemical/enzymatic modification studies)

22
Chp 8 - Gene Prediction
  • SECTION III GENE AND PROMOTER PREDICTION
  • Xiong Chp 8 Gene Prediction
  • Categories of Gene Prediction Programs
  • Gene Prediction in Prokaryotes
  • Gene Prediction in Eukaryotes

23
What is a Gene?
  • What is a gene? segment of DNA, some of which is
    "structural," i.e., transcribed to give a
    functional RNA product, some of which is
    "regulatory"
  • Genes can encode
  • mRNA (for protein)
  • other types of RNA (tRNA, rRNA, miRNA, etc.)
  • Genes differ in eukaryotes vs prokaryotes (
    archaea), both structure regulation

24
Gene Finding
  • Problem Given a new genomic DNA sequence,
    identify coding regions and their predicted RNA
    and protein sequences
  • ATTACCATGGGGCAGGGTCAGATATAATGCCCTCATTTT
  • ATTACCATGGGGCAGGGTCAGATATAATGCCCTCATTTT
  • Steps
  • Search against protein / EST database
  • Apply gene prediction programs (many programs
    available)
  • Analyze regulatory regions

25
Gene Prediction in Prokaryotes vs Eukaryotes
  • Eukaryotes
  • Large genomes 107 1010 bp
  • Often less than 2 coding
  • Complicated gene structure (splicing, long
    exons)
  • Prediction success 50-95
  • Prokaryotes
  • Small genomes 0.5 - 10106 bp
  • About 90 of genome is coding
  • Simple gene structure
  • Prediction success 99

26
DNA "Signals" Used by Gene Finding Algorithms
  • Exploit the regular gene structure
  • ATGExon1Intron1Exon2ExonNSTOP
  • Recognize coding bias
  • CAG-CGA-GAC-TAT-TTA-GAT-AAC-ACA-CAT-GAA-
  • Recognize splice sites
  • IntroncAGtExongGTgagIntron
  • Model the duration of regions
  • Introns tend to be much longer than exons, in
    mammals
  • Exons are biased to have a given minimum length
  • Use cross-species comparison
  • Gene structure is conserved in mammals
  • Exons are more similar (85) than introns

27
Computational Gene Finding Approaches
  • Ab initio methods
  • Search by signal find DNA sequences involved in
    gene expression.
  • Search by content Test statistical properties
    distinguishing coding from non-coding DNA
  • Similarity based methods
  • Database search exploit similarity to proteins,
    ESTs, and cDNAs
  • Comparative genomics exploit aligned genomes
  • Do other organisms have similar sequence?
  • Hybrid methods - best

28
Examples of Gene Prediction Software
  • Ab initio
  • Genscan, GeneMark.hmm, Genie, GeneID
  • Similarity-based
  • BLAST, Procrustes
  • Hybrids
  • GeneSeqer, GenomeScan, GenieEST, Twinscan, SGP,
    ROSETTA, CEM, TBLASTX, SLAM.
  • BEST? Ab initio - Genescan (according to some
    assessments)
  • Hybrid - GeneSeqer
  • But depends on organism specific task
  • Lists of Gene Prediction Software
  • http//www.bioinformaticsonline.org/links/ch_09_t
    _1.html
  • http//cmgm.stanford.edu/classes/genefind/

29
Synthesis Processing of Eukaryotic mRNA
Gene in DNA
30
What are cDNAs ESTs?
  • cDNA libraries are important for determining gene
  • structure studying regulation of gene
    expression
  • Isolate RNA (always from a specific
  • organism, region, and time point)
  • Convert RNA to complementary DNA
  • (with reverse transcriptase)
  • Clone into cDNA vector
  • Sequence the cDNA inserts
  • Short cDNAs are called ESTs or
  • Expressed Sequence Tags
  • ESTs are strong evidence for genes
  • Full-length cDNAs can be difficult to obtain

31
UniGene Unique genes via ESTs
  • Find UniGene at NCBI
  • www.ncbi.nlm.nih.gov/UniGene
  • UniGene clusters contain many ESTs
  • UniGene data come from many cDNA libraries.
  • When you look up a gene in UniGene, you can
  • obtain information re level tissue
  • distribution of expression

32
Gene Prediction
  • Overview of steps strategies
  • What sequence signals can be used?
  • What other types of information can be used?
  • Algorithms
  • HMMs, Bayesian models, neural nets
  • Gene prediction software
  • 3 major types
  • many, many programs!

33
Overview of Gene Prediction Strategies
  • What sequence signals can be used?
  • Transcription TF binding sites, promoter,
    initiation site, terminator, GC islands, etc.
  • Processing signals Splice donor/acceptors,
    polyA signal
  • Translation Start (AUG Met) stop (UGA,UUA,
    UAG)
  • ORFs, codon usage
  • What other types of information can be used?
  • Homology (sequence comparison, BLAST)
  • cDNAs ESTs (experimental data, pairwise
    alignment)

34
Gene prediction Eukaryotes vs prokaryotes
Gene prediction is easier in microbial
genomes Why? Smaller genomes Simpler gene
structures Many more sequenced genomes!
(for comparative approaches)
Many microbial genomes have been fully sequenced
whole-genome "gene structure" and "gene
function" annotations are available e.g.,
GeneMark.hmm TIGR Comprehensive
Microbial Resource (CMR) NCBI Microbial
Genomes
35
Predicting Genes - Basic steps
  • Obtain genomic sequence
  • BLAST it!
  • Perform database similarity search
  • (with EST cDNA databases, if
    available)
  • Translate in all 6 reading frames
  • (i.e., "6-frame translation")
  • Compare with protein sequence databases
  • Use Gene Prediction software to locate genes
  • Analyze regulatory sequences
  • Refine gene prediction

36
Predicting Genes - Details
  • 1. 1st, mask to "remove" repetitive elements
    (ALUs, etc.)
  • Perform database search on translated DNA
    (BlastX,TFasta)
  • Use several programs to predict genes
    (GENSCAN, GeneMark.hmm, GeneSeqer)
  • Search for functional motifs in translated ORFs
    (Blocks, Motifs, etc.) in neighboring DNA
    sequences
  • Repeat

37
Spliced Alignment Algorithm
GeneSeqer - Brendel et al.- ISU
http//deepc2.psi.iastate.edu/cgi-bin/gs.cgi
Brendel et al (2004) Bioinformatics 20 1157
  • Perform pairwise alignment with large gaps in one
    sequence
  • (due to introns)
  • Align genomic DNA with cDNA, ESTs, protein
    sequences
  • Score semi-conserved sequences at splice
    junctions
  • Using Bayesian model or MM
  • Score coding constraints in translated exons
  • Using a Bayesian model or MM

Brendel 2005
38
Brendel - Spliced Alignment II Compare with
protein probes
Brendel 2005
39
Splice Site Detection
Do DNA sequences surrounding splice "consensus"
sequences contribute to splicing signal?
YES
i ith position in sequence I avg
information content over all positions gt20 nt
from splice site ?I avg sample standard
deviation of I
Brendel 2005
40
Information content vs position
Which sequences are exons which are
introns? How can you tell?
Brendel et al (2004) Bioinformatics 20 1157
Brendel 2005
41
Markov Model for Spliced Alignment
Brendel 2005
Write a Comment
User Comments (0)
About PowerShow.com