Gene prediction in flies - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Gene prediction in flies

Description:

quick genome scan to find putative gene containing regions ... built with Fitch/Kitsch. Odds and bits. Mapping of Pdb - Uniprot - dmel proteins ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 24
Provided by: andrea145
Category:

less

Transcript and Presenter's Notes

Title: Gene prediction in flies


1
Gene prediction in flies
  • Background
  • Gene prediction pipeline
  • Resources

2
Background
3
Genome quality
4
Genes in Drosophila melanogaster
  • high gene density
  • at least 20 with alternative transripts
  • can be nested
  • on the same strand
  • on different strands
  • di-cistronic
  • involve trans-splicing
  • exons from a different strand

5
Gene prediction pipeline
  • Gene prediction by homology
  • no ab-initio predictions
  • not using genomic alignments
  • TBLASTN/Genewise process
  • quick genome scan to find putative gene
    containing regions
  • aligning peptide sequence to genomic fragment
    using a gene model
  • cds
  • introns
  • splice-sites

6
(No Transcript)
7
Sensitivity Selectivity - Speed
  • Genome scan
  • strict trade-off between
  • sensitivity versus memory/time
  • Transcript prediction
  • t O(MN)
  • N length of peptide sequence quite short
  • M length of DNA sequence large
  • you want to minimize
  • the length of the genomic sequence to search
  • the number of fragments you align

8
Solutions
  • ENSEMBL Minigenes
  • cut out putative introns
  • My pipeline
  • priority lists
  • gene structure conservation

9
Difficulties
  • Terminal exons
  • short and thus alignment signal is weak
  • Spindly genes
  • there is no length penalty on introns

10
Concepts
  • Predict in three passes
  • Predict clear cut cases
  • Predict dubious cases
  • only if they don't overlap with a previous
    prediction
  • Predict alternative transcripts
  • Iteratively search for duplications
  • Accept a prediction with conserved exon boundaries

11
Conservation of gene structure
Query
Conserved
Prediction
Query
Partially conserved
Prediction
Query
Single exon
Prediction
Query
Retrotransposed
Prediction
Query
Unconserved
Prediction
(exon boundaries of query/prediction mapped on
query protein)
12
Quality control
  • Classify predictions into categories
  • Full length or fragment
  • Gene or pseudogene
  • Conserved or not conserved gene structure
  • Heuristically remove predictions
  • that are redundant
  • that are in conflict
  • nested genes
  • good predictions take precedence over bad
    predictions

13
Results
  • http//wwwfgu.anat.ox.ac.uk8080/cgi-bin/gbrowse

14
Number of predicted genes
15
Orthology assignments
  • Genes in D. melanogaster with ortholgs

16
Technical details
  • Hardware
  • 28 dual CPU nodes with 2Gb memory
  • sun grid engine (SGE)
  • Pipeline logic
  • gmake
  • Tasks
  • Python scripts (and Perl scripts)
  • Bash/awk scripts
  • Database
  • Postgres

17
Downstream analysis
  • Pairwise orthology assignment
  • PhyOP Pipeline (Leo Goodstadt (2006))
  • Multiple orthology assignment
  • My own concoction based on graph clustering with
    some consistency criteria
  • Multiple alignment of cds
  • Dialign (lt50 sequences)
  • Muscle (lt500 sequences)

18
Phylogenetic analysis
  • 14,000 GBlocks cleaned multiple alignments
  • Calculation of ka and ks with PAML
  • Phylogenetic trees
  • Genome trees
  • Gene trees
  • built with Fitch/Kitsch

19
Odds and bits
  • Mapping of Pdb -gt Uniprot -gt dmel proteins
  • Mapping of Interpro domains onto predictions
  • not up-to-date
  • Codon bias analysis
  • ENC, CAI, information theoretic measures
  • GC3, GC3_4D

20
Comparison of measures
21
Other groups
  • see http//rana.lbl.gov/drosophila/wiki/index.php/
    Main_Page
  • Gene predictions by others
  • Don Gilbert SNAP
  • Lior Pachter GeneMapper (genomic alignments)
  • Eisen Lab TBLastN Genewise/Exonerate,
    GeneMapper
  • Batzoglou Lab CONTRAST
  • Brent Lab N-Scan
  • Guigo geneid and SGP2

22
http//insects.eugenes.org/species/news/genome-sum
maries/genepredictions.html
23
Consensus predictions
  • Gbrowser comparison of all gene predictions
  • http//rana.lbl.gov/drosophila/gbrowse/cgi-bin/gbr
    owse
  • Mike Eisen's group GLEAN consensus set
  • Don Gilbert http//insects.eugenes.org/species/
  • Other resources
  • tRNA predictions
  • genome alignments
Write a Comment
User Comments (0)
About PowerShow.com