Value of organismspecific gene finders - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Value of organismspecific gene finders

Description:

Labels comprise a mosaic prediction (not a valid gene model) Choose the prediction that is closest to the 'voted' mosaic. 12. Combiner Performance ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 14
Provided by: StevenS79
Category:

less

Transcript and Presenter's Notes

Title: Value of organismspecific gene finders


1
Value of organism-specific gene finders
2
The Combiner
  • Inputs any 3 or more gene finders
  • GENSCAN (Burge Karlin, 1997)
  • GeneMark.hmm (Lukashin Borodovsky, 1998)
  • GlimmerHMM
  • others...

3
Candidate Prediction Programs
  • GENSCAN (Burge Karlin, 1997)
  • GeneMark.hmm (Lukashin Borodovsky, 1998)
  • GlimmerA (variant of GlimmerM Salzberg, Pertea,
    Gardner Tettelin, 1999)

Remember Your vote counts as much as those
people who actually know who the candidates are!
4
Statistics On The Dataset
  • The gene set
  • 1131 genes
  • 5242 exons
  • 4112 introns
  • Average length of the genes/exons/introns
  • 1048 nt / gene
  • 226 nt / exon
  • 161 nt / intron
  • Maximum number of introns 33 for gene At2g34680

5
Evaluating Performance
6
Metrics
  • Sensitivity (Sn) true positives / actual coding
  • Precision (Pr) true positives / reported as
    coding
  • (SnPr)/2
  • Percentage of genes exactly correct, end to end

7
Correctly Predicted Exons
Exons
8
Oracles
  • The GeneFinder Switch Oracle picks the best gene
    prediction for each sequence
  • The Maximum Precision Oracle picks from the
    exons predicted by all gene finders exactly those
    that are correct

9
Limits on any combiner (Arabidopsis)
Precision
Exon Level Sensitivity
10
Distribution of Correct Exon Predictions among
the Genefinders
Exons
Exons
GA-GM
GS-GA
GM-GS
GM-GA
GA-GS
GS-GM
Legend GS GENSCAN GM GeneMark.hmm GA
GlimmerA
11
Majority Voting Algorithm
  • 1 vote for each gene finder
  • Establish a label for each base by voting for it
    as coding or noncoding
  • Labels comprise a mosaic prediction (not a valid
    gene model)
  • Choose the prediction that is closest to the
    voted mosaic

12
Combiner Performance
Precision
Exon Level Sensitivity
13
Correct exons and whole genes
Write a Comment
User Comments (0)
About PowerShow.com